Immediate Impressions on JISC’s “Course Data: Making the most of Course Information” Funding Call

Notes on the JISC Grant Funding Call 8/11: “Course Data: Making the most of Course Information” Capital Programme – Call for Letters of Commitment

This post builds on quick commentaries around other reports in the area of Higher Education course data: Immediate Thoughts on the “Provision of information about higher education” and Getting Access to University Course Code Data (or not… (yet…))). It doesn’t necessarily represent my own opinions, let alone those of my employer.

1. The Joint Information Systems Committee (JISC) and the Higher Education Funding Council for England (HEFCE) invite English Universities and FE colleges (teaching over 400 HE FTEs) to become involved in a new programme of work which will help prepare the sector for increasing demands on course data.

3. Funding is available for projects starting from Monday 12 September 2011 for an initial period of approximately three months. Projects selected to go forward into Stage 2 will continue for an additional 12 to 15 months. All projects must be complete by 29 March 2013.

So how does this fit with the timeline for HEFCE Key Information Set (KIS) development if the called for work is relevant to that? (Note: HEFCE makes available much of the monies disbursed by JISC, and HEFCE is managing the KIS work directly.)

As soon as possible and not later than the end of September 2011 Technical guidance published by HEFCE
January to March 2012 Submission system open for KISs to be published in September 2012: Institutions submit their data to
June to early July 2012 2012 NSS and DLHE data available to HEFCE
July to August 2012 HEFCE merges data submitted by institutions with 2012 NSS and DLHE data. Institutions quality check and sign off their final
September 2012 KISs available for institutions to upload. All KISs to be accessible via institutional web-sites by the end of the month

[HEFCE: Provision of information about higher education]

So given the timings, the JISC second phase work looks as if it is supporting processes relating to, and publication of, different sorts of data to the KIS data, although phase 1 work may be relevant to KIS releases?

10. There are 3 main drivers for making it easier for people to find and compare courses:
– prospective fee paying students want to know more about the academic experience a course will provide and be able to compare this with other courses;
– better informed students are more likely to choose a course that they will complete, and be more motivated to achieve better results;
– increased scrutiny by quality assurance agencies and the Government’s requirement for transparency of publicly funded bodies.

11. JISC have made it easier for prospective students to decide which course to study by creating an internationally recognised data standard for course information, known as XCRI-CAP which is conformant with the new European standard for Advertising Learning Opportunities. This will make transferring and advertising information about courses between institutions and organisations more efficient and effective. Placing this data at a consistent COOL URI makes it easier to find.

So there are two end-user groups in mind for the course related information: prospective students, and the scrutineers. XCRI-CAP relates to the publication of information describing at a high level the subject content of a course, rather than the sorts of “metadata” around courses that the KIS will provide. If we were building a course comparison website, the XCRI-CAP data might provide course descriptions relating to a course, whereas the KIS data would provide student satisfaction ratings, teaching hours, assessment strategies, graduate employment rates and salaries. Pricing related information might be common to both sets?

KIS example

KIS example 2
Example of what the KIS display might look like.

Within the university website, developers will be required to identify which course a page relates to, and then call in the appropriate KIS widget from HEFCE or its agent, presumably by passing parameters relating to: institution identifier; course identifier.

In order to display both XCRI-CAP style data and KIS data on the same third party site web page, the third party will need to be able to identify the course identifier and the university identifier. It will also need a way of identifying which course codes are offered by each institution. In order to satisfy requests from potential applicants searching for a particular topic anywhere in the country*, the third party would ideally have access to an index (or at least a comprehensive list either of courses for each institution, or of institutions by course) that allows it to identify and return the set of (institution, course) pairs for which the course satisifes the search term. (Alternatively, for every request, the third party could query every university separately for related courses, aggregate these responses, and then annotate each result with a link to the corresponding KIS information, or its widget.) If the aggregator was to offer a service whereby potential applicants could rank each result according to one or more KIS data elements, it would need to index associate the KIS data relating to each of the courses identified by the (institution, course) pairs with the corresponding pair, and then use this aggregated data set to present the result to the end user. Again, this could be achieved my making separate requests to the KIS information server, once for each (institution, course) pair; or it could draw on its own index of this data if the information was openly licensed.

* when thinking about course selection, I often have four scenarios in mind: a) I know what course I want to do and where I want to do it; b) I know where I want to go but donlt know what course to do; c) I know what course I want to do, but know where to do it; d) I don’t know what course to do or where to do it…

Just by the by, I wonder if the intention of the HEFCE technical working group is to come up with a structured machine readable standard for communicating the KIS information via the KIS widget? That is, will content represented via the KIS widget be marked up in semantic form, or will semantics at the data representation level have to be reverse engineered from the presentation of the information? Where the KIS renders graphical elements, will the charts be generated directly from data transported to the widget, or will the provision simply be flat image files? (Charts displayed in widgets can come in three flavours: 1) as a flat image file with an arbitrary URL (e.g. kisDataImage4.png) (note that data underlying the graph may be described in surrounding metadata, such as within img attribute tags; 2) as an image file generated from data contained within the URL (e.g. as in the mechanism used by the Google Charts API); 3) through the enhancement of data contained within the page (for example, in a Javascript data scructure or an HTML table).

The KIS data only partially overlaps with the XCRI-CAP data, so I wonder: to what extent will it be possible to JOIN the two data sets (that is, how will we be able to link XCRI-CAP and KIS data? Via HEI+coursecode keys, presumably?)

12. The proposed programme will support the sector to prepare for the increasing demand for course information, and increase the availability of high-quality, accurate information about part-time, online and distance learning opportunities offered by UK HEIs by:

– funding institutions to make the process and technical innovations necessary to release a structured, machine-readable feed of their course-related information, and;
– creating a proof-of-concept aggregator and discovery service to bring together this course information and enable prospective students to search it.

So – what I think the JISC are suggesting is that they are looking to fund work on the “wider information set” of information around courses? That JISC are also looking to create a “proof-of-concept aggregator and discovery service to bring together this course information and enable prospective students to search it” sounds interesting. I wonder how this would sit in the context of:

  1. UCAS (which currently concentrates course listings as a basis for a single point of application for entry (how will entry work for the private universities? Cf. also the OU, which has only just started to make use of the UCAS entry route, and which also supports a significant direct entry route onto modules?)
  2. third party services such as ???Hotcourses
  3. custom search engines such as CourseDetective, which search over online course prospectuses (and which cost approx. 2 volunteered FTE days to put together at a hackday…;-)

It’s also worth bearing in mind that my TSO OpenUp competition entry also suggested the opening up of course code scaffolding data so that third parties could start to create aggregated and enriched datasets around courses, as well as building services on top of that data that would potentially be revenue generating and commercially sustainable…

Just on the topic of “wider information sets”, here’s what the HEFCE KIS consultation report had to say on the matter:

The wider information set
32. Higher education providers already publish a wide range of information about their institution and the courses they deliver. The information published has been considered by QAA in the context of institutional audit (for publicly funded higher education institutions and those privately funded providers that subscribe to QAA) or of Integrated Quality and Enhancement Review (for further education colleges (FECs) offering HE courses) and is subject to a ‘comment’ in that context. The consultation proposed that institutions should make this information more public-facing, noting that published information would, in due course, be subject to a judgement in QAA review processes.

33. It was proposed that this wider information set has two purposes: to provide information about higher education to a wide variety of audiences including:
prospective and current students; students’ parents and advisers; employers; the media; and the institution itself to form part of the evidence used in QAA audit and review.

34. The required information set was presented in the consultation document as a minimum requirement, with institutions continuing to publish as much other information as they wished. Institutions were asked to consider whether any of the information could be presented in more accessible ways.

Information about aspects of course/awards (not available in the KIS):

Information to be provided Level of information Availability
prospectuses, programme guides, module descriptors or similar programme specifications;
results of internal student surveys
links with employers – where employers have input into a course or programme (this could be quite a high-level statement)
partnership agreements, links with awarding bodies/delivery partners.
Course/programme level All apart from results of internal surveys to be publicly available
Results of internal surveys should be available internally

[HEFCE: Provision of information about higher education]

If there is such pent-up demand for aggregated course discovery services, then they should also be able to run as commercial services? One thing that I would argue currently limits innovation in this area is access to a comprehensive qualifcation catologue across the UK. UCAS do have this data, and they do sell it. But I want to play with it and see if I can build a service round it, rather than deciding to quit my job, raise finance, buy the data from UCAS and then see if I can make a go of building a commercial service around the data. UCAS would still benefit from traffic driven to the UCAS site for couse registrations. (But then, if aggregators were also aggregating information about courses in the private sector that supported direct entry and did not require central applications and clearing, aggregators might also start recommending courses outside the scope of UCAS…? Hmmm… Becuase the private universities would probably provide a commercial incentive to drive traffic to them in the form of affiliate fees based on registrations resulting from referrals… Hmmm… This is all starting to put me in mind of things like FOTA, Formula One and the FIA…!)

Another route to a comprehensive course catalogue is through indexing catalogue feeds (akin to website sitemap feeds that detail all the pages on a website to make it easy for search engines to index them) published directly by the universities, such as XCRI-CAP feeds…

13. The availability of useable course data feeds, and the demonstration of the proof-of-concept aggregator, is intended to provide a catalyst to the feeds being used within existing aggregators, catalogues or information, advice and guidance services, or to form the basis of new services.

I’m not sure an incentive is required.. just open access to the data, free in the first instance. (And if companies do start to make money from it, then license fees can kick in. I don’t think people would have a problem with that…)

15. Between September 2011 and March 2013, JISC intends to fund projects that help institutions review and adapt their internal processes to permit easier access to their course data to meet the needs of various stakeholders. As a minimum, and to provide a clear focus for this overarching activity, the programme will concentrate on the implementation of an XCRI-CAP standard system-generated feed. The programme will be staged to ensure maximum benefit is achieved.

If this data is already exposed via online course prospectuses, a developer with data scraper in hand could probably get a large chunk of this data anyway over the next three to six months. (The CourseDetective CSE definition file already provides a basis for anyone wanting to spider university course catalogues… Hmmm… maybe that’s a good reason for me to get to grips with Lucene…? Ideally, course prospectuses would also produce a sitemap (or XCRI) feed providing URLs for all the course pages currently published via the online prospectus to make it easy for third parties to index, or harvest, this data. The provision of semantic markup in a page, whether through RDFa, microformats, microdata or metadata would also simplify the sctaping (i.e. machibe parseability) of the course pages. At the very least, using template based, sensibly structured presentation markup that enforces markup conventions that suggest de facto semantics makes pages reliably scrapeable and provides one way of supporting the harvesting of data (if license conditions allow…)) Because, of course, a major why potentially commercial services don’t just scrape the data to build course comparison sites relates to the licensing/copyright restrictions that may exist, deliberately or by default, over the university prospectus data that is published online… (Not everyone’s a pirate;-)

16. In Stage 1, institutions will review the maturity of their management of course data using the XCRI Self Assessment Framework. This could cover the full course data life cycle, but must include a particular focus on prospectus and course advertising information. Based on the outcomes of this review, institutions will produce an implementation plan for how they will improve processes to, as a minimum, create a system-generated course advertising feed in a XCRI CAP 1.2 format with a COOL URI.

Ah, ha….

So I wonder, would JISC indemnify a third party looking to scrape, aggregate, and republish this data in a standard form via an open API and a permissive license, against actions taken against them by UCAS and the universities for breach of copyright?! I also wonder whether JISC will be providing guidance about what license conditions they expect XCRI-CAP data to be published under? Or is that out of scope?

19. The anticipated outcomes from this programme of work are:
– There will be increased usage of appropriate technology to streamline course data processes leading to:
— More standardised, and therefore comparable, course information in a consistent location making discovery easier.
— Improved quality and therefore more efficient and effective course data.
— Increased ease in finding and comparing courses, especially types of courses that are currently hard to find, such as ones delivered by distance learning.

– Institutions are able to make appropriate and informed decisions about their processes for managing course-related data, leading to a reduced administrative data burden, cost-effective working, and better business intelligence.

Ah… this is actually different to getting the data out there, then, in a way that third parties can use it? It’s more about tweaking systems and processes inside the institution to support the provisioning of data in ways that make it more accessible to third party aggregators? The course aggregator is then a red herring – it’s just there to provide a reference/candidate client/consumer against which the released data can be targeted.

25. There will be a support and synthesis project that will be working with projects from the start of the programme to help them shape their implementation plans in Stage 1 and other outputs in Stage 2 that are of most use to the sector. Projects are expected to engage with the support and synthesis project and to be proactive in sharing outputs throughout the project. This information will be synthesised and shared with the sector; where that information is sensitive, it will be shared in an aggregated, anonymised form.

A “support and synthesis project” within JISC presumably, (i.e. run by the usual suspects)? Rather than sponsoring and indemnifying the open data community on the one hand, or encouraging potential startups on the other, to start building user facing (potential student) services, along with the necessary business model that will make them sustainable, and maybe even profitable?

26. Funding is provided to enable institutions to carry out project work, but also to release key staff to prepare for, take part in and follow up on these programme-level activities. Projects should allow at least 5 person-days in Stage 1 and 10 person-days in Stage 2.

Such is the price of funding HE based developer activities. 5 days project work: £10k. 10 days project work: £40k-80k. So now you know…

Immediate Thoughts on the “Provision of information about higher education”

Some immediate thoughts on reading the “Provision of information about higher education” consultation report. Note that the opinions expressed below may not even belong to me, let alone my employer. (They’re just imaginings… or nightmare visions…)

What I still need to do is try to find out how the requirement to provide KIS data over the coming months fits in with JISC’s current Grant Funding Call 8/11: ‘Course Data: Making the most of Course Information’ Capital Programme – Call for Letters of Commitment which is “designed to ensure a high number of engaged institutions, which is vital to get the critical mass needed to effectively demonstrate to the sector the huge potential of organising and presenting course information in a standardised way.” (The initial call is for £10k for each eligible UK HEI, and a second tranche of £40-80,000 for each of 80 or so plan execution projects. (“Do the math”, as they say…) I don’t know how much HEFCE intend to give to UK HEIs to help underwrite the roll out of KIS (a fair chunk will go to the vendors that provide enterprise software to the HEIs, I guess..?) but I imagine that that will be a not insignificant sum. I just wonder what we’d have been able to do if we’d manage to get hold of the set of course code data that corresponds to the courses offered by UK HEIs? If UCAS would just relax their license conditions, I’m guessing we could even scrape the data and they wouldn’t even have to work out how to drop the corresponding table and make it available in some way… But if we respect their license conditions, we’re *****d.

1. This is a joint publication by HEFCE, Universities UK (UUK) and GuildHE, setting out how it is intended to improve the accessibility and usefulness of information about higher education (HE).

Who says what’s useful?

6. Universities and colleges should publish Key Information Sets (KISs) for undergraduate courses, whether full- or part-time. These KISs will contain information on student satisfaction, graduate outcomes, learning and teaching activities, assessment methods, tuition fees and student finance, accommodation and professional accreditation.

A lot of this data is already available as public data from original sources, or via curated datastores such as the Guardian Datablog. What is lacking at the current time is the scaffolding that lets us create resources capable of spanning the sector at qualification level. Some time ago, I described a simple visual application for comparing summary statistics relating to satisfaction, fees, salary levels and so on across UK universities (Does Funding Equal Happiness in Higher Education?). That was a first step. The second step was to try to start building up information from the course level and begin using that as the focus for comparisons (as well as building out other services, such a book recommendations related to courses). Which was in part why I entered the TSO OpenUp coompetition

Through HESA subject codes (which structure subject areas into a three level categorisation, it is possible to compare statistics relating to broad teaching subject offerings across multiple different providers within in a particular topic area. Cross-relating teaching subject areas to research areas is still an ad hoc process though, as is obtaining research funding data from across the UK research funding councils and agencies, let alone trying to relate it to teaching subject areas. (Exploiting research for teaching is one of the claimed benefits of undergraduate study; maybe through making accessible an easy way of comparing the amount of research funding provided to particular institutions in different subject areas and the related teaching areas we might get a better handle on the actual relationship that exists between teaching and research excellence?)

Nor is it a simple matter to to compare, in detail, the qualifications across teaching providers within a geographical area. The only place that currently describes all the current UK HE qualifications on offer each year is the UCAS website, which also acts as a gateway to applications to HE. One of the key considerations when developing comparison services is the extent to which a service can provide comprehensive coverage over the range of offerings that are being compared. In a very real sense, a comprehensive catalogue of offerings provides the key infrastructure that innovative third parties can build upon. By enriching and annotating a common, core dataset, vendors can develop differentiated services whilst maintain a level of consistency between them (i.e. the services become comparable). An opportunity also arises for vendors to offer business to business services over that core data set.

The provision of a common, key information set information about each course/qualification within a university can thus be picked apart as follows:

– firstly, that there exists a comprehensive directory of courses;
– secondly, that for each course, there exists a common set of data attributes, aligned to a common scale;
– thirdly, that the information is provided in a consistent way so as to “support” comparison.

As I have already mentioned, there is a significant amount of data available in public through open licenses that could already be used for the provision of comparison services. What is missing is the scaffolding – the complete course catalogue – that allows this to be done reliably across the sector.

(There is also arguably a lack of opportunity in certain areas for business development. One model might see comparison services acting almost in the role of “independent” educational advisers, helping guide a potential student to an informed choice, and reaping some benfit from that process. For example, let us crudely model the student application lifecycle as: discovery (where to go/what to do), application, study, completion/graduation, employment. In the discovery phase, services might sell advertising, and pick up affiliate fees for prospectus requests for example. In a mature market, the application phase might also accommodate affiliate or referral fees, for example, based on encouraging applications, or even better, accepted and taken up applications. The financial services industry, for all its sins, supports a variety of models for repaying an agent who signs up a client to a longstading financial product, replete with bonuses and other incentives that encourage the agent to find a product that the client will actually stick with. On completing a degree with a given grade, the agent may get a bonus. (Retention initiatives can start early, arguably before the student even accepts a place at univesity, through helpoing them make a decision regarding a course that is likely to suit them!)) If you can imagine that universities might set up as recruitment agents, taking a fee for placing a graduate in a particular job on graduation, it’s not hard to also imagine that a bonus might be paid from that placement fee to the agent responsible for referring the unergraduate applicant, as was, in the first place.)

13. Institutions will be required to submit data to HEFCE for inclusion in the KIS. Institutions who subscribe to the QAA but who do not currently take part in the NSS and DLHE should take steps to do so.

So a data burden will be placed on institutions to provide information in a standard way to a central aggregating service? Will there be an opportunity for HEIs to publish this data via an open API, and allow HEFCE to pull/harvest the data from there? Or will the data be required to pass from the HEI, through HEFCE so that HEFCE can put a stamp of approval on it, before it is allowed to be branded as part of the instition’s KIS?

14. All KISs should be made available via institutional web-sites by the end of September 2012.

But will the KIS data also support services that allow the direct comparison of KIS data across institutions on first (university), second (HEFCE, UCAS, Unistats, etc.) and third (commercial, or not-for-profit) party sites without having to visit each of those institutions separately?

22. The plans are based on extensive research, consultation and pilot processes. We are very grateful to all who have given their time and views so generously. There were 215 responses to HEFCE 2010/31, all of which have been carefully considered. We have also taken into account: the views of 2,000 prospective and current students on useful information; several expert working groups considering specific parts of the KIS; a pilot with eight institutions; and user testing with more than 200 prospective HE students. We are particularly pleased to have engaged closely with the National Union of Students in this project, and to have received consultation responses from 30 student unions. We have also liaised with the Academic Registrars’ Council, in an attempt to ensure that the next steps are both feasible and proportionate to implement.

I wonder: did they also consult with open data advocates or web development companies who are familiar with putting data to work in a customer-facing, value adding way? To my shame, I didn’t respond – I came across the consultation after it had closed. (Which suggests the consultation didn’t reach out into that part of the open data community I inhabit? Or maybe I did see it and missed/didn’t pick up on the significance of it at the time:-(

27. The consultation made three primary proposals which are summarised in this section. The first question focused on the purposes of providing information about HE. Responses broadly agreed that information about HE has three purposes:
– to inform people about the quality of higher education and, in particular, to give prospective students information that will help them choose what and where to study
– as evidence for quality assurance processes in institutions
– as information that institutions can use to enhance the quality of their HE provision.

29. The consultation proposed that universities’ and colleges’ web-sites should use a standardised way of publishing key pieces of information about each undergraduate course they offer, by using KISs.

30. KISs would make it easier to find information that prospective students have identified as important to their decisions, and which is mostly already available. The categories of information were identified during research undertaken with 2,000 prospective students, current students and careers advisers by Oakleigh Consulting and Staffordshire University15

So the implication here is that I can compare the data, because each university will separately publish a standard set of data in the same format. So to compare 14 different courses across 8 universities, I probably need to have 14 browser windows open on the same screen at the same time?!

31. In parallel to the consultation, a programme of KIS development work was undertaken. This looked specifically at the information items that do not currently exist in a national comparable format (about learning and teaching, assessment, professional accreditation and accommodation costs) and piloting the processes institutions need to undertake to provide these data. There were also user tests with prospective students. For further details see Annex A.

So the consultation looked at what sort of data might be used to enrich the core data set. One might argue that if the core, course data set were available, third party comparison services might already have started to explore various ways of annotating, enriching and pivoting around the data?

36. The principle of the KIS is that it presents information we have identified that prospective students find useful, in a place we know they already look for such information. In summary, this is information on study, satisfaction, costs and employability, presented on the course information sections of institution’s own web-sites.

“[I]n a place we know they already look for such information”: you could read that as being anti-competitive…? I’d also argue that it doesn’t support the ability to make comparisons. I assume that enerfy suppliers and mobile phone operators publish similar sorts of infromation about tariffs on their websites? Why, then, do comparison sites exisit?! I’d argue it’s not because they don’t have KIS tables on their sites (though that may contribute). Rather, it’s easier to make a comparison across sites in the context of a single location. (And here, I fear, I start to smell a trap… Because “a place we know they already look for such information” exits in the form of UCAS…)

46. There will be three categories of learning and teaching activities:
– scheduled learning and teaching activities
– guided independent study
– placement/study abroad.

47. Information on these will be presented in a bar chart, as a proportion of hours, on a year-by-year basis, showing each year/stage of study, rather than aggregated for the course as a whole. For KISs relating to part-time study, three bars should also be provided for a standard undergraduate course, each referring to the time equivalent to one year of study if studied full-time

48. In the interest of providing as much relevant information to the user as possible, a web-link would follow that would lead users to more detailed information. This might be the programme specification or other document, but we would expect this would present more detailed information about learning and teaching, for instance possibly module-level contact hours. This would provide useful contextualised data – something that was a strong theme emerging from consultation responses.

Being able to reliably identify links to programme specifications could be really handy, e.g. for things like the Course Detective approach to custom search engine development…?

67. The salaries for all institutions data will be adjusted to account for regional variations in the salaries earned by graduates in different parts of the country. A link from the KIS to institutional web-sites will enable institutions to provide additional contextual information with particular reference to the different circumstances of different employment sectors (for example the creative industries.)

I can see this causing all sorts of problems when it comes to offering comparisons?

92. Information derived from the NSS and DLHE survey will be presented at course level if sufficient data are available; otherwise NSS and DLHE data will be presented at the most detailed level possible of the Joint Academic Coding System (JACS), subject to the surveys’ response rates and threshold requirements. This information is held by HEFCE and HESA for publicly funded institutions and others that subscribe to HESA.

If a data describing UK HE courses were freely available, work could already have started on this…?

93. Annex C provides a detailed breakdown of the expected coverage of the KIS for HEIs, but
in summary:
a. The data thresholds we intend to apply to the NSS and DLHE data (which mirror the thresholds we apply on Unistats) mean that roughly one in seven single subject, full-time, first degree KISs will have both DLHE and NSS data available at course level, although in some cases the data presented may need to be aggregated across two years. However, over 95 per cent of KISs will be able to present DLHE or NSS data, or both, when data are included that is aggregated to JACS level 1 and across two years.
b. We expect that about 2 per cent of single subject, part-time, first degree KISs will have full data available; this rises to about 35 per cent when data are included which are aggregated to JACS level 1 and across two years.
c. We expect the KISs where full data are available to cover about 40 per cent of the student body; after allowing for aggregation, the proportion where some data are available is likely to cover over 90 per cent of the student body.

One argument against making a comprehensive course catalogue available under an open public license is that if it were to be used as scaffolding for aggregating different, comparative data sources, lack of coverage over the whole course listing would be confusing and offer a poor user experience. Err…? “[R]oughly one in seven single subject, full-time, first degree KISs will have both DLHE and NSS data available at course level” So that reason isnlt a deal breaker, then?!

95. We recognise that, even aggregating data over years or over JACS levels, there will be, as on Unistats at present, a number of courses for which it will not be possible to provide data derived from the NSS or DLHE due to the small size of the student cohorts concerned. The thresholds for publication reflect both the need to ensure the statistical validity of the information and the need to meet data protection requirements. There will still be elements of the KIS that will be useful to prospective students, but we recognise the need to ensure prospective students do not negatively interpret the absence of data. We will undertake further user testing over the next few months to finalise appropriate explanatory text.


98. Consideration has been given to who should undertake the production of the KISs, and how. Requiring individual institutions to create their own KISs was considered, but it was felt to be problematic because it would place a significant burden on individual institutions and would pose a challenge in controlling the quality of – potentially – several hundred different production processes, hindering the creation of a single, uniform and credible information source. This task therefore needs to be undertaken by a single body.

So institutions are not going to have a new data burden placed on them?

99. The first year of KISs (those to be published in September 2012) will be centrally created by HEFCE in partnership with HESA. From year two onwards it is intended that central creation will pass to HESA.

100. In the first year, HEFCE will draw data from the NSS and DLHE and institutions will provide additional data (as set out in Table 1). Once this has been collated, HEFCE will provide institutions with web code to be inserted appropriately on their own web-sites.

Hmmm.. when I won the TSO Open Up competition, the plan was to get UCAS course code data and then start annotating and enriching it howseover we could. The reason why I wanted the UCAS data is that it provides the scaffolding to build from. The user focus is the course, so it made sense to build up views over the data from the course level. (We could have started trying to build service out at the level of HESA codes, but that wasn’t what the prize was awarded for.) During the competition pitch, I made the claim that course code data was akin to postcode data to the extent that rights over the seemingly most useful identifier space was controlled by a restrictive license. I don’t yet know what services I want to build out over the course code space, but why is that a reason to prevent innovation in the development of services around course codes by locking those codes down?

103. In order for KISs to be published during September 2012, for use by applicants for entry in academic year 2013-14, institutions must submit their data returns to HEFCE by summer 2012.

So the data burden is on the universities?! But the aggregation – where the value is locked up – is under the control of the centre? Hmmm… thinks… SCONUL charge 80 quid (?) for their aggregated report on HE library stats data, but I’ve managed to FOI the return made to SCONUL by individual libraries. So if there is a KIS like return from HEIs to HEFCE, it should be FOIable, and we can create a copy of the aggregate by aggregating FOI requests. Hmmm…

105. In the main, we would expect the KIS to be revised at most annually; however, a system will be set up to enable exceptions to be processed, for example, corrections to be made or financial information updated. More detail will follow in the technical guidance.

Another of the arguments I’ve heard – this time from universities – to explain a university’s unwillingness to publish a course data API or data dumps is that a third party that aggregates data from universities may end up with data that is stale or out of synch with data on the university website. I suspect that a third party would be quicker to respond to changes than once ever 52 weeks…

106. HEFCE is in discussion with the primary providers of institutional data management software to ensure that the new data requirements for the KIS can be incorporated into existing applications as soon as possible.

So how much do we think the thrid party software vendors are going to claim for to make the changes to their systems? And hands up who thinks that those changes will also be antagnostic to developers who might be minded to open up the data via APIs. After all, if you can get data out of your commercially licensed enterprise software via a programmable API, there’s less requirement to stump up the cash to pay for maintenance and the implementation of “additional” features…

107. The KIS will have a strong brand, including a unique logo. This is to ensure that the KIS is as engaging to users as possible, as well as distinguishing it from any other information sources available.

…which sounds to me like someone’s twigged there may be value locked up in the data, and they’re not willing to let it go…

108. A core feature of the KIS is that it is standardised and comparable across HEIs, with consistent branding and presentation. Therefore, in order to avoid confusion, institutions should not publish a document called the KIS or with the KIS logo for any courses where not required.

Brand police… Total ownership. We can haz ur data; we pwn ur data.

110. It is likely the KIS for each course will be available through an embedded ‘widget’ on the institution’s web-site. We do not intend to be prescriptive about where on the web-site this should appear, other than that it should be found near other course information. The widget would contain three items of top-line information, and the option to click through for the full KIS.

Hmmm… did somebody just discover widgets?! So the idea here is to control the brand through a KIS branded widget that can be embedded on University websites?

The obvious question to an open data freak would be: will there be a freely available open API with that, and will the data made available through the API be openly licensed?

The open systems advocate in me would also wonder whether there might be mileage in each insitution publishing it’s own data in an open way via an open API that could be harvested by the central HEFCE aggregator or by a third party. In addition, the KIS data would be available as a service within the institution to institutional developers.

113. In HEFCE 2010/31 we suggested that the KIS should be accessible from the UCAS web-site. Although it was pointed out that not all applications go through UCAS, there was broad support for this approach in the consultation and discussions with UCAS are continuing. UCAS is
keen to link the KIS to its site and to explore the possibility of incorporating a comparison function into its planned ‘course finder’ facility, for all courses there are KISs for (including part-time courses), not just those they process applications for.

So HEFCE want to run a data service…?! Will it be an open data service? Or are HEFCE going to get a copy of the course code scaffolding grail and use it to act as infrastructure for a data service that aggregates and re-presents data that is in part already largely available, albeit in a less structured way, via a branded and content controlled widget?

114. We would also like to work with other organisations that provide student information on HE and other related careers guidance. We are keen to promote and publicise the KIS through the various student web-sites and social media outlets that exist.

So will third parties be encouraged and supported in developing their own takes on enriched KIS data?

116. Because KISs will be created centrally, a central database of KISs will be available. HEPISG needs to consider how to use this information, recognising the Government’s intention that data on publicly funded provision should be available for general use. More information will be published on the HEFCE web-site in due course.

Ah – so the data may be available via an open public license. Tip to the HEPISG folk: why not build an API around the data, and serve the widget from that? Furthermore, by making the core course code data available as a dataset, third parties would be able annotate and enrich that data and serve it as additional information around the “officially sanctioned” KIS data pulled from the API. Finally, a question: if third parties are going to use clustering techniques so that they can provide recommendations on “similar” courses, will they have access to the whole KIS data set so that they can run their own clustering algorithms?!

119. Currently, there is information available via Unistats that will not be available through the KIS. We do not envisage, therefore, that any changes will be made to the Unistats web-site in the KIS’ first year of operation. The focus will be on ensuring that the KIS is available on institutional web-sites as advised in the Oakleigh Consulting and Staffordshire University research, with links to, and from, the UCAS web-site.

So if students want to compare courses, they need to go to N different pages to find the KIS widget on each, and then go and fight with the Unistats website?

120. However, we recognise that, in the longer term, there will be a need to revisit the arrangements to ensure we meet the needs of students for good access to information and that we secure the best use of public money and institutional time. As we move to more established arrangements for the creation and maintenance of the KIS, and look at the use of potential sites for comparing information, we will consider the future of Unistats in the light of the wider policy environment for higher education.

Just open the course/qualification scaffolding data

123. As well as the KIS and Unistats data, a wider set of information is to be made available by all publicly funded HEIs, FECs with undergraduate provision, and private providers who subscribe to the QAA.

This is the sort of thing third parties might be keen to develop. But to scaffold the collection and delivery of the additional data annotations, the course data could be really handy…

[The paper goes on a bit more, but it’s making me angry so I figure I need to take a break!]