Immediate Thoughts on the “Provision of information about higher education”
Some immediate thoughts on reading the “Provision of information about higher education” consultation report. Note that the opinions expressed below may not even belong to me, let alone my employer. (They’re just imaginings… or nightmare visions…)
What I still need to do is try to find out how the requirement to provide KIS data over the coming months fits in with JISC’s current Grant Funding Call 8/11: ‘Course Data: Making the most of Course Information’ Capital Programme – Call for Letters of Commitment which is “designed to ensure a high number of engaged institutions, which is vital to get the critical mass needed to effectively demonstrate to the sector the huge potential of organising and presenting course information in a standardised way.” (The initial call is for £10k for each eligible UK HEI, and a second tranche of £40-80,000 for each of 80 or so plan execution projects. (“Do the math”, as they say…) I don’t know how much HEFCE intend to give to UK HEIs to help underwrite the roll out of KIS (a fair chunk will go to the vendors that provide enterprise software to the HEIs, I guess..?) but I imagine that that will be a not insignificant sum. I just wonder what we’d have been able to do if we’d manage to get hold of the set of course code data that corresponds to the courses offered by UK HEIs? If UCAS would just relax their license conditions, I’m guessing we could even scrape the data and they wouldn’t even have to work out how to drop the corresponding table and make it available in some way… But if we respect their license conditions, we’re *****d.
1. This is a joint publication by HEFCE, Universities UK (UUK) and GuildHE, setting out how it is intended to improve the accessibility and usefulness of information about higher education (HE).
Who says what’s useful?
6. Universities and colleges should publish Key Information Sets (KISs) for undergraduate courses, whether full- or part-time. These KISs will contain information on student satisfaction, graduate outcomes, learning and teaching activities, assessment methods, tuition fees and student finance, accommodation and professional accreditation.
A lot of this data is already available as public data from original sources, or via curated datastores such as the Guardian Datablog. What is lacking at the current time is the scaffolding that lets us create resources capable of spanning the sector at qualification level. Some time ago, I described a simple visual application for comparing summary statistics relating to satisfaction, fees, salary levels and so on across UK universities (Does Funding Equal Happiness in Higher Education?). That was a first step. The second step was to try to start building up information from the course level and begin using that as the focus for comparisons (as well as building out other services, such a book recommendations related to courses). Which was in part why I entered the TSO OpenUp coompetition…
Through HESA subject codes (which structure subject areas into a three level categorisation, it is possible to compare statistics relating to broad teaching subject offerings across multiple different providers within in a particular topic area. Cross-relating teaching subject areas to research areas is still an ad hoc process though, as is obtaining research funding data from across the UK research funding councils and agencies, let alone trying to relate it to teaching subject areas. (Exploiting research for teaching is one of the claimed benefits of undergraduate study; maybe through making accessible an easy way of comparing the amount of research funding provided to particular institutions in different subject areas and the related teaching areas we might get a better handle on the actual relationship that exists between teaching and research excellence?)
Nor is it a simple matter to to compare, in detail, the qualifications across teaching providers within a geographical area. The only place that currently describes all the current UK HE qualifications on offer each year is the UCAS website, which also acts as a gateway to applications to HE. One of the key considerations when developing comparison services is the extent to which a service can provide comprehensive coverage over the range of offerings that are being compared. In a very real sense, a comprehensive catalogue of offerings provides the key infrastructure that innovative third parties can build upon. By enriching and annotating a common, core dataset, vendors can develop differentiated services whilst maintain a level of consistency between them (i.e. the services become comparable). An opportunity also arises for vendors to offer business to business services over that core data set.
The provision of a common, key information set information about each course/qualification within a university can thus be picked apart as follows:
- firstly, that there exists a comprehensive directory of courses;
- secondly, that for each course, there exists a common set of data attributes, aligned to a common scale;
- thirdly, that the information is provided in a consistent way so as to “support” comparison.
As I have already mentioned, there is a significant amount of data available in public through open licenses that could already be used for the provision of comparison services. What is missing is the scaffolding – the complete course catalogue – that allows this to be done reliably across the sector.
(There is also arguably a lack of opportunity in certain areas for business development. One model might see comparison services acting almost in the role of “independent” educational advisers, helping guide a potential student to an informed choice, and reaping some benfit from that process. For example, let us crudely model the student application lifecycle as: discovery (where to go/what to do), application, study, completion/graduation, employment. In the discovery phase, services might sell advertising, and pick up affiliate fees for prospectus requests for example. In a mature market, the application phase might also accommodate affiliate or referral fees, for example, based on encouraging applications, or even better, accepted and taken up applications. The financial services industry, for all its sins, supports a variety of models for repaying an agent who signs up a client to a longstading financial product, replete with bonuses and other incentives that encourage the agent to find a product that the client will actually stick with. On completing a degree with a given grade, the agent may get a bonus. (Retention initiatives can start early, arguably before the student even accepts a place at univesity, through helpoing them make a decision regarding a course that is likely to suit them!)) If you can imagine that universities might set up as recruitment agents, taking a fee for placing a graduate in a particular job on graduation, it’s not hard to also imagine that a bonus might be paid from that placement fee to the agent responsible for referring the unergraduate applicant, as was, in the first place.)
13. Institutions will be required to submit data to HEFCE for inclusion in the KIS. Institutions who subscribe to the QAA but who do not currently take part in the NSS and DLHE should take steps to do so.
So a data burden will be placed on institutions to provide information in a standard way to a central aggregating service? Will there be an opportunity for HEIs to publish this data via an open API, and allow HEFCE to pull/harvest the data from there? Or will the data be required to pass from the HEI, through HEFCE so that HEFCE can put a stamp of approval on it, before it is allowed to be branded as part of the instition’s KIS?
14. All KISs should be made available via institutional web-sites by the end of September 2012.
But will the KIS data also support services that allow the direct comparison of KIS data across institutions on first (university), second (HEFCE, UCAS, Unistats, etc.) and third (commercial, or not-for-profit) party sites without having to visit each of those institutions separately?
22. The plans are based on extensive research, consultation and pilot processes. We are very grateful to all who have given their time and views so generously. There were 215 responses to HEFCE 2010/31, all of which have been carefully considered. We have also taken into account: the views of 2,000 prospective and current students on useful information; several expert working groups considering specific parts of the KIS; a pilot with eight institutions; and user testing with more than 200 prospective HE students. We are particularly pleased to have engaged closely with the National Union of Students in this project, and to have received consultation responses from 30 student unions. We have also liaised with the Academic Registrars’ Council, in an attempt to ensure that the next steps are both feasible and proportionate to implement.
I wonder: did they also consult with open data advocates or web development companies who are familiar with putting data to work in a customer-facing, value adding way? To my shame, I didn’t respond – I came across the consultation after it had closed. (Which suggests the consultation didn’t reach out into that part of the open data community I inhabit? Or maybe I did see it and missed/didn’t pick up on the significance of it at the time:-(
27. The consultation made three primary proposals which are summarised in this section. The first question focused on the purposes of providing information about HE. Responses broadly agreed that information about HE has three purposes:
- to inform people about the quality of higher education and, in particular, to give prospective students information that will help them choose what and where to study
- as evidence for quality assurance processes in institutions
- as information that institutions can use to enhance the quality of their HE provision.
29. The consultation proposed that universities’ and colleges’ web-sites should use a standardised way of publishing key pieces of information about each undergraduate course they offer, by using KISs.
30. KISs would make it easier to find information that prospective students have identified as important to their decisions, and which is mostly already available. The categories of information were identified during research undertaken with 2,000 prospective students, current students and careers advisers by Oakleigh Consulting and Staffordshire University15
So the implication here is that I can compare the data, because each university will separately publish a standard set of data in the same format. So to compare 14 different courses across 8 universities, I probably need to have 14 browser windows open on the same screen at the same time?!
31. In parallel to the consultation, a programme of KIS development work was undertaken. This looked specifically at the information items that do not currently exist in a national comparable format (about learning and teaching, assessment, professional accreditation and accommodation costs) and piloting the processes institutions need to undertake to provide these data. There were also user tests with prospective students. For further details see Annex A.
So the consultation looked at what sort of data might be used to enrich the core data set. One might argue that if the core, course data set were available, third party comparison services might already have started to explore various ways of annotating, enriching and pivoting around the data?
36. The principle of the KIS is that it presents information we have identified that prospective students find useful, in a place we know they already look for such information. In summary, this is information on study, satisfaction, costs and employability, presented on the course information sections of institution’s own web-sites.
“[I]n a place we know they already look for such information”: you could read that as being anti-competitive…? I’d also argue that it doesn’t support the ability to make comparisons. I assume that enerfy suppliers and mobile phone operators publish similar sorts of infromation about tariffs on their websites? Why, then, do comparison sites exisit?! I’d argue it’s not because they don’t have KIS tables on their sites (though that may contribute). Rather, it’s easier to make a comparison across sites in the context of a single location. (And here, I fear, I start to smell a trap… Because “a place we know they already look for such information” exits in the form of UCAS…)
46. There will be three categories of learning and teaching activities:
- scheduled learning and teaching activities
- guided independent study
- placement/study abroad.
47. Information on these will be presented in a bar chart, as a proportion of hours, on a year-by-year basis, showing each year/stage of study, rather than aggregated for the course as a whole. For KISs relating to part-time study, three bars should also be provided for a standard undergraduate course, each referring to the time equivalent to one year of study if studied full-time
48. In the interest of providing as much relevant information to the user as possible, a web-link would follow that would lead users to more detailed information. This might be the programme specification or other document, but we would expect this would present more detailed information about learning and teaching, for instance possibly module-level contact hours. This would provide useful contextualised data – something that was a strong theme emerging from consultation responses.
Being able to reliably identify links to programme specifications could be really handy, e.g. for things like the Course Detective approach to custom search engine development…?
67. The salaries for all institutions data will be adjusted to account for regional variations in the salaries earned by graduates in different parts of the country. A link from the KIS to institutional web-sites will enable institutions to provide additional contextual information with particular reference to the different circumstances of different employment sectors (for example the creative industries.)
I can see this causing all sorts of problems when it comes to offering comparisons?
92. Information derived from the NSS and DLHE survey will be presented at course level if sufficient data are available; otherwise NSS and DLHE data will be presented at the most detailed level possible of the Joint Academic Coding System (JACS), subject to the surveys’ response rates and threshold requirements. This information is held by HEFCE and HESA for publicly funded institutions and others that subscribe to HESA.
If a data describing UK HE courses were freely available, work could already have started on this…?
93. Annex C provides a detailed breakdown of the expected coverage of the KIS for HEIs, but
a. The data thresholds we intend to apply to the NSS and DLHE data (which mirror the thresholds we apply on Unistats) mean that roughly one in seven single subject, full-time, first degree KISs will have both DLHE and NSS data available at course level, although in some cases the data presented may need to be aggregated across two years. However, over 95 per cent of KISs will be able to present DLHE or NSS data, or both, when data are included that is aggregated to JACS level 1 and across two years.
b. We expect that about 2 per cent of single subject, part-time, first degree KISs will have full data available; this rises to about 35 per cent when data are included which are aggregated to JACS level 1 and across two years.
c. We expect the KISs where full data are available to cover about 40 per cent of the student body; after allowing for aggregation, the proportion where some data are available is likely to cover over 90 per cent of the student body.
One argument against making a comprehensive course catalogue available under an open public license is that if it were to be used as scaffolding for aggregating different, comparative data sources, lack of coverage over the whole course listing would be confusing and offer a poor user experience. Err…? “[R]oughly one in seven single subject, full-time, first degree KISs will have both DLHE and NSS data available at course level” So that reason isnlt a deal breaker, then?!
95. We recognise that, even aggregating data over years or over JACS levels, there will be, as on Unistats at present, a number of courses for which it will not be possible to provide data derived from the NSS or DLHE due to the small size of the student cohorts concerned. The thresholds for publication reflect both the need to ensure the statistical validity of the information and the need to meet data protection requirements. There will still be elements of the KIS that will be useful to prospective students, but we recognise the need to ensure prospective students do not negatively interpret the absence of data. We will undertake further user testing over the next few months to finalise appropriate explanatory text.
98. Consideration has been given to who should undertake the production of the KISs, and how. Requiring individual institutions to create their own KISs was considered, but it was felt to be problematic because it would place a significant burden on individual institutions and would pose a challenge in controlling the quality of – potentially – several hundred different production processes, hindering the creation of a single, uniform and credible information source. This task therefore needs to be undertaken by a single body.
So institutions are not going to have a new data burden placed on them?
99. The first year of KISs (those to be published in September 2012) will be centrally created by HEFCE in partnership with HESA. From year two onwards it is intended that central creation will pass to HESA.
100. In the first year, HEFCE will draw data from the NSS and DLHE and institutions will provide additional data (as set out in Table 1). Once this has been collated, HEFCE will provide institutions with web code to be inserted appropriately on their own web-sites.
Hmmm.. when I won the TSO Open Up competition, the plan was to get UCAS course code data and then start annotating and enriching it howseover we could. The reason why I wanted the UCAS data is that it provides the scaffolding to build from. The user focus is the course, so it made sense to build up views over the data from the course level. (We could have started trying to build service out at the level of HESA codes, but that wasn’t what the prize was awarded for.) During the competition pitch, I made the claim that course code data was akin to postcode data to the extent that rights over the seemingly most useful identifier space was controlled by a restrictive license. I don’t yet know what services I want to build out over the course code space, but why is that a reason to prevent innovation in the development of services around course codes by locking those codes down?
103. In order for KISs to be published during September 2012, for use by applicants for entry in academic year 2013-14, institutions must submit their data returns to HEFCE by summer 2012.
So the data burden is on the universities?! But the aggregation – where the value is locked up – is under the control of the centre? Hmmm… thinks… SCONUL charge 80 quid (?) for their aggregated report on HE library stats data, but I’ve managed to FOI the return made to SCONUL by individual libraries. So if there is a KIS like return from HEIs to HEFCE, it should be FOIable, and we can create a copy of the aggregate by aggregating FOI requests. Hmmm…
105. In the main, we would expect the KIS to be revised at most annually; however, a system will be set up to enable exceptions to be processed, for example, corrections to be made or financial information updated. More detail will follow in the technical guidance.
Another of the arguments I’ve heard – this time from universities – to explain a university’s unwillingness to publish a course data API or data dumps is that a third party that aggregates data from universities may end up with data that is stale or out of synch with data on the university website. I suspect that a third party would be quicker to respond to changes than once ever 52 weeks…
106. HEFCE is in discussion with the primary providers of institutional data management software to ensure that the new data requirements for the KIS can be incorporated into existing applications as soon as possible.
So how much do we think the thrid party software vendors are going to claim for to make the changes to their systems? And hands up who thinks that those changes will also be antagnostic to developers who might be minded to open up the data via APIs. After all, if you can get data out of your commercially licensed enterprise software via a programmable API, there’s less requirement to stump up the cash to pay for maintenance and the implementation of “additional” features…
107. The KIS will have a strong brand, including a unique logo. This is to ensure that the KIS is as engaging to users as possible, as well as distinguishing it from any other information sources available.
…which sounds to me like someone’s twigged there may be value locked up in the data, and they’re not willing to let it go…
108. A core feature of the KIS is that it is standardised and comparable across HEIs, with consistent branding and presentation. Therefore, in order to avoid confusion, institutions should not publish a document called the KIS or with the KIS logo for any courses where not required.
Brand police… Total ownership. We can haz ur data; we pwn ur data.
110. It is likely the KIS for each course will be available through an embedded ‘widget’ on the institution’s web-site. We do not intend to be prescriptive about where on the web-site this should appear, other than that it should be found near other course information. The widget would contain three items of top-line information, and the option to click through for the full KIS.
Hmmm… did somebody just discover widgets?! So the idea here is to control the brand through a KIS branded widget that can be embedded on University websites?
The obvious question to an open data freak would be: will there be a freely available open API with that, and will the data made available through the API be openly licensed?
The open systems advocate in me would also wonder whether there might be mileage in each insitution publishing it’s own data in an open way via an open API that could be harvested by the central HEFCE aggregator or by a third party. In addition, the KIS data would be available as a service within the institution to institutional developers.
113. In HEFCE 2010/31 we suggested that the KIS should be accessible from the UCAS web-site. Although it was pointed out that not all applications go through UCAS, there was broad support for this approach in the consultation and discussions with UCAS are continuing. UCAS is
keen to link the KIS to its site and to explore the possibility of incorporating a comparison function into its planned ‘course finder’ facility, for all courses there are KISs for (including part-time courses), not just those they process applications for.
So HEFCE want to run a data service…?! Will it be an open data service? Or are HEFCE going to get a copy of the course code scaffolding grail and use it to act as infrastructure for a data service that aggregates and re-presents data that is in part already largely available, albeit in a less structured way, via a branded and content controlled widget?
114. We would also like to work with other organisations that provide student information on HE and other related careers guidance. We are keen to promote and publicise the KIS through the various student web-sites and social media outlets that exist.
So will third parties be encouraged and supported in developing their own takes on enriched KIS data?
116. Because KISs will be created centrally, a central database of KISs will be available. HEPISG needs to consider how to use this information, recognising the Government’s intention that data on publicly funded provision should be available for general use. More information will be published on the HEFCE web-site in due course.
Ah – so the data may be available via an open public license. Tip to the HEPISG folk: why not build an API around the data, and serve the widget from that? Furthermore, by making the core course code data available as a dataset, third parties would be able annotate and enrich that data and serve it as additional information around the “officially sanctioned” KIS data pulled from the API. Finally, a question: if third parties are going to use clustering techniques so that they can provide recommendations on “similar” courses, will they have access to the whole KIS data set so that they can run their own clustering algorithms?!
119. Currently, there is information available via Unistats that will not be available through the KIS. We do not envisage, therefore, that any changes will be made to the Unistats web-site in the KIS’ first year of operation. The focus will be on ensuring that the KIS is available on institutional web-sites as advised in the Oakleigh Consulting and Staffordshire University research, with links to, and from, the UCAS web-site.
So if students want to compare courses, they need to go to N different pages to find the KIS widget on each, and then go and fight with the Unistats website?
120. However, we recognise that, in the longer term, there will be a need to revisit the arrangements to ensure we meet the needs of students for good access to information and that we secure the best use of public money and institutional time. As we move to more established arrangements for the creation and maintenance of the KIS, and look at the use of potential sites for comparing information, we will consider the future of Unistats in the light of the wider policy environment for higher education.
Just open the course/qualification scaffolding data…
123. As well as the KIS and Unistats data, a wider set of information is to be made available by all publicly funded HEIs, FECs with undergraduate provision, and private providers who subscribe to the QAA.
This is the sort of thing third parties might be keen to develop. But to scaffold the collection and delivery of the additional data annotations, the course data could be really handy…
[The paper goes on a bit more, but it's making me angry so I figure I need to take a break!]