Immediate Impressions on JISC’s “Course Data: Making the most of Course Information” Funding Call

Notes on the JISC Grant Funding Call 8/11: “Course Data: Making the most of Course Information” Capital Programme – Call for Letters of Commitment

This post builds on quick commentaries around other reports in the area of Higher Education course data: Immediate Thoughts on the “Provision of information about higher education” and Getting Access to University Course Code Data (or not… (yet…))). It doesn’t necessarily represent my own opinions, let alone those of my employer.

1. The Joint Information Systems Committee (JISC) and the Higher Education Funding Council for England (HEFCE) invite English Universities and FE colleges (teaching over 400 HE FTEs) to become involved in a new programme of work which will help prepare the sector for increasing demands on course data.

3. Funding is available for projects starting from Monday 12 September 2011 for an initial period of approximately three months. Projects selected to go forward into Stage 2 will continue for an additional 12 to 15 months. All projects must be complete by 29 March 2013.

So how does this fit with the timeline for HEFCE Key Information Set (KIS) development if the called for work is relevant to that? (Note: HEFCE makes available much of the monies disbursed by JISC, and HEFCE is managing the KIS work directly.)

As soon as possible and not later than the end of September 2011 Technical guidance published by HEFCE
January to March 2012 Submission system open for KISs to be published in September 2012: Institutions submit their data to
HEFCE
June to early July 2012 2012 NSS and DLHE data available to HEFCE
July to August 2012 HEFCE merges data submitted by institutions with 2012 NSS and DLHE data. Institutions quality check and sign off their final
KISs
September 2012 KISs available for institutions to upload. All KISs to be accessible via institutional web-sites by the end of the month

[HEFCE: Provision of information about higher education]

So given the timings, the JISC second phase work looks as if it is supporting processes relating to, and publication of, different sorts of data to the KIS data, although phase 1 work may be relevant to KIS releases?

10. There are 3 main drivers for making it easier for people to find and compare courses:
– prospective fee paying students want to know more about the academic experience a course will provide and be able to compare this with other courses;
– better informed students are more likely to choose a course that they will complete, and be more motivated to achieve better results;
– increased scrutiny by quality assurance agencies and the Government’s requirement for transparency of publicly funded bodies.

11. JISC have made it easier for prospective students to decide which course to study by creating an internationally recognised data standard for course information, known as XCRI-CAP which is conformant with the new European standard for Advertising Learning Opportunities. This will make transferring and advertising information about courses between institutions and organisations more efficient and effective. Placing this data at a consistent COOL URI makes it easier to find.

So there are two end-user groups in mind for the course related information: prospective students, and the scrutineers. XCRI-CAP relates to the publication of information describing at a high level the subject content of a course, rather than the sorts of “metadata” around courses that the KIS will provide. If we were building a course comparison website, the XCRI-CAP data might provide course descriptions relating to a course, whereas the KIS data would provide student satisfaction ratings, teaching hours, assessment strategies, graduate employment rates and salaries. Pricing related information might be common to both sets?

KIS example

KIS example 2
Example of what the KIS display might look like.

Within the university website, developers will be required to identify which course a page relates to, and then call in the appropriate KIS widget from HEFCE or its agent, presumably by passing parameters relating to: institution identifier; course identifier.

In order to display both XCRI-CAP style data and KIS data on the same third party site web page, the third party will need to be able to identify the course identifier and the university identifier. It will also need a way of identifying which course codes are offered by each institution. In order to satisfy requests from potential applicants searching for a particular topic anywhere in the country*, the third party would ideally have access to an index (or at least a comprehensive list either of courses for each institution, or of institutions by course) that allows it to identify and return the set of (institution, course) pairs for which the course satisifes the search term. (Alternatively, for every request, the third party could query every university separately for related courses, aggregate these responses, and then annotate each result with a link to the corresponding KIS information, or its widget.) If the aggregator was to offer a service whereby potential applicants could rank each result according to one or more KIS data elements, it would need to index associate the KIS data relating to each of the courses identified by the (institution, course) pairs with the corresponding pair, and then use this aggregated data set to present the result to the end user. Again, this could be achieved my making separate requests to the KIS information server, once for each (institution, course) pair; or it could draw on its own index of this data if the information was openly licensed.

* when thinking about course selection, I often have four scenarios in mind: a) I know what course I want to do and where I want to do it; b) I know where I want to go but donlt know what course to do; c) I know what course I want to do, but know where to do it; d) I don’t know what course to do or where to do it…

Just by the by, I wonder if the intention of the HEFCE technical working group is to come up with a structured machine readable standard for communicating the KIS information via the KIS widget? That is, will content represented via the KIS widget be marked up in semantic form, or will semantics at the data representation level have to be reverse engineered from the presentation of the information? Where the KIS renders graphical elements, will the charts be generated directly from data transported to the widget, or will the provision simply be flat image files? (Charts displayed in widgets can come in three flavours: 1) as a flat image file with an arbitrary URL (e.g. kisDataImage4.png) (note that data underlying the graph may be described in surrounding metadata, such as within img attribute tags; 2) as an image file generated from data contained within the URL (e.g. as in the mechanism used by the Google Charts API); 3) through the enhancement of data contained within the page (for example, in a Javascript data scructure or an HTML table).

The KIS data only partially overlaps with the XCRI-CAP data, so I wonder: to what extent will it be possible to JOIN the two data sets (that is, how will we be able to link XCRI-CAP and KIS data? Via HEI+coursecode keys, presumably?)

12. The proposed programme will support the sector to prepare for the increasing demand for course information, and increase the availability of high-quality, accurate information about part-time, online and distance learning opportunities offered by UK HEIs by:

– funding institutions to make the process and technical innovations necessary to release a structured, machine-readable feed of their course-related information, and;
– creating a proof-of-concept aggregator and discovery service to bring together this course information and enable prospective students to search it.

So – what I think the JISC are suggesting is that they are looking to fund work on the “wider information set” of information around courses? That JISC are also looking to create a “proof-of-concept aggregator and discovery service to bring together this course information and enable prospective students to search it” sounds interesting. I wonder how this would sit in the context of:

  1. UCAS (which currently concentrates course listings as a basis for a single point of application for entry (how will entry work for the private universities? Cf. also the OU, which has only just started to make use of the UCAS entry route, and which also supports a significant direct entry route onto modules?)
  2. third party services such as ???Hotcourses
  3. custom search engines such as CourseDetective, which search over online course prospectuses (and which cost approx. 2 volunteered FTE days to put together at a hackday…;-)

It’s also worth bearing in mind that my TSO OpenUp competition entry also suggested the opening up of course code scaffolding data so that third parties could start to create aggregated and enriched datasets around courses, as well as building services on top of that data that would potentially be revenue generating and commercially sustainable…

Just on the topic of “wider information sets”, here’s what the HEFCE KIS consultation report had to say on the matter:

The wider information set
32. Higher education providers already publish a wide range of information about their institution and the courses they deliver. The information published has been considered by QAA in the context of institutional audit (for publicly funded higher education institutions and those privately funded providers that subscribe to QAA) or of Integrated Quality and Enhancement Review (for further education colleges (FECs) offering HE courses) and is subject to a ‘comment’ in that context. The consultation proposed that institutions should make this information more public-facing, noting that published information would, in due course, be subject to a judgement in QAA review processes.

33. It was proposed that this wider information set has two purposes: to provide information about higher education to a wide variety of audiences including:
prospective and current students; students’ parents and advisers; employers; the media; and the institution itself to form part of the evidence used in QAA audit and review.

34. The required information set was presented in the consultation document as a minimum requirement, with institutions continuing to publish as much other information as they wished. Institutions were asked to consider whether any of the information could be presented in more accessible ways.

Information about aspects of course/awards (not available in the KIS):

Information to be provided Level of information Availability
prospectuses, programme guides, module descriptors or similar programme specifications;
results of internal student surveys
links with employers – where employers have input into a course or programme (this could be quite a high-level statement)
partnership agreements, links with awarding bodies/delivery partners.
Course/programme level All apart from results of internal surveys to be publicly available
Results of internal surveys should be available internally

[HEFCE: Provision of information about higher education]

If there is such pent-up demand for aggregated course discovery services, then they should also be able to run as commercial services? One thing that I would argue currently limits innovation in this area is access to a comprehensive qualifcation catologue across the UK. UCAS do have this data, and they do sell it. But I want to play with it and see if I can build a service round it, rather than deciding to quit my job, raise finance, buy the data from UCAS and then see if I can make a go of building a commercial service around the data. UCAS would still benefit from traffic driven to the UCAS site for couse registrations. (But then, if aggregators were also aggregating information about courses in the private sector that supported direct entry and did not require central applications and clearing, aggregators might also start recommending courses outside the scope of UCAS…? Hmmm… Becuase the private universities would probably provide a commercial incentive to drive traffic to them in the form of affiliate fees based on registrations resulting from referrals… Hmmm… This is all starting to put me in mind of things like FOTA, Formula One and the FIA…!)

Another route to a comprehensive course catalogue is through indexing catalogue feeds (akin to website sitemap feeds that detail all the pages on a website to make it easy for search engines to index them) published directly by the universities, such as XCRI-CAP feeds…

13. The availability of useable course data feeds, and the demonstration of the proof-of-concept aggregator, is intended to provide a catalyst to the feeds being used within existing aggregators, catalogues or information, advice and guidance services, or to form the basis of new services.

I’m not sure an incentive is required.. just open access to the data, free in the first instance. (And if companies do start to make money from it, then license fees can kick in. I don’t think people would have a problem with that…)

15. Between September 2011 and March 2013, JISC intends to fund projects that help institutions review and adapt their internal processes to permit easier access to their course data to meet the needs of various stakeholders. As a minimum, and to provide a clear focus for this overarching activity, the programme will concentrate on the implementation of an XCRI-CAP standard system-generated feed. The programme will be staged to ensure maximum benefit is achieved.

If this data is already exposed via online course prospectuses, a developer with data scraper in hand could probably get a large chunk of this data anyway over the next three to six months. (The CourseDetective CSE definition file already provides a basis for anyone wanting to spider university course catalogues… Hmmm… maybe that’s a good reason for me to get to grips with Lucene…? Ideally, course prospectuses would also produce a sitemap (or XCRI) feed providing URLs for all the course pages currently published via the online prospectus to make it easy for third parties to index, or harvest, this data. The provision of semantic markup in a page, whether through RDFa, microformats, microdata or metadata would also simplify the sctaping (i.e. machibe parseability) of the course pages. At the very least, using template based, sensibly structured presentation markup that enforces markup conventions that suggest de facto semantics makes pages reliably scrapeable and provides one way of supporting the harvesting of data (if license conditions allow…)) Because, of course, a major why potentially commercial services don’t just scrape the data to build course comparison sites relates to the licensing/copyright restrictions that may exist, deliberately or by default, over the university prospectus data that is published online… (Not everyone’s a pirate;-)

16. In Stage 1, institutions will review the maturity of their management of course data using the XCRI Self Assessment Framework. This could cover the full course data life cycle, but must include a particular focus on prospectus and course advertising information. Based on the outcomes of this review, institutions will produce an implementation plan for how they will improve processes to, as a minimum, create a system-generated course advertising feed in a XCRI CAP 1.2 format with a COOL URI.

Ah, ha….

So I wonder, would JISC indemnify a third party looking to scrape, aggregate, and republish this data in a standard form via an open API and a permissive license, against actions taken against them by UCAS and the universities for breach of copyright?! I also wonder whether JISC will be providing guidance about what license conditions they expect XCRI-CAP data to be published under? Or is that out of scope?

19. The anticipated outcomes from this programme of work are:
– There will be increased usage of appropriate technology to streamline course data processes leading to:
— More standardised, and therefore comparable, course information in a consistent location making discovery easier.
— Improved quality and therefore more efficient and effective course data.
— Increased ease in finding and comparing courses, especially types of courses that are currently hard to find, such as ones delivered by distance learning.

– Institutions are able to make appropriate and informed decisions about their processes for managing course-related data, leading to a reduced administrative data burden, cost-effective working, and better business intelligence.

Ah… this is actually different to getting the data out there, then, in a way that third parties can use it? It’s more about tweaking systems and processes inside the institution to support the provisioning of data in ways that make it more accessible to third party aggregators? The course aggregator is then a red herring – it’s just there to provide a reference/candidate client/consumer against which the released data can be targeted.

25. There will be a support and synthesis project that will be working with projects from the start of the programme to help them shape their implementation plans in Stage 1 and other outputs in Stage 2 that are of most use to the sector. Projects are expected to engage with the support and synthesis project and to be proactive in sharing outputs throughout the project. This information will be synthesised and shared with the sector; where that information is sensitive, it will be shared in an aggregated, anonymised form.

A “support and synthesis project” within JISC presumably, (i.e. run by the usual suspects)? Rather than sponsoring and indemnifying the open data community on the one hand, or encouraging potential startups on the other, to start building user facing (potential student) services, along with the necessary business model that will make them sustainable, and maybe even profitable?

26. Funding is provided to enable institutions to carry out project work, but also to release key staff to prepare for, take part in and follow up on these programme-level activities. Projects should allow at least 5 person-days in Stage 1 and 10 person-days in Stage 2.

Such is the price of funding HE based developer activities. 5 days project work: £10k. 10 days project work: £40k-80k. So now you know…

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

5 thoughts on “Immediate Impressions on JISC’s “Course Data: Making the most of Course Information” Funding Call”

  1. Thanks for a stimulating ‘impression’, as ever, Tony. I offer a few comments on your impressions – hope these are an aid to your thoughts.

    The JISC call doesn’t fit in with the KIS timeline – that’s not explicit or implicit, as far as I know. The two initiatives have separate masters and were launched with separate objectives. I, and others, have been trying to link them together, with varying degrees of success.

    The KIS data and, what might be called ‘normal’ XCRI-CAP data, are different. The latter is primarily text-based descriptive data of the course content variety, whereas KIS data is primarily statistical and numerical. Pricing related information might be common to both sets in the future – with so little variation in HE fees in the past, it hasn’t been an issue in XCRI-CAP data. XCRI-CAP has a cost element for this information – KIS *may* be more sophisticated, but it’s a complicated area, owing to bursaries, scholarships, and so on.

    “…if the information was openly licensed.” Indeed so.

    “to what extent will it be possible to JOIN the two data sets (that is, how will we be able to link XCRI-CAP and KIS data? Via HEI+coursecode keys, presumably?).” UCAS course codes are poor course identifiers. In the XCRI-CAP standard we encourage institutions to use permanent course URIs. It would be useful if the KIS also used the same URIs.

    The context of JISC’s proof-of-concept aggregator is that the call refers specifically to part-time, online / distance learning and CPD courses, rather than “UCAS courses” (UG FT). These course types are also relatively poorly covered by other third party services such as Hotcourses (though the OU courses are there of course).

    “If there is such pent-up demand for aggregated course discovery services, then they should also be able to run as commercial services?” On the contrary, recently we’ve seen the loss of several aggregator services, and the reduction in the covering of the old National Learning Directory. However, we also add in the KIS (which is new) and the enhanced capability offered by linked data. It seems likely, to me, that we’ll get new services sprouting up, as long as the data is open.

    “a comprehensive qualifcation catologue across the UK. UCAS do have this data, and they do sell it.” Well, in fact UCAS doesn’t have a comprehensive qualification catalogue – they have a database of course entry points. And at the moment it covers primarily undergraduate full time courses (with some small additions, such as postgrad teacher training). I’m not sure how much UCAS would benefit from more traffic – for applications to courses in the UCAS scheme, applicants are supposed to register at UCAS (and the vast majority do); for applications not in the UCAS scheme, the traffic is irrelevant. Many private universities are UCAS members; the marketing pull of UCAS is such that the vast majority of UK HEIs is in full membership – the OU is an exception for various reasons.

    “I’m not sure an incentive is required.. just open access to the data, free in the first instance.” I don’t think there’s a particular incentive. Various XCRI-CAP feeds have been available for years, in particular the OU ones, but few have used them. A major problem may be the lack of a network effect. There’s not much point in being in a tiny network. The JISC call aims to resolve that problem. I totally agree that free and open access to the feeds is essential. Plus a number of people prepared to play with the data and create useful services. We have a group in the East Midlands (led by Kirstie Coolin at University of Nottingham’s Centre for International ePortfolio Development, and including myself) that has produced several joined up services using this type of data, and adding in LMI, jobs, etc, etc. Developing production level services and making it all sustainable is not simple however. Loss of the Lifelong Learning Networks and Aimhigher has severely dented the construction of local and regional networks.

    JISC won’t own any of the data, the institutions will. The XCRI-CAP initiative makes no recommendations about copyright. Personally I’d expect every institution to want to make its courses data available freely, but I’ve heard arguments against it (mainly bad ones). So licencing will depend on institutions (as now for courses information on HEI websites).

    The focus of the call is on getting institutions to put their data out there. Personally I think we need to also address community needs as well, in particular ‘communities of practice’ that might want data sliced and diced already, so that service construction can be simplified, and quality and consistency of data across feeds can be increased. My own view is that a community of practice could agree (more or less formally) a data specification that HEIs could serve up, possibly through brokers or other mapping and transformation services. My hope is that these will grow up organically – but there’s very little soil for this.

    I think you’ve misunderstood the support and synthesis project. It’s a project within the JICS XCRI programme to support the projects and synthesise the project’s issues and lessons learned. It won’t be designed to do anything with data or to design services. With potentially 80 institutional based projects, it’s a way of giving support to those inexperienced with XCRI-CAP or with JISC project methods.

    “Projects should allow at least 5 person-days in Stage 1 and 10 person-days in Stage 2.” This means that projects have to allow this amount of time for programme-level activities *in*addition*to* the project work. These are days for liaison with other projects and with the support and synthesis project. I suspect that the rates for these days will be very low.

    Alan

    1. @alan thanks for the comprehensive comments… I would normally have done a bit (maybe that should have been a lot?!) more fact checking/linking, as well as a couple of read-thru-and–tweak sessions, but the post was written in the absence of a net connection and uploaded opportunistically when I did find time and bandwidth together!

  2. I was going to leave a well thought out and insightful comment, but Alan has covered pretty much everything I wanted to say!

    Nice post, though!

Comments are closed.

%d bloggers like this: