Posts Tagged ‘TSOopenup’
I notice that there’s a couple of days left for institutions to get £10k from JISC in order to look at what it would take to start publishing course data via XCRI feeds, with another £40-80k each for up to 80 institutions to do something about it (JISC Grant Funding 8/11: JISC ‘Course Data: Making the most of Course Information’ Capital Programme – Call for Letters of Commitment; see also Immediate Impressions on JISC’s “Course Data: Making the most of Course Information” Funding Call, as well as the associated comments):
As funding for higher education is reduced and the cost to individuals rises, we see a move towards a consumer-led market for education and increased student expectations. One of the key themes about delivering a better student experience discussed in the recent Whitepaper mentions improving the information available to prospective students.
Nowadays, information about a college or university is more likely found via a laptop than in a prospectus. In this competitive climate publicising courses while embracing new technologies is ever more important for institutions.
JISC have made it easier for prospective students to decide which course to study by creating an internationally recognised data standard for course information, known as XCRICAP. This will make transferring and advertising information about courses between institutions and organisations, more efficient and effective.
The focus of this new programme is to enable institutions to publish electronic prospectus information in a standard format for all types of courses, especially online, distance, part time, post graduate and continuing professional development. This standard data could then be shared with many different aggregator agencies (such as UCAS, the National Learning Directory, 14-19 Prospectus websites, or new services yet to be developed) to collect and share with prospective student
All well and good, but:
- there still won’t be a single, centralised directory of UK courses, the sort of thing than can be used to scaffold other services. I know it isn’t perfect, but UCAS has some sort of directory of UK undergrad HE courses that can be applied for via central clearing, but it’s not available as open data.
- the universities are being offered £10k each to explore how they can start to make more of their course data. There seems to be the expectation that some good will follow, and aggregation services will flower around this data (This standard data could then be shared with many different aggregator agencies (such as … new services yet to be developed). I think they might too. (For example, we’re already starting to see sites like Which University? provide shiny front ends to HESA and NSS data.) But why should these aggregation sites have to wait for the universities to scope out, plan, commission, delay and then maybe or maybe not deliver open XCRI feeds. (Hmm, I wonder: does the JISC money place any requirements on universities making their XCRI-CAP feeds available under an open license that allows commercial reuse?)
When we cobbled together the Course Detective search engine, we exploited Google’s index of UK HE websites to provide a search engine that provides a customised search over the course prospectus webpages on UK HE websites. Being a Google Custom Search Engine there’s only so much we can do with it, but whilst we wait for all the UK HEIs to get round to publishing course marketing feeds, it’s a start.
Of course, if we had our own index, we could offer a more refined search service, with all sorts of potential enhancements and enrichment. Which is where copyright kicks in…
…because course catalogue webpages are generally copyright the host institution, and not published under an open license that allows for commercial reuse.
(I’m not sure how the law stands against general indexing for web search purposes vs indexing only a limited domain (such as course catalogue pages on UK HEI websites) vs scraping pages from a limited domain (such as course catalogue pages on UK HEI websites) in order to create a structured search engine over UK HE course pages. But I suspect the latter two cases breach copyright in ways that are harder to argue your way out of then a “we index everything we can find, regardless” web search engine. (I’m not sure how domain limited Google CSEs figure either? Or folk who run searches with the site: limit?))
To kickstart the “so what could we do with a UK wide aggregation of course data?”, I wonder whether UK HEIs who are going to pick up the £10k from JISC’s table might also consider doing the following:
- licensing their their course catalogue web pages with an open, commercial license (no one really understands what non-commercial means…and the aim would be to build sustainable services that help people find courses in a fair (open algorithmic) way that they might want to take…)
- publishing a sitemap/site feed that makes it clear where the course catalogue content lives (as a starter for 10, we have the Course Detective CSE definition file [XML]). That way, the sites could retain some element of control over which parts of the site good citizen scrapers could crawl. (I guess a robots.txt file might also be used to express this sort of policy?)
The license would allow third parties to start scraping and indexing course catalogue content, develop normalised forms of that data, and start working on discovery services around that data. A major aim of such sites would presumably be to support course discovery by potential students and their families, and ultimately drive traffic back to the university websites, or on to the UCAS website. Such sites, once established, would also provide a natural sink for XCRI-CAP feeds as and when they are published (although I suspect JISC would also like to be able to run a pilot project looking at developing an aggregator service around XCRI-CAP feeds as well;-) In addition, the sites might well identify additional – pragmatic – requirements on other sorts of data that might contribute to intermediary course discovery and course comparison sites.
It’s already looking as if the KIS – Key Information Set – data that will supposedly support course choice won’t be as open as it might otherwise be (e.g. Immediate Thoughts on the “Provision of information about higher education”); it would be a shame if the universities themselves also sought to limit the discoverability of their courses via cross-sector course discovery sites…
A couple of weeks or so ago, having picked up the TSO OpenUp competition prize for suggesting that it would be a Good Thing for UCAS/university course code data to be made available, I had a meeting with the TSO folk to chat over “what next?” The meeting was an upbeat one with a plan to get started as soon as possible with a scrape of the the UCAS website… so what’s happened since…?
First up – a reading of the UCAS website Terms and Conditions suggests that scraping is a no-no…
6. Intellectual property rights
e. Copying, distributing or any use of the material contained on the website for any commercial purpose is prohibited.
f. You may not create a database by systematically downloading substantial parts of the website
(In the finest traditions of the web, you aren’t allowed to deep link into the site without permission either: 6.c inks to the website are not permitted, other than links to the homepage for your personal use, except with our prior written permission. Links to the website from within a frameset definition are not permitted except with our prior written permission.)
So, err, I guess my link to the terms and conditions breaks those terms and conditions? Oops…;-) Should I be sending them something like this do you think?
As per your terms and conditions, (paragraph 6 c) please may I publish a link to your terms and conditions web page [ http://www.ucas.com/terms_and_conditions ] in a blog post I am writing that, in part, refers to your terms and conditions?
As a fallback, I put a couple of trial balloon FOI requests in to a couple of universities asking for the course names and UCAS course codes for courses offered in 2010/11, along with the search keywords associated with each course (doh! I did it again, deep linking into the UCAS site…)
PS Please may I also link to the page describing course search keywords [ http://www.ucas.com/he_staff/courses/coursesearchkeywords ] ?
The first request went to the University of Southampton, in part because I knew that they already publish chunks of the data (as data) as part of their #opensoton Open Data initiative. (This probably means I was abusing the FOI system, but a point maybe needed to be made…?!;-) The second request was put in to the University of Bristol.
The requests were of the form:
I would be grateful if you could send me in spreadsheet, machine readable electronic form or plain text a copy of the course codes, course titles and search keywords for each course as submitted to UCAS for the 2010-2011 (October 2010) student entry.
If possible, would you also provide HESA subject category codes associated with each course.
So how did I get on?
Bristol’s response was as follows:
On discussion with our Admissions and Student Information teams, it appears that the University does not actually hold this data – it is held on a UCAS database. UCAS are not currently subject to the Freedom of Information Act (they will be in due course) but it may be worth talking to them directly to see if they are willing to assist.
Course codes and titles may be found here: http://www.soton.ac.uk/corporateservices/foi/request-66210-6124d691.pdf Keywords were not held by the University you should inquire with UCAS (http://www.ucas.com). HESA subject category codes may be found here: http://www.hesa.ac.uk/index.php/content/view/1806/296/
So what did I learn?
- I don’t seem to have made it clear enough to Southampton that I wanted the the 2-tuple (course code, HESA code) for each course. So how should I have asked for that data (the response pointed me to the list of all HESA codes. What I wanted was, for each course code, the course code/HESA code pair).
- Generalising from an example of one;-), there seems to be a disconnect between FOI and open data branches of organisations. In my ideal world, the FOI person (an advocate for the person making the request) would also be on good terms with the Open Data team in the organisation, if not a data wrangler themselves. For data requests, the FOI person would make sure the data is released as open data as part of the process of fulfilling the request and then refer the person making the request to the open data site (see also: Open Data Processes – Taps, Query Paths/Audit Trails and Round Tripping). Southampton have part of this process already – the course data is in a PDF on the their site and I was referred to it. (Note that the PDF is not just any PDF – have a look at it! – rather than the spreadsheet, machine readable electronic form or plain text I requested, even though @cgutteridge had posted a link to the SPARQL opendata query for the course code/UCAS code information I’d requested as a reply to my FOI request on the WhatDoTheyKnow site.)
- Universities don’t necessarily have any record of the search keywords they associate with the courses they post on UCAS. The UCAS website suggests that (doh!) “[r]ecent analysis of unique IP address use of the UCAS Course Search indicates that the subject search is by far the most popular of the 3 search options currently available”, such that “[w]hen an applicant uses our Course Search facility to search for available courses, they can choose a keyword by which to search, known as the ‘subject search’.” Which is to say, universities have no local record of the terms they use to describe courses that are the the primary way of discovering their courses on UCAS? Blimey… (I wonder how much universities spend on Google AdWords for advertising particular courses on their own course prospectus websites and how they go about selecting those terms?)
- Asking for a machine readable “data as data” response has no teeth at the current time. I don’t know if the Protection of Freedoms bill clause that “extends Freedom of Information rights by requiring datasets to be available in a re-usable format” will change this? It seems like it might?
(a) an applicant makes a request for information to a public authority in respect of information that is, or forms part of, a dataset held by the public authority, and
(b) on making the request for information, the applicant expresses a preference for communication by means of the provision to the applicant of a copy of the information in electronic form, the public authority must, so far as reasonably practicable, provide the information to the applicant in an electronic form which is capable of re-use.
So what next? UCAS is a charity that appears to be operated by, for, and on behalf of UK Higher Education (e.g. UCAS Directors’ Report and Accounts 2009). Whilst not FOIable yet, it looked set to become FOIable from October 2011 (Ministry of Justice: Greater transparency in Freedom of Information), though I haven’t been able to find the SI and commencement date that enact this…?). IF it does become FOIable, we may be able to get the data out that way (although memories of the battle between open data advocates and the Ordnance Survey come to mind…) Hopefully, though, we’ll be able to get the data open by more amicable means before then…:-)
PS a couple of other things that I’ve been dipping into relating to this project. Firstly, the UCAS Business Plan 2009-2012 (doh!):
PPS Please may I also link to your Corporate Business Plan 2009-2012 [ http://www.ucas.com/documents/corporate/corpbusplan09-12.pdf ]
Secondly, the Cabinet Office’s “Better Choices: Better Deals” strategy document [PDF], which as well as its “MyData” right to personal data initiative, also encourages business to put their information (and data…) to work. Whether or not you agree that more information may help to make for better choices from potential students, or that comparison sites have a role to play in this, the UK government appears to believe it and looks set to support the development of businesses operating in this area. For example:
Effective consumer choices are also important in the public sector – such as decisions about what and where to study.
However, unlike in private markets, public services are generally:
● Free at the point of delivery, so prices do not give us clues about quality or popularity.
● Not motivated by profits, so there is little incentive to highlight differences and encourage switching.
● Supplied under a universal service obligation, such that they serve a particularly broad range of users, from the very informed to the highly vulnerable.
In the same way that comparison and feedback sites have developed for private markets, some choice-tools have already emerged for public services. For example, parents and prospective students can use league tables to compare school and university performance, while patients can access websites comparing waiting times for treatments across different healthcare providers, and feedback from fellow consumers about the performance of a local GP practice. Their role is likely to become more important in future as public service markets are opened up and there is scope for further choice-tools to be developed [Better Choices: Better Deals, p. 32]
If you’re looking to put a bid or business plan together based on using public data as a basis for comparison services, the Better Choices document has more than a few quotable sections;-)
Here’s the presentation I gave to the judging panel at the TSO OpenUp competition final yesterday. As ever, it doesn’t make sense with[out] (doh!) me talking, though I did add some notes in to the Powerpoint deck: Opening up UCAS Course Code Data
(I had hoped Slideshare would be able to use the notes as a transcript, bit it doesn’t seem to do that, and I can’t see how to cut and paste the notes in by hand?:-(
A quick summary:
The “Big Idea” behind my entry to the TSO competition was a simple one – make UCAS course data (course code, title and institution) avaliable as data. By opening up the data we make it possible for third parties to construct services and applications based around complete data skeleton of all the courses offered for undergraduate entry through clearing in a particular year across UK higher education.
The data acts as scaffolding that can be used to develop consumer facing applications across HE (e.g. improved course choice applications) as well as support internal “vertical” activities within HEIs that may also be transferable across HEIs.
Primary value is generated from taking the course code scaffolding and annotating it with related data. Access to this dataset may be sold on in a B2B context via data platform services. Consumer facing applications with their own revenue streams may also be built on top of the data platform.
This idea makes data available that can potentially disrupt the currently discovery model for course choice and selection (but in its current form, not in university application or enrollment), in Higher Education in the UK.
Here are the notes I doodled to myself in preparation for the pitch. Now the idea has been picked up, it will need tightening up and may change significantly! ;-) Which is to say – in this form, it is just my original personal opinion on the idea, and all ‘facts’ need checking…
But when selected to pitch the idea, it became clear that an application or two were also required, or at least some good business reasons for opening up this data…
So here we go…
Postgraduate students and Open University students do not go through UCAS. Other direct entry routes to higher education courses may also be available.
According to UCAS, in 2010, there were 697,351 applicants with 487,329 acceptances, compared with 639,860 applications and 481,854 acceptances in 2009. [ Slightly different figures in end of cycle report 2009/10? ]
For convenience, hold in mind the thought that course codes could be to course marketing, what postcodes are for geo related applications… They provide a natural identifier that other things can be associated with.
Associated with each degree course is a course code. UCAS course codes are also associated with JACS codes – Joint Academic Coding System identifiers – that relate to particular topics of study. “The UCAS course codes have no meaning other than “this course is offered by this institution for this application cycle”.” link]
“UCAS course code is 4 character reference which can be any combination of letters and numbers.
Each course is also assigned up to three JACS (Joint Academic Coding System) codes in order to classify the course for *J purposes. The JACS system was introduced for 2002 entry, and replaced UCAS Standard Classification of Academic Subjects (SCAS). Each JACS code consists of a single letter followed by 3 numbers. JACS is divided into subject areas, with a related initial letter for each. JACS codes are allocated to courses for the *J return.
The JACS system is used by the Higher Education Statistics Agency (HESA), and is the result of a joint UCAS-HESA subject code harmonization project.
JACS is also used by UK institutions to identify the subject matter of programmes and modules. These institutions include the Department for Innovation, Universities and Skills (DIUS), the Home Office and the Higher Education Funding Council for England (HEFCE).”
Keywords: up to 10 keywords per course are allocated to each course from a restricted list of just over 4,500 valid keywords.
“Main keyword: This is generally a broad subject category, usually expressed as a single word, for example ‘Business’.
Suggested keyword (SUG): Where a search on a main keyword identifies more than 200 courses, the Course Search user is prompted to select from a set of secondary keywords or phrases. These are the more specific ‘Suggested keywords’ attached to the courses identified. For example, ‘Business Administration’ is one of a range of ‘Suggested keywords’ which could be attached to a Business course (there are more than 60 others to choose from). A course in Business Administration would typically have this as the ‘Suggested keyword’, with ‘Business’ as the main keyword.
However, if a course only has a ‘Suggested keyword’ and not a related ‘Main keyword’, the course will not be displayed in any search under the ‘Main keyword’ alone.
Single subject: Main keywords can be ticked as ‘Single subject’. This means that the course will be displayed by a keyword search on the subject, when the user chooses the ‘single subject’ option below. You may have a maximum of two keywords indicated as single subjects per course.”
“Between January and March 2010, approximately 600,000 unique IP addresses access the UCAS course code search function. During the same time period, almost 5 million unique IP addresses accessed the UCAS subject search function.” [link]
“New courses from 2012 will be given UCAS codes that should not be used for subject classification purposes. However, all courses will still be assigned up to three individual JACS3 codes based on the subject content of the course.
An analysis of unique IP address activity on the UCAS Course Search has shown that very few searches are conducted using the course code, compared to the subject search function. UCAS Courses Data Team will be working to improve the subject search and course keywords over the coming year to enable potential applicants to accurately find suitable courses.” [link]
Course code identifiers have an important role to play within a university administrations, for example in marshalling resources around a course, although they are not used by students. (On the other hand, students may have a familiarity with module codes.) Course codes identify courses that are the subject of quality assessment by the QAA. To a certain extent, a complete catalogue of course codes allows third parties to organise offerings based around UK higher education degrees in a comprehensive way and link in to the UCAS application procedure.
– the release of horizontal data across the UK HE sector by HEIs, such as course catalogue information;
– vertical scaffolding within an institution for elaboration by module codes, which in turn may be associated with module descriptions, reading lists, educational resources, etc.
– the development across HE of services supporting student choice – for example “compare the uni” type services
XCRI is JISC’s preferred way of doing this, and I think there has been some lobbying of HEFCE from various JISC projects, but I’m not sure how successful it’s been?
Also context of data burden on HEIs, reporting to Professional, Statutory and Regulatory Bodies – PSURBS.
Reconciliation with HESA Institution and campus identifiers, as well as the JISCMU API and Guardian Datablog Rosetta Stone spreadsheet
By hosting course code data, and using it as scaffolding within a Linked Data cloud around HE courses, a valuable platform service can be made available to HEIs as well as commercial operations designed to support student choice when it comes to selecting an appropriate course and university.
Opening up the data facilitates rapid innovation projects within HEIs, and makes it possible for innovators within an HEI to make progress on projects that span across course offerings even if they don’t have easy access to that data from their own institution.
CompareTheUni has had a holding page up for months – but will it ever launch? Uni&Books crowd sources module codes and associated reading links. Talis Aspire is a commercial reading list system that associates resources with module codes.
Guardian datablog picked up the post, and I still get traffic from there on a daily basis… [link ]
One demonstrator I built used a bookmarklet to annotate UCAS course pages with a link to a resource page showing what books had been borrowed by students on that course at Huddersfiled University. [Link ]
The course codes also provide hooks against which it may be possible to deploy mappings across skills frameworks, e.g. SFIA in IT world. The course codes will also have associated JACS subject code mappings and UCAS search terms, which in turn may provide weak links into other domains, such as the world of books using vocabularies such as the Library of Congress Subject headings and Dewey classification codes.
Marketing of services built on top of the data platform will need to be marketed to the target audience using appropriate channels. Specialist marketers such as Campus Group may be appropriate partners here.
For platform business – e.g. business model based around selling queries on linked/aggregated/mapped datasets. If you imagine a query returning results with several attributes, each result is a row and each attribute is a column, If you allow free access to x thousand query cells returned a day, and then charge for cells above that limit, you:
Encourage wider innovation around your platform; let people run narrow queries or broad queries. License on use of data for folk to use on their own datastores/augmented with their own triples.
Generate revenue that scales on a metered basis according to usage;
– offer additional analytics that get your tracking script in third party web pages, helping train your learning classifiers, which makes platform more valuable.
For a consumer facing application – eg a course choice site for potential appications is the easiest to imagine:
– Short term model would be advertising (e.g. course/uni ads), affiliate fees on booksales for first year books? Seond hand books market eg via Facebook marketplace?
– Medium term – affiliate for for prospectus application/fulfilment
Long term – affiliate fee for course registration