A couple of weeks or so ago, having picked up the TSO OpenUp competition prize for suggesting that it would be a Good Thing for UCAS/university course code data to be made available, I had a meeting with the TSO folk to chat over “what next?” The meeting was an upbeat one with a plan to get started as soon as possible with a scrape of the the UCAS website… so what’s happened since…?
First up – a reading of the UCAS website Terms and Conditions suggests that scraping is a no-no…
6. Intellectual property rights
e. Copying, distributing or any use of the material contained on the website for any commercial purpose is prohibited.
f. You may not create a database by systematically downloading substantial parts of the website
(In the finest traditions of the web, you aren’t allowed to deep link into the site without permission either: 6.c inks to the website are not permitted, other than links to the homepage for your personal use, except with our prior written permission. Links to the website from within a frameset definition are not permitted except with our prior written permission.)
So, err, I guess my link to the terms and conditions breaks those terms and conditions? Oops…;-) Should I be sending them something like this do you think?
As per your terms and conditions, (paragraph 6 c) please may I publish a link to your terms and conditions web page [ http://www.ucas.com/terms_and_conditions ] in a blog post I am writing that, in part, refers to your terms and conditions?
As a fallback, I put a couple of trial balloon FOI requests in to a couple of universities asking for the course names and UCAS course codes for courses offered in 2010/11, along with the search keywords associated with each course (doh! I did it again, deep linking into the UCAS site…)
PS Please may I also link to the page describing course search keywords [ http://www.ucas.com/he_staff/courses/coursesearchkeywords ] ?
The first request went to the University of Southampton, in part because I knew that they already publish chunks of the data (as data) as part of their #opensoton Open Data initiative. (This probably means I was abusing the FOI system, but a point maybe needed to be made…?!;-) The second request was put in to the University of Bristol.
The requests were of the form:
I would be grateful if you could send me in spreadsheet, machine readable electronic form or plain text a copy of the course codes, course titles and search keywords for each course as submitted to UCAS for the 2010-2011 (October 2010) student entry.
If possible, would you also provide HESA subject category codes associated with each course.
So how did I get on?
Bristol’s response was as follows:
On discussion with our Admissions and Student Information teams, it appears that the University does not actually hold this data – it is held on a UCAS database. UCAS are not currently subject to the Freedom of Information Act (they will be in due course) but it may be worth talking to them directly to see if they are willing to assist.
Course codes and titles may be found here: http://www.soton.ac.uk/corporateservices/foi/request-66210-6124d691.pdf Keywords were not held by the University you should inquire with UCAS (http://www.ucas.com). HESA subject category codes may be found here: http://www.hesa.ac.uk/index.php/content/view/1806/296/
So what did I learn?
- I don’t seem to have made it clear enough to Southampton that I wanted the the 2-tuple (course code, HESA code) for each course. So how should I have asked for that data (the response pointed me to the list of all HESA codes. What I wanted was, for each course code, the course code/HESA code pair).
- Generalising from an example of one;-), there seems to be a disconnect between FOI and open data branches of organisations. In my ideal world, the FOI person (an advocate for the person making the request) would also be on good terms with the Open Data team in the organisation, if not a data wrangler themselves. For data requests, the FOI person would make sure the data is released as open data as part of the process of fulfilling the request and then refer the person making the request to the open data site (see also: Open Data Processes – Taps, Query Paths/Audit Trails and Round Tripping). Southampton have part of this process already – the course data is in a PDF on the their site and I was referred to it. (Note that the PDF is not just any PDF – have a look at it! – rather than the spreadsheet, machine readable electronic form or plain text I requested, even though @cgutteridge had posted a link to the SPARQL opendata query for the course code/UCAS code information I’d requested as a reply to my FOI request on the WhatDoTheyKnow site.)
- Universities don’t necessarily have any record of the search keywords they associate with the courses they post on UCAS. The UCAS website suggests that (doh!) “[r]ecent analysis of unique IP address use of the UCAS Course Search indicates that the subject search is by far the most popular of the 3 search options currently available”, such that “[w]hen an applicant uses our Course Search facility to search for available courses, they can choose a keyword by which to search, known as the ‘subject search’.” Which is to say, universities have no local record of the terms they use to describe courses that are the the primary way of discovering their courses on UCAS? Blimey… (I wonder how much universities spend on Google AdWords for advertising particular courses on their own course prospectus websites and how they go about selecting those terms?)
- Asking for a machine readable “data as data” response has no teeth at the current time. I don’t know if the Protection of Freedoms bill clause that “extends Freedom of Information rights by requiring datasets to be available in a re-usable format” will change this? It seems like it might?
(a) an applicant makes a request for information to a public authority in respect of information that is, or forms part of, a dataset held by the public authority, and
(b) on making the request for information, the applicant expresses a preference for communication by means of the provision to the applicant of a copy of the information in electronic form, the public authority must, so far as reasonably practicable, provide the information to the applicant in an electronic form which is capable of re-use.
So what next? UCAS is a charity that appears to be operated by, for, and on behalf of UK Higher Education (e.g. UCAS Directors’ Report and Accounts 2009). Whilst not FOIable yet, it looked set to become FOIable from October 2011 (Ministry of Justice: Greater transparency in Freedom of Information), though I haven’t been able to find the SI and commencement date that enact this…?). IF it does become FOIable, we may be able to get the data out that way (although memories of the battle between open data advocates and the Ordnance Survey come to mind…) Hopefully, though, we’ll be able to get the data open by more amicable means before then…:-)
PS a couple of other things that I’ve been dipping into relating to this project. Firstly, the UCAS Business Plan 2009-2012 (doh!):
PPS Please may I also link to your Corporate Business Plan 2009-2012 [ http://www.ucas.com/documents/corporate/corpbusplan09-12.pdf ]
Secondly, the Cabinet Office’s “Better Choices: Better Deals” strategy document [PDF], which as well as its “MyData” right to personal data initiative, also encourages business to put their information (and data…) to work. Whether or not you agree that more information may help to make for better choices from potential students, or that comparison sites have a role to play in this, the UK government appears to believe it and looks set to support the development of businesses operating in this area. For example:
Effective consumer choices are also important in the public sector – such as decisions about what and where to study.
However, unlike in private markets, public services are generally:
● Free at the point of delivery, so prices do not give us clues about quality or popularity.
● Not motivated by profits, so there is little incentive to highlight differences and encourage switching.
● Supplied under a universal service obligation, such that they serve a particularly broad range of users, from the very informed to the highly vulnerable.
In the same way that comparison and feedback sites have developed for private markets, some choice-tools have already emerged for public services. For example, parents and prospective students can use league tables to compare school and university performance, while patients can access websites comparing waiting times for treatments across different healthcare providers, and feedback from fellow consumers about the performance of a local GP practice. Their role is likely to become more important in future as public service markets are opened up and there is scope for further choice-tools to be developed [Better Choices: Better Deals, p. 32]
If you’re looking to put a bid or business plan together based on using public data as a basis for comparison services, the Better Choices document has more than a few quotable sections;-)