A post on the Guardian Datablog yesterday (Higher education funding: which institutions will be affected?) alerted me to the release of HEFCE’s “provisional allocations of recurrent funding for teaching and research, and the setting of student number control limits for institutions, for academic year 2012-13” (funding data).
Here are the OU figures for teaching:
|Funding for old-regime students (mainstream)||Funding for old-regime students (co-funding)||High cost funding for new-regime students||Widening participation||Teaching enhancement and student success||Other targeted allocations||Other recurrent teaching grants||Total teaching funding|
HEFCE preliminary teaching funding allocations to the Open University, 2012-13
Of the research funding for 2012-13, mainstream funding was 8,030,807, the RDP supervision fund came in at 1,282,371, along with 604,103 “other”, making up the full 9,917,281 research allocation.
Adding Higher Education Innovation Funding of 950,000, the OU’s total allocation was 139,714,060.
So what other funding comes into the universities from public funds?
Open Spending publishes data relating to spend by government departments to named organisations, so we can search that for data spent by government departments with the universities (for example, here is a search on OpenSpending.org for “open university”:
Given the amounts spent by public bodies on consultancy (try searching OpenCorporates for mentions of PriceWaterhouseCoopers, or any of EDS, Capita, Accenture, Deloitte, McKinsey, BT’s consulting arm, IBM, Booz Allen, PA, KPMG (h/t @loveitloveit)), university based consultancy may come in reasonably cheaply?
The universities also receive funding for research via the UK research councils (EPSRC, ESRC, AHRC, MRC, BBSRC, NERC, STFC) along with innovation funding from JISC. Unpicking the research council funding awards to universities can be a bit of a chore, but scrapers are appearing on Scraperwiki that make for easier access to individual grant awards data:
- AHRC funding scraper; [grab data using queries of the form select * from `swdata` where organisation like "%open university%" on scraper arts-humanities-research-council-grants]
- EPSRC funding scraper; [grab data using queries of the form select * from `grants` where department_id in (select distinct id as department_id from `departments` where organisation_id in (select id from `organisations` where name like "%open university%")) on scraper epsrc_grants_1]
- ESRC funding scraper; [grab data using queries of the form select * from `grantdata` where institution like "%open university%" on scraper esrc_research_grants]
- BBSRC funding [broken?] scraper;
- NERC funding [broken?] scraper;
- STFC funding scraper; [grab data using queries of the form select * from `swdata` where institution like "%open university%" on scraper stfc-institution-data]
In order to get a unified view over the detailed funding of the institutions from these different sources, the data needs to be reconciled. There are several ID schemes for identifying universities (eg UCAS or HESA codes; see for example GetTheData: Universities by Mission Group) but even official data releases tend not make use of these, preferring instead to rely solely on insitution names, as for example in the case of the recent HEFCE provisional funding data release [DOh! This is not the case – identifiers are there, apparently (I have to admit, I didn’t check and was being a little hasty… See the contribution/correction from David Kernohan in the comments to this post…]:
For some time, I’ve been trying to put my finger on why data releases like this are so hard to work with, and I think I’ve twigged it… even when released in a spreadsheet form, the data often still isn’t immediately “database-ready” data. Getting data from a spreadsheet into a database often requires an element of hands-on crafting – coping with rows that contain irregular comment data, as well as handling columns or rows with multicolumn and multirow labels. So here are a couple of things that would make life easier in the short term, though they maybe don’t represent best practice in the longer term…:
1) release data as simple CSV files (odd as it may seem), because these can be easily loaded into applications that can actually work on the data as data. (I haven’t started to think too much yet about pragmatic ways of dealing with spreadsheets where cell values are generated by formulae, because they provide an audit trail from one data set to derived views generated from that data.)
2) have a column containing regular identifiers using a known identification scheme, for example, HESA or UCAS codes for HEIs. If the data set is a bit messy, and you can only partially fill the ID column, then only partially fill it; it’ll make life easier joining those rows at least to other related datasets…
As far as UK HE goes, the JISC monitoring unit/JISCMU has a an api over various administrative data elements relating to UK HEIs (eg GetTheData: Postcode data for HE and FE institutes, but I don’t think it offers a Google Refine reconciliation service, (ideally with some sort of optional string similarity service)…? Yet?! ;-) maybe that’d make for a good rapid innovation project???
PS I’m reminded of a couple of related things: Test Your RESTful API With YQL, a corollary to the idea that you can check your data at least works by trying to use it (eg generate a simple chart from it) mapped to the world of APIs: if you can’t easily generate a YQL table/wrapper for it, it’s maybe not that easy to use? 2) the scraperwiki/okf post from @frabcus and @rufuspollock on the need for data management systems not content management systems.
PPS Looking at the actual Guardian figures reveals all sorts of market levers appearing… Via @dkernohan, FT: A quiet Big Bang in universities