Academic Library Usage Data as Reported to SCONUL, via FOI, And a Thought About Whitebox Data Reporting

Something I’ve been meaning to do for ages, but only just got round to starting to do, is to send up trial balloon FOI requests around the data that one public organisation might release to other organisations as part of a formal or templated reporting procedure.

So here’s the first one – an FOI request to the University of Bath Library for a copy of the data they returned to SCONUL for the period 2008/2009 made via MySociety’s WhatDoTheyKnow service – and here’s the response, along with a copy of the return.

(In general, I wonder if it would be more useful to ask for a copy of the document if possible, in the document format it was submitted (for example, a Microsoft Word document, if that was the document type submitted).)

The information reported to SCONUL is not available from SCONUL for free, although aggregated data from across the UK HE sector is available via a paid for report. A copy of the current questionnaire used to collect the data is available.

It seems to me that what requests of this sort do is demonstrate a precedent regarding the release of data that is produced as part of a formal or standardised reporting process that can be used to encourage (oblige?) other institutions in the same sector to make the information available in the same way?

So here’s what I have in mind: a site that collects and collates information about standard reports that are used to transport information between public sector organisations (including copies of the forms used to collate that data), including but not limited to the information/data that public institutions are obliged to return to government or overseeing agencies.

For example, this DCLG list of of the minimum data central Government needs from local authorities is a good start – is there an equivalent for universities (come on, BIS…;-)? [Ah, maybe this is a place to start, at least as far as HESA goes: HEFCE Report 2008: Making your data work for you – Data quality and efficiency in higher education. I imagine there is also a considerable data burden arising from REF reporting?]

As and when reports are demonstrated to be FOIable, their contents also become candidates for open data release. One aim here is to start making data chains visible to the organisations that are producing the data (internal transparency) so that the organisation can become more aware of its own data resources and how they might be used elsewhere within the organisation. (Transparency within the organisation may also lead to a reduction in duplication of effort creating or collating the same data at several different locations within the same organisation?)

The claim I guess I’m making towards this approach to opening up data may be summarised as follows: data that is produced as part of formal reporting and that is FOIable should be made public as a matter of course. As a consequence, there should be little extra effort required to open up the data. Indeed, it may be possible to submit the reports via an open and transparent whitebox reporting process.

[See also: Putting Public Open Data to Work…?]

PS for what it’s worth, I think the SCONUL data application provides another example of a situation where it might be useful to have a WhatDoTheyKnow service that allows you to make the same (bulk) request to every institution in a particular sector (such as universities, or local councils). I can see there may need to be controls around such a service to prevent abuse, but

PPS I wonder, do MySociety license WhatDoTheyKnow to any public institutions to help they manage their FOI process?

PPPS Here’s a related comment I posted to the Public Data Corporation engagement exercise:

Question 5 – What methods of access to datasets would most benefit you or your organisation?

One particular class of data that interests me is data that is:

1) reported by a local organisation to a central body;
2) using a standardised, templated reporting format,
3) and that is FOIable either from the local organisation, and/or from the central body.

For example, in Higher Education, this might include data on library usage as reported to SCONUL, or marketing information about courses submitted to UCAS.

It can often be hard to find out how to phrase an FOI request to obtain this data as submitted, unless you know the type of reporting form used to submit it.

What I would like to see is the Public Data Corporation acting in part as a Public Data Exchange Directory, showing how different classes of public organisation make standard (public data containing) reports to other public organisations, detailing the standard report formats, with names/identifiers for those forms if appropriate, and describing which sections of the report are FOIable. This could also link in to the list of local council data burdens, for example (… and/or the code of practice for local authority transparency (… )

The next step would be to introduce a pubsub (publish-subscribe) model in the reporting chain for reporting documents* that are wholly FOIable. This could happen in several ways:

A) /open report publication/ – the publishing organisation could post their report to their opendata reporting store, and the consuming organisation (the one to which the report was being made) would subscribe to that store, collecting the data from there as it was published; third parties could also subscribe to the local publishing store and be alerted to reports as they are published. If co-publication to the central organisation and the public is not appropriate, the report could be witheld from public/press consumption for a specified period of days, or published to the press but not the public under embargo.

B) /open deposit/ – the publishing organisation publishes the report/data to an open deposit box owned by the central organisation which is receiving the report. After a specified period of time, the report is made public (ie published) via that central deposit box.

C) /data corp in the middle/ – a centralised architecture in which local organisations submit public reports to a Public Data Exchange, which then passes them on to the central body to which reports are made, and publishes them to the public, maybe after a fixed period of time.

The intention of all three approaches described above is to provide an open window onto the reporting chain. At the current time, open public data tends to be data that is published via a separate branch “to the public”. In contrast, the above approach suggests that public data publication acts as a view onto all or part of the data as it goes about it’s daily business being published from one organisation to another. That is, public data publication becomes a “tap” onto a dataflow/workflow process.

If one of the desires for data exploitation is to help introduce efficiencies as well as reuse in data related activities, third parties need to be able to work with data as it currently used.

A final issue relates to the way data is published. The JISC Resource Discovery Taskforce is currently consulting [ http://rdtfmetadata.jiscpress…. ] about metadata standards for describing resources in the Museums, Libraries and Archives field, and work is also ongoing with respect to efficient and complete ways of publishing scientific data. To the extent that generic models or guidance is possible with respect the representation of arbitrary data sets, it may be worth liaising with those working groups on generic guidelines for effective data publishing conventions. [Disclaimer: I am on the RDTF Technical Advisory Group]

* when talking about reports, I include the following sense: where a report is made, it is likely to include summary reports and maybe complete datasets. Ideally, data contained in reports should also be made available as “raw data” in an open data format, for example compliant with two or more stars in the W3C Linked Data 5 star open Linked Data publishing scheme [; ]. In addition, where summary reports appear, referencing views over raw data sets, the queries/database queries that generate the summary report view from the raw data should also be published, thus providing transparency over how the raw data generates summary statistics, for example, in the final report.]</blockquote