The FOI Route to Real (Fake) Open Data via WhatDoTheyKnow

In FOI Signals on Useful Open Data?, I pondered whether we could make use of information about FOI to help identify what sorts of data folk might actually be interested in by virtue of making Freedom of Information (FOI) requests for that that data.

I couldn’t help but start to try working various elements of that idea through, so here’s a simple baby step to begin with – a scraper on Scraperwiki (Scaperwiki scraper: WhatDOTheyKnow requests) that searches for FOI requests made through WhatDoTheyKnow that got one or more Excel/xls spreasheets back as an attachment.

I’ve also popped up a Scraperwiki view that allows you to view data returning searches made to local councils or universities

Clicking through on an FOI request link takes you to the response that contains the data file, which can be downloaded directly or previewed on Zoho:

It strikes me that if I crawled the response pages, I could build my own index of data files, catalogued according to FOI request titles, in effect generating a “fake” or opendata catalogue as powered by FOI requests…? (What would be really handy in the local council requests would be if the responses were tagged with with appropriate LGSL code or IPSV terms (indexing on the way out) as a form of useful public metadata that can help put the FOI released data to work…?)

Insofar as the requests may or may not be useful as signaling particular topic areas as good candidates as “standard” open data releases, I still need to do some text analysis on the request titles. In the meantime, you can enter a keyword/key phrase in the Request text box in order to filter the table results to only show requests whose title contains the keyword/phrase. (The Council drop down list allows you to filter the table so that it only shows requests for a particular university/council.)

PS via a post on HelpMeInvestigate, I came across this list of FOI responses to requests made to the NHS Prescription Pricing Division. From a quick skim, some of the responses have “data” file attachments, though in the form of PDFs rather than spreadsheets/CSV. However, it would be possible to scrape the pages to at least identify ones that do have attachments (which is a clue they may contain data sets?)

So now I’m wondering – what other bodies produce full lists of FOI requests they have received, along with the responses to them?

PPS See also this search query on FOI Release publications.

FOI Signals on Useful Open Data?

In an Open Data Cities reviewing post (Open data ‘must be driven by need’), Tom Steinberg is quoted as follows:

“A lot of the attitude around open data is what can we give away, what can we give out?” said Steinberg, founder and director of charity mySociety. “Then they say ‘no-one seems to be using it, let’s have a hackday, see if we can create incentives.’ Meanwhile in the freedom of information department there is a pile of requests building up that won’t go away based on real desires – someone really wants to know something.”

The trick for councils will be to train staff right across the authority to spot information requests that could be handled by releasing new types of data, and then to empower someone to help make sure this happens, he said.

Hooking up FOI and open data processes (along with data burden related reporting requirements) seems to be a sensible route to me as a way of trying to identify what data might be usefully opened up as a part of normal workflow.

So how might we go about using FOI signals to identify the sorts of datasets that councils might usefully release?

Defining ‘useful’ is the first step, so for now let’s assume that if someone has gone to the trouble of putting in an FOI request, it’s useful. (If nothing else, publishing the data as part of an opendata process rather than via an FOI request route makes for less work for the FOI department if a similar request is made at a later date, so it may be cost-saving from that point of view too?)

The next step is: where can we find some data relating to FOI requests?

WhatDoTheyKnow, the MySociety site that makes it easy to submit and track FOI requests, seems like a good place to start. To begin with, it’s easy enough to find a list of councils to whom FOI requests have been made via WhatDoTHeyKnow (this information is also available more directly as data by downloading the full list of bodies listed on whatdotheyknow and then extracting items tagged with local_council [A copy of this listing is also available as a db on scraperwiki]).

Interlude: prompted by a comment, here’s a quick poll…

We can now use the unique URL slug/identifier associated with each council to find FOI requests made to that council. Here are a couple of advanced search patterns that may be useful:

(It might also be worth running requests using the keyword data?)

Results are returned in page lengths of 25 items, so to see all results you need to page through them (using a qualifier of the form &page=N for results page N).

Results include a title/link text that identifies the topic of the FOI request, and a link to the request page itself, which logs all correspondence (and returned files) associated with the request.

(Methinks it would be really handy if the search results were made available as JSON feeds…)

If we scrape the link text of successful requests to all of (or a reasonable sample of) the councils, or grab the subject of requests that returned data files, we should be able to do some simple text analysis that might identify recurring topics that are the subject of requests across several councils. This might help signal the sorts of data that is commonly requested across different councils, as well datasets or information that might be a candidate for opening up as “useful” open data or open information. (Of course, it might not turn up anything of interest at all… But the experiment is quite a quick one to run in a basic form…)

A more laboured approach might be to do text analysis over the text of actual requests (or clarifications), but this would involving scraping the WhatDoTheyKnow site rather more intensively (i.e. grabbing each request page rather than just scraping the search results).

Worth doing, or not? Or maybe someone’s tried this approach already? If so, anyone got a link…?

PS Related: a recent report by the National Audit Office on Government progress on its transparency of public information agenda – NAO: Implementing transparency report (NAO: Press Release – Implementing transparency). The report includes a handy timeline over the last few years capturing notable events in the history of UK open public data (my own, less complete version is here: Open Standards and Open Data ).