In an Open Data Cities reviewing post (Open data ‘must be driven by need’), Tom Steinberg is quoted as follows:
“A lot of the attitude around open data is what can we give away, what can we give out?” said Steinberg, founder and director of charity mySociety. “Then they say ‘no-one seems to be using it, let’s have a hackday, see if we can create incentives.’ Meanwhile in the freedom of information department there is a pile of requests building up that won’t go away based on real desires – someone really wants to know something.”
The trick for councils will be to train staff right across the authority to spot information requests that could be handled by releasing new types of data, and then to empower someone to help make sure this happens, he said.
Hooking up FOI and open data processes (along with data burden related reporting requirements) seems to be a sensible route to me as a way of trying to identify what data might be usefully opened up as a part of normal workflow.
So how might we go about using FOI signals to identify the sorts of datasets that councils might usefully release?
Defining ‘useful’ is the first step, so for now let’s assume that if someone has gone to the trouble of putting in an FOI request, it’s useful. (If nothing else, publishing the data as part of an opendata process rather than via an FOI request route makes for less work for the FOI department if a similar request is made at a later date, so it may be cost-saving from that point of view too?)
The next step is: where can we find some data relating to FOI requests?
WhatDoTheyKnow, the MySociety site that makes it easy to submit and track FOI requests, seems like a good place to start. To begin with, it’s easy enough to find a list of councils to whom FOI requests have been made via WhatDoTHeyKnow (this information is also available more directly as data by downloading the full list of bodies listed on whatdotheyknow and then extracting items tagged with local_council [A copy of this listing is also available as a db on scraperwiki]).
Interlude: prompted by a comment, here’s a quick poll…
We can now use the unique URL slug/identifier associated with each council to find FOI requests made to that council. Here are a couple of advanced search patterns that may be useful:
- filetype:xls requested_from:kent_county_council – search a particular council for requests that had an Excel spreadsheet file (presumably: some data…) in response
- status:successful requested_from:kent_county_council – search for successfully handled requests
(It might also be worth running requests using the keyword data?)
Results are returned in page lengths of 25 items, so to see all results you need to page through them (using a qualifier of the form &page=N for results page N).
Results include a title/link text that identifies the topic of the FOI request, and a link to the request page itself, which logs all correspondence (and returned files) associated with the request.
(Methinks it would be really handy if the search results were made available as JSON feeds…)
If we scrape the link text of successful requests to all of (or a reasonable sample of) the councils, or grab the subject of requests that returned data files, we should be able to do some simple text analysis that might identify recurring topics that are the subject of requests across several councils. This might help signal the sorts of data that is commonly requested across different councils, as well datasets or information that might be a candidate for opening up as “useful” open data or open information. (Of course, it might not turn up anything of interest at all… But the experiment is quite a quick one to run in a basic form…)
A more laboured approach might be to do text analysis over the text of actual requests (or clarifications), but this would involving scraping the WhatDoTheyKnow site rather more intensively (i.e. grabbing each request page rather than just scraping the search results).
Worth doing, or not? Or maybe someone’s tried this approach already? If so, anyone got a link…?
PS Related: a recent report by the National Audit Office on Government progress on its transparency of public information agenda – NAO: Implementing transparency report (NAO: Press Release – Implementing transparency). The report includes a handy timeline over the last few years capturing notable events in the history of UK open public data (my own, less complete version is here: Open Standards and Open Data ).
Excellent post Tony!
As someone who works in a local authority on the transparency and open data agenda when Tom Steinberg suggested that “the trick for councils will be to train staff right across the authority to spot information requests that could be handled by releasing new types of data, and then to empower someone to help make sure this happens” it immediately seemed like the right idea. It’s simple and obvious but not something we were giving any priority.
I’d already been playing with the whatdotheyknow data along with a copy of our internal FOI tracker information to see if I could spot any patterns with our FOI requests so Tony’s post here is very timely. I think there could be potential in this approach.
If somehow we could rank FOI data requests by popularity then I would certainly commit to try and get the top ones released locally. As these releases would reflect demand there is an improved chance that something exciting could happen with them. I’m sure that I wouldn’t be alone in welcoming some ideas in what datasets to target for release.
From my analysis, less than 10% of our FOI requests come from whatdotheyknow. I don’t know if this typical of other local authorities?
I suspect one of the problems will be with matching what is in effect the same dataset but described very differently.
And finally just to note that you can get the json link by viewing the page source of the search results.
@Martin – thanks for the comment. I suspect that FOI, open data, day-to-day reporting as part of the data burden, and requests made under the Data Protection Act are often viewed completely independently of each other whereas to a data junkie they are all places to look for data… (I hedged around this in “Academic Library Usage Data as Reported to SCONUL, via FOI, And a Thought About Whitebox Data Reporting” https://blog.ouseful.info/2011/03/18/academic-library-usage-data-as-reported-to-sconul-via-foi-and-a-thought-about-whitebox-data-reporting/ and “Putting Public Open Data to Work…?” https://blog.ouseful.info/2011/01/21/putting-public-open-data-to-work/ though I don’t seemed to have expressed the exact sentiment that Tom Steinberg did… Sigh..;-)
The popularity thing is an odd one. What is it that defines a dataset “popular” or as having some sort of value that can be unlocked that outweighs the cost of releasing it? One thing I’m taken by is how you can see how services such as RateMyPlace could scale to a national level, whilst still be locally relevant/useful. That is, some locally produced datasets may have most value realised when aggregated with datasets from other councils. I guess this is one thing that a review of WhatDoTheyKnow data across the board might help reveal? Something else that a review of WhatDoTheyKnow might reveal are queries where several individuals make similar requests. This may be reflective of a local or national campaign (in which case there may be justification for a timely data release on grounds of transparency, consultation and public engagement) or it may reflect that there is demand at an individual citizen level for a particular class of data or information.
The 10% thing is interesting too… I’ll see if I can add a quick poll to the post…
As to the JSON feed – doh! I’d missed the /feed/ path selector and instead just tried adding a .json suffix on to the search page results URL. From a quick look, though, the search results JSON feed is only one page long, and doesn’t provide metadata about how many results pages there are or a way of accessing other pages???