One of the ways of finding data related files scattered around an organisations website is to run a web search using a search limit that specifies a data-y filetype, such as
xlsx for an Excel spreadsheet (
xls are also good candidates). For example, on the Parliament website, we could run a query along the lines of
filetype:xlsx site:parliament.uk and then opt to display the omitted results:
Taken together, these files form an ad hoc datastore (e.g. as per this demo on using FOI response on WhatDoTheyKnow as an “as if” open datastore).
Looking at the URLs, we see that data containing files are strewn about the online Parliamentary estate (that is, the website;-)…
Freedom of Information Related Datasets
Responses to Written Questions often come with datafile attachments.
These are files are posted to the subdomain http://qna.files.parliament.uk/qna-attachments.
Looking at the actual URL , something like http://qna.files.parliament.uk/qna-attachments/454264/original/28152%20-%20table.xlsx, it looks as if some guesswork is required generating the URL from the data contained in the API response? (For example, how might original attachments might distinguish from other attachments (such as “revised” ones, maybe?).)
The data files also appear on the http://qna.files.parliament.uk/ subdomain although it looks like they’re on a different path to the answered question attachments (http://qna.files.parliament.uk/ws-attachments compared to http://qna.files.parliament.uk/qna-attachments). This subdomain doesn’t appear to have the data files indexed and searchable on Google? I don’t see a Written Statements API on http://explore.data.parliament.uk/ either?
Deposited papers often include supporting documents, including spreadsheets.
Files are located under http://data.parliament.uk/DepositedPapers/Files/:
At the current time there is no API search over deposited papers.
A range of documents may be associated with Committees, including reports, responses to reports, and correspondence, as well as evidence submissions. These appear to mainly be PDF documents. Written evidence documents are rooted on
http://data.parliament.uk/writtenevidence/committeeevidence.svc/evidencedocument/ and can be found from committee written evidence web (HTML) pages rooted on the same path (example).
A web search for
site:parliament.uk inurl:committee (filetype:xls OR filetype:csv OR filetype:xlsx) doesn’t turn up any results.
Parliamentary Research Briefings
Research briefings are published by Commons and Lords Libraries, and may include additional documents.
Briefings may be published along with supporting documents, including spreadsheets:
The files are published under the following subdomain and path: http://researchbriefings.files.parliament.uk/.
The file attachments URLs can be found via the Research Briefings API.
This response is a cut down result – the full resource description, including links to supplementary items, can be found by keying on the numeric identifier from the URI
_about which the “naturally” identified resource (e.g. SN06643) is described.
Data files can be found variously around the Parliamentary website, including down the following paths:
http://qna.files.parliament.uk/qna-attachments(appear in Written Answers API results);
http://researchbriefings.files.parliament.uk/(appear in Research Briefings API results)
(I don’t think the API supports querying resources that specifically include attachments in general, or attachments of a particular filetype?)
What would be nice would be support for discovering some of these resources. A quick way in to this would be the ability to limit search query responses to webpages that link to a data file, on the grounds that the linking web page probably contains some of the keywords that you’re likely to be searching for data around?