Whenever I make a data related FOI request or a request for data within my own organisation, I try to ask for copies of the data as data in the form of a spreadsheet, CSV file or database dump. One advantage of this approach is that it can help establish precedents that argue against the non-disclosure of similar datasets from other organisations. (It also provides an FOI Route to Real (Fake) Open Data via WhatDoTheyKnow, a means by which we can build an index of data files released in response to FOI requests made via WhatDoTheyKnow. I also note a technique from Paul Bradshaw for requesting schemas and data dictionaries as a basis for making specific requests where whole database dump requests may not be possible, or for enriching/decoding datasets that have been released.)
It hadn’t occurred to me to try to request data explicitly (and sneakily?!;-) in a form that if lazily produced might actually contain more data than the publisher intended, but it strikes me that this could be a handy social engineering trick. Take this case, for example, as described in the ICO blog: The risk of revealing too much, which describes in part how responses to FOI requests made via WhatDoTheyKnow may accidentally be revealing personally identifiable information:
The issue relates to responses to freedom of information (FOI) requests provided in spreadsheets, which are inadvertently revealing personal information. Public authorities will often respond to requests by supplying the information requested in spreadsheet format. Sometimes that will be in the form of a ‘pivot table’, which can neatly summarise the information, without revealing the underlying personal information the summary is based on.
Unfortunately, it has come to our attention that public authorities are not always properly removing the underlying data before disclosing. Pivot tables, both in Microsoft Excel and other spreadsheet programs, retain a copy of the source data used. This information is hidden from view, but is easily accessible.
This is just a variant on revealing the document metadata that describes tracked changes in a Word document (e.g. Microsystems white paper: What Lies Beneath YourDocuments May Embarrass, Hurt or Cost You) in that it relies on the user being unaware of how the document is actually structured or what metadata it may contain, and represents another example of how our folk understanding of IT is often at odds to what’s actually going on inside an application.
I don’t know if it’s still the case, but chart objects embedded in a Word doc from an Excel spreadsheet used to carry the original data from which the chart was derived with them, and which therefore provides another way of getting hold of actual data points… (For example, a document publisher may think they are giving you an image of a chart, not appreciating that the chart is actually constructed from data values contained within the chart object.)
Of course, if data were published using simple text formats, such as CSV, there would be nowhere to hide any embarrassing metadata…