Skip to content

OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education and data journalism

Disclosing More Information Than You Intended

Whenever I make a data related FOI request or a request for data within my own organisation, I try to ask for copies of the data as data in the form of a spreadsheet, CSV file or database dump. One advantage of this approach is that it can help establish precedents that argue against the non-disclosure of similar datasets from other organisations. (It also provides an FOI Route to Real (Fake) Open Data via WhatDoTheyKnow, a means by which we can build an index of data files released in response to FOI requests made via WhatDoTheyKnow. I also note a technique from Paul Bradshaw for requesting schemas and data dictionaries as a basis for making specific requests where whole database dump requests may not be possible, or for enriching/decoding datasets that have been released.)

It hadn’t occurred to me to try to request data explicitly (and sneakily?!;-) in a form that if lazily produced might actually contain more data than the publisher intended, but it strikes me that this could be a handy social engineering trick. Take this case, for example, as described in the ICO blog: The risk of revealing too much, which describes in part how responses to FOI requests made via WhatDoTheyKnow may accidentally be revealing personally identifiable information:

The issue relates to responses to freedom of information (FOI) requests provided in spreadsheets, which are inadvertently revealing personal information. Public authorities will often respond to requests by supplying the information requested in spreadsheet format. Sometimes that will be in the form of a ‘pivot table’, which can neatly summarise the information, without revealing the underlying personal information the summary is based on.

Unfortunately, it has come to our attention that public authorities are not always properly removing the underlying data before disclosing. Pivot tables, both in Microsoft Excel and other spreadsheet programs, retain a copy of the source data used. This information is hidden from view, but is easily accessible.

This is just a variant on revealing the document metadata that describes tracked changes in a Word document (e.g. Microsystems white paper: What Lies Beneath YourDocuments May Embarrass, Hurt or Cost You) in that it relies on the user being unaware of how the document is actually structured or what metadata it may contain, and represents another example of how our folk understanding of IT is often at odds to what’s actually going on inside an application.

I don’t know if it’s still the case, but chart objects embedded in a Word doc from an Excel spreadsheet used to carry the original data from which the chart was derived with them, and which therefore provides another way of getting hold of actual data points… (For example, a document publisher may think they are giving you an image of a chart, not appreciating that the chart is actually constructed from data values contained within the chart object.)

Of course, if data were published using simple text formats, such as CSV, there would be nowhere to hide any embarrassing metadata…

Rate this:

Share this:

  • Tweet

Like this:

Like Loading...

Related

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering... View all posts by Tony Hirst

Author Tony HirstPosted on August 6, 2013August 6, 2013Categories InfoskillsTags data security, metadata

Post navigation

Previous Previous post: Using OpenRefine to Clean Multiple Documents in the Same Way
Next Next post: Rambling Round-Up of Some Recent #OpenData Notices
© AJ Hirst 2008-2020
Creative Commons License
Attribution: Tony Hirst.

Contact

Email me (Tony Hirst)
Bookmarks
Presentations
Follow @psychemedia
Tracking Jupyter newsletter

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 1,939 other followers

Subscribe in a reader

My Other Blogs

F1Datajunkie Blog F1 data tinkerings
Digital Worlds Blog Game Design uncourse
Visual Gadgets Blog visualisation bits'n'pieces

Custom Search Engines

Churnalism Times - Polls (search recent polls/surveys)
Churnalism Times (search press releases)
CourseDetective UK University Degree Course Prospectuses
UK University Libraries infoskills resources
OUseful web properties search
How Do I? Instructional Video Metasearch Engine

Page Hacks

RSS for the content of this page

View posts in chronological order

@psychemedia Tweets

  • Noting that rallies continue to hit the dust... I suspect a longer term issue, and one many sports, that all tend… twitter.com/i/web/status/1… 1 day ago
  • Punk is an aesthetic I never really subscribed to… blog.ouseful.info/2021/03/06/pun… 1 day ago
  • At least as far as a magic money trees goes, it doesn't require the stupid amounts of energy bitcoin does until the… twitter.com/i/web/status/1… 1 day ago
Follow @psychemedia

RSS Tumbling…

  • "So while the broadcasters (unlike the press) may have passed the test of impartiality during the..."
  • "FINDING THE STORY IN 150 MILLION ROWS OF DATA"
  • "To live entirely in public is a form of solitary confinement."
  • ICTs and Anti-Corruption: theory and examples | Tim's Blog
  • "Instead of getting more context for decisions, we would get less; instead of seeing the logic..."
  • "BBC R&D is now winding down the current UAS activity and this conference marked a key stage in..."
  • "The VC/IPO money does however distort the market, look at Amazon’s ‘profit’..."
  • "NewsReader will process news in 4 different languages when it comes in. It will extract what..."
  • Governance | The OpenSpending Blog
  • "The reality of news media is that once the documents are posted online, they lose a lot of value. A..."

Recent Posts

  • Punk is an aesthetic I never really subscribed to…
  • The Analytics Trap
  • Thinks Another: Using Spectrograms to Identify Stage Wiggliness?
  • Thinks: Symbolic Dynamics for Categorising Rally Stage Wiggliness?
  • It Starts With a Wondering: Hmm, How Would I Do That?

Top Posts

  • Intercepting JSON HTTP Responses to Web Browser Page Requests Using MITMProxy
  • Simple Interactive View Controls for pandas DataFrames Using IPython Widgets in Jupyter Notebooks
  • Converting Pandas Generated HTML Data Tables to PNG Images
  • Visualising Financial Data In a Google Spreadsheet Motion Chart
  • BlockPy - Introductory Python Programming Blockly Environment
  • Creating a Simple Python Flask App via cPanel on Reclaim Hosting
  • Punk is an aesthetic I never really subscribed to...
  • Connecting to a Remote Jupyter Notebook Server Running on Digital Ocean from Microsoft VS Code

Archives

OUseful.Info, the blog… Blog at WordPress.com.
%d bloggers like this: