OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Guardian Datastore MPs’ Expenses Spreadsheet as a Database

Continuing my exploration of what is and isn’t acceptable around the edges of doing stuff with other people’s data(?!), the Guardian datastore have just published a Google spreadsheet containing partial details of MPs’ expenses data over the period July-Decoember 2009 (MPs’ expenses: every claim from July to December 2009):

thanks to the work of Guardian developer Daniel Vydra and his team, we’ve managed to scrape the entire lot out of the Commons website for you as a downloadable spreadsheet. You cannot get this anywhere else.

In sharing the data, the Guardian folks have opted to share the spreadsheet via a link that includes an authorisation token. Which means that if you try to view the spreadsheet just using the spreadsheet key, you won’t be allowed to see it; (you also need to be logged in to a Google account to view the data, both as a spreadsheet, and in order to interrogate it via the visualisation API). Which is to say, the Guardian datastore folks are taking what steps they can to make the data public, whilst retaining some control over it (because they have invested resource in collecting the data in the form they’re re-presenting it, and reasonably want to make a return from it…)

But in sharing the link that includes the token on a public website, we can see the key – and hence use it to access the data in the spreadsheet, and do more with it… which may be seen as providing a volume add service over the data, or unreasonably freeloading off the back of the Guardian’s data scraping efforts…

So, just pasting the spreadsheet key and authorisation token into the cut down Guardian datastore explorer script I used in Using CSV Docs As a Database to generate an explorer for the expenses data.

So for example, we can run for example run a report to group expenses by category and MP:

MP expesnes explorer

Or how about claims over 5000 pounds (also viewing the information as an HTML table, for example).

Remember, on the datastore explorer page, you can click on column headings to order the data according to that column.

Here’s another example – selecting A,sum(E), where E>0 group by A and order is by sum(E) then asc and viewing as a column chart:

Datastore exploration

We can also (now!) limit the number of results returned, e.g. to show the 10 MPs with lowest claims to date (the datastore blog post explains that why the data is incomplete and to be treated warily).

Limiting results in datstore explorer

Changing the asc order to desc in the above query gives possibly a more interesting result, the MPs who have the largest claims to date (presumably because they have got round to filing their claims!;-)

Datastore exploring

Okay – enough for now; the reason I’m posting this is in part to ask the question: is the this an unfair use of the Guardian datastore data, does it detract from the work they put in that lets them claim “You cannot get this anywhere else”, and does it impact on the returns they might expect to gain?

Sbould they/could they try to assert some sort of database collection right over the collection/curation and re-presentation of the data that is otherwise publicly available that would (nominally!) prevent me from using this data? Does the publication of the data using the shared link with the authorisation token imply some sort of license with which that data is made available? E.g. by accepting the link by clicking on it, becuase it is a shared link rather than a public link, could the Datastore attach some sort of tacit click-wrap license conditions over the data that I accept when I accept the shared data by clicking through the shared link? (Does the/can the sharing come with conditions attached?)

PS It seems there was a minor “issue” with the settings of the spreadsheet, a result of recent changes to the Google sharing setup. Spreadsheets should now be fully viewable… But as I mention in a comment below, I think there are still interesting questions to be considered around the extent to which publishers of “public” data can get a return on that data?

Written by Tony Hirst

June 25, 2010 at 12:51 pm

7 Responses

Subscribe to comments with RSS.

  1. Er, not entirely. As you know, Google Docs have changed their sharing options recently – it automatically creates the auth key when you opt to share the document, so it’s certainly not intentional on our part. I’ll change the address now on the post.

    Simon Rogers

    June 25, 2010 at 12:56 pm

    • Ah – hmmm… I thought you must have opted to share the link, rather than make the spreadsheet public?

      But the questions remain valid, I think? Is there a way you can publish a view over a particular subset, collection or arrangement of data from public sources, that required effort on your part, and that allows you legitimately to expect some sort of period of exclusive or licensed exploitation of that view?

      PS your response also helps me learn another lesson – if I was following a journalistic process, I’d have checked my facts and got in touch with you before I posted the above, wouldn’t I?!;-)

      Tony Hirst

      June 25, 2010 at 1:08 pm

  2. OK, interestingly, Google Docs appeared to have changed their settings so that you can’t access a google spreadsheet with the cut-down url anymore, without logging in to google docs. I can’t see a way around this – can anyone else?

    In the meantime, you can still opt to just download the spreadsheet into any format – and do whatever you like with it. We’re not trying to retain ownership over anything.

    Simon Rogers

    editor, Guardian Datablog

    Simon Rogers

    June 25, 2010 at 1:04 pm

  3. Hi Tony – have mastered new settings now and the doc is fully public.

    On the second point – eh? As I said, am a bit dense today…

    Simon Rogers

    June 25, 2010 at 1:20 pm

  4. [...] yourself will encourage others to do the same. He cited the Guardian’s datastore as an example (blog on it here). The main point from his talk (slideshare.net/onlinejournalist) was building a user-driven project [...]

  5. [...] Search « Guardian Datastore MPs’ Expenses Spreadsheet as a Database [...]

  6. [...] Services such as Google Spreadsheets provide online spreadsheets that support traditional spreadsheet operations that include chart generation (using standard chart types familiar to spreadsheet users) and support for interactive graphical widgets (including more exotic chart types, such as tree maps), powered by spreadsheet data, that can be embedded in third party webpages. Simple aggregate reshaping of data is provided in the from of support for Pivot Tables. (Note however that Google Spreadsheet functionality is sometimes a little bug ridden…) Google spreadsheets also provides a powerful query API (the Google Visulisation API), that allows the spreadsheet to be treated as a database. For an example in another government domain, see Government Spending Data Explorer; see also Guardian Datastore MPs’ Expenses Spreadsheet as a Database ). [...]


Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 841 other followers

%d bloggers like this: