Querying a Google Spreadsheet of MPs’ Expenses Data: So Who Claimed for “biscuits”?
Yesterday, the Guardian published a spreadsheet to their Data Store containing all the MPs’ expenses revelations to date in a spreadsheet form (“MPs’ expenses in the news: all the revelations, as a spreadsheet“)*.
So it struck me that I should be able to find a way of easily searching that data to find just those MPs who had, for example, been claiming for biscuits…
[If you don't want to read how it's done, cust straight to the MPs' Expenses Search to find who's been claiming for biscuits]
* I actually found the link to the story just now from a known item search on Google, site limited to the Guardian domain and restricted in time to the last 24 hours using on of the new Google search options (I remembered seeing the story on the Guardian website somewhere yesterday). Note to self: this is a really handy trick for searching over recent content on a particular site:-)
(To tidy those search results even more, and remove the RSS feed results, just add -inurl:feedarticle to the search terms… i.e. exclude results that have feedarticle in the URL.)
Anyway, the question was, how to search the data in the spreadsheet. Now I had a half memory from HTML Tables and the Data Web of Google releasing a query language that would allow you to query data in a “data table” object embedded in a web page – the Google QUery Language – which it turns out can be used to interrogate anything defined as a Google visualisation API data source…
…and it just so happens that Google spreadsheets are so defined: Using a Google Spreadsheet as a Data Source.
So this means that I should be able to use the Google visualisation API query language to run a query on a Google Spreadsheet; like the MPs’ expenses data spreadsheet; like asking it for who’s claimed for biscuits…
So here’s what I want to do:
1) create a data table that pulls data in from a Google spreadsheet;
2) actually, that’s not strictly true – I want to run a query on the spreadsheet that pulls in some of the data from the spreadsheet (in particular, just the rows that satisfy the query);
3) I want to display the results in a table using the visualisastion API libraries (so then I don’t have to write any code to display the data myself; and more than that, I don’t even need to understand how the data has been returned from the spreadsheet).
Okay – so the ambitious next step is to try to write a test query on the spreadsheet by trying to make sense of the Google documentation, which is never as helpful as it might be.
No joy, so in the end, I copied and pasted some example code from the closest working example to what I wanted from Google’s interactive AJAX APIs Playground – an example of just getting data into a web page from a spreadsheet using the Google visualisation API libraries:
Okay – so what this example does is run a query on a spreadsheet and plot the data as a map. Just seeing the code isn’t much help though – what libraries do I need to load to run it? So I exported the whole example into a standalone worked example, did a View Source, and copied the code wholesale.
Good, I now have a canned example that pulls in data from a spreadsheet. Next step – I want to display a data table, not a map.
Again, the API Playground comes in handy – check out the table example and see what bits of the code need changing:
Change the demo code so it displays the data from the example spreadsheet as a table rather than a map, and check it works. It does. Good… So now change the spreadsheet key and see if it works to display the expenses data. It does. Good again.
Okay, now I can start to write some test queries. The AJAX API playground provides a crib again, this time in the form of the Using the Query Language example:
(Hmmm… maybe I should have just worked from this example from the start? Ah well, never mind, note to self: teach the changes required from just this example next time…)
Now it’s fun time… writing the query, the query language documentation suggests only equivalence style relations are possible, but I want to use a conditions along the lines of “select * where M LIKE ‘%biscuits%’ – that is, give me [select] all the columns in a row [*] where [where] column M [M] contains [LIKE] the word ‘biscuits’ ['%biscuits%'].
Typing a suitably encoded a test query URL (there’s a tool to encode the query string on the query language documentation page) into the browser location bar didn’t work :-( BUT, it turned up an informative error message that described some phrases the query language does support, or at least, that are expected by the spreadsheet:
So let’s try contains rather than LIKE… which works…
Okay, so now the long and the short of it is, I know how to write queries.
So for example, here’s searching the name column (so you can search for your MP by name):
var query=’select * where A contains “‘+q+'”‘ (e.g. search for Huhne)
Here’s searching the constitutency column (so you can search or your MP by constituency):
var query=’select * where B contains “‘+q+'”‘ (e.g. Edinburgh)
And here’s searching several columns for a particular item:
var query=’select * where (M contains “‘+q+'” OR O contains “‘+q+'” OR Q contains “‘+q+'” OR S contains “‘+q+'” OR U contains “‘+q+'” OR V contains “‘+q+'”)’
Add it all together, and what have you got? A way of searching to see who’s been claiming for biscuits:
Note that searches are case sensitive…(anyone know if there’s a way round this?)
So there you have it: an MP’s expenses search engine via Google Spreadsheets :-)