Yesterday, the Guardian published a spreadsheet to their Data Store containing all the MPs’ expenses revelations to date in a spreadsheet form (“MPs’ expenses in the news: all the revelations, as a spreadsheet“)*.
So it struck me that I should be able to find a way of easily searching that data to find just those MPs who had, for example, been claiming for biscuits…
[If you don’t want to read how it’s done, cust straight to the MPs’ Expenses Search to find who’s been claiming for biscuits]
* I actually found the link to the story just now from a known item search on Google, site limited to the Guardian domain and restricted in time to the last 24 hours using on of the new Google search options (I remembered seeing the story on the Guardian website somewhere yesterday). Note to self: this is a really handy trick for searching over recent content on a particular site:-)
(To tidy those search results even more, and remove the RSS feed results, just add -inurl:feedarticle to the search terms… i.e. exclude results that have feedarticle in the URL.)
Anyway, the question was, how to search the data in the spreadsheet. Now I had a half memory from HTML Tables and the Data Web of Google releasing a query language that would allow you to query data in a “data table” object embedded in a web page – the Google QUery Language – which it turns out can be used to interrogate anything defined as a Google visualisation API data source…
…and it just so happens that Google spreadsheets are so defined: Using a Google Spreadsheet as a Data Source.
So this means that I should be able to use the Google visualisation API query language to run a query on a Google Spreadsheet; like the MPs’ expenses data spreadsheet; like asking it for who’s claimed for biscuits…
So here’s what I want to do:
1) create a data table that pulls data in from a Google spreadsheet;
2) actually, that’s not strictly true – I want to run a query on the spreadsheet that pulls in some of the data from the spreadsheet (in particular, just the rows that satisfy the query);
3) I want to display the results in a table using the visualisastion API libraries (so then I don’t have to write any code to display the data myself; and more than that, I don’t even need to understand how the data has been returned from the spreadsheet).
Okay – so the ambitious next step is to try to write a test query on the spreadsheet by trying to make sense of the Google documentation, which is never as helpful as it might be.
No joy, so in the end, I copied and pasted some example code from the closest working example to what I wanted from Google’s interactive AJAX APIs Playground – an example of just getting data into a web page from a spreadsheet using the Google visualisation API libraries:
Okay – so what this example does is run a query on a spreadsheet and plot the data as a map. Just seeing the code isn’t much help though – what libraries do I need to load to run it? So I exported the whole example into a standalone worked example, did a View Source, and copied the code wholesale.
Good, I now have a canned example that pulls in data from a spreadsheet. Next step – I want to display a data table, not a map.
Again, the API Playground comes in handy – check out the table example and see what bits of the code need changing:
Change the demo code so it displays the data from the example spreadsheet as a table rather than a map, and check it works. It does. Good… So now change the spreadsheet key and see if it works to display the expenses data. It does. Good again.
Okay, now I can start to write some test queries. The AJAX API playground provides a crib again, this time in the form of the Using the Query Language example:
(Hmmm… maybe I should have just worked from this example from the start? Ah well, never mind, note to self: teach the changes required from just this example next time…)
Now it’s fun time… writing the query, the query language documentation suggests only equivalence style relations are possible, but I want to use a conditions along the lines of “select * where M LIKE ‘%biscuits%’ – that is, give me [select] all the columns in a row [*] where [where] column M [M] contains [LIKE] the word ‘biscuits’ [‘%biscuits%’].
Typing a suitably encoded a test query URL (there’s a tool to encode the query string on the query language documentation page) into the browser location bar didn’t work :-( BUT, it turned up an informative error message that described some phrases the query language does support, or at least, that are expected by the spreadsheet:
So let’s try contains rather than LIKE… which works…
Okay, so now the long and the short of it is, I know how to write queries.
So for example, here’s searching the name column (so you can search for your MP by name):
var query=’select * where A contains “‘+q+'”‘ (e.g. search for Huhne)
Here’s searching the constitutency column (so you can search or your MP by constituency):
var query=’select * where B contains “‘+q+'”‘ (e.g. Edinburgh)
And here’s searching several columns for a particular item:
var query=’select * where (M contains “‘+q+'” OR O contains “‘+q+'” OR Q contains “‘+q+'” OR S contains “‘+q+'” OR U contains “‘+q+'” OR V contains “‘+q+'”)’
Add it all together, and what have you got? A way of searching to see who’s been claiming for biscuits:
Note that searches are case sensitive…(anyone know if there’s a way round this?)
So there you have it: an MP’s expenses search engine via Google Spreadsheets :-)
8 thoughts on “Querying a Google Spreadsheet of MPs’ Expenses Data: So Who Claimed for “biscuits”?”
Also: when viewing the Google spreadsheet click View > List, which gives you dropdown menus and allows you to select a value to filter the table by. This can be useful as it prevents you fruitlessly searching with queries that will return no results.
Spreadsheet view: http://spreadsheets.google.com/ccc?key=rvWgEEGK9xuUQBR1EFcxHWA
List view: http://spreadsheets.google.com/lv?key=rvWgEEGK9xuUQBR1EFcxHWA&f=0
This is a great post .. I think for the case sensitive ..you can convert the whole JSON into a lower case by using document.write(txt.toLowerCase()); on way of returning ..atleast your query will run smoothly that way
I was admiring your “MP Expense Search” and thinking it could be modified somehow for my needs. I just need to give visitors to my Google Site the ability to search 1 or 2 columns of my Google Spreadsheet, and get results as a table — much like your biscuits search of MPs’ expenses. I’m no programmer, but can figure out some things (just not this). Any code I could modify perhaps and drop into a script and then insert as a Google Apps Script Gadget???
@michael I don’t have the energy tonight to put a script together right now, but one approach might be to grab an HTML returning query URL based on your spreadsheet from http://ouseful.open.ac.uk/datastore/gspreadsheetdb4.php
(A slightly tidied up version of that code can be found by viewing source on this single page web app: http://ouseful.open.ac.uk/datastore/iip11.html?run=true&gsKey=tPfI0kerLllVLcQw7-P1FcQ&gqc=*&gqw=&gqo=&gql=20&gqg= )
I’m in over my head. Those look much more complex, and I’m sorry I can’t make sense of it. I just need a search box that returns a table of results from my spreadsheet, or that hides spreadsheet rows that don’t contain the search words. I don’t expect you to work on this. Just thought there might be an easier solution out there. Thanks.
@Michael Don’t let the apparent complexity of it scare you! When you say “I just need a search box that returns a table of results from my spreadsheet”, what do you actually mean? That you want a search box that returns rows that contains the search terms in one particular column, or rows that contain the search term that appears in any column? How do you want the results ordered? Do you want to return every column in the spreadsheet for the rows you get a hit on, or just particular columns? “Just” is a really really loaded word, packed with assumptions and expectations that only you know about…;-)
You’re so right. My basic need is a search box that queries a specific column (in my case, the description column) and returns only rows (displaying ALL columns) for which the specified column contains the search term. Results would be ordered A-Z by the contents of another column (timestamp).
Ideally, I could replicate this for 2 or 3 of the columns, so users could have the option of searching within different data sets. Example:
Search Column C (Description) for the term:______
Search Column D (Agency) for the term:_____
That was my “just.” If this doesn’t pan out easily, I might fall back on a Visual Basic solution within an Excel spreadsheet that’s shared on the network of users. I appreciate your efforts to help. I’ve already learned quite a bit, even if I don’t end up implementing this solution.
Comments are closed.