So it seems that the Numeracy in the Newsroom blog liked the ‘roll your Google Squared’ toy that I rediscovered in Is Google Squared Just a Neatly Packaged and Generalised =googlelookup Array?.
But how about if you wanted to roll your own square by pulling in data from data sources you had defined yourself, such as spreadsheets in the Guardian datastore?
Well, it occurred to me that it should be easy enough to generate queries onto those spreadsheets (e.g. using the link generators described in First Steps Towards a Generic Google Spreadsheets Query Tool, or At Least, A Guardian Datastore Interactive Playground et al.) and roll them into a square.
Okay – so I know I’ve lost you but I only have 10 mins to crank this post out, so I’ll walk you through a demo (ish).
Create a new Google spreadsheet and pull in some data from the Conservative Shadow Cabinet expenses spreadsheet, such as their total expenses declared in that spreadsheet to date:
Pop that in cell A1 and it gives something like:
So now suppose I want to pull in data from the Guardian spreadsheets. I can use the name as a key and construct a URI that will query the Guardian spreadsheets by MP name. An easy way to gt an example query is to use the Guardian Datastore Explorer:
So having got a pattern for a URI:
we can look at it and see how it might be constructed in the spreadsheet:
That is, we construct the URI around the name contained in the cell a couple of columns to the left, (data that was itself pulled in from a Tory party spreadsheet).
(Note that I’m generating an HTML preview, but you should be able to grab CSV too: just set tqx=out:csv).
Drag the cell down to generate URIs for all the Shadow Cabinet:
We can now import data into the spreadsheet using that constructed URI:
Remember, I generated an HTML preview table URI. (If you generated a CSV URI, use =importdata(C2). Note I was using CSV originally, but then it seemed to stop working, so I switched to HTML).
Here’s what you get when you copy that (relative) formula for each MP:
Okay, so there’s an issue here – we’re only getting very other MP because the table headers are being pulled in (the same happens with CSV – I couldn’t find a way to get the importdata formula to suppress the header?). A clunky workaround you be to create another set of columns that pull in the other alternate rows (i.e. start pulling in data from row 3 rather than row 2).
You’ve also got the header crap everywhere, but I’m guessing we could probably write a query language call that would pull out the data from this spreadsheet and ignore the rows containing the column headings.
Anyway, so what?
So – I can create a data square in a Google Spreadsheet that pulls in and essentially combines data from multiple spreadsheet data source that I know and trust, as long as there is a common key query term (name in this case).