Using Many Eyes Wikified to Visualise Guardian Data Store Data on Google Docs
Last week, I posted a quick demo of how to visualise data stored in a Google spreadsheet in Many Eyes Wikified (HEFCE Grant Funding, in Pictures).
The data I used was the latest batch of HEFCE teaching funding data, but Joss soon tweeted to say he’d got Research funding data up on Google spreadsheets, and could I do something with that? You can see the results here: Visualising UK HEI Research Funding data on many Eyes Wikified (Joss has also had a go: RAE: UK research funding results visualised).
Anyway, today the Guardian announced a new content API (more on that later – authorised developer keys are still like gold dust), as well as the Guardian data store (strapline: “facts you can use”) and the associated Data Store Blog.
Interestingly, the data is being stored on Google docs, in part because Google spreadsheets offer an API and a wide variety of export formats.
As regular OUseful.info readers will know, one of the export formats from Google spreadsheets is CSV – Comma Separated Variable data – which just so happens to be liked by services such as Dabble DB and Many Eyes. I’ll try to come up with a demo of how to mash-up several different data sets in Dabble DB over the next few days, but as I’ve a spare half-hour now, I thought I’d post a qiuck demo of how to visualise some of the Guardian data store spreadsheet data in Many Eyes Wikified.
So to start, let’s look at the the RAE2008 results data – University research department rankings (you can find the actual data here: http://spreadsheets.google.com/pub?key=phNtm3LmDZEM-RqeOVUPDJQ.
If you speak URL, you’ll know that you can get the CSV version of the data by adding &output=csv to the URL, like this: http://spreadsheets.google.com/pub?key=phNtm3LmDZEM-RqeOVUPDJQ&output=csv
Inspection of the CSV output suggests there’s some crap at the top we don’t want – i.e. not actual column headings – as well as the the end of the file:
(Note this “crap” is actually important metadata – it describes the data and its provenance – but it’s not the actual data we want to visualise).
Grabbing the actualt data, without the metadata, can be achieve by grabbing a particular range of cells using the &range= URL argument. Inspection of the table suggests that meaningful data can be found in the columnar range of A to H; guesswork and a bit of binary search identifies the actual range of cell data as A2:H2365 – so we can export JUST the data, as CSV, using the URL http://spreadsheets.google.com/pub?key=phNtm3LmDZEM-RqeOVUPDJQ&output=csv&range=A2:H2365.
If you create a new page on Many Eyes Wikified, this data can be imported into a wiki page there as follows:
We can now use this data page as the basis of a set of Many Eyes visualisations. Noting that the “relative URL address” of the data page is ousefulTestboard/GuardianUKRAERankings2008 (the full URL of the wikified data page is http://manyeyes.alphaworks.ibm.com/wikified/ousefulTestboard/GuardianUKRAERankings2008), create a new page and put a visualisation placeholder or two in it:
Saving that page – and clicking through on the visualisation placeholder links – means you can now create your visualisation (Many Eyes seems to try to guess what visualisation you want if you use an appropriate visulisation name?):
Select the settings you want for you visualisation, and hit save:
A visualisation page will be created automatically, and a smaller, embedded version of the visualisation will appear in the wiki page:
If you visit the visualisation page – for example this Treemap visualisation, you should find it is fully interactive – which means you can explore the data for yourself, as I’ll show in a later post…