Visualising MPs’ Expenses Using Scatter Plots, Charts and Maps
A couple of days ago, the Guardian’s @datastore announced that a spreadsheet of UK MPs’ (Members of Parliament) expenses had been posted to the Guardian OpenPlatform datastore on Google Spreadsheets.
Just because, I though it would be nice to visualise the spreadsheet using some Many Eyes Wikified charts, so I had a look at the data, and sighed a little: in many of the spreadsheet cells was a pound sign, and Many Eyes doesn’t like those – it just wants numbers… So I went in to Yahoo pipes to create a pipe to tidy up the CSV output of the spreadsheet so I could pipe it into Many Eyes Wikified… and drew a blank: I couldn’t get the pipe to work (no CSV – just HTML (it turns out I was using the wrong URL pattern from the spreadsheet – doh!)). So I exported the CSV, reg-exped it in a text editor, adn uploaded it to create a new spreadsheet. (Which reminds me: note to self – create a tidy-upper pipe fed from the datastore and refactor the wikified data page to feed from the pipe…)
[Many Eyes Wikified is no longer available as a service – to replicate the following visulisations, you need to upload the data to Many Eyes (the none wikified version…). I think this is the spreadsheet I was pulling in to the Wikified service…]
So anyway, here are some interactive ways of visualising MPs’ expenses data using Many Eyes wikified…
Firstly, a bar char – select which expenses category you’d like to chart and then view the ranked distribution by sorting by values. If you mouse over any of the bars, you’ll see which MP made that claim:
Second up, a block histogram view. This chart is good for looking at the natural distribution of different claim categories. The search box makes it easy to search for your MP by name:
Again, mousing over any of the blocks identifies the name of the MP making that claim.
Thirdly, a scatter plot. This display lets you compare an MP’s claims across two categories, and potentially bring in a thrid category using the dot size:
As with the other visulisations, mouse over any point to see which MP it belongs to.
By the by, along the way I did a couple of other Yahoo pipes – one to extract expenses by MP name, (which simply pulls in CSV from the spreadsheet, then filters on an MP’s surname), the other MPs’ expenses by postcode. The latter pipe actually embeds the foemer, and works by looking up the name of the MP by postcode, using the TheyWorkForYou API; this name is then passed in to an embedded ‘expenses by name’ pipe.
Anyway, back to the viz biz: Charles Arthur generously picked up on my tweets announcing the visualisations with a blog post on the Guardian data blog (Visualising MP expenses) in a post that included the tease:
But what we need now is a dataset which shows constituency distances
from Westminster, so that we can compare that against travel. And perhaps someone else can work out the travelling MPs’ carbon footprints based on whether they went by air or rail or car
No fair… Okay – so where to get the location data for each MP. Well, the TheyWorkForYou API came to my rescue again. One call (to getConstituencies) pulled out details of each constituency, which included the lat/long geo-coordinated of the ‘centre’ of each constituency (along with the co-ordinates of the bounding box round each constituency… maybe I’ll use those another time ;-) A second call (to getMPs) pulled out all the MPs, and their constituencies. Loading both sets of data into different sheets on Dabble DB, meant I could then link them together by constituency name (for more on linking data in Dabble DB, see Mash/Combining Data from Three Separate Sources Using Dabble DB and Using Dabble DB in an Online Mashup Context).
Adding the MP data into Dabble DB after a further bit of cleaning – removing things like Mr, Miss, and Sir from the firstnames etc – and linking by MP name meant that I could now generate a single data view that exposed MPs by name, constituency, and expense claims, along with the geolocation of the midpoint of their constituency.
After grabbing the CSV feed out of this Dabble DB view into a pipe, and tidying up the data a little once again (eg removing commas in the formatted numbers), it was an easy matter to pull the JSON output from the pipe into a map, and plot different coloured markers depending what ‘band’ the MPs’ total expenses fell into. Here’s a snapshot of that first map:
All well and good – what’s nice about this view is that it’s quite easy to see which MPs appear to be claiming disproportionately more than other MPs with constituencies in a similar area. (There may be good reason for this, like, err… whatever. This tool is just a starting point for sensemaking round the data, right?!;-). If you click on one of the markers you can pop up a little info window, too (rather sparse in this first demo):
In that first map, the only expenses data I was exposing, and mapping, was the total travel expenses claimed. So over a coffee this afternoon, I created a richer view, and tweaked the map code to let me inspect a couple of other data sets. You can find the map here: MPs’ travel expenses map.
So for example, we can look at mileage claims:
Or the total expenses claimed for living away from the primary home:
One thing these quick to put together maps show is how powerful map based displays can be used to get a feel for local differences where there is a difference. (There may well be a good reason for this, of course; including errors in the data set being used…)
It’s also interesting to use the map based displays in conjunction with other chart based visualisations, such as the MPs’ expenses visualisations on Many Eyes Wikified, to explore the data in a far more natural way than trying to make sense of a spreadsheet containing the MPs’ expenses data.
PS the code is all as is; if it’s broken and the visualisations are consequently wrong/misleading, then I apologise in advance… ;-)
PPS See also: My Guardian OpenPlatform API’n’Data Hacks’n’Mashups Roundup, which describes 6 different recipes for playing with Guardian openplatform resources. And if you’re into F1, see Visualising Lap Time Data – Australian Grand Prix, 2009 ! ;-)
PPPS see also MPs’ Expenses by Constituency, Sort Of…, where I plot a couple of really colourful proportional symbol maps based on total travel expenses…