A couple of days ago, the Guardian’s @datastore announced that a spreadsheet of UK MPs’ (Members of Parliament) expenses had been posted to the Guardian OpenPlatform datastore on Google Spreadsheets.
Just because, I though it would be nice to visualise the spreadsheet using some Many Eyes Wikified charts, so I had a look at the data, and sighed a little: in many of the spreadsheet cells was a pound sign, and Many Eyes doesn’t like those – it just wants numbers… So I went in to Yahoo pipes to create a pipe to tidy up the CSV output of the spreadsheet so I could pipe it into Many Eyes Wikified… and drew a blank: I couldn’t get the pipe to work (no CSV – just HTML (it turns out I was using the wrong URL pattern from the spreadsheet – doh!)). So I exported the CSV, reg-exped it in a text editor, adn uploaded it to create a new spreadsheet. (Which reminds me: note to self – create a tidy-upper pipe fed from the datastore and refactor the wikified data page to feed from the pipe…)
[Many Eyes Wikified is no longer available as a service – to replicate the following visulisations, you need to upload the data to Many Eyes (the none wikified version…). I think this is the spreadsheet I was pulling in to the Wikified service…]
So anyway, here are some interactive ways of visualising MPs’ expenses data using Many Eyes wikified…
Firstly, a bar char – select which expenses category you’d like to chart and then view the ranked distribution by sorting by values. If you mouse over any of the bars, you’ll see which MP made that claim:
Second up, a block histogram view. This chart is good for looking at the natural distribution of different claim categories. The search box makes it easy to search for your MP by name:
Again, mousing over any of the blocks identifies the name of the MP making that claim.
Thirdly, a scatter plot. This display lets you compare an MP’s claims across two categories, and potentially bring in a thrid category using the dot size:
As with the other visulisations, mouse over any point to see which MP it belongs to.
By the by, along the way I did a couple of other Yahoo pipes – one to extract expenses by MP name, (which simply pulls in CSV from the spreadsheet, then filters on an MP’s surname), the other MPs’ expenses by postcode. The latter pipe actually embeds the foemer, and works by looking up the name of the MP by postcode, using the TheyWorkForYou API; this name is then passed in to an embedded ‘expenses by name’ pipe.
Anyway, back to the viz biz: Charles Arthur generously picked up on my tweets announcing the visualisations with a blog post on the Guardian data blog (Visualising MP expenses) in a post that included the tease:
But what we need now is a dataset which shows constituency distances
from Westminster, so that we can compare that against travel. And perhaps someone else can work out the travelling MPs’ carbon footprints based on whether they went by air or rail or car
No fair… Okay – so where to get the location data for each MP. Well, the TheyWorkForYou API came to my rescue again. One call (to getConstituencies) pulled out details of each constituency, which included the lat/long geo-coordinated of the ‘centre’ of each constituency (along with the co-ordinates of the bounding box round each constituency… maybe I’ll use those another time ;-) A second call (to getMPs) pulled out all the MPs, and their constituencies. Loading both sets of data into different sheets on Dabble DB, meant I could then link them together by constituency name (for more on linking data in Dabble DB, see Mash/Combining Data from Three Separate Sources Using Dabble DB and Using Dabble DB in an Online Mashup Context).
Adding the MP data into Dabble DB after a further bit of cleaning – removing things like Mr, Miss, and Sir from the firstnames etc – and linking by MP name meant that I could now generate a single data view that exposed MPs by name, constituency, and expense claims, along with the geolocation of the midpoint of their constituency.
After grabbing the CSV feed out of this Dabble DB view into a pipe, and tidying up the data a little once again (eg removing commas in the formatted numbers), it was an easy matter to pull the JSON output from the pipe into a map, and plot different coloured markers depending what ‘band’ the MPs’ total expenses fell into. Here’s a snapshot of that first map:
All well and good – what’s nice about this view is that it’s quite easy to see which MPs appear to be claiming disproportionately more than other MPs with constituencies in a similar area. (There may be good reason for this, like, err… whatever. This tool is just a starting point for sensemaking round the data, right?!;-). If you click on one of the markers you can pop up a little info window, too (rather sparse in this first demo):
In that first map, the only expenses data I was exposing, and mapping, was the total travel expenses claimed. So over a coffee this afternoon, I created a richer view, and tweaked the map code to let me inspect a couple of other data sets. You can find the map here: MPs’ travel expenses map.
So for example, we can look at mileage claims:
Or the total expenses claimed for living away from the primary home:
One thing these quick to put together maps show is how powerful map based displays can be used to get a feel for local differences where there is a difference. (There may well be a good reason for this, of course; including errors in the data set being used…)
It’s also interesting to use the map based displays in conjunction with other chart based visualisations, such as the MPs’ expenses visualisations on Many Eyes Wikified, to explore the data in a far more natural way than trying to make sense of a spreadsheet containing the MPs’ expenses data.
PS the code is all as is; if it’s broken and the visualisations are consequently wrong/misleading, then I apologise in advance… ;-)
PPS See also: My Guardian OpenPlatform API’n’Data Hacks’n’Mashups Roundup, which describes 6 different recipes for playing with Guardian openplatform resources. And if you’re into F1, see Visualising Lap Time Data – Australian Grand Prix, 2009 ! ;-)
PPPS see also MPs’ Expenses by Constituency, Sort Of…, where I plot a couple of really colourful proportional symbol maps based on total travel expenses…
30 thoughts on “Visualising MPs’ Expenses Using Scatter Plots, Charts and Maps”
This is fantastic!
Very nice! Are you charging them yet?
“Are you charging them yet”
If I was, what would be the going rate, what would i be charging for, and what would likely conditions arising (on both sides) be???? ;-)
I was half-way doing something similar, but you’ve beaten me to it.
One trick I was going to use, which might be helpful: the TheyWorkForYou API call to ‘getConstituencies’ also lets you ask for all constituencies within X kilometres of a given long/lat. Even better, it returns the distance from that long/lat point in the results.
So you can ask for all constituencies within 999km of 51.5000 / -0.1246… ie the House of Commons chamber, more or less… and bingo. The distance in km, to 12 significant figures!, for every (mainland) constituency in the country. ;)
@simon It would be interesting to just pop that distance vs expenses data onto a scatter plot (and do the same for eg mileage and/or rail data)?
In terms of helping see why the expenses are so, though, the raw distance measure doesn’t take into account that two people the same distance away from Westminster may actually have a very different public transport option thought…
…but it would maybe help normalise the data in terms of eg. somewhere up north being a similar ditance away from London as somewhere south west… and it may also identify different travel costs if eg similar distances but from different parts of the country have different expenses associated with them?
I guess the map way of displaying this would be to plot concentric circles over the map, centred on Westminster?
I think that’s exactly right, Tony. A simple distance-to-cost calculation won’t answer anything directly – but it would ask a few interesting questions anyway.
Awe inspiring. Really, I am inspired. How can we make this easier for anyone else to do?
It would be nice to relate expenses claimed to the usefulness/effectiveness of the MP! For example, expenses against speeches made, bills proposed or votes cast. This way the taxpayer can judge value for money spent. Not sure how to get hold of the data to do this.
Good work, we have done some work on the MPs’ Expenses data and are looking to aggregate it with other data sets.
I thought you might be interested in OpenPSI ( http://www.openpsi.org ), a collaboration between the University of Southampton and the UK government, lead by the National Archive, to trial a new form of community provisioned information service.
We are exposing UK government data using the Semantic Web standards, RDF. We have SPARQL end point so data mashups can be created by issuing query requests.
As well as providing some data sources, we are trying to spark interaction between government information providers, academic researchers and information intermediaries, specifically to bridge the gap between those researchers who may not have all the technical skills or data knowledge to answer important research questions.
Comments are closed.