Where Linked Data Would Be Useful – Creating More MPs’ Maps from the Guardian Politics API

So given the news from the Commons today, I was wondering where the current crop of MPs came from in terms of birthplace, school, and university… Would a map based view turn up from across the UK, or something a little more clumpy…?

Searching for sources of biographical data, two sources came to mind – Wikipedia infoboxes (and hence DBpedia) and the Guardian Politics API. In this post, I’ll describe a minimal – and not very reliable – recipe for plotting a map of UK MPs’ alma mater based on data grabbed from the Guardian Politics API, identifying a couple of ways in which the data could be made so much more useful, and indicating why the Linked Data approach is a Good Thing…

Just because, I’ll give a Yahoo Pipes recipe…

The first step is a handler pipe for grabbing an MP’s details from the Guardian Politics API from their Guardian ID:

MP details via Guardian Politics API

The next step is to get a list of current MPs and annotate the list with MP details using the helper pipe:

Lookup MP details

Not every MP has an alma mater listed, so we filter out the ones where there is no university information. We then use the university data as the input to a rough and ready geocoder, which does its best to identify a location and then geocode it. The Yahoo pipes trick of putting geo-data into the y:location attribute means that the pipe will automagically generate KML and map based previews of the output of the pipe.

Lookup location by university

Finally, we tidy up the feed a little:

Tidy the pipe...

Here’s the output of the resulting pipe:

MPs by university - badly coded...

Clicking on the various markers, we see that there is a lot of miscoding going on. Also, some MPs have several universities listed, which may also contribute to the confusion. (A rough and ready way of handling that would be to split the university field on a semi-colon, and just use the first listed university in the location lookup.)

So what would make things easier? The Guardian Politics API is getting the data out there, but can it be improved in any way in order to make it a little (or a lot?!) more useful in a machine automated context such as this?

I think so…

Here’s one possible approach: a few weeks ago, the JISC Monitoring unit published a lookup service for looking up UK HEIs using a variety of identifier schemes and a crude name based lookup, and returning synonymous identifers, canonical URLs and lat/long data: data.jiscmu.ac.uk. As identified in the announcement post, this information complements rather more formally some of that already collated in the Guardian’s Education Datastore Rosetta Stone spreadsheet…

(I’m not sure if Leigh Dodds looked at how the JISCMU data could be used as part of a Google Refine reconciliation API service? I seem to remember a brief flurry of tweets on a related topic at the time…;-)

So, what would be really useful would be for the Guardian Politics API to use a weak Linked Data approach and provide a list of HEI identifiers using a formal identification scheme such as UCAS or HESA codes so that we knew which institutions they were actually referring to; (though this wouldn’t cope with overseas universities… Hmm… is there an international identifier scheme for universities?)

We could then hop over to the JISCMU service and pull down the lat/long information, before popping it on a map.

Looking deeper into the Guardian Politics API, we also see a field for listing the MPs’ schools… which in turn could be enhanced by including identifiers used in the data.gov.uk education datastore.

So – Linked Data: can you see how it works yet? And do you get the feeling that network effects could kick in to place really quickly as data is enhanced with linking elements such as well defined identifiers using know identification schemes?

PS Chris Gutteridge has also picked up the challenge of this post, contributing a list of DBPedia URIs for current MPs to the cause. I wouldn’t be surprised if he turns up a whole load more data actually cracks the problem way before I do!

Ah – seems like Chris has been on the case, and produced, (with caveats: “Note that the data is patchy. It only shows MPs with a geocoded birthplace/university listed on dbpedia”) a map [updated] of 313 MPs’ birthplaces:

MPs birthplaces

as well as a map [updated] of 176 MPs’ universities (though I don’t have a valid link for this… yet…;-) Ah – here it is:

MPs alma mater map

UPDATE: here’s the recipe – Studying the MPs

PPS I really need to add Chris’ geo-tagged RDF to KML converter service (described here to my toolkit…