Last Week’s Football Reports from the Guardian Content Store API (with a little dash of SPARQL)

A big :-) from me today – at last I think I’ve started to get my head round this mashup malarkey properly… forget the re-presentation stuff, the real power comes from using one information source to enrich another… but as map demos are the sine qua non of mashup demos, I’ll show you what I mean with a map demo…

So to start, here’s a simple query on the Guardian content store API for football match reports:

It’s easy enough to construct the query URI using a relative date in the Yahoo pipe, so the query will always return the most recent match reports (in this case, matc h reports since “last saturday”):

It’s easy enough to use these results to generate an RSS feed of the most recent match reports:

Pulling the images in as Media RSS (eg media:group) elements means that things like the Google Ajax slide show control and the Pipes previewer can automatically generate a slideshow for you…

You can also get the straight feed of course:

A little bit of tinkering with the creation of the description element means we can bring the original byline and match score in to the description too:

Inspecting the API query results by eye, you might notice that a lot of the bylines have the form “John Doe at the Oojamaflip Stadium”:


It’s easy enough to exploit this structural pattern to grab the stadium name using a regular expression or two:

I thien did a little experiment running the name of the stadia, and the name of the stadia plius football ground, UK through the Yahoo Location Extractor block to try to plot the sotries on map locations corresponding to the football ground locations, but the results weren’t that good…

…so I tweeted:

And got a couple of responses…

The XQuery/DBpedia with SPARQL – Stadium locations link looked pretty interesting, so I tweaked the example query on that page to return a list of English football stadia and their locations:

PREFIX p: <;
PREFIX skos: <;
PREFIX geo: <;
PREFIX rdfs: <;
{?ground skos:subject <;.
?ground geo:long ?long.
?ground geo:lat ?lat.
?ground rdfs:label ?groundname.
FILTER (lang(?groundname) ='en').

and created a pipe to call dBpedia with that query (dbpedia example – English football stadium location lookup pipe):

Because I don’t know how to write SPARQL, I wasn’t sure how to tweak the query to just return the record for a given stadium name (feel tfree to comment telling me how ;-) – so instead I used a pipe filter block to filter the results instead. (This combination of search and filter can be a very powerful one when you don’t know how to phrase a particular qusry, or when a query language doesn’t support a search limit you want…

It was now a simple matter to add this pipe in to geocode the locations of the appropriate stadium for each match report:

So let’s recap – we call the Guardian content API for match reports since “last saturday” and construct a nice RSS feed from it, with description text that includes the byline and match score, as well as the match report. Then we pull out the name of stadium each match was played at (relying on the convention that seems to work much of the time that the byline records the stadium) and pass it through another pipe that asks DBpedia for a list of UK football stadium locations, and then filters out the one we want.

Tweak the location data to a form Yahoo pipes likes (which means it will create a nice geoRSS or KML feed for us) and what do we get? Map based match reports:

As I’ve show in this blog many times before, it’s easy enough to grab a KML feed from the More options pipe output and view the results elsewhere:

(Click on a marker on the google map and it will pop up the match report.)

So what do we learn from this? Hmmm – that I need to learn to speak SPARQL, maybe?!

PS @kitwallace has come up trumps with a tweak to the SPARQL query that will do the query by stadium name in one:
FILTER (lang(?groundname) =’en’ && regex(?groundname,’Old Trafford’)). Ta, muchly :-)