Last Week’s Football Reports from the Guardian Content Store API (with a little dash of SPARQL)

A big :-) from me today – at last I think I’ve started to get my head round this mashup malarkey properly… forget the re-presentation stuff, the real power comes from using one information source to enrich another… but as map demos are the sine qua non of mashup demos, I’ll show you what I mean with a map demo…

So to start, here’s a simple query on the Guardian content store API for football match reports:

It’s easy enough to construct the query URI using a relative date in the Yahoo pipe, so the query will always return the most recent match reports (in this case, matc h reports since “last saturday”):

It’s easy enough to use these results to generate an RSS feed of the most recent match reports:

Pulling the images in as Media RSS (eg media:group) elements means that things like the Google Ajax slide show control and the Pipes previewer can automatically generate a slideshow for you…

You can also get the straight feed of course:

A little bit of tinkering with the creation of the description element means we can bring the original byline and match score in to the description too:

Inspecting the API query results by eye, you might notice that a lot of the bylines have the form “John Doe at the Oojamaflip Stadium”:


It’s easy enough to exploit this structural pattern to grab the stadium name using a regular expression or two:

I thien did a little experiment running the name of the stadia, and the name of the stadia plius football ground, UK through the Yahoo Location Extractor block to try to plot the sotries on map locations corresponding to the football ground locations, but the results weren’t that good…

…so I tweeted:

And got a couple of responses…

The XQuery/DBpedia with SPARQL – Stadium locations link looked pretty interesting, so I tweaked the example query on that page to return a list of English football stadia and their locations:

PREFIX p: <;
PREFIX skos: <;
PREFIX geo: <;
PREFIX rdfs: <;
{?ground skos:subject <;.
?ground geo:long ?long.
?ground geo:lat ?lat.
?ground rdfs:label ?groundname.
FILTER (lang(?groundname) ='en').

and created a pipe to call dBpedia with that query (dbpedia example – English football stadium location lookup pipe):

Because I don’t know how to write SPARQL, I wasn’t sure how to tweak the query to just return the record for a given stadium name (feel tfree to comment telling me how ;-) – so instead I used a pipe filter block to filter the results instead. (This combination of search and filter can be a very powerful one when you don’t know how to phrase a particular qusry, or when a query language doesn’t support a search limit you want…

It was now a simple matter to add this pipe in to geocode the locations of the appropriate stadium for each match report:

So let’s recap – we call the Guardian content API for match reports since “last saturday” and construct a nice RSS feed from it, with description text that includes the byline and match score, as well as the match report. Then we pull out the name of stadium each match was played at (relying on the convention that seems to work much of the time that the byline records the stadium) and pass it through another pipe that asks DBpedia for a list of UK football stadium locations, and then filters out the one we want.

Tweak the location data to a form Yahoo pipes likes (which means it will create a nice geoRSS or KML feed for us) and what do we get? Map based match reports:

As I’ve show in this blog many times before, it’s easy enough to grab a KML feed from the More options pipe output and view the results elsewhere:

(Click on a marker on the google map and it will pop up the match report.)

So what do we learn from this? Hmmm – that I need to learn to speak SPARQL, maybe?!

PS @kitwallace has come up trumps with a tweak to the SPARQL query that will do the query by stadium name in one:
FILTER (lang(?groundname) =’en’ && regex(?groundname,’Old Trafford’)). Ta, muchly :-)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

8 thoughts on “Last Week’s Football Reports from the Guardian Content Store API (with a little dash of SPARQL)”

  1. Very nicely done! Regarding your tweet about some canned SPARQLs and tags from Guardian API, maybe using the set of classes in DBpedia would provide some useful filtering to get canned SPARQLs started. DBpedia’s RDF server (Virtuoso) also has a special ‘bif:contains’ predicate that works similar to the regex filter:

    ?article bif:contains “your string”

    So if you have a tag, say, “football” maybe start with something like this:

    PREFIX dbpprop:
    PREFIX dbpont:

    select distinct ?name

    where {
    ?res a dbpont:Person ;
    dbpprop:name ?name ;
    dbpprop:abstract ?abs .
    ?abs bif:contains “football” .

    FILTER (lang(?abs) = ‘en’)

    LIMIT 100

    ?res a dbpont:Person says to look for people — there are lots of other classes in DBpedia that you could use to filter, then
    ?res bif:contains “your tag” digs through, similar to the regex.

    Then, for different searches, you might just be adding in a new predicate instead of dbpprop:name, as in what you have above.

    I’ve found, though, that I need to remember that it’s still based on a wiki — what people have entered for the same property can differ wildly (e.g., sometimes the object is a literal, and sometimes is a link to another wikipedia page).

    Hope that helps!

  2. Arghh – deleted this trackback by mistake:

    “I’m a Hatters fan, not a Gooner. Do I really need to know where The Emirates is? No. I want to know where my County match report is. And it ain’t going to be on a Guardian API…”[ ]

Comments are closed.

%d bloggers like this: