OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Posts Tagged ‘sparql

Getting Started with data.gov.uk, Triplr SPARYQL and Yahoo Pipes

with 8 comments

RDF, SPARQL and the semantic web are too scarey for mortals, right? So here’s a hopefully easier way in for those of us who are put off by the syntactic nightmare that defines the world of formal Linked Data: Yahoo Pipes :-)

A few weeks ago, I was fortunate enough to get one of the developer preview keys for data.gov.uk, the UK answer to Tim Berners Lee’s call to open up public data.

The UK data.gov.uk solution is currently a hybrid – a growing set of separate Linked Data stores (hosted on the Talis Platform, I think?) covering areas such as education, finance and transport; and a set of links to CSV and Excel spreadsheets available for download on a wide variety of Government department websites. (Some of them have nice URLs, some don’t; if you think they should, how does this sound? Designing URI Sets for the UK Public Sector: Machine- and human-readable formats. Most of the data is public too – it’s just the meta-sit – data.gov.uk – that I think is under wraps at the moment?)

I’ve played with online CSV and Excel spreadsheets before (e.g. in the context of the Guardian Datastore), but I’ve always found SPARQL endpoints and RDF a little bit, err, terrifying, so last week I felt it was time to bite the bullet and spend an hour or two trying to do something – anything – with some hardcore Linked Data. Or at least, try to just do something – anything – with some data out of a single data store.

So where to start? Regular readers will know that I try to use free online apps and client side Javascript code wherever possible (I don’t want to have to assume the availability of access to my own web server), so it made sense to look to at Yahoo Pipes :-)

I’ve done the odd demo of how to use SPARQL in a Yahoo Pipe before (Last Week’s Football Reports from the Guardian Content Store API (with a little dash of SPARQL), which is not about the football, right?) but a tweet last week tipped me off to a potentially more abstracted way of writing SPARQL queries in a Pipes environment: Triplr’s SPARQL + YQL = SPARYQL.

Ooh…. the idea is that you can wrap a SPARQL query in a YQL query, which in turn suggests two other things…

Firstly, I now have a way I’m already familiar with of generating and debugging bookmarkable RESTful queries to SPARQL endpoints:

Secondly, the Yahoo Pipes YQL block provides a handy container for making the SPARYQL queries and pulling the results back into a pipes environment.

Here’s the pipework… (based on an original pipe by @hapdaniel)

SPARQL + YQL= SPARYQL pipe http://pipes.yahoo.com/ouseful/sparyql

So what can we do with it? Not being particularly fluent in SPARQL, I had a poke around for some examples I could cut, paste, hack and tinker with and found a few nice examples on the [n]^2 blog: SPARQLing data.gov.uk: Edubase Data

So here’s a quick demo – a pipe runs a query that looks for the 10 schools with the latest opening dates on data.gov.uk’s education datastore:

SELECT ?school ?name ?date ?easting ?northing WHERE {?school a sch-ont:School; sch-ont:establishmentName ?name; sch-ont:openDate ?date ; sch-ont:easting ?easting; sch-ont:northing ?northing . } ORDER BY DESC(?date) LIMIT 10

In order to plot the schools on a map, it’s necessary to convert northings and easting to latitude and longitude. A cry for help on twitter was quickly responded to by @kitwallace who gave me a link to a service that did just the job… almost – for some reason, Pipes didn’t like the output, so I had to run the query through a YQL proxy:

(Note that Kit soon came up with a fix, so I could actually just call the service directly via a Data Source block using a call of the form http://www.cems.uwe.ac.uk/xmlwiki/geo/OS2latlong2.xq?easting=527085&northing=185400.)

Here’s the result:

Note that the KML output from the pipe can be plotted directly in a Google map (simply paste the KML URL into Google Map search box and hit return.)

By writing two or three different queries, and pulling the data separately into a web page via the JSON feed, we can easily create a map that displays the schools that have opened and closed between 1/1/08 and 1/10/09:

If we take the CSV output of the pipe, we can also see how it’s possible to transport the content into a Google spreadsheet (once again thanks to @hapdaniel for pointing out that changing the output switch of a pipe’s RSS feed from rss to csv does the format conversion job nicely):

which gives:

(Note that the CSV import seems to require quite a flat data structure (though it is trying really hard with the more hierarchical data – it’s just not quite managing to catch the data values?), so some renaming within the pipe might be required to make sure that the child attributes of each feed item do not have any children of their own. Empty attributes also need pruning.)

PS I did try importing the XML output from a RESTful YQL query into a Google spreadsheet with an =importXML formula but it didn’t seem to work. Firstly, the RESTful URI was too long (easily solved by rewriting it as a shortenedURI). Secondly, the Google spreadsheet didn’t seem to like output XML :-(

So near, yet so far… but still, it poses the question: could we write containerised queries/topic specific APIs over data.gov.uk SPARQL endpoints that expose the results in a spreadsheet capable of importing XML?

Written by Tony Hirst

October 20, 2009 at 8:41 am

Posted in Pipework, Tinkering

Tagged with ,

Last Week’s Football Reports from the Guardian Content Store API (with a little dash of SPARQL)

with 8 comments

A big :-) from me today – at last I think I’ve started to get my head round this mashup malarkey properly… forget the re-presentation stuff, the real power comes from using one information source to enrich another… but as map demos are the sine qua non of mashup demos, I’ll show you what I mean with a map demo…

So to start, here’s a simple query on the Guardian content store API for football match reports:

http://api.guardianapis.com/content/search?
filter=/football&filter=/global/matchreports&after=20090314&api_key=MYSECRETACTIVATEDKEY

It’s easy enough to construct the query URI using a relative date in the Yahoo pipe, so the query will always return the most recent match reports (in this case, matc h reports since “last saturday”):

It’s easy enough to use these results to generate an RSS feed of the most recent match reports:

Pulling the images in as Media RSS (eg media:group) elements means that things like the Google Ajax slide show control and the Pipes previewer can automatically generate a slideshow for you…

You can also get the straight feed of course:

A little bit of tinkering with the creation of the description element means we can bring the original byline and match score in to the description too:

Inspecting the API query results by eye, you might notice that a lot of the bylines have the form “John Doe at the Oojamaflip Stadium”:

Hmmm…

It’s easy enough to exploit this structural pattern to grab the stadium name using a regular expression or two:

I thien did a little experiment running the name of the stadia, and the name of the stadia plius football ground, UK through the Yahoo Location Extractor block to try to plot the sotries on map locations corresponding to the football ground locations, but the results weren’t that good…

…so I tweeted:

And got a couple of responses…

The XQuery/DBpedia with SPARQL – Stadium locations link looked pretty interesting, so I tweaked the example query on that page to return a list of English football stadia and their locations:

PREFIX p: <http://dbpedia.org/property/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE
{?ground skos:subject <http://dbpedia.org/resource/Category:Football_venues_in_England>.
?ground geo:long ?long.
?ground geo:lat ?lat.
?ground rdfs:label ?groundname.
FILTER (lang(?groundname) ='en').
}

and created a pipe to call dBpedia with that query (dbpedia example – English football stadium location lookup pipe):

Because I don’t know how to write SPARQL, I wasn’t sure how to tweak the query to just return the record for a given stadium name (feel tfree to comment telling me how ;-) – so instead I used a pipe filter block to filter the results instead. (This combination of search and filter can be a very powerful one when you don’t know how to phrase a particular qusry, or when a query language doesn’t support a search limit you want…

It was now a simple matter to add this pipe in to geocode the locations of the appropriate stadium for each match report:

So let’s recap – we call the Guardian content API for match reports since “last saturday” and construct a nice RSS feed from it, with description text that includes the byline and match score, as well as the match report. Then we pull out the name of stadium each match was played at (relying on the convention that seems to work much of the time that the byline records the stadium) and pass it through another pipe that asks DBpedia for a list of UK football stadium locations, and then filters out the one we want.

Tweak the location data to a form Yahoo pipes likes (which means it will create a nice geoRSS or KML feed for us) and what do we get? Map based match reports:

As I’ve show in this blog many times before, it’s easy enough to grab a KML feed from the More options pipe output and view the results elsewhere:

(Click on a marker on the google map and it will pop up the match report.)

So what do we learn from this? Hmmm – that I need to learn to speak SPARQL, maybe?!

PS @kitwallace has come up trumps with a tweak to the SPARQL query that will do the query by stadium name in one:
FILTER (lang(?groundname) =’en’ && regex(?groundname,’Old Trafford’)). Ta, muchly :-)

Written by Tony Hirst

March 18, 2009 at 9:53 am

Follow

Get every new post delivered to your Inbox.

Join 150 other followers