Viewing SPARQLed data.gov.uk Data in a Google Spreadsheet

This is a stub post as much as anything to help me keep tabs on a technique I’ve not had any time to properly play with, let alone document: consuming Linked Data in a Google spreadsheet.

First up – why might this be useful? In short, I think that many people who might want to make use of data.gov.uk data are probably comfortable with spreadsheets but not with code, so giving them access to SPARQL and RDF is not necessarily useful.

So whilst the recipe shown here is a hacky one, at least it opens up the playground a little to explore what the issues might be – and what things might be facilitated by – providing fluid, live data routes from RDF triple stores into spreadsheets.

So here’s what I’m aiming for – some data from the education datastore in a spreadsheet:

And how did it get there? Via a CSV import using the =importData formula.

And where did the CSV come from? The sparqlproxy webservice:

As well as giving CSV output, the serivce can also gnerate HTML and a variety of JSON formats, including the MIT Simile and Googl Viz API formats (which means it’s easy to just plug other data into a wide variety of visualisation formats.

To get the data into a Google spreadsheet, simply copy the CSV URI into an =importData(“CSV_URI_HERE”) formula in a spreadsheet cell.

The sparqlproxy service can also pull in and transform queries that have been posted on the web:

So for example, in the above case the query at http://data-gov.tw.rpi.edu/sparql/stat_ten_triples_default.sparql looks like:

What this means is that someone else can write complex queries and mortals can then access the data and display it however they want. (What I’d really like to see is a social site that supports the sharing of endpoint/query pairs for particular queries (I could probably even hack something to do this using delicious?) ;-)

Once the data is in the spreadsheet, it can be played with in the normal way of course. So for example, I can query the spreadsheet using my old prototype Guardian datastore explorer:

In the above example, the route is then:

1) a sparql query onto http://services.data.gov.uk/education/sparql is run through
2) the http://data-gov.tw.rpi.edu/ws/sparqlproxy.php sparqlproxy service to produce a CSV output that is
3) pulled into a Google spreadsheet using an =importData() formula, and in turn
4) queried using my Google Datastore explorer using the Google visualisation API query language and then
5) rendered in a Google Visualisation table widget.

Lurvely… :-)

Mapping Recent School Openings and Closures

Just after I put together the pipework for Getting Started with data.gov.uk, Triplr SPARYQL and Yahoo Pipes, I also cut and pasted some of the code from a previous map based mashup to demo how to make a SPARQL call via a pipe that calls on the UK Gov education Linked Data datastore from within a web page, and then display the geocoded results on a map.

Here’s the demo – School openings and closures in the UK, 1/1/08-1/10/09

If you View Source, you’ll see the code boils down to:

//schools closed between 1/1/08 and 1/10/09
q="SELECT ?school ?name ?opendate ?closedate ?easting ?northing WHERE {?school a sch-ont:School;  sch-ont:establishmentName ?name;sch-ont:easting ?easting; sch-ont:northing ?northing; sch-ont:establishmentStatus sch-ont:EstablishmentStatus_Closed ; sch-ont:closeDate ?closedate ; sch-ont:openDate ?opendate . FILTER (?closedate > '2008-01-01'^^xsd:date && ?closedate < '2009-10-01'^^xsd:date)}"

u=ur+encodeURIComponent(q);
getPipeGeoData(u, 'parseJSON_purple');

In all I make three calls to a pipe that calls on the data.gov.uk education datastore, one for schools opened between 1/1/08 and 1/10/09:
SELECT ?school ?name ?opendate ?easting ?northing WHERE {?school a sch-ont:School; sch-ont:establishmentName ?name;sch-ont:easting ?easting; sch-ont:northing ?northing; sch-ont:openDate ?opendate . FILTER (?opendate > '2008-01-01'^^xsd:date && ?opendate < '2009-10-01'^^xsd:date)}

one for schools closed between 1/1/08 and 1/10/09:
SELECT ?school ?name ?opendate ?closedate ?easting ?northing WHERE {?school a sch-ont:School; sch-ont:establishmentName ?name;sch-ont:easting ?easting; sch-ont:northing ?northing; sch-ont:establishmentStatus sch-ont:EstablishmentStatus_Closed ; sch-ont:closeDate ?closedate ; sch-ont:openDate ?opendate . FILTER (?closedate > '2008-01-01'^^xsd:date && ?closedate < '2009-10-01'^^xsd:date)}

and one for schools proposed to close:
SELECT ?school ?name ?easting ?northing ?opendate WHERE {?school a sch-ont:School; sch-ont:establishmentName ?name;sch-ont:easting ?easting; sch-ont:northing ?northing ; sch-ont:establishmentStatus sch-ont:EstablishmentStatus_Open__but_proposed_to_close; sch-ont:openDate ?opendate . }

(I cribbed how to write these queries from a Talis blog: SPARQLing data.gov.uk: Edubase Data;-)

The results of each call are displayed using the different coloured markers.

(The rest of the code is really horrible. Note to self: get round to learning JQuery.)

Getting Started with data.gov.uk, Triplr SPARYQL and Yahoo Pipes

RDF, SPARQL and the semantic web are too scarey for mortals, right? So here’s a hopefully easier way in for those of us who are put off by the syntactic nightmare that defines the world of formal Linked Data: Yahoo Pipes :-)

A few weeks ago, I was fortunate enough to get one of the developer preview keys for data.gov.uk, the UK answer to Tim Berners Lee’s call to open up public data.

The UK data.gov.uk solution is currently a hybrid – a growing set of separate Linked Data stores (hosted on the Talis Platform, I think?) covering areas such as education, finance and transport; and a set of links to CSV and Excel spreadsheets available for download on a wide variety of Government department websites. (Some of them have nice URLs, some don’t; if you think they should, how does this sound? Designing URI Sets for the UK Public Sector: Machine- and human-readable formats. Most of the data is public too – it’s just the meta-sit – data.gov.uk – that I think is under wraps at the moment?)

I’ve played with online CSV and Excel spreadsheets before (e.g. in the context of the Guardian Datastore), but I’ve always found SPARQL endpoints and RDF a little bit, err, terrifying, so last week I felt it was time to bite the bullet and spend an hour or two trying to do something – anything – with some hardcore Linked Data. Or at least, try to just do something – anything – with some data out of a single data store.

So where to start? Regular readers will know that I try to use free online apps and client side Javascript code wherever possible (I don’t want to have to assume the availability of access to my own web server), so it made sense to look to at Yahoo Pipes :-)

I’ve done the odd demo of how to use SPARQL in a Yahoo Pipe before (Last Week’s Football Reports from the Guardian Content Store API (with a little dash of SPARQL), which is not about the football, right?) but a tweet last week tipped me off to a potentially more abstracted way of writing SPARQL queries in a Pipes environment: Triplr’s SPARQL + YQL = SPARYQL.

Ooh…. the idea is that you can wrap a SPARQL query in a YQL query, which in turn suggests two other things…

Firstly, I now have a way I’m already familiar with of generating and debugging bookmarkable RESTful queries to SPARQL endpoints:

Secondly, the Yahoo Pipes YQL block provides a handy container for making the SPARYQL queries and pulling the results back into a pipes environment.

Here’s the pipework… (based on an original pipe by @hapdaniel)

SPARQL + YQL= SPARYQL pipe http://pipes.yahoo.com/ouseful/sparyql

So what can we do with it? Not being particularly fluent in SPARQL, I had a poke around for some examples I could cut, paste, hack and tinker with and found a few nice examples on the [n]^2 blog: SPARQLing data.gov.uk: Edubase Data

So here’s a quick demo – a pipe runs a query that looks for the 10 schools with the latest opening dates on data.gov.uk’s education datastore:

SELECT ?school ?name ?date ?easting ?northing WHERE {?school a sch-ont:School; sch-ont:establishmentName ?name; sch-ont:openDate ?date ; sch-ont:easting ?easting; sch-ont:northing ?northing . } ORDER BY DESC(?date) LIMIT 10

In order to plot the schools on a map, it’s necessary to convert northings and easting to latitude and longitude. A cry for help on twitter was quickly responded to by @kitwallace who gave me a link to a service that did just the job… almost – for some reason, Pipes didn’t like the output, so I had to run the query through a YQL proxy:

(Note that Kit soon came up with a fix, so I could actually just call the service directly via a Data Source block using a call of the form http://www.cems.uwe.ac.uk/xmlwiki/geo/OS2latlong2.xq?easting=527085&northing=185400.)

Here’s the result:

Note that the KML output from the pipe can be plotted directly in a Google map (simply paste the KML URL into Google Map search box and hit return.)

By writing two or three different queries, and pulling the data separately into a web page via the JSON feed, we can easily create a map that displays the schools that have opened and closed between 1/1/08 and 1/10/09:

If we take the CSV output of the pipe, we can also see how it’s possible to transport the content into a Google spreadsheet (once again thanks to @hapdaniel for pointing out that changing the output switch of a pipe’s RSS feed from rss to csv does the format conversion job nicely):

which gives:

(Note that the CSV import seems to require quite a flat data structure (though it is trying really hard with the more hierarchical data – it’s just not quite managing to catch the data values?), so some renaming within the pipe might be required to make sure that the child attributes of each feed item do not have any children of their own. Empty attributes also need pruning.)

PS I did try importing the XML output from a RESTful YQL query into a Google spreadsheet with an =importXML formula but it didn’t seem to work. Firstly, the RESTful URI was too long (easily solved by rewriting it as a shortenedURI). Secondly, the Google spreadsheet didn’t seem to like output XML :-(

So near, yet so far… but still, it poses the question: could we write containerised queries/topic specific APIs over data.gov.uk SPARQL endpoints that expose the results in a spreadsheet capable of importing XML?