Using Data From Linked Data Datastores the Easy Way (i.e. in a spreadsheet, via a formula)

Disclaimer: before any Linked Data purists say I’m missing the point about what Linked Data is, does, or whatever, I don’t care, okay? I just don’t care. This is for me, and me is someone who can’t get to grips with writing SPARQL queries, can’t stand the sight of unreadable <rdf> <all:over the=”place”>, can’t even work out how to find things are queryable in a Linked Data triple store, let alone write queries that link data from one data store with data from another data store (or maybe SPARQL can’t do that yet? Not that I care about that either, because I can, in Yahoo Pipes, or Google Spreadsheets, and in a way that’s meaningful to me…)

In Mulling Over =datagovukLookup() in Google Spreadsheets, I started wondering about whether or not it would be useful to be able to write formulae to look up “live facts” in various datastores from within a spreadsheet (you know, that Office app that is used pretty much universally in workplace whenever there is tabular data to hand. That or Access of course…)

Anyway, I’ve started tinkering with how it might work, so now I can do things like this:

The formulae in columns G, H and I are defined according to a Google Apps script, that takes a school ID and then returns something linked to it in the data.gov.uk education datastore, such as the name of the school, or its total capacity.

Formulae look like:

  • =datagovuk_education(A2,”name”)
  • =datagovuk_education(A2,”opendate”)
  • =datagovuk_education(A2,”totcapacity”)

and are defined to return a single cell element. (I haven’t worked out how to return several cells worth of content from a Google Apps Script yet!)

At the moment, te script is a little messy, taking the form:

function datagovuk_education(id,typ) {
  var ret=""; var args=""
  switch (typ) {
    case 'totcapacity':
      args= _datagovuk_education_capacity_quri(id);
      break;
    ...
    default:
      //hack something here;
  }
  var x=UrlFetchApp.fetch('http://data-gov.tw.rpi.edu/ws/sparqlproxy.php',{method: 'post', payload: args});
  var ret=x.getContentText();
  var xmltest=Xml.parse(ret);
  ret=xmltest.sparql.results.result.binding.literal.getText();

  return ret;
}
function _datagovuk_education_capacity_quri(id){
  return "query=prefix+sch-ont%3A+%3Chttp%3A%2F%2Feducation.data.gov.uk%2Fdef%2Fschool%2F%3E%0D%0ASELECT+%3FschoolCapacity+WHERE+{%0D%0A%3Fschool+a+sch-ont%3ASchool%3B%0D%0Asch-ont%3AuniqueReferenceNumber+"+id+"%3B%0D%0Asch-ont%3AschoolCapacity+%3FschoolCapacity.%0D%0A}+ORDER+BY+DESC%28%3Fdate%29+LIMIT+1&output=xml&callback=&tqx=&service-uri=http%3A%2F%2Fservices.data.gov.uk%2Feducation%2Fsparql";
}

The datagovuk_education(id,typ) function takes the school ID and the requested property, uses the case statement to create an appropriate query string, and then fetches the data from the education datastore, returning the result in an XML format like this. The data is pulled from the datastore via Sparqlproxy, and the query string URIs generated (at the moment) by adding the school ID number into a query string generated by running the desired SPARQL query on Sparqlproxy and then grabbing the appropriate part of the URI. (It’s early days yet on this hack!;-)

By defining appropriate Apps script functions, I can also create formulae to call other datastores, run queries on Google spreadsheets (e.g. in the Guardian datastore) and so on. I assume similar sorts of functionality would be supported using VB Macros in Excel?

Anyway – this is my starter for ten on how to make live datastore data available to the masses. It’ll be interesting to see whether this approach (or one like it) is used in favour of getting temps to write SPARQL queries and RDF parsers… The obvious problem is that my approach can lead to an explosion in the number of formulae and parameters you need to learn; the upside is that I think these could be quite easily documented in a matrix/linked formulae chart. The approach also scales to pulling in data from CSV stores and other online spreadsheets, using spreadsheets as a database via the =QUERY() formula (e.g. Using Google Spreadsheets Like a Database – The QUERY Formula), and so on. There might also be a market for selling prepackaged or custom formulae as script bundles via a script store within a larger Google Apps App store

PS I’m trying to collect example SPARQL queries that run over the various data.gov.uk end points because: a) I’m really struggling in getting my head round writing my own, not least because I struggle to make sense of the ontologies, if I can find them, and write valid queries off the back of them; even (in fact, especially) really simple training/example queries will do! b) coming up with queries that either have interesting/informative/useful results, or clearly demonstrate an important ‘teaching point’ about the construction of SPARQL queries, is something I haven’t yet got a feel for. If you’ve written any, and you’re willing to share, please post a gist to github and add a link in a comment here.

PPS utility bits, so I don’t lose track of them:
education datastore ontology
– Apps script XML Element class

PPPS HEre’s how to dump a 2D CSV table into a range of cells: Writing 2D Data Arrays to a Google Spreadsheet from Google Apps Script Making an HTTP POST Request for CSV Data

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...