Using SPARQL Query Libraries to Generate Simple Linked Data API Wrappers

A handful of open Linked Data have appeared through my feeds in the last couple of days, including (via RBloggers) SPARQL with R in less than 5 minutes, which shows how to query US data.gov Linked Data and then Leigh Dodds’ Brief Review of the Land Registry Linked Data.

I was going to post a couple of of examples merging those two posts – showing how to access Land Registry data via Leigh’s example queries in R, then plotting some of the results using ggplot2, but another post of Leigh’s today – SPARQL-doc – a simple convention for documenting individual SPARQL queries, has sparked another thought…

For some time I’ve been intrigued by the idea of a marketplace in queries over public datasets, as well as the public sharing of generally useful queries. A good query is like a good gold pan, or a good interview question – it can get a dataset to reveal something valuable that may otherwise have laid hidden. Coming up with a good query in part requires having a good understanding of the structure of a dataset, in part having an eye for what sorts of secret the data may contain: the next step is crafting a well phrased query that can tease that secret out. Creating the query might take some time, some effort, and some degree of expertise in query optimisation to make it actually runnable in reasonable time (which is why I figure there may be a market for such things*) but once written, the query is there. And if it can be appropriately parameterised, it may generalise.

(*There are actually a couple of models I can think of: 1) I keep the query secret, but run it and give you the results; 2) I license the “query source code” to you and let you run it yourself. Hmm, I wonder: do folk license queries they share? How, and to what extent, might derived queries/query modifications be accommodated in such a licensing scheme?)

Pondering Leigh’s SPARQL-doc post, another post via R-bloggers, Building a package in RStudio is actually very easy (which describes how to package a set of R files for distribution via github), asdfree (analyze survey data for free), a site that “announces obsessively-detailed instructions to analyze us government survey data with free tools” (and which includes R bundles to get you started quickly…), the resource listing Documentation for package ‘datasets’ version 2.15.2 that describes a bundled package of datasets for R and the Linked Data API, which sought to provide a simple RESTful API over SPARQL endpoints, I wondered the following:

How about developing and sharing commented query libraries around Linked Data endpoints that could be used in arbitrary Linked Data clients?

(By “Linked Data clients”, I mean different user agent contexts. So for example, calling a query from Python, or R, or Google Spreadsheets.) That’s it… Simple.

One approach (the simplest?) might be to put each separate query into a separate file, with a filename that could be used to spawn a function name that could be used to call that query. Putting all the queries into a directory and zipping them up would provide a minimal packaging format. An additional manifest file might minimally document the filename along with the parameters that can be passed into and returned from the query. Helper libraries in arbitrary languages would open the query package and “compile” a programme library/set of “API” calling functions for that language (so for example, in R it would create a set of R functions, in Python a set of Python functions).

(This reminds me of a Twitter exchange with Nick Jackson/@jacksonj04 a couple of days ago around “self-assembling” API programme libraries that could be compiled in an arbitrary language from a JSON API, cf. Swagger (presentation), which I haven’t had time to look at yet.)

The idea, then is this:

  1. Define a simple file format for declaring documented SPARQL queries
  2. Define a simple packaging format for bundling separate SPARQL queries
  3. The simply packaged set of queries define a simple “raw query” API over a Linked Data dataset
  4. Describe a simple protocol for creating programming language specific library wrappers around API from the query bundle package.

So.. I guess two questions arise: 1) would this be useful? 2) how hard could it be?

[See also: @ldodds again, on Publishing SPARQL queries and-documentation using github]

Mapping How Programming Languages Influenced Each Other According to Wikipedia

By way of demonstrating how the recipe described in Visualising Related Entries in Wikipedia Using Gephi can easily be turned to other things, here’s a map of how different computer programming languages influence each other according to DBpedia/Wikipedia:

Here’s the code that I pasted in to the Request area of the Gephi Semantic Web Import plugin as configured for a DBpedia import:

prefix gephi:<http://gephi.org/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT{
  ?a gephi:label ?an .
  ?b gephi:label ?bn .
  ?a <http://dbpedia.org/ontology/influencedBy> ?b
} WHERE {
?a a <http://dbpedia.org/ontology/ProgrammingLanguage>.
?b a <http://dbpedia.org/ontology/ProgrammingLanguage>.
?a <http://dbpedia.org/ontology/influencedBy> ?b.
?a foaf:name ?an.
?b foaf:name ?bn.
}

As to how I found the <http://dbpedia.org/ontology/ProgrammingLanguage&gt; relation, I had a play around with the SNORQL query interface for DBpedia looking for possible relations using queries along the lines of:

SELECT DISTINCT ?c WHERE {
?a <http://dbpedia.org/ontology/influencedBy> ?b.
?a rdf:type ?c.
?b a ?c.
} limit 50 offset 150

(I think a (as in ?x a ?y and rdf:type are synonyms?)

This query looks for pairs of things (?a, ?b), each of the same type, ?c, where ?b also influences ?a, then reports what sort of thing (?c) they are (philosophers, for example, or programming languages). We can then use this thing in our custom Wikipedia/DBpedia/Gephi semantic web mapping request to map out the “internal” influence network pertaining to that thing (internal in the sense that the things that are influencing and influenced are both representatives of the same, erm, thing…;-).

The limit term specifies how many results to return, the offset essentially allows you to page through results (so an offset of 500 will return results starting with the 501st result overall). DISTINCT ensures we see unique relations.

If you see a relation that looks like dbpedia:ontology/Philosopher, put it in and brackets (<>) and replace dbpedia: with http://dbpedia.org/ to give something like <http://dbpedia.org/ontology/Philosopher&gt;.

PS see how to use a similar technique to map out musical genres ascribed to bands on WIkipedia

Visualising Related Entries in Wikipedia Using Gephi

Sometime last week, @mediaczar tipped me off to a neat recipe on the wonderfully named Drunks&Lampposts blog, Graphing the history of philosophy, that uses Gephi to map an influence network in the world of philosophy. The data is based on the extraction of the “influencedBy” relationship over philosophers referred to in Wikipedia using the machine readable, structured data view of Wikipedia that is DBpedia.

The recipe given hints at how to extract data from DBpedia, tidy it up and then import it into Gephi… but there is a quicker way: the Gephi Semantic Web Import plugin. (If it’s not already installed, you can install this plugin via the Tools -> Plugins menu, then look in the Available Plugin.)

To get DBpedia data into Gephi, we need to do three things:

– tell the importer where to find the data by giving it a URL (the “Driver” configuration setting);
– tell the importer what data we want to get back, by specifying what is essentially a database query (the “Request” configuration setting);
– tell Gephi how to create the network we want to visualise from the data returned from DBpedia (in the context of the “Request” configuration).

Fortunately, we don’t have to work out how to do this from scratch – from the Semantic Web Import Configuration panel, configure the importer by setting the configuration to DBPediaMovies.

Hitting “Set Configuration” sets up the Driver (Remote SOAP Endpoint with Endpoint URL http://dbpedia.org/sparql):

and provides a dummy, sample query Request:

We need to do some work creating our own query now, but not too much – we can use this DBpediaMovies example and the query given on the Drunks&Lampposts blog as a starting point:

SELECT *
WHERE {
?p a
<http://dbpedia.org/ontology/Philosopher> .
?p <http://dbpedia.org/ontology/influenced> ?influenced.
}

This query essentially says: ‘give me all the pairs of people, (?p, ?influenced), where each person ?p is a philosopher, and each person ?influenced is influenced by ?p’.

We can replace the WHERE part of the query in the Semantic Web Importer with the WHERE part of this query, but what graph do we want to put together in the CONSTRUCT part of the Request?

The graph we are going to visualise will have nodes that are philosophers or the people who influenced them. The edges connecting the nodes will represent that one influenced the other, using a directed line (with an arrow) to show that A influenced B, for example.

The following construction should achieve this:

CONSTRUCT{
?p <http://dbpedia.org/ontology/influenced> ?influenced.
} WHERE {
  ?p a
<http://dbpedia.org/ontology/Philosopher> .
?p <http://dbpedia.org/ontology/influenced> ?influenced.
} LIMIT 10000

(The LIMIT argument limits the number of rows of data we’re going to get back. It’s often good practice to set this quite low when you’re trying out a new query!)

Hit Run and a graph should be imported:

If you click on the Graph panel (in the main Overview view of the Gephi tool), you should see the graph:

If we run the PageRank or EigenVector centrality statistic, size the nodes according to that value, and lay out the graph using a force directed or Fruchtermann-Rheingold layout algorithm, we get something like this:

The nodes are labelled in a rather clumsy way – http://dbpedia.org/page/Martin_Heidegger – for example, but we can tidy this up. Going to one of the DPpedia pages, such as http://dbpedia.org/page/Martin_Heidegger, we find what else DBpedia knows about this person:

In particular, we see we can get hold of the name of the philosopher using the foaf:name property/relation. If you look back to the original DBpediaMovies example, we can start to pick it apart. It looks as if there are a set of gephi properties we can use to create our network, including a “label” property. Maybe this will help us label our nodes more clearly, using the actual name of a philosopher for example? You may also notice the declaration of a gephi “prefix”, which appears in various constructions (such as gephi:label). Hmmm.. Maybe gephi:label is to prefix gephi:<http://gephi.org/&gt; as foaf:name is to something? If we do a web search for the phrase foaf:name prefix, we turn up several results that contain the phrase prefix foaf:<http://xmlns.com/foaf/0.1/&gt;, so maybe we need one of those to get the foaf:name out of DBpedia….?

But how do we get it out? We’ve already seen that we can get the name of a person who was influenced by a philosopher by asking for results where this relation holds: ?p <http://dbpedia.org/ontology/influenced&gt; ?influenced. So it follows we can get the name of a philosopher (?pname) by asking for the foaf:name in the WHEER part of the query:

?p <foaf:name> ?pname.

and then using this name as a label in the CONSTRUCTion:

?p gephi:label ?pname.

We can also do a similar exercise for the person who is influenced.

looking through the DBpedia record, I notice that as well as an influenced relation, there is an influencedBy relation (I think this is the one that was actually used in the Drunks&Lampposts blog?). So let’s use that in this final version of the query:

prefix gephi:<http://gephi.org/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT{
  ?philosopher gephi:label ?philosopherName .
  ?influence gephi:label ?influenceName .
  ?philosopher <http://dbpedia.org/ontology/influencedBy> ?influence
} WHERE {
  ?philosopher a
  <http://dbpedia.org/ontology/Philosopher> .
  ?philosopher <http://dbpedia.org/ontology/influencedBy> ?influence.
  ?philosopher foaf:name ?philosopherName.
  ?influence foaf:name ?influenceName.
} LIMIT 10000

If you’ve already run a query to load in a graph, if you run this query it may appear on top of the previous one, so it’s best to clear the workspace first. At the bottom right of the screen is a list of workspaces – click on the RDF Request Graph label to pop up a list of workspaces, and close the RDF Request Graph one by clicking on the x.

Now run the query into a newly launched, pristine workspace, and play with the graph to your heart’s content…:-) [I’ll maybe post more on this later – in the meantime, if you’re new to Gephi, here are some Gephi tutorials]

Here’s what I get sizing nodes and labels by PageRank, and laying out the graph by using a combination of Force Atlas2, Expansion and Label Adjust (to stop labels overlapping) layout tools:

Using the Ego Network filter, we can then focus on the immediate influence network (influencers and influenced) of an individual philosopher:

What this recipe hopefully shows is how you can directly load data from DBpedia into Gephi. The two tricks you need to learn to do this for other data sets are:

1) figuring out how to get data out of DBpedia (the WHERE part of the Request);
2) figuring out how to get that data into shape for Gephi (the CONSTRUCT part of the request).

If you come up with any other interesting graphs, please post Request fragments in the comments below:-)

[See also: Graphing Every* Idea In History]

PS via @sciencebase (Mapping research on Wikipedia with Wikimaps), there’s this related tool: WikiMaps, on online (and desktop?) tool for visualising various Wikipedia powered graphs, such as, erm, Justin Bieber’s network…

Any other related tools out there for constructing and visualising Wikipedia powered network maps? Please add a link via the comments if you know of any…

PPS for a generalisation of this approach, and a recipe for finding other DBpedia networks to map, see Mapping How Programming Languages Influenced Each Other According to Wikipedia.

PPPS Here’s another handy recipe that shows how to pull SPARQLed DBPedia queries into R, analyse them there, and then generate a graphML file for rendering in Gephi: SPARQL Package for R / Gephi – Movie star graph visualization Tutorial

PPPPS related – a large scale version of this? Wikipedia Mining Algorithm Reveals The Most Influential People In 35 Centuries Of Human History

Handling RDF on Your Own System – Quick Start

One of the things that I think tends towards being a bit of an elephant in the Linked Data room is the practically difficulty of running a query that links together results from two different datastores, even if they share common identifiers. The solution – at the moment at least – seems to require grabbing a dump of both datastores, uploading them to a common datastore and then querying that…

…which means you need to run your own triple store…

This quick post links out the the work of two others, as much as a placeholder for myself as for anything, describing how to get started doing exactly that…

First up, John Goodwin, aka @gothwin, (a go to person if you ever have dealings with the Ordnance Survey Linked Data) on How can I use the Ordnance Survey Linked Data: a python rdflib example. As John describes it:

[T]his post shows how you just need rdflib and Python to build a simple linked data mashup – no separate triplestore is required! RDF is loaded into a Graph. Triples in this Graph reference postcode URIs. These URIs are de-referenced and the RDF behind them is loaded into the Graph. We have now enhanced the data in the Graph with local authority area information. So as well as knowing the postcode of the organisations taking part in certain projects we now also know which local authority area they are in. Job done! We can now analyse funding data at the level of postcode, local authority area and (as an exercise for the ready) European region.

Secondly, if you want to run a fully blown triple store on your own localhost, check out this post from Jeni Tennison, aka @jenit, (a go to person if you’re using the data.gov.uk Linked Datastores, or have an interest in the Linked Data JSON API): Getting Started with RDF and SPARQL Using 4store and RDF.rb, which documents how to get started on the following challenges (via Richard Pope’s Linked Data/RDF/SPARQL Documentation Challenge):

Install an RDF store from a package management system on a computer running either Apple’s OSX or Ubuntu Desktop.
Install a code library (again from a package management system) for talking to the RDF store in either PHP, Ruby or Python.
Programatically load some real-world data into the RDF datastore using either PHP, Ruby or Python.
Programatically retrieve data from the datastore with SPARQL using using either PHP, Ruby or Python.
Convert retrieved data into an object or datatype that can be used by the chosen programming language (e.g. a Python dictionary).

PS it may also be worth checking out these posts from Kingsley Idehen:
SPARQL Guide for the PHP Developer
SPARQL Guide for Python Developer
SPARQL Guide for the Javascript Developer
SPARQL for the Ruby Developer

Dewey Fencing, and Getting Started With Bibliographica Linked Data

If you follow @redjets on Twitter, you can follow the departure and arrival of the Red Funnel catamarans as they travel between Southampton and Cowes.

Redjets twitter feed

The service, put together by @andysc, tracks shipping movements on the Solent (AIS is the keyword, if you want to find out how;-) and watches out for the RedJet identifiers entering or leaving geographical areas around the ports. These virtual, digitally created boundaries are often referred to as geofences.

So for example, my ‘find folk tweeting near a location’ hack uses the Twitter search API to construct a geofenced locale and then search for tweets within that boundary.

But what does that have to do with Dewey? Simply this: if we have easy access to standardised classmarks, such as Dewey Decimal Classification ranges, or sets of Dewey classifications, we can create “Dewey fences” to search for books on a particular topic. (Nothing novel in this of course, but it provides a weak context for the rest of this post;-)

Last night, I came across a post announcing that the British Library [had] Release[d] 3 Million Records to the JISC funded OpenBib project. These records had been added to the bibliographica.org service, which just happens to have a Linked Data SPARQL endpoint. Because I’m in a phase of learning how to make sense of new (to me) datastores, I thought I’d see what I could get it to do.

The SPARQL page itself gives an example query, which identifies several of the key vocabularies (ontologies? I’m still trying to get the languaging right…;-) used in the datastore:

Bibliogrpahica endpoint with sample query - http://bibliographica.org/sparql

Here’s the example:

PREFIX dc: <http://purl.org/dc/terms/&gt;
PREFIX bibo: <http://purl.org/ontology/bibo/&gt;
PREFIX foaf: <http://xmlns.com/foaf/0.1/&gt;
SELECT DISTINCT ?book ?title ?name ?description
WHERE {
?book a bibo:Book .
?book dc:title ?title . ?title bif:contains "Edinburgh" .
OPTIONAL { ?book dc:description ?description } .
OPTIONAL {
?book dc:contributor ?author . ?author foaf:name ?name
}
} GROUP BY ?book LIMIT 50

If we look at an actual book record, we get something like this book record:

Book record on Bibliographica

The book record page gives us the links we need in order to piece together our own queries, using the example query as an additional crib. So for example, by inspecting the link that specifies an ISBN relation on the book record page (http://purl.org/ontology/bibo/isbn; we have already declared PREFIX bibo: <http://purl.org/ontology/bibo/&gt;, so we can write the isbn relation as bibo:isbn), I can tweak the original example query to also report the ISBN:

SELECT DISTINCT ?book ?title ?name ?description ?isbn
WHERE {
?book a bibo:Book .
//etc
?book bibo:isbn ?isbn.
//etc
}

To search for a book by ISBN, and then return its title, we might use a query like this one:
PREFIX dc: <http://purl.org/dc/terms/&gt;
PREFIX bibo: <http://purl.org/ontology/bibo/&gt;
SELECT DISTINCT ?book ?title
WHERE {
?book a bibo:Book .
?book dc:title ?title.
?book bibo:isbn <urn:isbn:019857519x>.
}

Which is where Dewey classifications make their first appearance. Looking at the book record, we see there may be several subject terms associated with a book, including what are presumably Dewey classmarks. This means that given a book by ISBN, we should be able to look up its classmark, and then other books with the same classmark. We can also look up related books using the keyword subject terms, which may or may not conform to controlled vocabulary terms. In terms of fencing, we might also be able to take sets of books – such as books on a reading list – to create topical “Dewey fenced” areas that define a set of classmarks that are all associated in some way (e.g. the topic that forms the area of study for a given reading list). I’m not sure if these sets are likely to be useful in any way, but they’d allow us to ask the question about the extent to which a reading list models a Dewey-style view of the world, or whether it is “multi-disciplinary’ (at least, according to Dewey…;-) The reason why this is interesting (to me, at least) is because to a certain extent, physical libraries are serendipity engines as well as discovery engines, based on the way books are physically laid out and associated (or not) with one another; and one of the things I’m interested in is useful, serendipitous discovery…

Anyway, back to code geekery… Looking at the book page, we can find out how to grab a list of subject terms – http://purl.org/dc/terms/subject (which we can write as dc:subject given the PREFIXes already declared) is the relation we want. Unfortunately, it’s not that simple, because the subject term doesn’t always map directly to the values displayed in the subject area of the book page. The following query tries to unpick just what the displayed subject terms refer to:

PREFIX dc: <http://purl.org/dc/terms/&gt;
PREFIX bibo: <http://purl.org/ontology/bibo/&gt;
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt;
SELECT DISTINCT ?book ?title ?subject ?label ?property ?value
WHERE {
?book a bibo:Book .
?book dc:title ?title.
?book bibo:isbn <urn:isbn:019857519x>.
?book dc:subject ?subject.
OPTIONAL { ?subject rdfs:label ?label }
OPTIONAL { ?subject ?property ?value }
} GROUP BY ?book order by ?subject LIMIT 50

And here’s the result:

Subject terms on bibliographica

We can also see how messy things are by looking at one of the other representations of the book page which unpacks the dc:subject values as follows:

dc:subject [ a skos:Concept;
skos:inScheme ;
skos:prefLabel "Genetics." ],
[ a skos:Concept;
skos:inScheme ;
skos:prefLabel "Evolution (Biology)" ],
[ a skos:Concept;
skos:inScheme ;
skos:prefLabel "Behavior genetics." ],
[ rdfs:label "Animals" ],
[ rdfs:label "Behaviour" ],
[ rdfs:label "expounded by" ],
[ rdfs:label "theories of survival of species" ],
[ rdfs:label "Animals" ],
[ rdfs:label "Genes" ],
[ a skos:Concept;
skos:inScheme ;
skos:notation "591.5"^^ ],
[ a skos:Concept;
skos:inScheme ;
skos:notation "591.1/5/1"^^ ],
[ a skos:Concept;
skos:inScheme ;
skos:notation "591.5"^^ ],
[ a skos:Concept;
skos:inScheme ;
skos:notation "591.1/5"^^ ];

So by my reckoning, this query should get us the Dewey Decimal classification(s) for a book given its ISBN:

PREFIX dc: <http://purl.org/dc/terms/&gt;
PREFIX bibo: <http://purl.org/ontology/bibo/&gt;
PREFIX skos: <http://www.w3.org/2004/02/skos/core#&gt;
SELECT DISTINCT ?book ?title ?classmark
WHERE {
?book a bibo:Book .
?book dc:title ?title.
?book bibo:isbn <urn:isbn:019857519x>.
?book dc:subject ?subject.
?subject a skos:Concept;
skos:inScheme <http://dewey.info/scheme/e18&gt;;
skos:notation ?classmark
}

which gives as a result:

Querying for a classmark - bibliographica

We should now be able to find other books with the same classmark by extending the query, I’m guessing like this????

PREFIX dc: <http://purl.org/dc/terms/&gt;
PREFIX bibo: <http://purl.org/ontology/bibo/&gt;
PREFIX skos: <http://www.w3.org/2004/02/skos/core#&gt;
SELECT DISTINCT ?othertitle ?otherisbn
WHERE {
?book a bibo:Book .
?book bibo:isbn <urn:isbn:019857519x>.
?book dc:subject ?subject.
?subject a skos:Concept; skos:inScheme <http://dewey.info/scheme/e18&gt;; skos:notation ?classmark.
?otherbook a bibo:Book .
?otherbook dc:subject ?subject.
?otherbook dc:title ?othertitle.
?otherbook bibo:isbn ?otherisbn.
} LIMIT 50

(Hmmm… how do I say ?book is not equal to ?otherbook?)

…only, I can’t check it right now because bibliographica has stopped playing again….:-( Which is a lesson to be learned, I guess… If you’re running Linked Data queries across multiple different services, if one of those services go down, things can break…

PS that query didn’t work in the end… in the meantime, here’s a query for looking up books by classmark:

PREFIX dc: <http://purl.org/dc/terms/&gt;
PREFIX bibo: <http://purl.org/ontology/bibo/&gt;
PREFIX skos: <http://www.w3.org/2004/02/skos/core#&gt;
SELECT DISTINCT ?title ?isbn
WHERE {
?book a bibo:Book .
?book dc:title ?title.
?book bibo:isbn ?isbn.
?book dc:subject ?subject.
?subject a skos:Concept; skos:inScheme <http://dewey.info/scheme/e18&gt;; skos:notation '591.5'^^<ddc:Notation>.
} LIMIT 50

(I found the ^^<ddc:Notation> crib in the n3 version of the book information page… I’m not sure what the ddc prefix is supposed to be, although the query seems to run without me having to declare it explicitly?

[ a skos:Concept;
skos:inScheme <http://dewey.info/scheme/e19&gt;;
skos:notation "591.5"^^ ],

UPDATE: doh! the < .. > means it doesnlt need unpacking, right?)

data.open.ac.uk Linked Data Now Exposing Module Information

As HE becomes more and more corporatised, I suspect we’re going to see online supermarkets appearing that help you identify – and register on – degree courses in exchange for an affiliate/referral fee from the university concerned. For those sites to appear, they’ll need access to course catalogues, of course. UCAS currently holds the most comprehensive one that I know of, but it’s a pain to scrape and all but useless as a datasource. But if the universities publish course catalogue information themselves in a clean way (and ideally, a standardised way), it shouldn’t be too hard to construct aggregation sites ourselves…

So it was encouraging to see earlier this week an announcement that the OU’s data.open.ac.uk site has started publishing module data from the course catalogue – that is, data about the modules (as we now call them – they used to be called courses) that you can study with the OU.

The data includes various bits of administrative information about each module, the territories it can be studied in, and (most importantly?!) pricing information;-)

data.open.ac.uk - module data

You may remember that the data.open.ac.uk site itself launched a few weeks ago with the release of Linked Data sets including data about deposits in the open repository, as well as OU podcasts on iTunes (data.open.ac.uk Arrives, With Linked Data Goodness. Where podcasts are associated with a course, the magic of Linked Data means that we can easily get to the podcasts via the course/module identifier:

data.open.ac.uk

It’s also possible to find modules that bear an isSimilarTo relation to the current module, where isSimilarTo means (I think?) “was also studied by students taking this module”.

As an example of how to get at the data, here’s a Python script using the Python YQL library that lets me run a SPARQL query over the data.open.ac.uk course module data (the code includes a couple of example queries):

import yql

def run_sparql_query(query, endpoint):
    y = yql.Public()
    query='select * from sparql.search where query="'+query+'" and service="'+endpoint+'"'
    env = "http://datatables.org/alltables.env"
    return y.execute(query, env=env)

endpoint='http://data.open.ac.uk/query'

# This query finds the identifiers of postgraduate technology courses that are similar to each other
q1='''
select distinct ?x ?z from <http://data.open.ac.uk/context/course> where {
?x a <http://purl.org/vocab/aiiso/schema#Module>.
?x <http://data.open.ac.uk/saou/ontology#courseLevel> <http://data.open.ac.uk/saou/ontology#postgraduate>.
?x <http://purl.org/dc/terms/subject> <http://data.open.ac.uk/topic/technology>.
?x <http://purl.org/goodrelations/v1#isSimilarTo> ?z
} limit 10
'''

# This query finds the names and course codes of 
# postgraduate technology courses that are similar to each other
q2='''
select distinct ?code1 ?name1 ?code2 ?name2 from <http://data.open.ac.uk/context/course> where {
?x a <http://purl.org/vocab/aiiso/schema#Module>.
?x <http://data.open.ac.uk/saou/ontology#courseLevel> <http://data.open.ac.uk/saou/ontology#postgraduate>.
?x <http://purl.org/dc/terms/subject> <http://data.open.ac.uk/topic/technology>.
?x <http://courseware.rkbexplorer.com/ontologies/courseware#has-title> ?name1.
?x <http://purl.org/goodrelations/v1#isSimilarTo> ?z.
?z <http://courseware.rkbexplorer.com/ontologies/courseware#has-title> ?name2.
?x <http://purl.org/vocab/aiiso/schema#code> ?code1.
?z <http://purl.org/vocab/aiiso/schema#code> ?code2.
}
'''

# This query finds the names and course codes of 
# postgraduate courses that are similar to each other
q3='''
select distinct ?code1 ?name1 ?code2 ?name2 from <http://data.open.ac.uk/context/course> where {
?x a <http://purl.org/vocab/aiiso/schema#Module>.
?x <http://data.open.ac.uk/saou/ontology#courseLevel> <http://data.open.ac.uk/saou/ontology#postgraduate>.
?x <http://courseware.rkbexplorer.com/ontologies/courseware#has-title> ?name1.
?x <http://purl.org/goodrelations/v1#isSimilarTo> ?z.
?z <http://courseware.rkbexplorer.com/ontologies/courseware#has-title> ?name2.
?x <http://purl.org/vocab/aiiso/schema#code> ?code1.
?z <http://purl.org/vocab/aiiso/schema#code> ?code2.
}
'''

result=run_sparql_query(q3, endpoint)

for row in result.rows:
	for r in row['result']:
		print r

I’m not sure what purposes we can put any of this data to yet, but for starters I wondered just how connected the various postgraduate courses are based on the isSimilarTo relation. Using q3 from the code above, I generated a Gephi GDF/network file using the following snippet:

# Generate a Gephi GDF file showing connections between 
# modules that are similar to each other
fname='out2.gdf'
f=open(fname,'w')

f.write('nodedef> name VARCHAR, label VARCHAR, title VARCHAR\n')
ccodes=[]
for row in result.rows:
	for r in row['result']:
		if r['code1']['value'] not in ccodes:
			ccodes.append(r['code1']['value'])
			f.write(r['code1']['value']+','+r['code1']['value']+',"'+r['name1']['value']+'"\n')
		if r['code2']['value'] not in ccodes:
			ccodes.append(r['code2']['value'])
			f.write(r['code2']['value']+','+r['code2']['value']+',"'+r['name2']['value']+'"\n')
		
f.write('edgedef> c1 VARCHAR, c2 VARCHAR\n')
for row in result.rows:
	for r in row['result']:
		#print r
		f.write(r['code1']['value']+','+r['code2']['value']+'\n')

f.close()

to produce the following graph. (Size is out degree, colour is in degree. Edges go from ?x to ?z. Layout: Fruchterman Reingold, followed by Expansion.)

OU postgrad courses in gephi

The layout style is a force directed algorithm, which in this case has had the effect of picking out various clusters of highly connected courses (so for example, the E courses are clustered together, as are the M courses, B courses, T courses and so on.)

If we run the ego filter over this network on a particular module code, we can see which modules were studying alongside it:

ego filter on course codes

Note that in the above diagram, the nodes are sized/coloured according to in-degree/out-degree in the original, complete graph, If we re-calculate those measures on just this partition, we get the following:

Recoloured course network

If we return to the whole network, and run the Modularity class statistic, we can identify several different course clusters:

Modules - modularity class

Here’s one of them expanded:

A module cluster

Here are some more:

COurse clusters

I’m not sure what use any of this is, but if nothing else, it shows there’s structure in that data (which is exactly what we’d expect, right?;-)

PS as to how I wrote my first query on this data, I copied the ‘postgraduate modules in computing’ example query from data.open.ac.uk:

http://data.open.ac.uk/query?query=select%20distinct%20%3Fx%20from%20%3Chttp://data.open.ac.uk/context/course%3Ewhere%20{%3Fx%20a%20%3Chttp://purl.org/vocab/aiiso/schema%23Module%3E.%0A%3Fx%20%3Chttp://data.open.ac.uk/saou/ontology%23courseLevel%3E%20%3Chttp://data.open.ac.uk/saou/ontology%23postgraduate%3E.%0A%3Fx%20%3Chttp://purl.org/dc/terms/subject%3E%20%3Chttp://data.open.ac.uk/topic/computing%3E%0A}%0A&limit=200

and pasted it into a tool that “unescapes” encoded URLs, which encodes the SPARQL query:

Unescaping text

I was then able to pull out the example query:
select distinct ?x from <http://data.open.ac.uk/context/course&gt;
where {?x a <http://purl.org/vocab/aiiso/schema#Module&gt;.
?x <http://data.open.ac.uk/saou/ontology#courseLevel&gt; <http://data.open.ac.uk/saou/ontology#postgraduate&gt;.
?x <http://purl.org/dc/terms/subject&gt; <http://data.open.ac.uk/topic/computing&gt;
}

Just by the by, there’s a host of other handy text tools at Text Mechanic.

Accessing Linked Data in Scraperwiki via YQL

A comment from @frabcus earlier today alerted me to the fact that the Scraperwiki team had taken me up on my suggestion that they make the Python YQL library available in the Scraperwiki environment, so I thought I ought to come up with an example of using it…

YQL provides a general purpose standard query interface “to the web”, interfacing with all manner of native APIs and providing a common way of querying with them, and receiving responses from them. YQL is extensible too – If there isn’t a wrapper for your favourite API, you can write one yourself and submit it to the community. (For a good overview of the rationale for, and philosophy behind YQL, see Christian Heilmann’s the Why of YQL.)

Browsing through the various community tables, I found one for handling SPARQL queries. The YQL wrapper expects a SPARQL query and an endpoint URL, and will return the results in the YQL standard form. (Here’s an example SPARQL query in the YQL developer console using the data.gov.uk education datastore.)

The YQL query format is:
select * from sparql.search where query=”YOUR_SPARQL_QUERY” and service=”SPARQL_ENDPOINT_URL”
and can be called in Python YQL in the following way (Python YQL usage):

def run_sparql_query(query, endpoint):
    y = yql.Public()
    query='select * from sparql.search where query="'+query+'" and service="'+endpoint+'"'
    env = "http://datatables.org/alltables.env"
    return y.execute(query, env=env)

For a couple of weeks now, I’ve been look for an opportunity to try to do something – anything – with the newly released Ordnance Survey Linked Data (read @gothwin’s introduction to it for more details: /location /location /location – exploring Ordnance Survey Linked Data – Part 2).

One of the things the OS Linked Data looks exceedingly good for is acting as glue, mapping between different representations for geographical and organisational areas; the data can also return regions that neighbour on a region, which could make for some interesting “next door to each other” ward, district or county level comparisons.

One of the most obvious ways in to the data is via a postcode. The following Linked Data query to the ordnance survey SPARQL endpoint (http://api.talis.com/stores/ordnance-survey/services/sparql) returns the OS district ID, ward and district name that a postcode exists in:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#&gt;
PREFIX postcode: < http://data.ordnancesurvey.co.uk/ontology/postcode/&gt;

select ?district ?wardname ?districtname where { <http://data.ordnancesurvey.co.uk/id/postcodeunit/MK76AA&gt;
postcode:district ?district; postcode:ward ?ward.
?district skos:prefLabel ?districtname.
?ward skos:prefLabel ?wardname
}

Here is is running in the YQL developer console:

OS Posctcode query in YQL developer console

(Just by the by, we can create a query alias for that query if we want, by changing the postcode (MK76AA in the example to @postcode. This gives us a URL argument/variable called postcode whose value gets substituted in to the query whenever we call it:

http://query.yahooapis.com/v1/public/yql/psychemedia/ospostcodelookupdemo1?postcode=INSERTPOSTCODEHERE&env=http://datatables.org/alltables.env

[Note we manually need to add the environment variable &env=http://datatables.org/alltables.env to the URL created by the query alias generator/wizard.]

YQL query alieas for sparql query

So… that’s SPARQL in YQL – but how can we use it in Scraperwiki… The newly added YQL wrapper makes it easy.. here’s an example, based on the above:

os_endpoint='http://api.talis.com/stores/ordnance-survey/services/sparql'

os_query='''
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX postcode: <http://data.ordnancesurvey.co.uk/ontology/postcode/>

select ?district ?wardname ?districtname where {
<http://data.ordnancesurvey.co.uk/id/postcodeunit/MAGIC_POSTCODE> postcode:district ?district; postcode:ward ?ward.
?district skos:prefLabel ?districtname.
?ward skos:prefLabel ?wardname
}
'''
postcode="MK7 6AA"

os_query=os_query.replace('MAGIC_POSTCODE',postcode.replace(' ',''))

def run_sparql_query(query, endpoint):
    y = yql.Public()
    query='select * from sparql.search where query="'+query+'" and service="'+endpoint+'"'
    env = "http://datatables.org/alltables.env"
    return y.execute(query, env=env)

result=run_sparql_query(os_query, os_endpoint)

for row in result.rows:
    print postcode,'is in the',row['result']['wardname']['value'],'ward of',row['result']['districtname']['value']
    record={ "id":postcode, "ward":row['result']['wardname']['value'],"district":row['result']['districtname']['value']}
    scraperwiki.datastore.save(["id"], record) 

I use the MAGICPOSTCODE substitution to give me the freedom to create a procedure that will take in a postcode argument and add it in to the query. Note that I am probably breaking all sorts of Linked Data rule by constructing the URL that uniquely identifies (reifies?) the postcode in the ordnance survey URL namespace (that is, I construct something like <http://data.ordnancesurvey.co.uk/id/postcodeunit/MK76AA&gt;, which contravenes the “URIs are opaque” rule that some folk advocate, but I’m a pragmatist;-)

Anyway, here’s a Scraperwiki example that scrapes a postcode from a web page, and looks up some of its details via the OS: simple Ordnance Survey Linked Data postcode lookup

The next thing I wanted to do was use two different Linked Data services. Here’s the setting. Suppose I know a postcode, and I want to lookup all the secondary schools in the council area that postcode exists in. How do I do that?

The data.gov.uk education datastore lets you look up schools in a council area given the council ID. Simon Hume gives some example queries to the education datastore here: Using SPARQL & the data.gov.uk school data. The following is a typical example:

prefix sch-ont: <http://education.data.gov.uk/def/school/&gt;

SELECT ?name ?reference ?date WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:uniqueReferenceNumber ?reference ;
sch-ont:districtAdministrative <http://statistics.data.gov.uk/id/local-authority-district/00MG&gt; ;
sch-ont:openDate ?date ;
sch-ont:phaseOfEducation .
}

Here, the secondary schools are being identified according to the district area they are in (00MG in this case).

But all I have is the postcode… Can Linked Data help me get from MK7 6AA to 00MG (or more specifically, from <http://data.ordnancesurvey.co.uk/id/postcodeunit/MAGIC_POSTCODE&gt; to <http://statistics.data.gov.uk/id/local-authority-district/00MG&gt;?)

Here’s what the OS knows about a postcode:

What the OS knows about a postcode

If we click on the District link, we can see what the OS knows about a district:

Local authority area code lookup in OS Linked Data

The Census Code corresponds to the local council id code used in the Education datastore (thanks to John Goodwin for pointing that out…). The identifier doesn’t provide a Linked Data URI, but we can construct one out of the code value:
<http://statistics.data.gov.uk/id/local-authority-district/MAGIC_DISTRICTCODE&gt;

(Note that the statistics.data.gov.uk lookup on the district code does include a sameas URL link back to the OS identifier.)

Here’s how we can get hold of the district code – it’s the dmingeo:hasCensusCode you’re looking for:

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/>
PREFIX postcode: <http://data.ordnancesurvey.co.uk/ontology/postcode/>

select ?district ?nsdistrict ?wardname ?districtname where {
<http://data.ordnancesurvey.co.uk/id/postcodeunit/MAGIC_POSTCODE> postcode:district ?district; postcode:ward ?ward.
?district skos:prefLabel ?districtname.
?ward skos:prefLabel ?wardname .
?district admingeo:hasCensusCode ?nsdistrict.
}
'''

postcode='MK7 6AA'
os_endpoint='http://api.talis.com/stores/ordnance-survey/services/sparql'
os_query=os_query.replace('MAGIC_POSTCODE',postcode.replace(' ',''))

result=run_sparql_query(os_query, os_endpoint)

for row in result.rows:
    print row['result']['nsdistrict']['value']
    districtcode=row['result']['nsdistrict']['value']
    print postcode,'is in the',row['result']['wardname']['value'],'ward of',row['result']['districtname']['value']
    record={ "id":postcode, "ward":row['result']['wardname']['value'],"district":row['result']['districtname']['value']} 

So what does that mean… well. we managed to look up the district code from a postcode using the Ordnance Survey API, which means we can insert that code into a lookup on the education datastore to find schools in that council area:

def run_sparql_query(query, endpoint):
    '''
    # The following string replacement construction may be handy
    query = 'select * from flickr.photos.search where text=@text limit 3';
    y.execute(query, {"text": "panda"})
    '''
    y = yql.Public()
    query='select * from sparql.search where query="'+query+'" and service="'+endpoint+'"'
    env = "http://datatables.org/alltables.env"
    return y.execute(query, env=env)

edu_endpoint='http://services.data.gov.uk/education/sparql'    

edu_query='''
prefix sch-ont:  <http://education.data.gov.uk/def/school/>

SELECT ?name ?reference ?date WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:uniqueReferenceNumber ?reference ;
sch-ont:districtAdministrative <http://statistics.data.gov.uk/id/local-authority-district/MAGIC_DISTRICTCODE> ;
sch-ont:openDate ?date ;
sch-ont:phaseOfEducation <http://education.data.gov.uk/def/school/PhaseOfEducation_Secondary>.
}
'''
districtcode='00MG'
edu_query=edu_query.replace('MAGIC_DISTRICTCODE',districtcode)
result=run_sparql_query(edu_query, edu_endpoint)
for row in result.rows:
    for school in row['result']:
        print school['name']['value'],school['reference']['value'],school['date']['value']
        record={ "id":school['reference']['value'],"name":school['name']['value'],"openingDate":school['date']['value']}
        scraperwiki.datastore.save(["id"], record) 
​

Here’s a Scraperwiki example showing the two separate Linked Data calls chained together (click on the “Edit” tab to see the code).

Linked Data in Scraperwiki

Okay – so that easy enough (?!;-). We’ve seen how:
– Scraperwiki supports calls to YQL;
– how to make SPARQL/Linked Data queries from Scraperwiki using YQL;
– how to get data from one Linked Data query and use it in another.

A big problem though is how do you know whether there is a linked data path from a data element in one Linked Data store (e.g. from a postcode lookup in the Ordnance Survey data) through to another datastore (e.g. district area codes in the education datastore), where you is a mere mortal and not a Linked Data guru?! Answers on the back of a postcard, please, or via the comments below;-)

PS whilst doing a little digging around, I came across some geo-referencing guidance on the National Statistcics website that suggests that postcode areas might change over time (they also publish current and previous postcode info). So what do we assume about the status (currency, validity) of the Ordnance Survey postcode data?

PPS Just by the by, this may be useful to folk looking for Linked Data context around local councils: @pezholio’s First steps to councils publishing their own linked data

Too Much Information, Not Enough Data?

Trying to scope out ideas for a talk I’m due to give in Manchester next week on open civic data, I came across quite a few examples of local councils making map based “data” available – on a map. For example, Milton Keynes Council has a nice collection of links to interactive maps :

The maps look very pretty, when they load… I don’t know if it was my connection that was particularly slow when I tried to grab the screenshot, but I never did manage to load any hi-res tiles…

Milton Keynes interactive map

Manchester City Council also offers a range of map based navigation for identifying local schools:

Manchester schools maps

And if we tunnel down:

Manchester schools map

The point I wanted to make here is that what the councils are doing is more to do with displaying map based information than geocoded data. That is, it’s still hard for me to create my own map based views based on the stuff the councils are publishing.

The valuable goods for me in the role of a developer is flexible access to the raw data so that I can re-present it and make sense of it in a way that I decide.

So for example, in the case of the Manchester schools data, where I might decide to throw caution to the wind and plot all the schools on one map rather than present the information in the form of schools contained in arbitrarily drawn regions, it would be nice to be able to get a raw data feed of schools under the control of the Manchester local authority…

…which I seem to remember is something I can get from the data.gov.uk education datastore.

Looking through my own hacks, I found a description of a Yahoo pipe (that appears to have rotted:-( that will return schools given a local authority code… but what is the code for Manchester?

A quick google turned up a post by Simon Hume entitled Using SPARQL & the data.gov.uk school data which points to a handy service that uses a query of the National Statistics SPARQL endpoint to list council ID codes (I’m not sure if this can be extended to return a Council ID code based on a user-supplied postcode? If you know how to do this, please post a clue in a comment to this post;-)

Simon’s post also contains another rather handy example of a SPARQL query over the education datastore that will “call back all the schools in your local authority”, including the lat/long coordinates, so they can be easily placed onto a map.

This is the data, the useful stuff. The maps the councils have published that are shown above display some useful information, for sure, but it’s not data…

So here’s a thought: suppose that where councils feel they’re adding value by producing maps like the ones shown above (and I think that sort of display can provide a valuable service on a council website), wouldnlt it be great if:

1) the data they used to create the map came from a public datastore, such as one of the datastores on data,gov.uk, or maybe a queryable datastore local to the council; and

2) as a footnote to the page, or more likely on a page linked from it, a description was given of the query used to generate the data rendered on the information page.

Just a thought…

Getting Started with data.gov.uk… or not…

Go to any of the data.gov.uk SPARQL endpoints (that’s geeky techie scary speak for places where you can run geeky techie datastore query language queries and get back what looks to the eye like a whole jumble of confusing Radical Dance Faction lyrics [in joke;-0]) and you see a search box, of sorts… Like this one on the front of the finance datastore:

So, pop pickers:

One thing that I think would make the SPARQL page easier to use would be to have a list of links that would launch one of the last 10 or queries that had run in a reasonable time, returned more than no results, displayed down the left hand side – so n00bs like me could at least have a chance at seeing what a successful query looked like. Appreciating that some folk might want to keep their query secret (more on this another day…;-), there should probably be a ‘tick this box to keep your query out of the demo queries listing’ option when folk submit a query.

(A more adventurous solution, but one that I’m not suggesting at the moment, might allow folk who have run a query from the SPARQL page on the data.gov.uk site “share this query” to a database of (shared) queries. Or if you’ve logged in to the site, there may be an option of saving it as a private query.)

That is all…

PS if you have some interesting SPARQL queries, please feel free to share them below or e.g. via the link on here: Bookmarking and Sharing Open Data Queries.

PPS from @iand “shouldnt that post link to the similar http://tw.rpi.edu/weblog/2009/10/23/probing-the-sparql-endpoint-of-datagovuk/“; and here’s one from @gothwin: /location /location /location – exploring Ordnance Survey Linked Data.

PPPS for anyone who took the middle way in the vote, then if there are any example queries in the comments to this post, do they help you get started at all? If you voted “what are you talking about?” please add a comment below about what you think data.gov.uk, Linked Data and SPARQL might be, and what you’d like to be able to with them…

Using Data From Linked Data Datastores the Easy Way (i.e. in a spreadsheet, via a formula)

Disclaimer: before any Linked Data purists say I’m missing the point about what Linked Data is, does, or whatever, I don’t care, okay? I just don’t care. This is for me, and me is someone who can’t get to grips with writing SPARQL queries, can’t stand the sight of unreadable <rdf> <all:over the=”place”>, can’t even work out how to find things are queryable in a Linked Data triple store, let alone write queries that link data from one data store with data from another data store (or maybe SPARQL can’t do that yet? Not that I care about that either, because I can, in Yahoo Pipes, or Google Spreadsheets, and in a way that’s meaningful to me…)

In Mulling Over =datagovukLookup() in Google Spreadsheets, I started wondering about whether or not it would be useful to be able to write formulae to look up “live facts” in various datastores from within a spreadsheet (you know, that Office app that is used pretty much universally in workplace whenever there is tabular data to hand. That or Access of course…)

Anyway, I’ve started tinkering with how it might work, so now I can do things like this:

The formulae in columns G, H and I are defined according to a Google Apps script, that takes a school ID and then returns something linked to it in the data.gov.uk education datastore, such as the name of the school, or its total capacity.

Formulae look like:

  • =datagovuk_education(A2,”name”)
  • =datagovuk_education(A2,”opendate”)
  • =datagovuk_education(A2,”totcapacity”)

and are defined to return a single cell element. (I haven’t worked out how to return several cells worth of content from a Google Apps Script yet!)

At the moment, te script is a little messy, taking the form:

function datagovuk_education(id,typ) {
  var ret=""; var args=""
  switch (typ) {
    case 'totcapacity':
      args= _datagovuk_education_capacity_quri(id);
      break;
    ...
    default:
      //hack something here;
  }
  var x=UrlFetchApp.fetch('http://data-gov.tw.rpi.edu/ws/sparqlproxy.php',{method: 'post', payload: args});
  var ret=x.getContentText();
  var xmltest=Xml.parse(ret);
  ret=xmltest.sparql.results.result.binding.literal.getText();

  return ret;
}
function _datagovuk_education_capacity_quri(id){
  return "query=prefix+sch-ont%3A+%3Chttp%3A%2F%2Feducation.data.gov.uk%2Fdef%2Fschool%2F%3E%0D%0ASELECT+%3FschoolCapacity+WHERE+{%0D%0A%3Fschool+a+sch-ont%3ASchool%3B%0D%0Asch-ont%3AuniqueReferenceNumber+"+id+"%3B%0D%0Asch-ont%3AschoolCapacity+%3FschoolCapacity.%0D%0A}+ORDER+BY+DESC%28%3Fdate%29+LIMIT+1&output=xml&callback=&tqx=&service-uri=http%3A%2F%2Fservices.data.gov.uk%2Feducation%2Fsparql";
}

The datagovuk_education(id,typ) function takes the school ID and the requested property, uses the case statement to create an appropriate query string, and then fetches the data from the education datastore, returning the result in an XML format like this. The data is pulled from the datastore via Sparqlproxy, and the query string URIs generated (at the moment) by adding the school ID number into a query string generated by running the desired SPARQL query on Sparqlproxy and then grabbing the appropriate part of the URI. (It’s early days yet on this hack!;-)

By defining appropriate Apps script functions, I can also create formulae to call other datastores, run queries on Google spreadsheets (e.g. in the Guardian datastore) and so on. I assume similar sorts of functionality would be supported using VB Macros in Excel?

Anyway – this is my starter for ten on how to make live datastore data available to the masses. It’ll be interesting to see whether this approach (or one like it) is used in favour of getting temps to write SPARQL queries and RDF parsers… The obvious problem is that my approach can lead to an explosion in the number of formulae and parameters you need to learn; the upside is that I think these could be quite easily documented in a matrix/linked formulae chart. The approach also scales to pulling in data from CSV stores and other online spreadsheets, using spreadsheets as a database via the =QUERY() formula (e.g. Using Google Spreadsheets Like a Database – The QUERY Formula), and so on. There might also be a market for selling prepackaged or custom formulae as script bundles via a script store within a larger Google Apps App store

PS I’m trying to collect example SPARQL queries that run over the various data.gov.uk end points because: a) I’m really struggling in getting my head round writing my own, not least because I struggle to make sense of the ontologies, if I can find them, and write valid queries off the back of them; even (in fact, especially) really simple training/example queries will do! b) coming up with queries that either have interesting/informative/useful results, or clearly demonstrate an important ‘teaching point’ about the construction of SPARQL queries, is something I haven’t yet got a feel for. If you’ve written any, and you’re willing to share, please post a gist to github and add a link in a comment here.

PPS utility bits, so I don’t lose track of them:
education datastore ontology
– Apps script XML Element class

PPPS HEre’s how to dump a 2D CSV table into a range of cells: Writing 2D Data Arrays to a Google Spreadsheet from Google Apps Script Making an HTTP POST Request for CSV Data