Using SPARQL Query Libraries to Generate Simple Linked Data API Wrappers

A handful of open Linked Data have appeared through my feeds in the last couple of days, including (via RBloggers) SPARQL with R in less than 5 minutes, which shows how to query US data.gov Linked Data and then Leigh Dodds’ Brief Review of the Land Registry Linked Data.

I was going to post a couple of of examples merging those two posts – showing how to access Land Registry data via Leigh’s example queries in R, then plotting some of the results using ggplot2, but another post of Leigh’s today – SPARQL-doc – a simple convention for documenting individual SPARQL queries, has sparked another thought…

For some time I’ve been intrigued by the idea of a marketplace in queries over public datasets, as well as the public sharing of generally useful queries. A good query is like a good gold pan, or a good interview question – it can get a dataset to reveal something valuable that may otherwise have laid hidden. Coming up with a good query in part requires having a good understanding of the structure of a dataset, in part having an eye for what sorts of secret the data may contain: the next step is crafting a well phrased query that can tease that secret out. Creating the query might take some time, some effort, and some degree of expertise in query optimisation to make it actually runnable in reasonable time (which is why I figure there may be a market for such things*) but once written, the query is there. And if it can be appropriately parameterised, it may generalise.

(*There are actually a couple of models I can think of: 1) I keep the query secret, but run it and give you the results; 2) I license the “query source code” to you and let you run it yourself. Hmm, I wonder: do folk license queries they share? How, and to what extent, might derived queries/query modifications be accommodated in such a licensing scheme?)

Pondering Leigh’s SPARQL-doc post, another post via R-bloggers, Building a package in RStudio is actually very easy (which describes how to package a set of R files for distribution via github), asdfree (analyze survey data for free), a site that “announces obsessively-detailed instructions to analyze us government survey data with free tools” (and which includes R bundles to get you started quickly…), the resource listing Documentation for package ‘datasets’ version 2.15.2 that describes a bundled package of datasets for R and the Linked Data API, which sought to provide a simple RESTful API over SPARQL endpoints, I wondered the following:

How about developing and sharing commented query libraries around Linked Data endpoints that could be used in arbitrary Linked Data clients?

(By “Linked Data clients”, I mean different user agent contexts. So for example, calling a query from Python, or R, or Google Spreadsheets.) That’s it… Simple.

One approach (the simplest?) might be to put each separate query into a separate file, with a filename that could be used to spawn a function name that could be used to call that query. Putting all the queries into a directory and zipping them up would provide a minimal packaging format. An additional manifest file might minimally document the filename along with the parameters that can be passed into and returned from the query. Helper libraries in arbitrary languages would open the query package and “compile” a programme library/set of “API” calling functions for that language (so for example, in R it would create a set of R functions, in Python a set of Python functions).

(This reminds me of a Twitter exchange with Nick Jackson/@jacksonj04 a couple of days ago around “self-assembling” API programme libraries that could be compiled in an arbitrary language from a JSON API, cf. Swagger (presentation), which I haven’t had time to look at yet.)

The idea, then is this:

  1. Define a simple file format for declaring documented SPARQL queries
  2. Define a simple packaging format for bundling separate SPARQL queries
  3. The simply packaged set of queries define a simple “raw query” API over a Linked Data dataset
  4. Describe a simple protocol for creating programming language specific library wrappers around API from the query bundle package.

So.. I guess two questions arise: 1) would this be useful? 2) how hard could it be?

[See also: @ldodds again, on Publishing SPARQL queries and-documentation using github]

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

4 thoughts on “Using SPARQL Query Libraries to Generate Simple Linked Data API Wrappers”

  1. Hi Tony,

    Nice post. I think this achievable, and I’m planning to move sparql-doc in that direction. I’ve also been separately tinkering with the “sparql stored procedure” idea again. I think this ties the two things together.

    I’m planning to extend sparql-doc so that it supports some extra annotations and configuration, e.g.:

    @endpoint so that a query can be tied to an endpoint(s) it can be used against. This will also allow queries to be submitted directly from docs

    @param to indicate variables in the query that should be filled in. This will need to be a little more structured to indicate, e.g. optional/required, type of param, etc.

    I was also going to add a “package.json” that can be used to provide some default configuration for a package or directory of queries. This might be some overall documentation, or default values for annotations, e.g. @tag or @endpoint or @author

    Writing some additional code so that queries are automatically available as an function call, or perhaps as an API from a URL won’t be too hard. Once I’ve got the docs stuff done I’ll look at a Ruby version of that.

    1. Hi Leigh – thanks for that comment. @endpoint and @param seem eminently sensible..:-) I thought about mentioning the Kasabi API stuff, but couldn’t remember the detail properly… Using my hazy sketchy knowledge of R, I’ll start trying to see if I can work both ends (SPARQL in R, SPARQL-doc’ed queries in a text files) and then see if I can find a way of hooking them up in the middle…

  2. This is one of the things that SPIN does, but instead of files and directories it’s triples all the way down. Queries can be attached to function names which can pass parameters to the queries. (And instead of a manifest file, you just use rdfs:comment.) The use of triples to attach queries to resources has the added benefit of letting you attach SPARQL-based constraints to class definitions. See http://spinrdf.org/ for more.

Comments are closed.

%d bloggers like this: