What OU courses/modules contain the word ‘music’ in the title? Can we use the OU’s Linked Data to find out?
Earlier this week, an OU press release proudly announced that “[t]he Open University (OU) is the first university in the UK to open up access to online data from across the institution as part of the Linked Open Data Movement.” Several other HEIs have also been exploring Linked Data releases ([lazyweb]: please feel free to add a link if that includes your institution in the comments to this post, and I’ll then collate them here…) but the OU’s LUCERO project is looking across units, and getting real data out, which I think is where the novelty lies?
“What this means,” explained David Matthewman, Chief Information Officer at The Open University, “is that members of the public, students, researchers and organisations will be able to easily search, extract and, more importantly, reuse The Open University’s information and data.
Having the data available is one thing, of course, but what might it actually be useful for?. And how easy (or difficult) is it to get started with accessing this data?
In this post, I’ll try to give some sample queries on the courses data that you can use as a basis for your own queries. This will get a little technical, and it will involve writing queries using a language called SPARQL, so if you have no idea what that is, you’d better read this first: My Understanding of SPARQL, the First Attempt…
Interrogating Linked Data such as that published at data.open.ac.uk requires three things:
– a query that states in a particular way the question we want to ask of the database and the result fields we want returned;
– the specification of a query endpoint, which tells the thing making the query where the query interface to the database to be queried lives;
– some sort of engine or mechanism for actually making the query and displaying the results.
The query form at http://data.open.ac.uk/query handles the second of third of these for us, which means all we need to do is provide the query itself:
As is the way of most Linked Data tutorials I’ve seen, I’m probably now going to scare you off… I had wanted to start with a really, really simple and plausibly useful query (?!), but the most obvious query I can think of – looking up the name of a course from its course code – requires some syntactic clutter that makes the query look harder to read than it actually is… such is life!
At its heart, the query we want to ask is something like as follows (T215 is the course code of interest:
select ?title where {
?x a course:Module.
?x course:code 'T215'.
?x course:name ?title.
}
This query says something along the following lines: “for something we’ll call ?x, that we require to be a course:Module, and that has the course code ‘T215’, find me a value of ?title corresponding to it’s course:name”
Unfortunately, if we run the query as stated above, it won’t work, because the facts that are encoded in the database that satisfy the above query look like this:
<http://data.open.ac.uk/course/t215> a <http://purl.org/vocab/aiiso/schema#Module>
<http://data.open.ac.uk/course/t215> <http://purl.org/vocab/aiiso/schema#code> 'T215'^^<http://www.w3.org/2001/XMLSchema#string>
<http://data.open.ac.uk/course/t215> <http://purl.org/vocab/aiiso/schema#name> "Communication and information technologies"^^xsd:string
In order to make the query work, we need to write it something like this:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX course: <http://purl.org/vocab/aiiso/schema#>
select ?title where {
?x a course:Module.
?x course:code 'T215'^^xsd:string.
?x course:name ?title.
}
(The ^^xsd:string monstrosity is essentially a variable type, in this case a string. As with many programming languages, data can be typed in various ways – sets of characters as strings, as in this example, or numbers as integers (^^xsd:int), for example. If you get the type wrong, and it’s required, the match won’t work…
If you copy and paste that query into the form at http://data.open.ac.uk/query, you should get the following result when you submit the form:
Of course, if you’re a real Linked Data person, then you’d probably write the query in the following equivalent, but difficult to read way!:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?title from
<http://data.open.ac.uk/context/course> where {
?x a <http://purl.org/vocab/aiiso/schema#Module>.
?x <http://purl.org/vocab/aiiso/schema#code> 'T215'^^xsd:string.
?x <http://purl.org/vocab/aiiso/schema#name> ?title.
}
If the above freaks you out/does your head, it does the same to mine;-) Just remember, a lot of that stuff is syntactic baggage that’s just required to make things work…
Also note that most LD folk wouldn’t have use “course” in the following prefix:
PREFIX course: <http://purl.org/vocab/aiiso/schema#>
Instead, I suspect they’d have gone for something like aiiso:, (unless a conventional prefix exists for that vocabulary?); but I was trying not to frighten you off too early…!;-)
As I mentioned above, the OU is looking to open up data as Linked Data in the same database from across the university. I think the following modified query tries to help the database out a little by telling it what sort of data we’re interested in, so it has a head start on knowing to look for course related data, compared to open repository/ORO reference data, which also exists in the datastore, for example:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX course: <http://purl.org/vocab/aiiso/schema#>
select ?title from
<http://data.open.ac.uk/context/course> where {
?x a course:Module.
?x course:code 'T215'^^xsd:string.
?x course:name ?title.
}
How would you look up the title of the course with code T151? Or M366?
Having got a single query going, how can we start to elaborate on it? If you look at the location http://data.open.ac.uk/page/course/t215.html (and I know this page exists because there’s a crib/link to it on the data.open.ac.uk homepage;-), you can see a view over the data in the datastore corresponding to the course with course code T215:
The things down the left hand side are links that associate particular sorts of data with that course.
So for example, the OUCourseLevel property is described by the link <http://data.open.ac.uk/saou/ontology#OUCourseLevel>, which means that there is a fact in the database corresponding to something like:
<http://data.open.ac.uk/course/t215> <http://data.open.ac.uk/saou/ontology#OUCourseLevel> 3
(Actually, that character 3 is probably typed, for example as something like “2”^^xsd:string)
Can you work out how to take that information and find the OU level (as well as the title) of a course from it’s course code?
How about this way?
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX course: <http://purl.org/vocab/aiiso/schema#>
select ?title ?ouLevel from
<http://data.open.ac.uk/context/course> where {
?x a course:Module.
?x course:code 'T215'^^xsd:string.
?x course:name ?title.
?x <http://data.open.ac.uk/saou/ontology#OUCourseLevel> ?ouLevel.
}
If you look at the course data information page, you’ll see a variety of other linked properties there. Just follow the approach above to spot the links you want and generate your own queries. Here’s another example which you can try in the SPARQL form (or delete the lines corresponding to properties you aren’t interested in (remember to remove the query variable from the select statement at the start of the query)):
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX course: <http://purl.org/vocab/aiiso/schema#>
PREFIX ou: <http://data.open.ac.uk/saou/ontology#>
select distinct ?title ?ccode ?eucredits ?oulevel ?scqflevel ?fheqlevel from <http://data.open.ac.uk/context/course> where {
?x a course:Module.
?x course:code 'T215'^^xsd:string.
?x course:code ?ccode.
?x course:name ?title.
?x ou:FHEQLevel ?fheqlevel.
?x ou:OUCourseLevel ?oulevel.
?x ou:SCQFLevel ?scqflevel.
?x ou:eu-number-of-credits ?eucredits.
}
Note that we can also return the course code we embedded in our search query as one of the selected arguments by also putting in a variable (?ccode) associated with the course:code for the course.
Another way of finding a course by course code is to find all the courses that match the arguments specified, and then filter the results. I suspect the following query makes the server do a lot of work finding information for all manner of courses, before filtering the results to find the details for the course we want… (or maybe the query engine can optimise it’s way out of some of the apparent inefficiences of this query?)
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX course: <http://purl.org/vocab/aiiso/schema#>
PREFIX ou: <http://data.open.ac.uk/saou/ontology#>
select distinct ?title ?ccode ?eucredits ?oulevel ?scqflevel ?fheqlevel from <http://data.open.ac.uk/context/course> where {
?x a course:Module.
?x course:code ?ccode.
?x course:name ?title.
?x ou:FHEQLevel ?fheqlevel.
?x ou:OUCourseLevel ?oulevel.
?x ou:SCQFLevel ?scqflevel.
?x ou:eu-number-of-credits ?eucredits.
FILTER(regex(str(?ccode), 'T215'))
}
How do you think you could tweak the above query to search for courses that contain ‘music’ in the ?title/course:name?
Could you tweak it further to find courses with music in the title at SQCF level 10? (Hint: level 10 is encoded in the database as '10'^^xsd:int. Which is to say ?x ou:SCQFLevel '10'^^xsd:int. will identify courses where ?x is at ou:SCQFLevel 10…;-)
In the next post in this series, I’ll describe several other ways in which we can run these queries, using services such as SparqlProxy, YQL, and even Yahoo Pipes;-)