data.open.ac.uk Arrives, With Linked Data Goodness

How time flies in open data land… and how quickly things seem to get done, at least in the opening up stakes. A couple of weeks ago, @ostephens revealed that data.open.ac.uk was live as a URL, and yesterday, a tweet went out from the OU’s Mathieu d’Aquin to say that some data was there…

data.open.ac.uk first release

data.open.ac.uk is the home of open linked data from The Open University. It is a platform currently developed as part of the LUCERO JISC Project to extract, interlink and expose data available in various institutional repositories of the University and make it available openly for reuse.

It seems that there’s going to be a focus on releasing data as Linked Data™, so it’ll be interesting to see, as time goes by, just how many ways we can find of gluing things together, particularly around our crown jewels: OU course codes.

If you can’t wait to get started, the SPARQL endpoint is here: http://data.open.ac.uk/query. Datasets look like they’re organised within particular contexts (whatever that means?!), apparently accessed as follows:

– SELECT ?blah FROM <http://data.open.ac.uk/context/oro&gt; WHERE {…}
– SELECT ?blah FROM <http://data.open.ac.uk/context/podcast&gt; WHERE {…}

If no context is specified, the search presumably runs over everything it can?

So what’s there at the moment? For starters, there’s information from ORO, the OU’s eprints repository, as well as the OU’s iTunesU podcast directory.

I’m not much of a SPARQLer (though I can recommendTalis’ 2 day Introduction to the Web of Data workshop as a way of getting your head round the Linked Data world), but here are a couple of queries I managed to write to just check that the endpoint was working as advertised. I’m using SparqlProxy to run the queries because it offers lots of nice output formats. It also allows you to run queries from a URL, so if the Lucero team wanted to easily share the text of demo queries, they could just bookmark them, one to a page… [See end of post for Howto];-)

The titles of papers authored by surname “Weller”, as listed in ORO:

select distinct ?title from <http://data.open.ac.uk/context/oro&gt; where {
?y a <http://purl.org/ontology/bibo/AcademicArticle&gt;.
?y <http://purl.org/dc/terms/title&gt; ?title.
?y <http://purl.org/dc/terms/creator&gt; ?z.
?z <http://xmlns.com/foaf/0.1/family_name&gt; "Weller" ^^<http://www.w3.org/2001/XMLSchema#string&gt;.
} LIMIT 10

SParqlproxy on data.open.ac.uk

sparqlproxy results from data.open.ac.uk query

Here are details of a few podcasts on iTunes relating to the course T209:

SELECT distinct ?title ?description WHERE {
?x <http://data.open.ac.uk/podcast/ontology/relatesToCourse&gt; <http://data.open.ac.uk/course/t209&gt;.
?x <http://purl.org/dc/terms/title&gt; ?title.
?x <http://www.w3.org/TR/2010/WD-mediaont-10-20100608/description&gt; ?description } LIMIT 10

query on data.open.ac.uk - podcasts

The data.open.ac.uk homepage also hints at a few other datasets that may shortly be making an appearance:

– Course information originating from Study at the OU
– The OU Library Catalogue, especially focusing on course material
– Open educational content available in the OpenLearn system
– Public information about staff, locations on the OU campus, etc

From a “transparency means <em<show us where the money is” point of view, it would also be interesting to see a list of funded research projects added to the list, which might in turn provide some sort of incentive to the research councils to start releasing this data in a bit more structured way?;-)

PS just by the by, for openness folks, there’s also a fair amount of stuff on the OU’s FOI site.

PPS here’s a quick/ad hoc way of sharing queries and then running them via SPARQLProxy:

1) save the query to somewhere like Pastebin:

sparql query in pastebin

2) Grab a link to the actual text of the query. In Pastebin, “raw” actually displays the clip in HTML tags. For the really raw raw (rarrghhhh… easy, tiger;-)) you’ll need to grab the download URL, e.g. http://pastebin.com/download.php?i=tZzgFDvY

3) Use this URL in a”SPARQL query by URI” query on SPARQLProxy:

SPARQLQuery by URI on sparqlproxy

Here’s the result:
Remote query on sparqlproxy

IF you create any queries on data.open.ac.uk you’d like to share, why not add them to Pastebin, and share a link back in the comments? :-)

data.open.ac.uk Linked Data Now Exposing Module Information

As HE becomes more and more corporatised, I suspect we’re going to see online supermarkets appearing that help you identify – and register on – degree courses in exchange for an affiliate/referral fee from the university concerned. For those sites to appear, they’ll need access to course catalogues, of course. UCAS currently holds the most comprehensive one that I know of, but it’s a pain to scrape and all but useless as a datasource. But if the universities publish course catalogue information themselves in a clean way (and ideally, a standardised way), it shouldn’t be too hard to construct aggregation sites ourselves…

So it was encouraging to see earlier this week an announcement that the OU’s data.open.ac.uk site has started publishing module data from the course catalogue – that is, data about the modules (as we now call them – they used to be called courses) that you can study with the OU.

The data includes various bits of administrative information about each module, the territories it can be studied in, and (most importantly?!) pricing information;-)

data.open.ac.uk - module data

You may remember that the data.open.ac.uk site itself launched a few weeks ago with the release of Linked Data sets including data about deposits in the open repository, as well as OU podcasts on iTunes (data.open.ac.uk Arrives, With Linked Data Goodness. Where podcasts are associated with a course, the magic of Linked Data means that we can easily get to the podcasts via the course/module identifier:

data.open.ac.uk

It’s also possible to find modules that bear an isSimilarTo relation to the current module, where isSimilarTo means (I think?) “was also studied by students taking this module”.

As an example of how to get at the data, here’s a Python script using the Python YQL library that lets me run a SPARQL query over the data.open.ac.uk course module data (the code includes a couple of example queries):

import yql

def run_sparql_query(query, endpoint):
    y = yql.Public()
    query='select * from sparql.search where query="'+query+'" and service="'+endpoint+'"'
    env = "http://datatables.org/alltables.env"
    return y.execute(query, env=env)

endpoint='http://data.open.ac.uk/query'

# This query finds the identifiers of postgraduate technology courses that are similar to each other
q1='''
select distinct ?x ?z from <http://data.open.ac.uk/context/course> where {
?x a <http://purl.org/vocab/aiiso/schema#Module>.
?x <http://data.open.ac.uk/saou/ontology#courseLevel> <http://data.open.ac.uk/saou/ontology#postgraduate>.
?x <http://purl.org/dc/terms/subject> <http://data.open.ac.uk/topic/technology>.
?x <http://purl.org/goodrelations/v1#isSimilarTo> ?z
} limit 10
'''

# This query finds the names and course codes of 
# postgraduate technology courses that are similar to each other
q2='''
select distinct ?code1 ?name1 ?code2 ?name2 from <http://data.open.ac.uk/context/course> where {
?x a <http://purl.org/vocab/aiiso/schema#Module>.
?x <http://data.open.ac.uk/saou/ontology#courseLevel> <http://data.open.ac.uk/saou/ontology#postgraduate>.
?x <http://purl.org/dc/terms/subject> <http://data.open.ac.uk/topic/technology>.
?x <http://courseware.rkbexplorer.com/ontologies/courseware#has-title> ?name1.
?x <http://purl.org/goodrelations/v1#isSimilarTo> ?z.
?z <http://courseware.rkbexplorer.com/ontologies/courseware#has-title> ?name2.
?x <http://purl.org/vocab/aiiso/schema#code> ?code1.
?z <http://purl.org/vocab/aiiso/schema#code> ?code2.
}
'''

# This query finds the names and course codes of 
# postgraduate courses that are similar to each other
q3='''
select distinct ?code1 ?name1 ?code2 ?name2 from <http://data.open.ac.uk/context/course> where {
?x a <http://purl.org/vocab/aiiso/schema#Module>.
?x <http://data.open.ac.uk/saou/ontology#courseLevel> <http://data.open.ac.uk/saou/ontology#postgraduate>.
?x <http://courseware.rkbexplorer.com/ontologies/courseware#has-title> ?name1.
?x <http://purl.org/goodrelations/v1#isSimilarTo> ?z.
?z <http://courseware.rkbexplorer.com/ontologies/courseware#has-title> ?name2.
?x <http://purl.org/vocab/aiiso/schema#code> ?code1.
?z <http://purl.org/vocab/aiiso/schema#code> ?code2.
}
'''

result=run_sparql_query(q3, endpoint)

for row in result.rows:
	for r in row['result']:
		print r

I’m not sure what purposes we can put any of this data to yet, but for starters I wondered just how connected the various postgraduate courses are based on the isSimilarTo relation. Using q3 from the code above, I generated a Gephi GDF/network file using the following snippet:

# Generate a Gephi GDF file showing connections between 
# modules that are similar to each other
fname='out2.gdf'
f=open(fname,'w')

f.write('nodedef> name VARCHAR, label VARCHAR, title VARCHAR\n')
ccodes=[]
for row in result.rows:
	for r in row['result']:
		if r['code1']['value'] not in ccodes:
			ccodes.append(r['code1']['value'])
			f.write(r['code1']['value']+','+r['code1']['value']+',"'+r['name1']['value']+'"\n')
		if r['code2']['value'] not in ccodes:
			ccodes.append(r['code2']['value'])
			f.write(r['code2']['value']+','+r['code2']['value']+',"'+r['name2']['value']+'"\n')
		
f.write('edgedef> c1 VARCHAR, c2 VARCHAR\n')
for row in result.rows:
	for r in row['result']:
		#print r
		f.write(r['code1']['value']+','+r['code2']['value']+'\n')

f.close()

to produce the following graph. (Size is out degree, colour is in degree. Edges go from ?x to ?z. Layout: Fruchterman Reingold, followed by Expansion.)

OU postgrad courses in gephi

The layout style is a force directed algorithm, which in this case has had the effect of picking out various clusters of highly connected courses (so for example, the E courses are clustered together, as are the M courses, B courses, T courses and so on.)

If we run the ego filter over this network on a particular module code, we can see which modules were studying alongside it:

ego filter on course codes

Note that in the above diagram, the nodes are sized/coloured according to in-degree/out-degree in the original, complete graph, If we re-calculate those measures on just this partition, we get the following:

Recoloured course network

If we return to the whole network, and run the Modularity class statistic, we can identify several different course clusters:

Modules - modularity class

Here’s one of them expanded:

A module cluster

Here are some more:

COurse clusters

I’m not sure what use any of this is, but if nothing else, it shows there’s structure in that data (which is exactly what we’d expect, right?;-)

PS as to how I wrote my first query on this data, I copied the ‘postgraduate modules in computing’ example query from data.open.ac.uk:

http://data.open.ac.uk/query?query=select%20distinct%20%3Fx%20from%20%3Chttp://data.open.ac.uk/context/course%3Ewhere%20{%3Fx%20a%20%3Chttp://purl.org/vocab/aiiso/schema%23Module%3E.%0A%3Fx%20%3Chttp://data.open.ac.uk/saou/ontology%23courseLevel%3E%20%3Chttp://data.open.ac.uk/saou/ontology%23postgraduate%3E.%0A%3Fx%20%3Chttp://purl.org/dc/terms/subject%3E%20%3Chttp://data.open.ac.uk/topic/computing%3E%0A}%0A&limit=200

and pasted it into a tool that “unescapes” encoded URLs, which encodes the SPARQL query:

Unescaping text

I was then able to pull out the example query:
select distinct ?x from <http://data.open.ac.uk/context/course&gt;
where {?x a <http://purl.org/vocab/aiiso/schema#Module&gt;.
?x <http://data.open.ac.uk/saou/ontology#courseLevel&gt; <http://data.open.ac.uk/saou/ontology#postgraduate&gt;.
?x <http://purl.org/dc/terms/subject&gt; <http://data.open.ac.uk/topic/computing&gt;
}

Just by the by, there’s a host of other handy text tools at Text Mechanic.

Getting Started With data.open.ac.uk Course Linked Data

What OU courses/modules contain the word ‘music’ in the title? Can we use the OU’s Linked Data to find out?

Earlier this week, an OU press release proudly announced that “[t]he Open University (OU) is the first university in the UK to open up access to online data from across the institution as part of the Linked Open Data Movement.” Several other HEIs have also been exploring Linked Data releases ([lazyweb]: please feel free to add a link if that includes your institution in the comments to this post, and I’ll then collate them here…) but the OU’s LUCERO project is looking across units, and getting real data out, which I think is where the novelty lies?

“What this means,” explained David Matthewman, Chief Information Officer at The Open University, “is that members of the public, students, researchers and organisations will be able to easily search, extract and, more importantly, reuse The Open University’s information and data.

Having the data available is one thing, of course, but what might it actually be useful for?. And how easy (or difficult) is it to get started with accessing this data?

In this post, I’ll try to give some sample queries on the courses data that you can use as a basis for your own queries. This will get a little technical, and it will involve writing queries using a language called SPARQL, so if you have no idea what that is, you’d better read this first: My Understanding of SPARQL, the First Attempt…

Interrogating Linked Data such as that published at data.open.ac.uk requires three things:

– a query that states in a particular way the question we want to ask of the database and the result fields we want returned;
– the specification of a query endpoint, which tells the thing making the query where the query interface to the database to be queried lives;
– some sort of engine or mechanism for actually making the query and displaying the results.

The query form at http://data.open.ac.uk/query handles the second of third of these for us, which means all we need to do is provide the query itself:

Sparql query interface at data.open.ac.uk/query

As is the way of most Linked Data tutorials I’ve seen, I’m probably now going to scare you off… I had wanted to start with a really, really simple and plausibly useful query (?!), but the most obvious query I can think of – looking up the name of a course from its course code – requires some syntactic clutter that makes the query look harder to read than it actually is… such is life!

At its heart, the query we want to ask is something like as follows (T215 is the course code of interest:

select ?title where {
?x a course:Module.
?x course:code 'T215'.
?x course:name ?title.
}

This query says something along the following lines: “for something we’ll call ?x, that we require to be a course:Module, and that has the course code ‘T215’, find me a value of ?title corresponding to it’s course:name”

Unfortunately, if we run the query as stated above, it won’t work, because the facts that are encoded in the database that satisfy the above query look like this:

<http://data.open.ac.uk/course/t215&gt; a <http://purl.org/vocab/aiiso/schema#Module&gt;

<http://data.open.ac.uk/course/t215&gt; <http://purl.org/vocab/aiiso/schema#code&gt; 'T215'^^<http://www.w3.org/2001/XMLSchema#string&gt;

<http://data.open.ac.uk/course/t215&gt; <http://purl.org/vocab/aiiso/schema#name&gt; "Communication and information technologies"^^xsd:string

In order to make the query work, we need to write it something like this:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#&gt;
PREFIX course: <http://purl.org/vocab/aiiso/schema#&gt;

select ?title where {
?x a course:Module.
?x course:code 'T215'^^xsd:string.
?x course:name ?title.
}

(The ^^xsd:string monstrosity is essentially a variable type, in this case a string. As with many programming languages, data can be typed in various ways – sets of characters as strings, as in this example, or numbers as integers (^^xsd:int), for example. If you get the type wrong, and it’s required, the match won’t work…

If you copy and paste that query into the form at http://data.open.ac.uk/query, you should get the following result when you submit the form:

SPARQL result

Of course, if you’re a real Linked Data person, then you’d probably write the query in the following equivalent, but difficult to read way!:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#&gt;

select ?title from
<http://data.open.ac.uk/context/course&gt; where {
?x a <http://purl.org/vocab/aiiso/schema#Module&gt;.
?x <http://purl.org/vocab/aiiso/schema#code&gt; 'T215'^^xsd:string.
?x <http://purl.org/vocab/aiiso/schema#name&gt; ?title.
}

If the above freaks you out/does your head, it does the same to mine;-) Just remember, a lot of that stuff is syntactic baggage that’s just required to make things work…

Also note that most LD folk wouldn’t have use “course” in the following prefix:

PREFIX course: <http://purl.org/vocab/aiiso/schema#&gt;

Instead, I suspect they’d have gone for something like aiiso:, (unless a conventional prefix exists for that vocabulary?); but I was trying not to frighten you off too early…!;-)

As I mentioned above, the OU is looking to open up data as Linked Data in the same database from across the university. I think the following modified query tries to help the database out a little by telling it what sort of data we’re interested in, so it has a head start on knowing to look for course related data, compared to open repository/ORO reference data, which also exists in the datastore, for example:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#&gt;
PREFIX course: <http://purl.org/vocab/aiiso/schema#&gt;

select ?title from
<http://data.open.ac.uk/context/course&gt; where {
?x a course:Module.
?x course:code 'T215'^^xsd:string.
?x course:name ?title.
}

How would you look up the title of the course with code T151? Or M366?

Having got a single query going, how can we start to elaborate on it? If you look at the location http://data.open.ac.uk/page/course/t215.html (and I know this page exists because there’s a crib/link to it on the data.open.ac.uk homepage;-), you can see a view over the data in the datastore corresponding to the course with course code T215:

Course info on data.open.ac.uk

The things down the left hand side are links that associate particular sorts of data with that course.

So for example, the OUCourseLevel property is described by the link <http://data.open.ac.uk/saou/ontology#OUCourseLevel&gt;, which means that there is a fact in the database corresponding to something like:

<http://data.open.ac.uk/course/t215&gt; <http://data.open.ac.uk/saou/ontology#OUCourseLevel&gt; 3

(Actually, that character 3 is probably typed, for example as something like “2”^^xsd:string)

Can you work out how to take that information and find the OU level (as well as the title) of a course from it’s course code?

How about this way?

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#&gt;
PREFIX course: <http://purl.org/vocab/aiiso/schema#&gt;

select ?title ?ouLevel from
<http://data.open.ac.uk/context/course&gt; where {
?x a course:Module.
?x course:code 'T215'^^xsd:string.
?x course:name ?title.
?x <http://data.open.ac.uk/saou/ontology#OUCourseLevel&gt; ?ouLevel.
}

If you look at the course data information page, you’ll see a variety of other linked properties there. Just follow the approach above to spot the links you want and generate your own queries. Here’s another example which you can try in the SPARQL form (or delete the lines corresponding to properties you aren’t interested in (remember to remove the query variable from the select statement at the start of the query)):

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#&gt;
PREFIX course: <http://purl.org/vocab/aiiso/schema#&gt;
PREFIX ou: <http://data.open.ac.uk/saou/ontology#&gt;

select distinct ?title ?ccode ?eucredits ?oulevel ?scqflevel ?fheqlevel from <http://data.open.ac.uk/context/course&gt; where {
?x a course:Module.
?x course:code 'T215'^^xsd:string.
?x course:code ?ccode.
?x course:name ?title.
?x ou:FHEQLevel ?fheqlevel.
?x ou:OUCourseLevel ?oulevel.
?x ou:SCQFLevel ?scqflevel.
?x ou:eu-number-of-credits ?eucredits.
}

Note that we can also return the course code we embedded in our search query as one of the selected arguments by also putting in a variable (?ccode) associated with the course:code for the course.

Another way of finding a course by course code is to find all the courses that match the arguments specified, and then filter the results. I suspect the following query makes the server do a lot of work finding information for all manner of courses, before filtering the results to find the details for the course we want… (or maybe the query engine can optimise it’s way out of some of the apparent inefficiences of this query?)

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#&gt;
PREFIX course: <http://purl.org/vocab/aiiso/schema#&gt;
PREFIX ou: <http://data.open.ac.uk/saou/ontology#&gt;

select distinct ?title ?ccode ?eucredits ?oulevel ?scqflevel ?fheqlevel from <http://data.open.ac.uk/context/course&gt; where {
?x a course:Module.
?x course:code ?ccode.
?x course:name ?title.
?x ou:FHEQLevel ?fheqlevel.
?x ou:OUCourseLevel ?oulevel.
?x ou:SCQFLevel ?scqflevel.
?x ou:eu-number-of-credits ?eucredits.
FILTER(regex(str(?ccode), 'T215'))
}

How do you think you could tweak the above query to search for courses that contain ‘music’ in the ?title/course:name?

Could you tweak it further to find courses with music in the title at SQCF level 10? (Hint: level 10 is encoded in the database as '10'^^xsd:int. Which is to say ?x ou:SCQFLevel '10'^^xsd:int. will identify courses where ?x is at ou:SCQFLevel 10…;-)

In the next post in this series, I’ll describe several other ways in which we can run these queries, using services such as SparqlProxy, YQL, and even Yahoo Pipes;-)