Archive for November 2010
Blogged Elsewhere: Getting Started with COINS Gov Spending Linked Data
By me, but published over on the WhereDoesMyMoneyGo blog: In at the deep end: how to get started with COINS Linked Data
[My first stumbling attempts at exploring the structure of, and trying to write queries into, the COINS Linked Data store. In this first post, I didn't actually manage to find any of the money...]
Subscription Models for Lifelong Students
Earlier today, the OU VC and other assorted dignitaries took place in a session on Preventing a funding crisis in higher education: addressing the outcomes of the spending review and Browne review. A live stream of the event was available, and encouraged the participation of a small number of interested parties on the #unifundingdebate Twitter hashtag backchannel. About two thirds of the way through the event, I managed to grab this overview of the Twitter echo-chamber that resulted:
The nodes represent individuals who used the hashtag and the lines correspond to arrows that go from an individual to the people they follow on Twitter. The nodes are sized according to the number of other hashtaggers following them, the colour (blue to red) show how many other hashtaggers that person follows. So small red means you follow lots but aren’t followed by many, and large blue suggests lots follow you but you aren’t following them back.
Anyway, the debate (not that it really was a debate, from what I could tell?), touched in part on the role of both widening participation, and part-time study. And whilst there was some consideration of how the part-time, distance ed approach represented an alternative way of doing a degree, there seemed to be widespread, implicit agreement that the current degree model is still an appropriate one.
I’m not so sure…
The following is a bit of a ramble, and is largely off the top of my head… but it’s something that’s been in mind for a bit, and I need to start trying to articulate it and develop it a little further than random thoughts on occasional dog walks….one of which I just happen to have come back from…
A model I’m trying to pull together at them moment is based more on a situation where a a student spends one or two years of quite intense, formal study getting into the swing of what independent learning might mean, albeit independent learning in the sense of no-one making you work through structured teaching materials, rather than folk learning informally from unstructured materials in an autodidactic way.
This short, intense period gives the provider an opportunity to hook the student into a subscription package (for a fee…) that will contain to provide them with educational support, training, and current awareness about dominant trends and ideas in a subject area for the rest of their career, and maybe beyond.
Other packages might support the serious hobbiest, or ‘leisure learner’, particularly in subjects such as arts appreciation (art history, for example) or ‘expert amateur’ subject areas such as astronomy.
The package will taken mainly open content, but with added value from the way it is packaged and how it can be made available in a timely way for the subscriber. The content will also flow to the subscriber through the subscription channel from the host institution. Subscription upgrades will provide the subscriber with additional benefits, such as access to structured/vertical search contexts, or commercial academic content.
Top-up learning delivery then creates a new distribution context to support learning, where the academic passes on current awareness knowledge about an area that keeps the subscriber student informed about their subject area. It helps them keep on top of it and up to date with it. The university subscription becomes a channel for the professional updating of subscribers. Like Independent Radio News syndicates news to commercial radio stations, professional societies might also push content through the universities’ subscription channels.
Supporting such a model would require a shift in the way that the academy engages with perpetual (‘lifelong learning’?!) subscription students. Distance education models can help, but there is also the opportunity for bespoke events (lectures, public talks, privileged access to university members, evenings with…) and specialist conferences. It is noticeable that media providers such as the Guardian are already exploring such models in the news media domain.
Which is maybe where the sort of thing I try to do every day fits in? I live, by and large, through a screen, with occasional presentations at workshops, conferences and hackdays. Every day, I try to learn something new, or ask a not-obvious-to-me/Emperor’s-new-clothes type question of something I’ve come across, and use my notebook blog to record what I’ve learned.
Two or three times a week, I get a comment, tweet or email about an old post, or a current one, from someone who’s tried to make something of what I’ve done, or someone asking for an opinion about, or guidance on, something loosely related the topics covered in OUseful.info. These request may come from within the OU, or without. Sometimes, these are quickly answered, sometimes they set a new challenge or puzzle to solve, which in turn results in another blog post/notebook entry. By posting in public, these posts become discoverable by other people who are interested in similar topics, or are faced with similar problems/puzzles.
Through services like Twitter – which is usually open on my desktop most of the day – I often (several times a week) participate in what might be termed “distributed flash consultancy”, picking up on “anyone know…?” type questions, or coming together for a short amount of time with two or three others to tackle some easily stated puzzle or problem. I am an active participant in the LazyWeb. I treat my cognitive as surplus. Of course, my employer may take issue with that. But whenever I see emails coming round about things like “knowledge transfer”, I take heart, because I engage with knowledge transfer, albeit at the microtransaction level, several times a day, with folk from diverse professional and governmental, as well as academic, communities. THis always make people laugh, but I think it’s true: to paraphrase William Gibson, I believe that I catch glimpses of the future that is already around us, (and maybe even invent additional tiny pieces of it), and try to help distribute it a little more evenly.
Thinking back to subscription models of continual top-up, drip fed education, that’s likely to be at the micro-level too. Not the macro-degree level. Maybe not even the meso-course/module level, though the provision of 20-100 hour top-up courses, or evening, half-day, or one- or two-day training courses might also fit into this scale. But the micro-level. The question you get stuck on you need the lecturer’s help on. The question other people might get stuck on. The question the academic might turn their “always on cognitive surplus” to, to help you solve your problem, and in return generate another example, another case, for a growing body of really fine grained open-educational materials. (The micro-meso boundary can blur, of course… some problems may take a day or two to work through.) And for the subscriber, after a few years of engaging with drips in a trackable way, maybe their undergraduate degree can get an upgrade (for a small, additional fee) to a Masters? (Hmmm, now where does that remind me of…?)
So here’s what I do… I participate in a knowledge sharing, creating and sharing again network. Every day. An open network. I help solve problems, and I also create new ones (often with the intent that their solutions might become good teaching or learning examples). It’s just that the transactions I do in the network, I haven’t found a way of monetising yet, which is what I suspect my employer will be increasingly keen on… But in the world of open knowledge, cash isn’t the only indicator, is it? (Journal publications are another, for example… hmmm ;-)
For all the talk in the funding crisis debate today about what new models might emerge around university funding, I think the point that there’s a global network of knowledgeable folk and open information resources has been missed. I’m not a team player, I’m a network player. And whilst some might argue that we may always need teams, I think we’ll increasingly make use of networks and ad hoc comings together too. The thing is, our higher ed institutions haven’t yet figured out how to play a pivotal role in the distribution of sound academic knowledge in a network that’s open to all. Remember when the libraries were the access point for knowledge…? I wonder when someone will come along to give HE a similar wake up call…? By which time, of course, it will be too late…
PS in passing, I just spotted this: Adrian Hon on Why free online lectures will destroy universities – unless they get their act together fast
Dewey Fencing, and Getting Started With Bibliographica Linked Data
If you follow @redjets on Twitter, you can follow the departure and arrival of the Red Funnel catamarans as they travel between Southampton and Cowes.
The service, put together by @andysc, tracks shipping movements on the Solent (AIS is the keyword, if you want to find out how;-) and watches out for the RedJet identifiers entering or leaving geographical areas around the ports. These virtual, digitally created boundaries are often referred to as geofences.
So for example, my ‘find folk tweeting near a location’ hack uses the Twitter search API to construct a geofenced locale and then search for tweets within that boundary.
But what does that have to do with Dewey? Simply this: if we have easy access to standardised classmarks, such as Dewey Decimal Classification ranges, or sets of Dewey classifications, we can create “Dewey fences” to search for books on a particular topic. (Nothing novel in this of course, but it provides a weak context for the rest of this post;-)
Last night, I came across a post announcing that the British Library [had] Release[d] 3 Million Records to the JISC funded OpenBib project. These records had been added to the bibliographica.org service, which just happens to have a Linked Data SPARQL endpoint. Because I’m in a phase of learning how to make sense of new (to me) datastores, I thought I’d see what I could get it to do.
The SPARQL page itself gives an example query, which identifies several of the key vocabularies (ontologies? I’m still trying to get the languaging right…;-) used in the datastore:
Here’s the example:
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?book ?title ?name ?description
WHERE {
?book a bibo:Book .
?book dc:title ?title . ?title bif:contains "Edinburgh" .
OPTIONAL { ?book dc:description ?description } .
OPTIONAL {
?book dc:contributor ?author . ?author foaf:name ?name
}
} GROUP BY ?book LIMIT 50
If we look at an actual book record, we get something like this book record:
The book record page gives us the links we need in order to piece together our own queries, using the example query as an additional crib. So for example, by inspecting the link that specifies an ISBN relation on the book record page (http://purl.org/ontology/bibo/isbn; we have already declared PREFIX bibo: <http://purl.org/ontology/bibo/>, so we can write the isbn relation as bibo:isbn), I can tweak the original example query to also report the ISBN:
SELECT DISTINCT ?book ?title ?name ?description ?isbn
WHERE {
?book a bibo:Book .
//etc
?book bibo:isbn ?isbn.
//etc
}
To search for a book by ISBN, and then return its title, we might use a query like this one:
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
SELECT DISTINCT ?book ?title
WHERE {
?book a bibo:Book .
?book dc:title ?title.
?book bibo:isbn <urn:isbn:019857519x>.
}
Which is where Dewey classifications make their first appearance. Looking at the book record, we see there may be several subject terms associated with a book, including what are presumably Dewey classmarks. This means that given a book by ISBN, we should be able to look up its classmark, and then other books with the same classmark. We can also look up related books using the keyword subject terms, which may or may not conform to controlled vocabulary terms. In terms of fencing, we might also be able to take sets of books – such as books on a reading list – to create topical “Dewey fenced” areas that define a set of classmarks that are all associated in some way (e.g. the topic that forms the area of study for a given reading list). I’m not sure if these sets are likely to be useful in any way, but they’d allow us to ask the question about the extent to which a reading list models a Dewey-style view of the world, or whether it is “multi-disciplinary’ (at least, according to Dewey…;-) The reason why this is interesting (to me, at least) is because to a certain extent, physical libraries are serendipity engines as well as discovery engines, based on the way books are physically laid out and associated (or not) with one another; and one of the things I’m interested in is useful, serendipitous discovery…
Anyway, back to code geekery… Looking at the book page, we can find out how to grab a list of subject terms – http://purl.org/dc/terms/subject (which we can write as dc:subject given the PREFIXes already declared) is the relation we want. Unfortunately, it’s not that simple, because the subject term doesn’t always map directly to the values displayed in the subject area of the book page. The following query tries to unpick just what the displayed subject terms refer to:
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?book ?title ?subject ?label ?property ?value
WHERE {
?book a bibo:Book .
?book dc:title ?title.
?book bibo:isbn <urn:isbn:019857519x>.
?book dc:subject ?subject.
OPTIONAL { ?subject rdfs:label ?label }
OPTIONAL { ?subject ?property ?value }
} GROUP BY ?book order by ?subject LIMIT 50
And here’s the result:
We can also see how messy things are by looking at one of the other representations of the book page which unpacks the dc:subject values as follows:
dc:subject [ a skos:Concept;
skos:inScheme ;
skos:prefLabel "Genetics." ],
[ a skos:Concept;
skos:inScheme ;
skos:prefLabel "Evolution (Biology)" ],
[ a skos:Concept;
skos:inScheme ;
skos:prefLabel "Behavior genetics." ],
[ rdfs:label "Animals" ],
[ rdfs:label "Behaviour" ],
[ rdfs:label "expounded by" ],
[ rdfs:label "theories of survival of species" ],
[ rdfs:label "Animals" ],
[ rdfs:label "Genes" ],
[ a skos:Concept;
skos:inScheme ;
skos:notation "591.5"^^ ],
[ a skos:Concept;
skos:inScheme ;
skos:notation "591.1/5/1"^^ ],
[ a skos:Concept;
skos:inScheme ;
skos:notation "591.5"^^ ],
[ a skos:Concept;
skos:inScheme ;
skos:notation "591.1/5"^^ ];
So by my reckoning, this query should get us the Dewey Decimal classification(s) for a book given its ISBN:
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?book ?title ?classmark
WHERE {
?book a bibo:Book .
?book dc:title ?title.
?book bibo:isbn <urn:isbn:019857519x>.
?book dc:subject ?subject.
?subject a skos:Concept;
skos:inScheme <http://dewey.info/scheme/e18>;
skos:notation ?classmark
}
which gives as a result:
We should now be able to find other books with the same classmark by extending the query, I’m guessing like this????
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?othertitle ?otherisbn
WHERE {
?book a bibo:Book .
?book bibo:isbn <urn:isbn:019857519x>.
?book dc:subject ?subject.
?subject a skos:Concept; skos:inScheme <http://dewey.info/scheme/e18>; skos:notation ?classmark.
?otherbook a bibo:Book .
?otherbook dc:subject ?subject.
?otherbook dc:title ?othertitle.
?otherbook bibo:isbn ?otherisbn.
} LIMIT 50
(Hmmm… how do I say ?book is not equal to ?otherbook?)
…only, I can’t check it right now because bibliographica has stopped playing again….:-( Which is a lesson to be learned, I guess… If you’re running Linked Data queries across multiple different services, if one of those services go down, things can break…
PS that query didn’t work in the end… in the meantime, here’s a query for looking up books by classmark:
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?title ?isbn
WHERE {
?book a bibo:Book .
?book dc:title ?title.
?book bibo:isbn ?isbn.
?book dc:subject ?subject.
?subject a skos:Concept; skos:inScheme <http://dewey.info/scheme/e18>; skos:notation '591.5'^^<ddc:Notation>.
} LIMIT 50
(I found the ^^<ddc:Notation> crib in the n3 version of the book information page… I’m not sure what the ddc prefix is supposed to be, although the query seems to run without me having to declare it explicitly?
[ a skos:Concept;
skos:inScheme <http://dewey.info/scheme/e19>;
skos:notation "591.5"^^ ],
UPDATE: doh! the < .. > means it doesnlt need unpacking, right?)
Should Academic Journal Papers Have Video Trailers?
I don’t read academic journal papers very much any more, partly because folk rarely link to them, but today I read a paper (“Narrative Visualization: Telling Stories with Data”, Edward Segel, Jeffrey Heer, IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis), 2010) in response to this video trail that brought it to my attention (Journalism in the Age of Data, Ch. 3: Telling “Data Stories”):
I encourage you to watch the video – not necessarily for what it’s about, but for the way that a journal article is used to hold bits of the video together. Note that the video is not just about the paper, but it’s not hard to see how a video could be made that was just about the paper…
So I wonder: should we be making voiced over “papercasts” of academic papers to provide a quick summary of what they contain, and maybe also enriching them with photos and footage relating to what the content of the paper is about? (I know this might not make sense for the subject matter of every paper, but if a journal paper is about a particular online tool, for example, here would be an opportunity to show a few seconds of the tool in use, and contextualise it/demonstrate it a little more interestingly than a single, simple screenshot can convey?)
UPDATE: @der_no tweets: “Always enjoyed technical papers preview @ #SIGGRAPH (esp considering many of actual papers are beyond me)” See an example conference papers trailer here – SIGGRAPH 2010 : Technical Papers Trailer:
If the conference matter is appropriate (robotics related conferences come to my mind, for example), couldn’t this sort of approach provide an additional legacy resource that can continue to give an event life after the fact?
PS I believe that several of the OpenLearn folk are also looking at ways of pulling together video and audio in the way they package their material, for example looking at the use of Xtranormal videos, or Slideshare slidecasts. (Note that it’s easy (or used to be!) to publish Xtranormal clips into Youtube, and Youtube clips can also be embedded in Slideshare presentations, so all manner of fusions of content become possible!)
PPS Very, very loosely related to the above is another thread I want to link in to, here. That is, the extent to which academics might take up various sorts of (“new”) media training to explore different ways of engaging with (and maybe helping reinvent?) scientific communication. For example, a recent initiative in the OU has seen more than a few brave academic volunteers engaging in podcast training as part of Martin’s Podstars project (I couldn’t find a better link?!).
Running parallel to this, the OBU’s media training team have been helping other academics put together short showreels that have since been published on the OU podcast site – OU Experts:
In terms of finding training materials that are already out there, it struck me that the BBC College of Journalism might be a good start, particularly in the skills area?
New “Study at OU” iOS app for iPhone, iPad and iPod Touch
I haven’t managed to find an official announcement of this yet, but it seems as if there’s a new addition to the Apple App Store in the form of a “Study at the OU” app: StudyAtOU.
So what does it do? Essentially, it seems to provide and app way in to the Open University course catalogue. Here’s the opening screen
(Err – wot? No OU logo? Is it not an *official* app then?! Though the iTunes does carry branding… If it is an official app, how well does it sit with OU ice/brand police guidelines, I wonder…?;-)
The app provides a convenient way of navigating through the OU’s course catalogue, providing a brief overview of subject areas:
qualifications:
and research degrees:
Clicking through on individual course links also leads through to a description of the corresponding course.
Looking through some of the descriptions, it seems as if there isn’t any information about forthcoming presentation dates, course fees, and other ‘administrative’ information (such as level information), nor does there appear to be a ‘click here to register’ option. (Hmm… I don’t think the OU course registration system accepts Paypal yet? If it did, I guess something like the PayPal Mobile Payments library would allow this to be integrated into the StudyAtOU app?)
If you want to share a link to a course with other people, there are several ways of doing this:
Facebook and Twitter based sharing requires you share access to those services, of course. Here’s the prompt you get when you try to share a link using Facebook:
and the corresponding prompt from Twitter:
(By the by, seeing this app reminds me of my old, old WAP demo for navigating Relevant Knowledge short course descriptions… TSCP Experimental WAP Service;-)
Also in passing, I asked where the data that feeds the app come from. It seems to be an XML source that also feeds the data.open.ac.uk Linked Data store, rather than being fed from the Linked Data Store itself. Whilst the Linked Data store was presumably not available whilst the app was being developed, it would be nice to think that services such as the app might call on the datastore and actually start using the data contained within it.
One thing worth noting about the app is that it is not self-contained. The qualification, subject area and course descriptions are all pulled in live from the web, which means if you lose your network connection, you get this:
In that sense, the StudyAtOU app is therefore a hybrid app, with a small amount of offline functionality, complemented by network sourced content. I guess there’s a trade-off between going on here between connectivity/bandwidth requirements and memory requirements/app size.
If the app was displaying course start dates/pricing, then I could see why there might be a good reason for the app to want to display ‘live’ data, but for course descriptions? (I guess you could argue that you always want the catalogue to only list current courses, but that could easily be managed with a time to live field associated with each course on the app, given the review dates for the expiry of courses are well known?)
Anyway, the app’s there, and usable, and available for download, and offers a great starting point…but will it be allowed to evolve, I wonder, in an agile fashion, now that it’s there…?
PS just because, I wondered where http://www.open.ac.uk/studyatou might point to. At the moment, it seems to resolve to http://www3.open.ac.uk/study/explained/how-do-i-apply.shtml:
The top-level “Study at OU” web presence, and the one you get to from the OU website’s Study at the OU top level navigation actually seems to have its home at http://www3.open.ac.uk/study/. Just sayin’…;-)
Time, Yet, for Twitter Captions on BBC iPlayer Content?
A couple of days ago, the Guardian reported a quote from Dimblebobs about Question Time being bigger than X-Factor on Twitter (How Question Time got as big as The X Factor on Twitter); so when are we going to see optional Twitter captions made available, either in real time, or on catch-up services such as iPlayer? (If you haven’t been keeping up: Twitter captions/subtitles are captions generated as an overlay for a video video based on tweets from members of a particular Twitter list, or using a particular hashtag. (In the future, it might also be worth considering the capture of tweets based on location?) Martin Hawksey has been developing several tools in this area: Twitter subtitling. His most recent demonstration – iTitle: Full circle with Twitter subtitle playback in YouTube (ALT-C 2010 Keynotes) – describes how videos of the ALT-C 2010 keynotes have been recently republished along with searchable Twitter captions).
As Martin hinted at in What they were saying: Leaders debate on BBC iPlayer with twitter subtitles from parliamentary candidates and in the comments to that post, the volume and rate of production of tweets for a popular live event may be too great to display them all via the caption feed and still give the viewer time to read them. Which means, for heavy volumne backchannels, tweets need filtering or sampling (ideally in a way that avoids undue bias?) in order to limit the number (and quality?) of tweets that are actually displayed as captions. So what are the options?
First of all, we should distinguish whether we intend to work on a live feed, or an archive feed. An archive feed means that samples or filters can be in part tuned according to a post hoc analysis of all the tweets; whereas the live feed may either work in a stateless way, judging whether or not to show any individual tweet based solely on its own merits, (for example, showing any particular tweet with given, fixed probability p), or based at least in part on the history of tweets already observed.
I think we should also distinguish between sampling of Tweets, versus filtering them. By sampling, I mean selecting each individual tweet according to probability p independently of any other information; by filtering, I mean selecting a tweet based on it or its metadata containing a particular term (for example: only selecting tweets from certain individuals, block tweets starting RT, and so on). Note that both sampling and filtering may feature in the selection of tweets for display, in either order (sample, then filter, or filter then sample), or in more elaborate combinations (sample, filter, sample, for example).
So what strategies are there..? Note that this isn’t a very principled list (been a long day!), and it is likely to be far from complete, but it’s a start, and something to mull over at least…
Sampling
- display every n‘th tweet;
- display the most recently received tweet in the last x seconds every y seconds;
- display any given tweet with fixed probability, p:
Historyless Filtering
- filter out rewteets (items starting RT);
- filter out tweets sent to a person (tweets starting @). (Note that this does mean we limit the extent to which conversations might be displayed);
- filter tweets based on some function of the number of friends and or followers a sender has;
History-based Filtering
- filter based on the number of tweets the user has already sent;
- filter based on properties of the hashtag network (for example, number of hashtaggers following an individual. I have classed this as a history-based filter because we need some knowledge of the hashtag community, generated from a history of tweets, in order to calculate hashtag network metrics;
- filter based on the extent to which tweets are appratently part of a conversation thread (e.g. construct a conversation graph in which @a mentions @b and @b mentions @a, and select all conversations greater than a particular length. Note that we might combine this condition with other conditions, such as “where @a and @b share more than m common followers”.
Note that the filtering approach may be used to either filter out tweets and prevent them from being displayed, or select tweets according a particular set of criteria that means they should be displayed. In addition, filtering may be deterministic or combined with a probabilistic sampling mechanism. For example, we may choose to display a tweet with probability p where p is a function of some ranking factor with value f. An alternative approach might be to generate a score for each tweet based on one or more ranking factor as described in the filter considerations above, rank the tweets by score, and then display the one with the highest score at any given time.
The history based approach may be used in real time, making selections based on the tweets observed (and/or maybe just the tweets displayed) so far (until now history), or, in cases where a Twitter caption file is being generated after the fact, through analysis of the whole hashtag archive corpus (total archive). So for example, it might be that the caption file is generated after the event for use only by catch-up viewers, with the expectation that live viewers would be able to entertain themselves direclty from a live Twitter feed in their own client.
Getting Started With data.open.ac.uk Course Linked Data
What OU courses/modules contain the word ‘music’ in the title? Can we use the OU’s Linked Data to find out?
Earlier this week, an OU press release proudly announced that “[t]he Open University (OU) is the first university in the UK to open up access to online data from across the institution as part of the Linked Open Data Movement.” Several other HEIs have also been exploring Linked Data releases ([lazyweb]: please feel free to add a link if that includes your institution in the comments to this post, and I’ll then collate them here…) but the OU’s LUCERO project is looking across units, and getting real data out, which I think is where the novelty lies?
“What this means,” explained David Matthewman, Chief Information Officer at The Open University, “is that members of the public, students, researchers and organisations will be able to easily search, extract and, more importantly, reuse The Open University’s information and data.
Having the data available is one thing, of course, but what might it actually be useful for?. And how easy (or difficult) is it to get started with accessing this data?
In this post, I’ll try to give some sample queries on the courses data that you can use as a basis for your own queries. This will get a little technical, and it will involve writing queries using a language called SPARQL, so if you have no idea what that is, you’d better read this first: My Understanding of SPARQL, the First Attempt…
Interrogating Linked Data such as that published at data.open.ac.uk requires three things:
- a query that states in a particular way the question we want to ask of the database and the result fields we want returned;
- the specification of a query endpoint, which tells the thing making the query where the query interface to the database to be queried lives;
- some sort of engine or mechanism for actually making the query and displaying the results.
The query form at http://data.open.ac.uk/query handles the second of third of these for us, which means all we need to do is provide the query itself:
As is the way of most Linked Data tutorials I’ve seen, I’m probably now going to scare you off… I had wanted to start with a really, really simple and plausibly useful query (?!), but the most obvious query I can think of – looking up the name of a course from its course code – requires some syntactic clutter that makes the query look harder to read than it actually is… such is life!
At its heart, the query we want to ask is something like as follows (T215 is the course code of interest:
select ?title where {
?x a course:Module.
?x course:code 'T215'.
?x course:name ?title.
}
This query says something along the following lines: “for something we’ll call ?x, that we require to be a course:Module, and that has the course code ‘T215′, find me a value of ?title corresponding to it’s course:name”
Unfortunately, if we run the query as stated above, it won’t work, because the facts that are encoded in the database that satisfy the above query look like this:
<http://data.open.ac.uk/course/t215> a <http://purl.org/vocab/aiiso/schema#Module>
<http://data.open.ac.uk/course/t215> <http://purl.org/vocab/aiiso/schema#code> 'T215'^^<http://www.w3.org/2001/XMLSchema#string>
<http://data.open.ac.uk/course/t215> <http://purl.org/vocab/aiiso/schema#name> "Communication and information technologies"^^xsd:string
In order to make the query work, we need to write it something like this:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX course: <http://purl.org/vocab/aiiso/schema#>
select ?title where {
?x a course:Module.
?x course:code 'T215'^^xsd:string.
?x course:name ?title.
}
(The ^^xsd:string monstrosity is essentially a variable type, in this case a string. As with many programming languages, data can be typed in various ways – sets of characters as strings, as in this example, or numbers as integers (^^xsd:int), for example. If you get the type wrong, and it’s required, the match won’t work…
If you copy and paste that query into the form at http://data.open.ac.uk/query, you should get the following result when you submit the form:
Of course, if you’re a real Linked Data person, then you’d probably write the query in the following equivalent, but difficult to read way!:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?title from
<http://data.open.ac.uk/context/course> where {
?x a <http://purl.org/vocab/aiiso/schema#Module>.
?x <http://purl.org/vocab/aiiso/schema#code> 'T215'^^xsd:string.
?x <http://purl.org/vocab/aiiso/schema#name> ?title.
}
If the above freaks you out/does your head, it does the same to mine;-) Just remember, a lot of that stuff is syntactic baggage that’s just required to make things work…
Also note that most LD folk wouldn’t have use “course” in the following prefix:
PREFIX course: <http://purl.org/vocab/aiiso/schema#>
Instead, I suspect they’d have gone for something like aiiso:, (unless a conventional prefix exists for that vocabulary?); but I was trying not to frighten you off too early…!;-)
As I mentioned above, the OU is looking to open up data as Linked Data in the same database from across the university. I think the following modified query tries to help the database out a little by telling it what sort of data we’re interested in, so it has a head start on knowing to look for course related data, compared to open repository/ORO reference data, which also exists in the datastore, for example:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX course: <http://purl.org/vocab/aiiso/schema#>
select ?title from
<http://data.open.ac.uk/context/course> where {
?x a course:Module.
?x course:code 'T215'^^xsd:string.
?x course:name ?title.
}
How would you look up the title of the course with code T151? Or M366?
Having got a single query going, how can we start to elaborate on it? If you look at the location http://data.open.ac.uk/page/course/t215.html (and I know this page exists because there’s a crib/link to it on the data.open.ac.uk homepage;-), you can see a view over the data in the datastore corresponding to the course with course code T215:
The things down the left hand side are links that associate particular sorts of data with that course.
So for example, the OUCourseLevel property is described by the link <http://data.open.ac.uk/saou/ontology#OUCourseLevel>, which means that there is a fact in the database corresponding to something like:
<http://data.open.ac.uk/course/t215> <http://data.open.ac.uk/saou/ontology#OUCourseLevel> 3
(Actually, that character 3 is probably typed, for example as something like “2″^^xsd:string)
Can you work out how to take that information and find the OU level (as well as the title) of a course from it’s course code?
How about this way?
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX course: <http://purl.org/vocab/aiiso/schema#>
select ?title ?ouLevel from
<http://data.open.ac.uk/context/course> where {
?x a course:Module.
?x course:code 'T215'^^xsd:string.
?x course:name ?title.
?x <http://data.open.ac.uk/saou/ontology#OUCourseLevel> ?ouLevel.
}
If you look at the course data information page, you’ll see a variety of other linked properties there. Just follow the approach above to spot the links you want and generate your own queries. Here’s another example which you can try in the SPARQL form (or delete the lines corresponding to properties you aren’t interested in (remember to remove the query variable from the select statement at the start of the query)):
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX course: <http://purl.org/vocab/aiiso/schema#>
PREFIX ou: <http://data.open.ac.uk/saou/ontology#>
select distinct ?title ?ccode ?eucredits ?oulevel ?scqflevel ?fheqlevel from <http://data.open.ac.uk/context/course> where {
?x a course:Module.
?x course:code 'T215'^^xsd:string.
?x course:code ?ccode.
?x course:name ?title.
?x ou:FHEQLevel ?fheqlevel.
?x ou:OUCourseLevel ?oulevel.
?x ou:SCQFLevel ?scqflevel.
?x ou:eu-number-of-credits ?eucredits.
}
Note that we can also return the course code we embedded in our search query as one of the selected arguments by also putting in a variable (?ccode) associated with the course:code for the course.
Another way of finding a course by course code is to find all the courses that match the arguments specified, and then filter the results. I suspect the following query makes the server do a lot of work finding information for all manner of courses, before filtering the results to find the details for the course we want… (or maybe the query engine can optimise it’s way out of some of the apparent inefficiences of this query?)
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX course: <http://purl.org/vocab/aiiso/schema#>
PREFIX ou: <http://data.open.ac.uk/saou/ontology#>
select distinct ?title ?ccode ?eucredits ?oulevel ?scqflevel ?fheqlevel from <http://data.open.ac.uk/context/course> where {
?x a course:Module.
?x course:code ?ccode.
?x course:name ?title.
?x ou:FHEQLevel ?fheqlevel.
?x ou:OUCourseLevel ?oulevel.
?x ou:SCQFLevel ?scqflevel.
?x ou:eu-number-of-credits ?eucredits.
FILTER(regex(str(?ccode), 'T215'))
}
How do you think you could tweak the above query to search for courses that contain ‘music’ in the ?title/course:name?
Could you tweak it further to find courses with music in the title at SQCF level 10? (Hint: level 10 is encoded in the database as '10'^^xsd:int. Which is to say ?x ou:SCQFLevel '10'^^xsd:int. will identify courses where ?x is at ou:SCQFLevel 10…;-)
In the next post in this series, I’ll describe several other ways in which we can run these queries, using services such as SparqlProxy, YQL, and even Yahoo Pipes;-)
All Jobs Digital at the OU…
Is something in the air?
Currently advertised on the OU jobs website are the following:
- Information Technology, Assistant Director IT Programmes
What’s it worth? Senior Salary band 3 £79,171- £91,167
This is a challenging time for the Higher Education sector which is seeing significant changes in state funding. The entry of more private providers has also increased the consumer power of fee-paying students, placing a premium on service delivery and cost efficiency.
The University recently appointed its first Chief Information Officer and he is now building the Systems Futures programme and the IT executive team which will lead the work in delivering the systems to enable the University to achieve its ambitions. The OU is well placed to respond to these challenges and this team will play a crucial role in enabling the University to adapt and be successful in this changing environment.
We are seeking a dynamic, innovative and value focused individual to join this team and support the development of robust, enterprise-scale IT Systems to lead the University into its next chapter.
You will play a key role in the transformation of the OU’s systems and capability. You will achieve this by leading the Systems Futures programme which will establish the future organisational process and service needs and deliver the systems to meet them. - Information Technology, Assistant Director – IT Service Delivery
What’s it worth? Senior Salary band 2 £69,574 – £81,570
We are seeking a dynamic, tenacious and value & service focused individual to join this team and support the development and operational delivery of robust, enterprise-scale IT services to lead the University into its next chapter.
You will play a key role in the transformation of the OU’s IT services and capability. You will achieve this by championing and leading the drive for service excellence and the introduction of world class methods and standards. - Assistant Director IT Development , Information Technology
What’s it worth? £79,171- £91,167
We are seeking a dynamic, innovative and value focused individual to join this team and support the development of robust, enterprise-scale IT Systems to lead the University into its next chapter.
You will play a key role in the transformation of the OU’s systems and capability as well as providing effective business-as-usual delivery using a mix of bespoke software and package solutions. You will achieve this by establishing the IT Development Strategy and take advantage of industry best practice. - Media Project Manager, Learning and Teaching Solutions (LTS)
What’s it worth? Temporary contract until 31 July 2011, £36,715 – £43,840
Do you have an excellent record of managing media production and building successful relationships with clients? Would you enjoy leading a team of media specialists working on a portfolio of distance-teaching modules?
As Media Project Manager, you will manage the development, production and maintenance of teaching materials across a range of media, from early planning through to the finished product. You will work closely with academic and media production staff on specifications, product development and delivery of module materials to quality, time and budget. - Research Assistant/Research Associate in Digital Scholarship, Institute of Education Technology
What’s it worth? £27,319 – £35,646 p.a, Temporary until 31 July 2011, Part-time and job share applications considered
We wish to develop work aimed at understanding the changes in communication and publication practices of academic researchers in higher education due to the impact of the information age. The impact consists of a changed landscape which offers researchers new ways of working and offers new kinds of academic output for educators to use in their teaching. The focus therefore is a programme of research to explore the ways that changes in technology are changing the discourse and practices of academic researchers, and the consequences that these changes have for professional practices of educators. - Two Research Fellowships in Technology Enhanced Learning, Institute of Educational Technology,
What’s it worth? Temporary posts for 24 months, £36,715-£43,840 p.a.
Research Fellowship in Technology Enhanced Learning (Educational Futures)
Research Fellowship in Technology Enhanced Learning (Pervasive, Ubiquitous and Ambient Computing)
Applications are sought for two independent researchers to conduct a programme of research linking the work of the Institute of Educational Technology (IET) to colleagues in the Centre for Education and Educational Technology. The general themes of the research would be ‘Educational Futures’ and ‘Pervasive, Ubiquitous and Ambient Computing’ and candidates would be expected to specify a 2 year programme of work. The proposal should demonstrate how this work be relevant to at least one of the Digital Scholarship, Next Generation Distance Learning or Learning in an Open World research programs running in IET. Researchers would be required to demonstrate a strong record of Publication and a track record of successful external funding bids.
Hmmm… where did it all go so wrong for me…?!;-)
PS what’s a finder’s fee on a a £90k job? If by chance you saw the ad here, go for the post as a result, and get it, you owe me a coffee. Every time I’m in MK… And I will maybe see if I can get promoted into HR…
PPS Oops – forgot this one… Make a difference, protect the vision – join the OU Council
If You’re Going to Republish Data, Try To Be Consistent…
A passing observation, not meant as a gripe, but a good example of how slight inconsistencies can make life hard in data land…
[UPDATE: I think the insconsistencies in the Guardian Datastore spreadhseets mentioned below have now all been addressed. Which I guess makes this an impactful post?;-)]
Early today/last night, the Guardian Datablog did a great job republishing government spending data in a series of spreadsheets organised by Government Department. I cobbled together a tool to run queries against each spreadsheet separately, and thought I’d have a look at doing one that could run the same query against all the sheets at the same time, and aggregate the results.
To simplify matters, it looks as if the Google Charts Visualisation Query Language that I use to interrogate the spreadsheets allows you to pass column names in the query (documentation). This is handy, because it gets round possible inconsistencies across the spreadsheets arising from the different ordering of columns. However, to run the same query across multiple spreadsheets, it does require that the columns are named in exactly the same way. But at the moment, they aren’t:
So for example, most of the spreadsheets have column names Department Family and Entity, except for the FCO and BIS, which use Dept family and Dept entity respectively. System Transaction Number is almost consistent except for at the FCO and BIS (System transaction no) and the Attorney General’s Office (Transaction number), with the Treasury sheet using Transaction Number; those four departments also use Date rather than Payment Date. FCO and BIS also differ from the norm in the way they capitalise Expense type (every other department uses Expense Type). Environment, Food and Rural Affairs is also out of kilter using YEAR rather than Year.
(I’m not sure if there are other differences? If you spot one, please note it in a comment.)
Just by the by, here’s the script I used to grab the column names. It’s written in Python, so could form part of a stub for a Scraperwiki script that runs CSV returning queries run across the Guardian Datastore spreadsheets and aggregates the results.
import csv,urllib
spreadsheets=[("Treasury","tVryVDy3K3O6kfV7vt0SdSg","0"),("Transport","thgmwL0KV4fX4g5Kr4rdM-w","0"),("Revenue and Customs","0AonYZs4MzlZbdHJLWXJfS1diemlnN084YlNSU0RjNWc","0"),("Justice","tWJDJMgScG8KKpiwVShSjMw","0"),("International Development","0AonYZs4MzlZbdFdjaGVOVFAydm5sUUlTb09JUzFaNXc","0"),("Home Office","tnBJa5GzHGs6BdHL2K-5N9w","0"),("Health","tPdRIE1Dtovo5adgjt_D5rA","0"),("Government Equalities Office","tKUQuJiBekBQxlDcasQFFaQ","0"),("Foreign and Commonwealth Office","tk38yFwkcFwIiV1Xa5N7ZDw","0"),("Environment, Food and Rural Affairs","tNHaggN5kosvBHIU0pCs4fw","0"),("Energy and Climate Change","tuYwPXA9dma-Y87Pw8yQVjw","0"),("Education","tCieqjFZQ9LKQEZxEJX7IXA","0"),("Defence","tBA1bSiVf7y_no4upsfl8yw","0"),("Culture, Media and Sport","tKlkQ_ocEHaUa3RDgIee5sQ","0"),("Communities and Local Government","tGjVskINaH4X5crKIF_7KaA","0"),("Cabinet Office","toK-_5Qg_7QW8fuEJH5vdgw","0"),("Business, Innovation and Skills","tR6ec9fJYbVpLOlH3X4TjlQ","1"),("Attorney General's Office","0AonYZs4MzlZbdHJKWFMtaGlsaTdDMm5QaFNWWWd0QWc","0")]
def getURL(key,gid):
return "http://spreadsheets.google.com/tq?tqx=out:csv&tq=select%20*%20limit%201&key="+str(key)
for dept,key,gid in spreadsheets:
url= getURL(key,gid)
f = urllib.urlopen(url)
data = csv.DictReader(f)
data.next()
print dept,data.fieldnames
Government Spending Data Explorer
So… the UK Gov started publishing spending data for at least those transactions over £25,0000. Lots and lots of data. So what? My take on it was to find a quick and dirty way to cobble a query interface around the data, so here’s what I spent an hour or so doing in the early hours of last night, and a couple of hours this morning… tinkering with a Gov spending data spreadsheet explorer:
The app is a minor reworking of my Guardian datastore explorer, which put some of query front end onto the Guardian Datastore’s Google spreadsheets. Once again, I’m exploiting the work of Simon Rogers and co. at the Guardian Datablog, a reusing the departmental spreadsheets they posted last night. I bookmarked the spreadsheets to delicious (here) and use these feed to populate a spreadsheet selector:
When you select a spreadsheet, you can preview the column headings:
Now you can write queries on that spreadsheet as if it was a database. So for example, here are Department for Education spends over a hundred million:
The query is built up in part by selecting items from lists of options – though you can also enter values directly into the appropriate text boxes:
You can bookmark and share queries in the datastore explorer (for example, Education spend over 100 million), and also get URLs that point directly to CSV and HTML versions of the data via Google Spreadsheets.
Several other example queries are given at the bottom of the data explorer page.
For certain queries (e.g. two column ones with a label column and an amount column), you can generate charts – such as Education spends over 250 million:
Here’s how we construct the query:
If you do use this app, and find some interesting queries, please bookmark them and tag them with wdmmg-gde10, or post a link in a comment below, along with a description of what the query is and why its interesting. I’ll try to add interesting examples to the app’s list of example queries.
Notes: the datastore explorer is an example of a single web page application, though it draws on several other external services – delicious for the list of spreadsheets, Google spreadsheets for the database and query engine, Google charts for the charts and styled tabular display. The code is really horrible (it evolved as a series of bug fixes on bug fixes;-), but if anyone would like to run with the idea, start coding afresh maybe, and perhaps make a production version of the app, I have a few ideas I could share;-)

































