Over the last few weeks, I’ve been trying to look at the Linked Data project from different points of view. Part of the reason for this is to try to find one or more practical ways in that minimise the need to engage too much with the scary looking syntax. (Whether it really is scary or not, I think the fact that it looks scary makes it hard for novices to see past.)
Here’s my latest attempt, which uses Yahoo Pipes (sigh, yes, I know…) and BBC programme pages to engage with the BBC programme Linked Data: iPlayer clips and videos from OU/BBC co-pros
In particular, a couple of hacks that demonstrate how to:
– find all the clips associated with a particular episode of a BBC programme;
– find all the programmes associated with a parrticular series;
– find all the OU/BBC co-produced programmes that are currently available on iPlayer.
Rather than (or maybe as well as?) dumping all the programme data into a single Linked Data triple store, the data is exposed via programme pages on the BBC website. As well as HTML versions of each programme pages (that is, pages for each series, each episode in a series, each clip from a programme), the BBC also publish RDF and XML views over the data represented in each page. This machine readable data is all linked, so for example, a series page includes well defined links to the programme pages for each episode included in that series.
The RDF and XML views over the data (just add .rdf or .xml respectively as a suffix on the end of a programme page URL) are slightly different in content (I think), with the added difference that the XML view is naturally in a hierarchical/tree like structure, whereas the RDF would rather define a more loosely structured graph. [A JSON representation of the data is also available – just add .json]
So for example, to get the XML version of the series page http://www.bbc.co.uk/programmes/b006mvlc add the suffix .xml to give http://www.bbc.co.uk/programmes/b006mvlc.xml.
In the following demos, I’m going to make use of the XML rather than RDF expression of the data, partly to demonstrate that the whole linked data thing can work without RDF as well as without SPARQL…
There are also a couple of other URL mappings that can be useful, as described on the BBC Programmes Developers’ site:
– episodes available on iPlayer:
http://www.bbc.co.uk/programmes/b006mvlc/episodes/player
– episodes upcoming
http://www.bbc.co.uk/programmes/b006mvlc/episodes/upcoming
Again, the .xml suffix can be used to get the xml version of the page.
So – let’s start with looking a the details of a series, such as @liamgh’s favourite – Coast, and pulling out the episodes that are currently available on iPlayer:
We can use a Yahoo Pipes Fetch Data block to get the XML from the Coast episodes/player page:
The resulting output is a feed of episodes of Coast currently available. The link can be easily rewritten from a programme page form (e.g. http://www.bbc.co.uk/programmes/b00lyljl) so that it points to the iPlayer page for the episode (e.g. http://www.bbc.co.uk/iplayer/episode/b00lyljl). If the programme is not available on iPlayer, I think the iPlayer link redirects to the programme page?
This extended pipe will accept a BBC series code, look for current episodes on iPlayer, and then link to the appropriate iPlayer page. Subscribing to the RSS output of the pipe should work in the Boxee RSS app. You should also be able to compile a standalone Python runnable version of the Pipe using Pipe2Py.
Now let’s look at some linked data action..(?!) Err… sort of…
Here’s the front half of a pipe that grabs the RDF version of a series page and extracts the identifiers of available clips from the series:
By extracting the programme identifier from the list of programme clips, we can generate the URL of the programme page for that programme (as well as the URI for the XML version of the page); or we might call on another pipe that grabs “processed” data from the clip programme page:
Here’s the structure of the subpipe – it pulls together some details from the programme episode page:
To recap – given a programme page identifier (in this case for a series), we can get a list of clips associated with the series; for each clip, we can then pull in the data version of the clip’s programme page.
We can also use this compound pipe within another pipe that contains series programme codes. For example, I have a list of OU/BBC co-produced series bookmarked on delicious (OU/BBC co-pros). If I pull this list into a pipe via a delicious RSS feed, and extract the series/programme ID from each URL, we can then go and find the clips…
Which is to say: grab a feed from delicious, extract programmme/series IDs, lookup clips for each series from the programme page for the series, then for each clip, lookup clip details from the clip’s programme page.
And if the dependence on Yahoo Pipes is too much for you, there’s always pipe2py, which can compile it to a Python equivalent.
PS hmm, as well as the piPE2py approach, maybe I should set up a scraperwiki page to do a daily scrape?
PPS see also Visualising OU Academic Participation with the BBC’s “In Our Time”, which maybe provides an opportunity for a playback channel component relating to broadcast material featuring OU academics?
4 thoughts on “Linked Data Without the SPARQL – OU/BBC Programmes on iPlayer”
Comments are closed.