If you’re a regular listener of BBC Radio 4, you will almost certainly have come across In Our Time, a weekly, single topic discussion programme (with a longstanding archive of listen again material) hosted by Melvyn Bragg on matters scientific, philosophical, historical and cultural. In certain respects, In Our Time may be thought of as discussion based audio encyclopedia. The format sees a panel of three experts (made up of academics, commentators and critics knowledgeable on the topic for that week) teaching the host about the topic. A diligent student, he will of course have done some background reading, and posted links to the references consulted on the programme’s web page.
I’ve already had a quick play with the In Our Time data, looking to see how easy it is to relate programmes to expert academics from various UK universities (Visualising OU Academic Participation with the BBC’s “In Our Time”), but I also wondered whether it would be possible to do anything with the book references, such as using them to identify courses that may be related to a particular programme; (this is reminiscent of a couple of MOSAIC competition entries that looked at ways of recommending books based on courses, and courses based on books using @daveyp’s data from Huddersfield University library that associated course codes with the books borrowed by students taking those courses).
Being a lazy sort, I posted an idea to the OKF Ideas Incubator suggesting that it might be worth considering extracting references from In Our Time programme pages and then reconciling them with Linked Data representations of the corresponding book data.
And then, as if by magic, a solution appeared, from Orangeaurochs: “In Our Time” booklist which describes a method for parsing out the book data and then getting a Linked Data resource reference back from Bibliographica.
The original recipe suggested screenscraping the raw book references from the page HTML, but I posted a comment (at the time of writing, still in the moderation queue) which suggests:
Great to see you taking this challenge on. Re your step 2 – obtaining the reading list – a possibly more structured way of doing this is to get the appropriate section out of the xml or json representation of the programme page (eg http://www.bbc.co.uk/programmes/b00xhz8d.xml or http://www.bbc.co.uk/programmes/b00xhz8d.json).
I wonder if the BBC will start to structure the data even more – for example by adding explicitly marked up biblio data to book references?
Anyway, you can see an example of the results at pages with URLs of the form http://www.aurochs.org/inourtime_booklist/inourtime_booklist_v1.php?http://www.bbc.co.uk/programmes/b00xhz8d – just add the appropriate IOT programme page URL to extract the data from it.
There are a few hit and misses, but it’s a great start, and something that can be used as a starting point for thinking about how to annotate programme related booklists with structured bibliographic data and exploring what that might mean in a world of linked educational resources that can also reference linked BBC content… :-)