It turns out that part of the grief I encountered here in trying to access OpenLearn XML content was easily resolved (check the comments: mechanise did the trick…), though I’ve still to try to sort out a workaround for accessing OpenLearn images (a problem described here)), but at least now I have another stepping stone: a database of some deconstructed OpenLearn content.
Using Scraperwiki to pull down and parse the OpenLearn XML files, I’ve created some database tables that contain the following elements scraped from across the OpenLearn units by this OpenLearn XML Processor:
- glossary items;
- learning objectives;
- figure captions and descriptions.
You can download CSV data files corresponding to the tables, or the whole SQLite database. (Note that there is also an “errors” table that identifies units that threw an error when I tried to grab, or parse, the OpenLearn XML.)
Unfortunately, I haven’t had a chance yet to pop up a view over the data (I tried, briefly, but today was another of those days where something that’s probably very simple and obvious prevented me from getting the code I wanted to write working; if anyone has an example Scraperwiki view that chucks data into a sortable HTML table or a Simile Exhibit searchable table, please post a link below; or even better, add a view to the scraper:-)
So in the meantime, if ypu want to have a play, you need to make use of the Scraperwiki API wizard.
Here are some example queries:
- a search for figure descriptions containing the word “communication” – select * from `figures` where desc like ‘%communication%’: try it
- a search over learning outcomes that include the phrase how to followed at some point by the word data – select * from `learningoutcomes` where lo like ‘%how to%data%’: try it
- a search of glossary items for glossary terms that contain the word “period” or a definition that contains the word “ancient” – select * from `glossary` where definition like ‘%ancient%’ or term like ‘%period%’: try it
- find figures with empty captions – select * from `figures` where caption==”: try it
I’ll try to add some more examples when I get a chance, as well as knocking up a more friendly search interface. Unless you want to try…?!;-)