My ILI2012 Presentation – Derived Products from OpenLearn/OU XML Documents

FWIW, a copy of the slides I used in my ILI2012 presentation earlier this week – Making the most of structured content:data products from OpenLearn XML:

I guess this counts as a dissemination activity for my related eSTEeM project on course related custom search engines, since the work(?!) sort of evolved out of that idea…

The thesis is this:

  1. Course Units on OpenLearn are available as XML docs – a URL pointing to the XML version of a unit can be derived from the Moodle URL for the HTML version of the course; (the same is true of “closed” OU course materials). The OU machine uses the XML docs as a feedstock for a publication process that generates HTML views, ebook views, etc, etc of a course.
  2. We can treat XML docs as if they were database records; sets of structured XML elements can be viewed as if they define database tables; the values taken by the structured elements are like database table entries. Which is to say, we can treat each XML docs as a mini-database, or we we can trivially extract the data and pop it into a “proper”/”real” database.
  3. given a list of courses we can grab all the corresponding XML docs and build a big database of their contents; that is, a single database that contains records pulled from course XML docs.
  4. the sorts of things that we can pull out of a course include: links, images, glossary items, learning objectives, section and subsection headings;
  5. if we mine the (sub)section structure of a course from the XML, we can easily provide an interactive treemap version of the sections and subsections in a course; generating a Freemind mindmap document type, we can automatically generate course-section mindmap files that students can view – and annotate – in Freemind. We can also generate bespoke mindmaps, for example based on sections across OpenLearn courses that contain a particular search term.
  6. By disaggregating individual course units into “typed” elements or faceted components, and then reaggreating items of a similar class or type across all course units, we can provide faceted search across, as well as university wide “meta” view over, different classes of content. For example:
    • by aggregating learning objectives from across OpenLearn units, we can trivially create a search tool that provides a faceted search over just the learning objectives associated with each unit; the search returns learning outcomes associated with a search term and links to course units associated with those learning objectives; this might help in identifying reusable course elements based around reuse or extension of learning outcomes;
    • by aggregating glossary items from across OpenLearn units, we can trivially create a meta glossary for the whole of OpenLearn (or similarly across all OU courses). That is, we could produce a monolithic OpenLearn, or even OU wide, glossary; or maybe it’s useful to have redefine the same glossary terms using different definitions, rather than reuse the same definition(s) consistently across different courses? As with learning objectives, we can also create a search tool that provides a faceted search over just the glossary items associated with each unit; the search returns glossary items associated with a search term and links to course units associated with those glossary items;
    • by aggregating images from across OpenLearn units, we can trivially create a search tool that provides a faceted search over just the descriptions/captions of images associated with each unit; the search returns the images whose description/captions are associated with the search term and links to course units associated with those images. This disaggregation provides a direct way of search for images that have been published through OpenLearn. Rights information may also be available, allowing users to search for images that have been rights cleared, as well as openly licensed images.
  7. the original route in was the extraction of links from course units that could be used to seed custom search engines that search over resources referenced from a course. This could in principle also include books using Google book search.

I also briefly described an approach for appropriating Google custom search engine promotions as the basis for a search engine mediated course, something I think could be used in a sMoocH (search mediated MOOC hack). But then MOOCs as popularised have f**k all to do with innovation, don’t they, other than in a marketing sense for people with very little imagination.

During questions, @briankelly asked if any of the reported dabblings/demos (and there are several working demo) were just OUseful experiments or whether they could in principle be adopted within the OU, or even more widely across HE. The answers are ‘yes’ and ‘yes’ but in reality ‘yes’ and ‘no’. I haven’t even been able to get round to writing up (or persuading someone else to write up) any of my dabblings as ‘proper’ research, let alone fight the interminable rounds of lobbying and stakeholder acquisition it takes to get anything adopted as a rolled out as adopted innovation. If any of the ideas were/are useful, they’re Googleable and folk are free to run with them…but because they had no big budget holding champion associated with their creation, and hence no stake (even defensively) in seeing some sort of use from them, they unlikely to register anywhere.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: