So it seems that there’s a new internal project running over the next few weeks scoping out how a metadata approach might allow to us to exploit our teaching content as a corpus of assets we can search and reuse in order to save time and money in production. Or something.
When OU-XML was first mooted as a gold master document standard for representing course materials, I naturally thought we might use it as the basis of a searchable repository and did some early demos of searching content scraped from OpenLearn OU-XML docs, revisiting related ideas several times over the years: a searchable image gallery, a meta-glossary (scraped from glossary items in all OpenLearn materials), a directory of learning objectives and the OpenLearn units they related to, and so on. I also did various mindmap style navigational surfaces for navigating a single module, or dynamically created to link to related items from multiple units. All the obvious stuff. More recently, I explored converting OU-XML to markdown, and even started sketching out (hard to use!) automated republishing workflows (for more recent thinking on authoring environments, see for example here). (It looks like there may be strike days soon again, so that may provide an opportunity to revisit that stuff…) Anyway, no one thought enough of any of that stuff to think it might be usefully innovative, so it was all just a(nother) OUse
So here we are again…
Another kick off meeting…
Anyway, quick impressions off the back of it.
OU materials are often narrative based and items can be tightly embedded in a strong narrative. This can make literal reuse, reuse without modification, really hard. It really is often easier to rewirte stuff from scratch. Another issue that s easily forgottten if you are trying to reuse text is to do with voice matching. Whilst it is possible to write in a voice that removes all sense of who the author is, I prefer to write course materials using a particular conversational voice. Other others have their own voices. So literally reusing content that someone else has written might be jarring to the reader.
Here’s my current take on some of the things that are usefully reusable, often very limited in scope but that take time to get right. Which is to say that they are very granular but they take a long time to produce. It can be really quick and easy to generate 500 words of blah text, but it can be really difficult and time consuming to get a figure or figure description right; it can be really fiddly marking up a complicated equation or getting the appropriate steps of a proof in place), and so on.
So when it comes to reuse, what are the granular things that take time to produce, that can be hard to get right, and that someone may have already done? And what other spin-offs or benefits might there be from being able to reuse the asset?
- learning outcomes (hard to get right);
- glossary items (fiddly to write);
- images / figures (often require an artist)
- already rights cleared
- may be generic
- already annotated with figure descriptions (hard to write well)
- equations (Maths, Chemistry)
- already marked up eg using LaTeX (may be hard to get right)
- proofs and derivations (often tricky to sequence to support learning)
- activities and exercises (may be hard to make interesting and relevant)
- activity statement
- example solution and commentary
- completion time (hard to estimate)
- (assets associated with activity/exercise)
- (may be linked to particular learning objective)
- SAQs, interactive questions
- example solution
- good quality linked web resources (hard to find; need maintaining in sesne of checking over time)
- readings / Library items (hard to discover)
- reading time (needs estimating)
- Library availability (time/cost)
- rights clearance
- datasets (often hard to find ones that tick all the boxes)
- simple to understand
- good basis for an example
- (rights cleared)
- examples of use
- (usefully dirty)
- higher level design patterns (learning design)
- structured sequences (text, reading, activity)
- animations and screencasts
- audio and video material (expensive to produce)
As well as these atomic items, there might be paired assets where each component takes time to produce and they also need to be compatible:
- figure + equation
- data + figure / chart
So those are some of the things it may be useful to discover and potentially reuse.
OU-XML is structured and allows us to create concordances (metadata) to help us discover or search through a lot of thise things. For a quick example, see this interactive Jupyter notebook demo (or this static demo) of scraping images and figure desciptions from an OpenLearn OU-XML document to support search (click the Open demo direclty button).
However, OU-XML tagging is often used inconsistently and may not be complete so the metadata or structured items we can (could) extract may not be as high quality as we might hope. Note that it may be possible to automate / bootstrap some quality improvement, but manual annotations/retagging of legacy materials may also be required to get most benefit from them in terms of discoverability, reuse etc.
On the question of reuse, direct reuse will often be difficult if not impossible because things may not be exactly suted to the purpose required. In the case of instructionla software screencasts, software updates may invalidate the video component but the shooting script or trasncript might be reusable. However, reuse-by/with-modification may offer huge efficiencies in terms of production. But there’s a but. In order to support reuse-with-modification, or “derived re-production” we ideally need an “editable” form of the asset. We might discover an image by searching through caption text, and may like what we see, almost, but if we need a tweak to the image, how easy is that to achieve? If it required redrawing from scratch, that will take more time than opening up a source file and changing a text label. If an image is a statistical chart derived from data, or an equation, and we want to tweak a small part of it, do we have access to the equation or the original data and chat plotting script, for example?
So as well as finding the “finished” published asset, we also need to find the “source” of the asset. Which is one reason I like “generative” production so much, where our gold master document also includes the means of production of the assets that are rendered in the final published document. For examples of what I mean by generative production, see for example Subject matter Authoring Using Jupyter Notebooks.