Quick Way in to Hacking Legacy OU Course Materials Using Markdown

By some arcane process, OU course materials authored typically in MS Word are converted to an XML format (OU-XML) and then rendered variously to HTML for the Moodle website, ebook formats, and perhaps PDF (we don’t want to make it too easy for students to print of the materials…).

An internal project that ran for a couple of years (maybe a bit more) looking at more direct authoring workflows was shelved earlier this year. (I was banned from blogging about it whilst it was under development, so I’m afraid I don’t have screen shots to show what it looked like from the time I was given preview access.) As far as I know, the authoring tool was completely distinct from the one developed by the OU’s bastard offspring that is FutureLearn. Nowt like sharing.

One of the things I’m slated to do over the next few months is update, or possibly rewrite, a unit in a first year equivalent module.

My preferred way of authoring for some time has been to keep it simple and just use markdown.

So that’s what I’m probably going to do.

If there’s any griping or sniping that it doesn’t fit the OU workflow, I’ll just run it through pandoc to generate an MS Word docx version and hand that over.

(I’ve been saying *for years* we should have pandoc read/write filters for OU-XML (the most recent notes are here). It would have been a damn site cheaper than the aborted authoring tool project and would have allowed authors to explain a whole range of tools for creating their warez, with pandoc handling the conversion to OU-XML. And yes, I f**king know that some hand cleaning of the OU-XML would almost certainly have been required but we’d have got a far better feeling for what sorts of document structures folk produce if they were allowed to use the tools that suit them. And authors’ shonky mark-up (including my own) *always* needs some fettling anyway: we already know that…)

So… markdown…

If I’m going to revise the current materials, I need to get them out of the current format and into markdown. I’ve previously started looking at an XSLT to convert OU-XML to markdown, eg as described in Fragment – OpenLearn Jupyter Books Remix; a copy of the current-ish XSLT, and some code fragments to grab and convret an example OU-XML document, can be found here.

But today, I thought of an even scruffier and quicker way…

Within the VLE, a single OU-XML source document is rendered across multiple HTML pages, along  with a navigation index:

A single HTML page view (for easier printing) is also available… Hmmm…there are plenty of HTML2markdown converters out there, aren’t there?

#!pip3 install markdownify
from bs4 import BeautifulSoup
from markdownify import markdownify as md

with open('Robotics study week 1 – Introduction_ View as single page.html', 'r') as f:
    # Let's just grab the HTML body...
    tree = BeautifulSoup(f.read(), 'lxml')
    body = tree.body
    txt = md(str(body))
    
with open('week1-mardownify.md','w') as f:
    # There'll still be script tag cruft, videos won't be embedded / linked etc
    # but it's enough to get started with and the diffs should be easy to see...
    f.write(txt)

The output is a bit flakey in parts, but most of the stuff I need is there.  Certainly, there’s more than enough of it in useable form for me to start using as an outline. Indeed, much of the work will be ripping out and replacing the huge chunks of content that are now rather dated.

I can also edit the markdown in a notebook environment using Jupytext, using metadata cells to highlight certain blocks of content with additional structural or semantic metadata, saving the metadata into the markdown document from where it could be processed (I’m not sure how it would turn up if the enhanced markup were converted to docx using pandoc, for example?).

From what I saw of the aborted OpenCreate editor, it used a block/cell style metaphor for creating separate content elements within a page, so it’d also be interesting to compare the jupytext/metadata enhanced markdown, or even the notebook ipynb output format, with the OpenCreate document format / representation to see whether there are similarities in the block level semantic / structural markup.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

2 thoughts on “Quick Way in to Hacking Legacy OU Course Materials Using Markdown”

  1. Is there any positive affect on the output if the content is created using markdown or is it solely down to your personal preference as an author to write in markdown, not use a word processor (which is a reasonable thing, I am just not clear.)

    As an OU student the thing that frustrates with the current course templates is that the course content is basically just a digital version of a book and DVDS, with some links added. I am not sure if it is a limitation as to how it is compiled, or more that Word is primarily a print content creation tool. It just feels pretty weird to have everything separated out, although it does allow doing the online things separately all together and reading the paper book without screens.

    It’s interesting you mention FutureLearn because I do like how it combines different types of content on a page, including allowing for discussion right there with the content and allows much greater granularity for checking things off. But in both I don’t like the overly-linear approach.

    Courses I have taken in person tend to follow a much more mindmap style approach to each topic. If you look at a syllabus for a course you have key topics and concepts and required and recommended reading and resources, as well as assignments (usually with some choice) and deadlines. This allows you to delve deeper into certain areas, find different authors and styles and engage with the learning.

    I think online courses can do this even better as they could allow learners to customise their own route through the course and change the views to how it suits them (like how some mindmapping tools have an outline view) . Popular EdTech tools like Popplet, blendspace, pearltrees and padlet could be seen to take more this approach to varying degrees and it would be great to see the OU move beyond putting word documents online for the majority of the course content.

    1. @Ruth My reason for using markdown is that it’s quick and easy for me to author in; also, I’m exploring workflows that allow code and code outputs to be used in support of the generation of materials using Jupytext, which can run code embedded in a markdown document as if the document were a Jupyter notebook. This approach can also be used to generate static HTML documents with interactive widgets, such as maps, embedded within them.

      Some context here: https://blog.ouseful.info/2019/05/17/fragment-openlearn-jupyter-books-remix/

      This approach also supports the creation of interactive elements within pages. For example: https://www.nbinteract.com/examples/examples_empirical_distributions.html

      There are other tools in the Jupyter ecosystem that allow editable code to be executed from within a web page.

      As far as mindmaps go, I experimented in the past with various ways of rendering mindmap style navigational surfaces from OU-XML eg https://blog.ouseful.info/2012/05/04/generating-openlearn-navigation-mindmaps-automagically/ but no-one was ever very interested in them.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.