OER Text Publishing Workflows Rooted on OpenLearn OU-XML Via Github, CircleCI and Github Pages Using Jupytext and nbSphinx

Slowly, slowly, my recipes are coming together for generating markdown from OU-XML sourced, variously, from modules on the OU VLE and units on OpenLearn.

The code needs a couple more passes through but at some point I should be able to pull a simple CLI together (hopefully! I’m still manually running some handcranked steps spread across a couple of notebooks at the moment:-(

So… where am I currently at?

First up, I have chunks of code that can generate markdown from OU-XML, sort of. The XSLT is still a bit ropey (lists are occasionally broken[FIXED], for example, repeating the text) and the image link reconciliation for OpenLearn images doesn’t work, although I may have a way of accessing the images directly from the OU-XML image paths. (There could still be image rights issues if I was archiving the images in my own repo, which perhaps counts as a redistribution step…?)

The markdown can be handled in various ways.

Firstly, it can be edited/viewed as markdown. Chatting to colleague Jon Rosewell the other day, I realised that JupyterLab provides one way of editing and previewing markdown: in the JupyterLab file browser, right click on an .md file and you should be able to preview it:

There is also a WYSIWYG editor extension for JupyterLab (which looks like it may enter core at some point): Jupyter Scribe / jupyterlab-richtext-mode.

If you have Jupytext installed, then clicking on an .md file in the notebook tree browser opens the document into a Jupyter notebook editor, where markdown and code cells can be edited separately. An .ipynb file can then be downloaded from the notebook editor, and/or Jupytext can be used to pair markdwon and .ipynb docs from the notebook file menu if you install the Jupytext notebook extension. Jupytext can also be called on the command line to convert .md to .ipynb files. If the markdown file is prefaced with Jupytext YAML metadata (i.e. if the markdown file is a “Jupytext markdown” file, then notebook metadata (which includes cell tags, for example) is preserved in the markdown and can be used for round-tripping between markdown and notebook document formats. (This is handy for RISE slideshows, for example; the slide tags are preserved in the markdown so you can edit a RISE slideshow as a markdown document and then present it via Jupytext and a notebook server.)

In a couple of simple tests I tried, the .ipynb generated from markdown using Jupytext seemed to open okay in the new Netflix Polynote notebook application (early review). This is handy, because Polynote has a WYSIWYG markdown editor… So for anyone who gripes that notebooks are too hard because writing markdown is too hard, this provides an alternative.

I also note that the wrong code language has been selected (presumably the default in the absence of any specified language? So I need to make sure I do tag code cells with a default language somehow… I wonder if Jupytext can do that?).

Having a bunch of markdown documents, or notebooks derived from markdown documents using Jupytext is one thing, providing as it does a set of documents that can be easily edited and interacted with, albeit in the context of a Jupyter notebook server.

However, we can also generate HTML websites based on those documents using tools such as Jupyter Book and nbsphinx. Jupyter Book uses a Jekyll engine to build HTML sites, which is a bit of a pain (I noted a demo here that used CircleCI to build a site from notebooks and md using Jupyter Book) but the nbsphinx Python package that extends the (also pip installable) Sphinx documentation engine is a much easier propostion…

As a proof-of-concept demo, the ouseful-oer/openlearn-learntocode repo contains markdown files generated from the OpenLearn Learn to code for data analysis course.

Whenever the master branch on the repository is updated, CircleCI kicks in and uses nbsphinx to build a documentation site from the markdown docs and pushes them to the repository’s gh-pages branch, which makes the site available via Github Pages: “Learn To Code…” on Github Pages.

What this means is that I should be able to edit the markdown directly via the Github website, or using an online editor such as prose.io connected to my Github account, commit changes and then let CircleCI rebuild the site for me.

(I’m pretty sure I haven’t set things up as efficiently I could in terms of CI; what I would like is for only things that have changed to be rebuilt, but as it is, everything gets rebuilt (although the installed Python environment should be cached?) Hints / tips / suggestions about improving my CircleCI config.yml file would be much appreciated…

At the moment, nbsphinx is set up to run .md files through Jupytext to convert them to .ipynb, which nbsphinx then eventually churns back to HTML. I’ve also disabled code cell execution in the current set up (which means the routing through .ipynb in this instance is superfluous – the site could just be generated from the .md files). But the principle is there for a flick of a switch meaning that the code cells could be executed and their outputs immortalised in the punlished site HTML.

So… what next?

I need to automate the prodcution of the root index file (index.rst) so that the table of contents are built from the parsed OU-XML. I think Sphinx handles navigation menu nesting based on header levels, which is a bit of a pain in the demo site. (It would be nice if there were a Sphinx trick that lets me increase the de facto heading level for files in a subdirectory so that in the navigation sidebar menu each week’s content could be given its own heading and then the week’s pages listed as child pages within that. Is there such a trick?)

Slowly, slowly, I can see the pieces coming together. A tool chain looks possible that will:

  • download OU-XML;
  • generate markdown;
  • optionally, cast markdown as notebook files (via jupytext);
  • publish markdown / (un)executed notebooks (via nbsphinx).

A couple of next steps I want tack on to the end as and when I get a chance and top up my creative energy levels: firstly, a routine that will wrap the published pages in an electron app for different platforms (Mac, Windows, Linux); secondly, publishing the content to different formats (for example, PDF, ebook) as well as HMTL.

I also need to find a way of adding interaction — as Jupyter Book does — integrating something like ThebeLab or nbinteract buttons to support in-page code execution (ThebeLab) and interactive widgets (nbinteract).

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...