Appropriating OpenLearn Content and Republishing Edited Versions Of It Via a “Simple” Automated Text Blogging Workflow

I had intended on using my (unpaid) strike days to catch up with some books and harp practice, and maybe even the garden, and keep away from the keyboard; or failing that, to have a push on my rally data tinkering and get another LeanPub book started to try to reboot the £50 a quarter or so my previous publication (Wrangling F1 Data With R) generated, which keeps things like recurring Dropbox and Flickr etc etc charges covered (no-one has ever bought me a KoFi, as far as I can tell…).

And I was determined not to do any of the mounting workload associated with the day job, no matter how much fun some it is likely to be (like getting Ev3devSim working as an ipywidget in Jupyter notebooks).

Whilst I did manage to stick to the determined not to path, I never even really started down the intended one, instead spending hours and hours in front of keyboard trying to hack something together around my OpenLearn publishing workflow.

So here’s what I’ve come up with…

An OpenLearn Unit Text Publishing Thing

Firstly, it’s a thing that lets you grab the “source” content of an OpenLearn unit (at least, some of it; I still haven’t got round to grabbing things like video files or audio files, or scraping PDFs etc.) and churn it into a simple text format, markdown, which looks like this:

Headers are prefixed with a #, you can emphasise things by wrapping it in * characters, eg *italics* -> italics, or **double them up** for strong emphasis. Embedding links — [link text](link/path/file.html) — and images — ![Alt text](path/to/image.file) — is also pretty easy when you get the hang of it.

So how do you get started? First, you need a Github account (sign up here; just get one: you’re not going to have to do any hard Github stuff, you’re just making use of their free hosting). Get one, and sign in.

Second, visit my demo repo — psychemedia/openlearn-publish-test — (the URL will change at some point, but I’ll archive the original and link the new address from it…) and grab a copy of your own repo from mine by clicking the big green Use this template button:

You’ll be presented with a form:

Give your repo a name (no spaces). Optionally add a description. Keep the repo public. And click the big green Create repository from template button.

Things will churn for a moment or two:

And then you’ll have your own repo, containing a copy of the files in mine:

Behind the scenes, there is work going on…

Click the Actions tab on your copy of the repo to see what…

At first, it may look like nothing… but wait a moment or two and refresh the page:

A couple of actions will start running to initialise, and customise, your repo for you.

When the actions are done, you’ll be informed… (you shouldn’t have to refresh the page, the status indicators should update when things are done…):

If you go back to your repo homepage, you’ll see it’s been updated with a new README that’s slightly different to the original copy from my repo, and that has been personalised to yours:

So… now you can grab some OpenLearn content into your repo.

Click on the file link in your repo:

You will be presented with a list of units on OpenLearn.

Find one you like the look of and click the Grab Unit into this repo link:

This will open a new issue for you in the Issues tab of your repo, and prepopulate it with a title that will tell a Github Action you want to grab some OpenLearn content, and an issue body that tells the action where the unit can be found.

Click the Submit new issue button to get things started.

Back in the Actions tab, you can see the helper elves have started doing their thing again…

If you click on a running Action, you can check its progress in more detail:

Click through on the actual job name to see what’s happening inside:

You can expand a step by clicking the arrow to see what each step is doing or has already done…

If you read through the steps, you’ll see several things are done: for example, we grab some OUXML (the OpenLearn content), convert to markdown, build some HTML files (these are what gets published), and deploy them, then build a LaTeX version of the material (which is used to generate a PDF), and an ePub ebook. (The LaTeX step takes some time; I should perhaps simplify things so that only the HTML build is done by default.)

When the Actions are green circled / green ticked and done (which may take a few minutes…), or at least, when the Deploy HTML to gh-pages step has run, go back to the repo home page, where you should see a new commit has been made to your repo:

If you click into the content folder you’ll see one or more session folders:

If you click into a session folder, you see some markdown files:

If you click on one of those, you’ll see some scraped and converted OpenLearn content:

So the content has been grabbed from OpenLearn and saved to your repo.

But that’s not all.

If you scroll down on your README page (I really should make this link more prominent in the README…) you’ll see a link to a site published from your repo:

Click it…

If you see a “404”, page not found, don’t panic

On the repo home page, select the Settings tab:

and scroll down to the Github Pages area:

Change the Source from gh-pages branch to master branch:

And then, select the master branch:

And set the Source back to the gh-pages branch:

When you see something like this, you knows all good to go:

Note that cacheing of a previous build of the site may last for up to 10 minues, so grab yourself a cup of tea, or perhaps look through the markdown files in the content directory, or even go back to the Actions tab and, if the actions have completed.

If the Actions have completed, select the OpenLearnXML2 (or a completed nbsphinx publisher action if you have committed your own changes to the markdown files) and you should see and the availability of an Artifacts download.

Down load and unzip the artifacts file. If the build process has been able to build a PDF file and/or an ePub file from the content, it will be found in the unzipped downloaded directory.

Right… time to try your site link again:

An OpenLearn Editing Thing

This will have to be in a part two to this post… I’ve run out of time for now and need to get back to the day job…

If you are itching to get started, this may work, if I’ve got my autopublishing things fixed…

In the content folder (on the default master branch of the repo),  find the markdown file you want to edit, and click on the pencil icon to open the editor:


Edit the file / make the changes you want, and commit it (you may want to set a meaningful commit message title summarising the chnages, and perhaps even a longer description about the motivation for the changes, but both are optional…):


Click the big green Commit changes button to commit the changes. If you look in the Actions tab, you should see that an nbsphinx publisher action has started that should publish your changes to your site.


Note that even when the publishing action has generated and pushed updated site pages  to where they need to be, the site may take a few minutes to update because of page cacheing on the Github site.

The Future

One of the spinoffs of this for me was the realisation that I could use Github Actions to run arbitrary code in response to particular events, such as.. commits or issue postings. The current machinery uses a Sphinx / nbSphinx publishing route, but I’ve also started exploring a recipe for Jekyll based Jupyter Book publishing. (Next on the to do list will be an Executable Book project / MyST workflow](; I also need to split out the workflows into actions of their own, but I haven’t figured out how to do that for myself yet.) It strikes me that I could bundle all these in the same repo with some way of flagging which build process I want to use. This would allow the user to then republish their material using the publishing tool, and its various peculiarities, customisationa and affordances, of their choice.

Immediate to dos, that may not happen because I’m the only user, I know it’s possible, and I’m not that interested, are to: make the Github Pages / pubished site link more prominent in the README; get movie and audio downloads and embeds working. Also a way of handling PDFs linked from the OpenLearn materials, and perhaps extracting text from those, even, to support republishing…

On the publish side, it would be useful to be able to publis to HTML only by default, with some optional way of invoking the PDF and ePub builds. The ePub build also needs things like title and author setting. The PDF build sometimes breaks, eg due to the inability to detect a bounding box size round a gif image. I maybe need to use another PDF generator, eg some hints here.

I also need to refactor the code, two ways: firstly, a simplification, that uses the bare minimum of packages and just churns the markdown direct from XML in one simple step. Secondly, fixing the current workflow, which stages the XML in a SQLite database, so that the database can properly handly content from multiple unitis and I can reliably churn the md from the database for any single unit. At the moment, I think things pretty much assume there’s content from just a single unit in the database. Putting the md into the db might be useful too… Then I could imagine a datasette powered publishing route too…

As a recent tweet from Martin Hawksey reveals, he’s been blogging about how we can turn Google’s App Script to our own purposes as a hosted code runner, and I think Github now provides a similar opportunity for anyone who wants to appropriate it to that end…

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: