Using Github For Editing Course Notebooks

One of the great things about working on the TM351 module is that the module team are generally up for trying stuff out. Over the last year or two, we’ve been fumbling towards a way of working with Github for managing module notebooks.

The latest spurt of activity has been around updating Jupyter notebooks relating to the the relational databases part of the course, which has involved re-writing notebook activities from scratch. (We’ve also added a couple of tools to try to help students see what changes they’re making to a database which I’ll post about later.)

Part of the strength of the OU course production route is that it involves peer review, critical reading and editing of materials before they go live to students (this also partly explains the length of time it takes to get a new course out…) The review process provides opportunities for exploring and developing pedagogy (that word has to appear in every discussion of OU course material production!), as well as learning from each other about different ways of approaching this online teaching stuff.

Posting some final edits to a reworking by a colleague of a notebook I originally drafted (such is the process), I was struck again by how a Github workflow can help us capture our thinking about certain course design decisions, as well as argue through different approaches.

In the current thing we are trying out, notebooks are being revised in an 18j-notebooks-updates branch (the OU uses BBC year-month numbering: J is October, and 18J refers to a module presentation starting in October 2018. For any OU readers, yes, this does mean we are editing materials for a module that has already started the presentation for which they are intended. Agile, innit…;-).

I’m making my edits in sub-branches derived from that branch, one sub-branch per notebook. Associated with each notebook is a separate issue in the Github issue tracker where we can discuss issues relating to the notebook.

I’ve started trying to group edits made in the sub-branch so that each commit relates to a particular sort of change:

Clicking on the commit identifier allows for inspection, as well as general comment and review, and point by point comment and review (click the + on a particular change), of the changes made as part of that commit.

 The commits I’m making are being made via the Mac Github desktop client. I find I’m making multiple passes of the notebook, but also collecting multiple changes to it at the same time that fit best in separate commits.

The following isn’t the best example, but it makes the point:

I can select (highlight dark blue) various changes in the notebook file and just add those to a particular commit. Clicking on the bar within a set of colour highlighted changed rows lets you select all contiguous changed lines that make up that change.

(The nbdime tool allows you to see differences in rendered notebook, but I’ve found that if you’re working with a notebook with cleared output cells, and commit changes reasonably regularly, it’s easy enough to keep identify and add the changes you want in a particular commit.)

When I’ve committed all the changes I want to suggest for a particular notebook, I can make a pull request (PR) onto the edit branch I forked from, with a Fix: ISSUE comment that associates the PR with the issue related to the notebook. If the PR is accepted, the issue is automatically closed and the sub-branch can be automatically deleted. (Our repo is in a bit of a mess with its branches at the moment and needs some serious gardening!)

You might also notice that the branch has been set up so that a review is required before a PR can be merged. This is one of the checks and balances we’ve added to try to make sure the (team) workflow doesn’t get upset by someone mistakenly deleting everything, merging something into the wrong branch, or merging files that don’t really belong in the repo (or a particular branch at least…)

None of us are particularly expert at using git or Github, but I think we we’re slowly working towards a workflow that allows us to discuss — and effect — changes to materials, keep track of them, and also keep tracking of our reflection around them. As problems are discovered, and then resolved, we can capture that learning so we don’t make the same mistakes again, or if we do, we can look up how to resolve them (or at least, how we resolved them previously). This is something that often gets lost in the editing process.

The low level change controls that we can manage through commit reviews is, and the comments and  commit and acceptance messages we can associate with them, is far richer, and at better levels of granularity both up and down the scale, than track changes and comments in a Word document, which is the traditional way of making edits to documents in the OU.

One thing the Github process does force on us, though, is the requirement to work at the text level. That said, the OU’s Word document workflow does end up in a text format – OU-XML – which could be managed (and edited) at the text/XML level (but oh, so the claim goes, how academics would complain…)

Personally, I think we should allow Markdown and LaTeX document creation, or authoring direct in Jupyter notebooks (with an exporter to OU-XML; unfortunately, no-one in the OU admits to experience in Jinja templating or is willing to learn enough to create an nbconvert template to render OU-XML from notebook JSON/ipynb. (It’s been on my to do list for what feels like forever!)) If we did that, we’d be able to manage all our TM351 module materials and edits, not just the notebooks, using our emerging Github workflow.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: