Preparing Jupyter Notebooks for Release to Students

Over the years, I’ve sketched various tools to support the release of notebooks to students, but as I’m not the person who prepares and distributes the releases, they never get used (“Tony hacking crap again” etc.;-).

Anyway, on the basis that the tools aren’t completely crap, and may be of use to others, perhaps even folk working on other modules internally that make use of notebooks and are using them for the first time this presentation, I’ll post a quick summary of some of them here. (And if they are broken, a little use and testing by not-me could well provide the bug reports and motivation I need to fix them to a level of slightly less possible brokenness.)

The package that bundles the tools can be found here: innovationOUtside/nb_workflow_tools.

First up, tm351nbtest is a tool that helps check whether the notebooks run correctly in the latest environment.

The notebooks we save to the private module team repo all have their cells run, in part so that we can review what the expected outputs are. (When checking in notebooks, the tm351nbrun --file-processor runWithErrors . command can be used to ensure all noebooks in the specified path have their cells run.) The nbval package is a handy package that runs the notebooks in the current environment and checks that the contents of the new output cell match those of the previous, saved output cell. (I keep thinking that jupyter-cache might also be handy here?) Cells that are known to generated an error can be ignored by tagging them with the raises-exception tag and cells you want to ignore the output of can be tagged with the nbval-ignore-output tag. Running the tool generates a report identifying each notebook and each cell where the outputs don’t match.

The next tool, nb_collapse_activities, checks that out activity blocks all have their answers precollapsed. Activities are tagged and coloured using the innovationOUtside/nb_extension_empinken extension; activities with answers use the classic notebook collapsible headings extension to collapse the cells beneath an activity answer heading block (all cells are collapsed to the the cell with a header at the same level or higher as the collapsed answer cell header). The nb_collapse_activities utility tries to identify answer head cells and whenever it finds one, adds heading_collapsed: true metadata.

The third tool also processes the notebooks for release: tm351nbrun --file-processor clearOutput clears the outputs of every code cell and essentially resets each notebook to an unrun state.

A fourth tool, nbzip, can be used to zip required notebook folders for release to students.

A sort of release process could then work soemthing like this. In the environment you want to test in:

# Install package
pip3 install --upgrade git+https://github.com/innovationOUtside/nb_workflow_tools

# When checking in notebooks, ensure cells are run
# Ensure that all cells are run even in presence of errors
tm351nbrun --file-processor runWithErrors .

# Test notebooks
tm351nbtest .

# Quality reports
## Whatever...

# Clear outputs
tm351nbrun --file-processor clearOutput .

# Collapse acvitity answers
nb_collapse_activities .

# Spell check
## However... Or run earlier before output cells cleared

# Zip files
# Whichever...

In passing, the nb_workflow_tools package also includes some other utilities not directly relevant to release, but occasionally handy during production: nb_merge to merge two or more notebooks, and nb_split to split a notebook into to or more notebooks.

I’ve also been exploring various approaches to spell-checking notebooks. These are currently being collected in innovationOUtside/nb_spellchecker and the various issues attached to that repo. When I have something reliable, I’ll add it to innovationOUtside/nb_workflow_tools. Another set of quality tools I had been working on but halted due to a universal “why would we want to know anything comparative about the contents of our notebooks” can be found in innovationOUtside/nb_quality_profile. At some point I’ll revisit this and then try to bundle them up into a simple CLI tool I can also add to nb_workflow_tools.

In passing, and for completeness, I’ve also started sketching some innovationOUtside/ou-jupyter-book-tools. The idea of these is that they can provide an intermediate publishing step, where necessary, that maps from cell tags, for example, to complementary Jupyter Book / MyST / restructured text markdown.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: