Displaying Differences in Jupyter Notebooks – nbdime / nbdiff

One of the challenges of working with Jupyter notebooks to date has been the question of diffing, spotting the differences between two versions of the same notebook. This made collaborative authoring and reviewing of notebooks a bit tricky. It also acted as a brake on using notebooks for student assessment. It’s easy enough to to set an exercise using a templated notebook and then get students to work through it, but marking the completed notebook in return can be a bit fiddly. (The nbgrader system addresses this in part but at the expense of the overhead in terms of having to use additional nbgrader formatting and markup.)

However, there’s ongoing effort now around nbdime (docs). Given past success in getting Jupyter notebook previews displayed in Github, it wouldn’t be unreasonable to think that the diff view too might make it into that environment at some point too…

See also: Sensible Diff-ing of Jupyter Notebook ipynb Documents Using VS Code

At the moment, nbdime works from the command line. It can produce a text diff in the console, or launch a notebook viewer in the browser that shows differences between two notebooks.

The differ works on a cell by cell basis and highlights changes and addtions. (Extra emphasis on the changed text in a markdown cell doesn’t seem to work at the moment?)

nbdime_-_diff_and_merge_your_jupyter_notebooks

If you change the contents of a code cell, or the outputs of a code cell have changed, those differences are identified too. (Note the extra emphasis in the code cell on the changed text, but not in the output.)

nbdime_-_diff_and_merge_your_jupyter_notebooks2

To improve readability, you can collapse the display of changed code cell output.

nbdime_-_diff_and_merge_your_jupyter_notebooks3

Where cell outputs include graphical objects, differences to these are highlighted too.

nbdime_-_diff_and_merge_your_jupyter_notebooks4

(Whilst I note that Github has various tools for exploring the differences between two versions of the same image, I suspect that sort of comparison will be difficult to achieve inline in the notebook differencer.)

I suspect one common way of using nbdime will be to compare the current state of a notebook with a checkpointed version. (Jupyter notebooks autosave the current state of the notebook quite regulalry. If you force a save, the current state is saved but a “checkpoint” version of the notebook is also saved to a hidden folder. If things go really wrong with your current notebook, you can restore it to the checkpointed version.)

If you’ve saved a checkpoint of a notebook, and want to compare the current (autosaved) version with it, you need to point to the checkpointed file in the checkpoint folder: nbdiff-web .ipynb_checkpoints/MY_FILE-checkpoint.ipynb MY_FILE.ipynb. It’d be nice if a switch could handle this automatically, eg nbdime_web --compare-checkpoint MY_FILE.ipynb (It would also be nice if the nbdiff command could force the notebook to autosave before a diff is run, but I’m not sure how that could be achieved?)

It also strikes me that when restoring from a checkpoint, it might be possible to combine the restoration action with the differencer view so that you can decide which bits of the current notebook you might want to keep (i.e. essentially treat the differences between the current and checkpointed version as conflicts that need to be resolved?)

This is probably pushing things a bit far, but I also wonder if lightweight, inline, cell level differencing would be possible, given that each cell in at running notebook has an undo feature that goes back multiple streps?

Finally, a note about using the differencer to support marking. The differencer view is an HTML file, so whilst you can compare a student’s notebook with the orignal  you can’t edit their notebook directly in the differencer to add marks or feedback. (I really do need to have another play with nbgrader, I think…)

PS It’s also worth noting that SageMathCloud has a history slider that lets you run over different autosaved versions of a notebook, although differences are not highlighted.

PPS Thinks: what I’d like is a differencer that generates a new notebook with addition/deletion cells highlighted and colour styled so that I could retain – or delete – the cell and add cells of my own… Something akin to track changes, for example. That way I could run different cells, add annotations, etc etc (related issue).

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: