Sensible Diff-ing of Jupyter Notebook ipynb Documents Using VS Code

One of the major ARRGHHHs with working with Jupyter notebooks in a git / Github context is that if you are in the habit of checking in notebook ipynb files, particularly notebook ipynb files with code cells run, the differencing experience sucks…

And if the metadata changes you can get loads of diffs to skim through…

There are tools available for viewing differences between rendered Jupyter notebooks (for example, the nbdime Jupyter extension or the reviewnb Github application) but to my knowledge we’ve definitely left these currently underexplored in an internal context, and despite their availability for many years, I donlt see them talked about that much (maybe everyone uses them silently?).

Anyway, recent updates to the VS Code editor provide a huge leap forward, with the off-the shelf inclusion in the Jupyter extension of sensible a diff viewer (VS Code – Custom notebook diffing).

Git diff of Jupyter notebooks in VS Code

The differencing gives you diff views at the cell input, metadata and output levels, as required. Where code cell outputs are images, you have the images presented side by side.

With the addition of the VS Code GitLens extension (via this SO answer), you can trivially compare files across different branches of a repo via the Search and Compre / Compare References… option:

Just choose the branch you’re interested in, and the branch you want to comapre to:

And the comparison is launched and the side by side view rendered in the main panel:

You can also connect to a remote or containerised kernel from the command palette:

by specifying the URL (and any necessary auth token) for the Jupyter server you want to connect to:

A little bit of me suspects this might not actually work when trying to connect to an institutional server hidden behind institutional auth because, obvs, IT security policies are all designed to prevent folk accessing internal compute…!;-)

That said, it is possible to run VS Code in a browser becuase a Jupyter server proxy using code-server, so that represents another possibe solution for institutionally hosted environments (run VS Code and Jupyter notebooks in the same server and diff using VS Code (or even pre-installed and pre-enabled nbdime).

Anyway, the DIFFing is really exciting… Now let’s see if I actually start using it!

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...