Pondering a Jupyter Notebook “Diff”er Extension and Its Use as a Marking Tool

One of the problems with trivially using simple text based differencing tools on Jupyter notebooks is that the differencing is based on the content of the JSON file used to represent the notebook, rather than the content of the notebook cells themselves.

The nbdime tools (which I’ve commented on before) address his problem by reporting differences at the cell content level. (Hmm, I wonder: would a before/after image comparison view for charts also be useful (example)?

I still don’t really understand how Jupyter notebook checkpointing works (e.g. I think I’d like it to work so that I can force a save to a checkpoint, whilst the autosave updates the current notebook file) but I started wondering yesterday about a simple Jupyter extension that would compare the current (saved) notebook with the checkpointed version.

My first attempt can be found in this gist:

view raw

README.md

hosted with ❤ by GitHub


// call "nbdime" and compare current version with new version in new tab
define([
'base/js/namespace',
'jquery',
'base/js/events',
'base/js/utils'
], function(IPython, $, events, utils) {
"use strict";
/**
* Call nbdiff-web with current notebook against checkpointed version
*
*/
var nbCheckpointDiffView = function () {
var kernel = IPython.notebook.kernel;
var name = IPython.notebook.notebook_name;
var path= IPython.notebook.notebook_path
var base_url=window.location.protocol + '//' +window.location.hostname
var abspath='/vagrant/notebooks'
var hostport='35101'
var guestport='8899'
var url = base_url+':'+hostport+'?remote='+abspath+'/.ipynb_checkpoints/'+utils.splitext(name)[0]+'-checkpoint.ipynb&base='+abspath+'/'+path;
var command = 'import subprocess; subprocess.run(\'nbdiff-web –port='+guestport+' –ip=* \', shell=True)';
kernel.execute(command);
window.open(url);
$('#doCheckpointDiffView').blur();
};
var load_ipython_extension = function() {
IPython.toolbar.add_buttons_group([
{
id: 'doCheckpointDiffView',
label: 'Display nbdiff to checkpointed version',
icon: 'fa-list-alt',
callback: nbCheckpointDiffView
}
]);
};
return {
load_ipython_extension : load_ipython_extension
};
});

view raw

main.js

hosted with ❤ by GitHub


Type: IPython Notebook Extension
Name: Checkpointdiffer
Description: Calls nbdiff with current notebook and checkpointed version
Link: readme.md
Compatibility: 4.x

view raw

nbdiffex.yaml

hosted with ❤ by GitHub

Clicking the appropriate button (I need to find a better icon?) launches the nbdiff-web service on a specified port and then pops open a tab comparing the current saved notebook file with the checkpointed version. Note there are several settings specific to my setup where the notebooks are running inside the TM351 VM:

  • guestport is the port inside the guest VM that the nbdiff-web service runs on
  • hostport is the port on the host machine that the nbdiff-web service port is mapped to
  • abspath is the path to the notebook root directory; ideally, this should be picked up by the script.

In testing, it seems my setup seems to have checkpoints being saved regularly and automatically, which means the diffing is not that useful… I maybe to have a “backup-save” or “version-save” option somewhere to force saves at particular times and then compare to those?

The code used for the extension was inspired by the printview extension code. Looking at that, I note how the defining YAML file makes it easy to set up the extension configuration options:

Which got me wondering… one of the issues we have had with marking student work specified in and completed as Jupyter notebooks is helping markers quickly navigate to the cells that students have modified so they can be marked. (Finding an efficient way of allowing markers to annotate and return scripts is another issue. Note that we still haven’t really experimented with nbgrader either.) So I wonder if the differencer can help?

For example, we could bundle a set of ‘provided’ notebooks in which assessments are defined within a marking extension, and then allow a marker to compare the student returned version of the notebook with the provided copy, using the option to hide unchanged cells and just highlight the ones that have been changed to include the student’s answers. Ideally, we’d also want the marker to be able to annotate those cells with marks and feedback, and return the annotated scrip to the student – who could then diff back to their submitted transcript to see what the marker had to say?

PS via @biztechpm, How to grade programming assignments on GitHub

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: