One of the problems with trivially using simple text based differencing tools on Jupyter notebooks is that the differencing is based on the content of the JSON file used to represent the notebook, rather than the content of the notebook cells themselves.
The nbdime tools (which I’ve commented on before) address his problem by reporting differences at the cell content level. (Hmm, I wonder: would a before/after image comparison view for charts also be useful (example)?
I still don’t really understand how Jupyter notebook checkpointing works (e.g. I think I’d like it to work so that I can force a save to a checkpoint, whilst the autosave updates the current notebook file) but I started wondering yesterday about a simple Jupyter extension that would compare the current (saved) notebook with the checkpointed version.
My first attempt can be found in this gist:
nbdiffex – Jupyter notebook checkpoint differ
Install with:
jupyter nbextension install ./nbdiffex/
jupyter nbextension enable nbdiffex/main
systemctl restart jupyter
Notes on possible use here: Pondering a Jupyter Notebook “Diff”er Extension and Its Use as a Marking Tool.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// call "nbdime" and compare current version with new version in new tab | |
define([ | |
'base/js/namespace', | |
'jquery', | |
'base/js/events', | |
'base/js/utils' | |
], function(IPython, $, events, utils) { | |
"use strict"; | |
/** | |
* Call nbdiff-web with current notebook against checkpointed version | |
* | |
*/ | |
var nbCheckpointDiffView = function () { | |
var kernel = IPython.notebook.kernel; | |
var name = IPython.notebook.notebook_name; | |
var path= IPython.notebook.notebook_path | |
var base_url=window.location.protocol + '//' +window.location.hostname | |
var abspath='/vagrant/notebooks' | |
var hostport='35101' | |
var guestport='8899' | |
var url = base_url+':'+hostport+'?remote='+abspath+'/.ipynb_checkpoints/'+utils.splitext(name)[0]+'-checkpoint.ipynb&base='+abspath+'/'+path; | |
var command = 'import subprocess; subprocess.run(\'nbdiff-web –port='+guestport+' –ip=* \', shell=True)'; | |
kernel.execute(command); | |
window.open(url); | |
$('#doCheckpointDiffView').blur(); | |
}; | |
var load_ipython_extension = function() { | |
IPython.toolbar.add_buttons_group([ | |
{ | |
id: 'doCheckpointDiffView', | |
label: 'Display nbdiff to checkpointed version', | |
icon: 'fa-list-alt', | |
callback: nbCheckpointDiffView | |
} | |
]); | |
}; | |
return { | |
load_ipython_extension : load_ipython_extension | |
}; | |
}); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Type: IPython Notebook Extension | |
Name: Checkpointdiffer | |
Description: Calls nbdiff with current notebook and checkpointed version | |
Link: readme.md | |
Compatibility: 4.x |
Clicking the appropriate button (I need to find a better icon?) launches the nbdiff-web
service on a specified port and then pops open a tab comparing the current saved notebook file with the checkpointed version. Note there are several settings specific to my setup where the notebooks are running inside the TM351 VM:
guestport
is the port inside the guest VM that the nbdiff-web service runs onhostport
is the port on the host machine that the nbdiff-web service port is mapped toabspath
is the path to the notebook root directory; ideally, this should be picked up by the script.
In testing, it seems my setup seems to have checkpoints being saved regularly and automatically, which means the diffing is not that useful… I maybe to have a “backup-save” or “version-save” option somewhere to force saves at particular times and then compare to those?
The code used for the extension was inspired by the printview
extension code. Looking at that, I note how the defining YAML file makes it easy to set up the extension configuration options:
Which got me wondering… one of the issues we have had with marking student work specified in and completed as Jupyter notebooks is helping markers quickly navigate to the cells that students have modified so they can be marked. (Finding an efficient way of allowing markers to annotate and return scripts is another issue. Note that we still haven’t really experimented with nbgrader either.) So I wonder if the differencer can help?
For example, we could bundle a set of ‘provided’ notebooks in which assessments are defined within a marking extension, and then allow a marker to compare the student returned version of the notebook with the provided copy, using the option to hide unchanged cells and just highlight the ones that have been changed to include the student’s answers. Ideally, we’d also want the marker to be able to annotate those cells with marks and feedback, and return the annotated scrip to the student – who could then diff back to their submitted transcript to see what the marker had to say?
PS via @biztechpm, How to grade programming assignments on GitHub