Using Jupyter Notebooks For Assessment – Export as Word (.docx) Extension

One of the things we still haven’t properly worked out in our Data management and analysis (TM351 course is how best to handle Jupyter notebook based assignments. The assignments are set using a notebook to describe the tasks to be completed and completed by the student. We then need some mechanism for:

  • students to submit the assessment electronically;
  • markers mark assessments for their students: if the document contains a lot of OU text, it can be hard for the marker to locate the student text;
  • markers may provide on-script feedback; this means the marker needs to be able to edit the document and make changes/annotations.
  • markers return scripts to students;
  • students read feedback – so they need to be able to locate and distinguish the marker feedback within the document.

One Frankenstein process we tried was for students to save a Jupyter notebook file as a Markdown or HTML document and then convert it to a Microsoft Word document using pandoc.

This document could then be submitted and marked in a traditional way, with markers using comments and track chances to annotate the student script. Unfortunately, our original 32 bit VM meant we had to use an old version of pandoc, with the result that tabular data was not handled at all well in the conversion-to-Word process.

Updating to a 64 bit virtual machine means we can update pandoc, and the Word document conversion is now much smoother. However, the conversion process still requires students to export the word document as HTML and then use pandoc to convert the HTML to to the Microsoft Word .docx format. (The Jupyter nbconvert utility does not currently export to Word.)

So to make things a little easier, here’s my first attempt at a Download Jupyter Notecbook as Word (.docx) extension to do just that. It makes use of the Jupyter notebook custom bundler extensions API which allows you to add additional options to the notebook File -> Download menu option. The code I used was also cribbed from the dashboards_bundlers which converts a notebook to a dashboard and then downloads it.


# Copyright (c) The Open University, 2017
# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.
# Custom bundler extensions API:
# http://jupyter-notebook.readthedocs.io/en/latest/extending/bundler_extensions.html
# Requires: _jupyter_bundlerextension_paths(), bundle()
# Based on: https://github.com/jupyter-incubator/dashboards_bundlers/
import os
import subprocess
import shlex
import shutil
import tempfile
def _jupyter_bundlerextension_paths():
'''API for notebook bundler installation on notebook 5.0+'''
return [{
'name': 'wordexport_wordexport',
'label': 'MS Word (.docx)',
'module_name': 'wordexport.wordexport',
'group': 'download'
}]
def bundle(handler, model):
'''
Downloads a notebook as a Microsoft Word .docx document after converting to HTML
'''
# Based on https://github.com/jupyter-incubator/dashboards_bundlers
abs_nb_path = os.path.join(
handler.settings['contents_manager'].root_dir,
model['path']
)
notebook_basename = os.path.basename(abs_nb_path)
notebook_name = os.path.splitext(notebook_basename)[0]
tmp_dir = tempfile.mkdtemp()
# Generate HTML version of file
cmd='jupyter nbconvert –to html "{abs_nb_path}" –output-dir "{tmp_dir}"'.format(abs_nb_path=abs_nb_path,tmp_dir=tmp_dir)
#os.system(cmd)
subprocess.check_call(cmd, shell=True)
staged=os.path.join(tmp_dir, notebook_name)
# Convert to MS Word .docx
cmd='pandoc -s "{staged}.html" -o "{staged}.docx"'.format(staged=staged)
#os.system(cmd)
subprocess.check_call(cmd, shell=True)
handler.set_header('Content-Disposition', 'attachment; filename="%s"' % (notebook_name + '.docx'))
handler.set_header('Content-Type', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document')
with open("{staged}.docx".format(staged=staged), 'rb') as bundle_file:
handler.write(bundle_file.read())
handler.finish()
# We read and send synchronously, so we can clean up safely after finish
shutil.rmtree(tmp_dir, True)

view raw

wordexport.py

hosted with ❤ by GitHub

[There’s now a Github repo: innovationOUtside/nb_extension_wordexport]

One thing it doesn’t handle at the moment are things like embedded interactive maps. I’ve previously come up with a workaround for generating static images of interactive maps created using the folium package by using selenium to render the map and grab a screenshot of it; I’m not sure if that would work in our headless VM, though? (One to try, I guess?) There’s also a related thread in the folium repo issue tracker.

The above script is placed in a wordexport folder inside a package folder containing a simple setup.py script:

from setuptools import setup

setup(name='wordexport',
      version='0.0.1',
      description='Export Jupyter notebook as .docx file',
      author='Tony Hirst',
      author_email='tony.hirst@open.ac.uk',
      license='MIT',
      packages=['wordexport'],
      zip_safe=False)

The package can be installed and the extension enabled using a riff along the lines of the following command-line commands:

echo "...wordexport install..."
#Install the wordexport (.docx exporter) extension package
pip3 install --upgrade --force-reinstall ${THISDIR}/jupyter_custom_files/nbextensions/wordexport

#Enable the wordexport extension
jupyter bundlerextension enable --py wordexport.wordexport  --sys-prefix
echo "...wordexport done"

Restart the Jupyter server after enabling the extension, and the result should be a new MS Word (.docx) option in the notebook File -> Download menu option.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...