Authoring Multiple Docs from a Single IPython Notebook

It’s my not-OU today, and whilst I should really be sacrificing it to work on some content for a FutureLearn course, I thought instead I’d tinker with a workflow tool related to the production process we’re using.

The course will be presented as a set of HTML docs on FutureLearn, supported by a set of IPython notebooks that learners will download and execute themselves.

The handover resources will be something like:

– a set of IPython notebooks;
– a Word document for each week containing the content to appear online. (This document will be used as the basis for multiple pages on the course website. The content is entered into the FutureLearn system by someone else as markdown (though I’m not sure what flavour?)
– for each video asset, a Word document containing the script;
– ?separate image files (the images will also be in the Word doc).

Separate webpages provide teaching that leads into a linked to IPython notebook. (Learners will be running IPython via Anaconda on their own desktops – which means tablet/netbook users won’t be able to do the interactive activities as currently delivered; we looked at using Wakari, but didn’t go with it; offering our own hosted solution or tmpnb server was considered out of scope.)

The way I have authored my week is to create a single IPython document that proceeds in a linear fashion, with “FutureLearn webpage” content authored using as markdown, as well as incorporating executed code cells, followed by “IPython notebook” activity content relating to the previous “webpage”. The “IPython notebook” sections are preceded by a markdown cell containing a NOTEBOOK START statement, and closed with markdown cell containing a NOTEBOOK END statement.

I then run a simple script that:

  • generates one IPython notebook per “IPython notebook” section;
  • creates a monolithic notebook containing all, but just, the “FutureLearn webpage” content;
  • generates a markdown version of that monolithic notebook;
  • uses pandoc to convert the monolithic markdown doc to a Microsoft Word/docx file.

fl_ipynb_workflow

Note that it would be easy enough to render each “FutureLearn webpage” doc as markdown directly from the original notebook source, into its own file that could presumably be added directly to FutureLearn, but that was seen as being overly complex compared to the original “copy rendered markdown from notebook into Word and then somehow generate markdown to put into FutureLearn editor” route.

import io, sys
import IPython.nbformat as nb
import IPython.nbformat.v4.nbbase as nb4

#Are we in a notebook segment?
innb=False

#Quick and dirty count of notebooks
c=1

#The monolithic notebook is the content ex of the separate notebook content
monolith=nb4.new_notebook()

#Load the original doc in
mynb=nb.read('ORIGINAL.ipynb',nb.NO_CONVERT)

#For each cell in the original doc:
for i in mynb['cells']:
    if (i['cell_type']=='markdown'):
        #See if we can stop a standalone notebook code delimiter
        if ('START NOTEBOOK' in i['source']):
            #At the start of a block, create a new notebook
            innb=True
            test=nb4.new_notebook()
        elif ('END NOTEBOOK' in i['source']):
            #At the end of the block, save the code to a new standalone notebook file
            innb=False
            nb.write(test,'test{}.ipynb'.format(c))
            c=c+1
        elif (innb):
            test.cells.append(nb4.new_markdown_cell(i['source']))
        else:
            monolith.cells.append(nb4.new_markdown_cell(i['source']))
    elif (i['cell_type']=='code'):
        #For the code cells, preserve any output text
        cc=nb4.new_code_cell(i['source'])
        for o in i['outputs']:
            cc['outputs'].append(o)
        #Route the code cell as required...
        if (innb):
            test.cells.append(cc)
        else:
            monolith.cells.append(cc)

#Save the monolithic notebook
nb.write(monolith,'monolith.ipynb')

#Convert it to markdown
!ipython nbconvert --to markdown monolith.ipynb

##On a Mac, I got pandoc via:
#brew install pandoc

#Generate a Microsoft .docx file from the markdown
!pandoc -o monolith.docx -f markdown -t docx monolith.md

What this means is that I can author a multiple chapter, multiple notebook minicourse within a single IPython notebook, then segment it into a variety of different standalone files using a variety of document types.

Of course, what I really should have been doing was working on the course material… but then again, it was supposed to be my not-OU today…;-)

PS The actual workflow, of course, turned out to be more traditional. Content for the FutureLearn website was copied from the notebooks into Word document, edited there, and then somehow converted to markdown for entry into FutureLearn. (I haven’t seen what the FutureLearn content entry forms look like – anyone got a user guide or screenshots they could share?) Which caused all sorts of fun with the tables and code styling…

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...