Rapid ipywidgets Prototyping Using Third Party Javascript Packages in Jupyter Notebooks With jp_proxy_widget

Just before the break, I came across a rather entrancing visualisation of Jean Michel Jarre’s Oxygene album in the form of an animated spectrogram.

Time is along the horizontal x-axis, and frequency along the vertical y-axis. The bright colours show the presence, and volume, of each frequency as the track plays out.

Such visualisations can help you hear-by-seeing the structure of the sound as the music plays. So I wondered… could I get something like that working in a Jupyter notebook….?

And it seems I can, using the rather handy jp_proxy_widget that provides a way of easily loading jQueryUI components as well as the requests.js module to load and run Javascript widgets.

Via this StackOverflow answer, which shows how to embed a simple audio visualisation into a Jupyter notebook using the Wavesurfer.js package, I note that Wavesurfer.js also supports spectrograms. The example page docs are a bit ropey, but a look at the source code and the plugin docs revealed what I needed to know…

#%pip install --upgrade ipywidgets
#!jupyter nbextension enable --py widgetsnbextension

#%pip install jp_proxy_widget

import jp_proxy_widget

widget = jp_proxy_widget.JSProxyWidget()

js = "https://unpkg.com/wavesurfer.js"
url = "https://ia902606.us.archive.org/35/items/shortpoetry_047_librivox/song_cjrg_teasdale_64kb.mp3"

widget.load_js_files([js, js2])


element.wavesurfer = WaveSurfer.create({
    container: element[0],
    waveColor: 'violet',
        progressColor: 'purple',
        loaderColor: 'purple',
        cursorColor: 'navy',
        minPxPerSec: 100,
        scrollParent: true,
        plugins: [
            wavesurfer: element.wavesurfer,
            container: element[0],
            labels: true


element.wavesurfer.on('ready', function () {
""", url=url)


#It would probably make sense to wire up these commands to upywidgets buttons...

The code is also saved as a gist here and can be run on MyBinder (the dependencies should be automatically installed):

Here’s what it looks like (It may take a moment or two to load when you run the code cell…)

It doesn’t seem to work in JupyterLab though… [UPDATE: following recent patches to jp_proxy_widget, it may well work now…]

It looks like the full ipywidgets machinery is supported, so we can issue start and stop commands from the Python notebook envioronment that control the widget Javascript.

So now I’m wondering what other Javascript apps are out there that might be interesting in a Jupyter notebook context, and how easy it’d be to get them running…?

It might also be interesting to try to construct an audio file within the notebook and then visualise it using the widget.

PS ipywidget slider corss-referencing wavesurfer.js playhead: https://gist.github.com/scottire/654019e88e6225c15a68006ab4a3ba98 h/t @_ScottCondron

Dockerising / Binderising the TM351 Virtual Machine

Just before the Chirstmas break, I had a go recasting the TM351 VM as a Docker container built from a Github repository using MyBinder (which is to say: I had a go at binderising the VM…). Long time readers will know that this virtual machine has been used to deliver a computing environment to students on the OU TM351 Data managament and Analysis course since 2016. The VM itself is built using Virtualbox provisioned using vagrant and then distributed originally via a mailed out USB stick or alternatively (which is to say, unofficially; though my preferrred route) as a download from VagrantCloud.

The original motivation for using Vagrant was a hope that we’d be able to use a range of provisioners to construct VM images for a range of virtualisation platforms, but that’s never happened. We still ship a Virtualbox image that causes problems to a small number of Windows users each year, rather than a native HyperV image, because: a) I refuse to buy a Windows machine so I can build the HyperV image myself; b) no-one else sees benefit from offering multiple images (perhaps because they don’t provide the tech support…).

For all our supposed industrial scale at delivering technology backed “solutions”, the VM is built, maintained and supported on a cottage industry basis from within the course team.

For a scaleable solution that would work:

a) within a module presentation;
b) across module presentations;
c) across modules

I think we should be looking at some sort of multi-user hosted service, with personal accounts and persistent user directories. There are various ways I can imagine delivering this, each that creates its own issues as well solving particular problems.

As a quick example, here are two possible extremes:

1) one JupyterHub server to rule them all: every presentation, every module, one server. JupyterHub can be configured to use the DockerSpawner to present different kernel container options to the user, (although I’m not sure if this can be personalised on a per user basis? If not, that feature would make for a useful contribution back…), so a student could be presented with a list of containers for each of their modules.

2) one JupyterHub server per module per presentation: this requires more admin and means servers everywhere, but it separates concerns…

The experimental work on a “persistent Binderhub deployment” also looks interesting, offering the possibility of launching arbitrary environments (as per Binderhub) against personally mounted file area (as per JupyterHub).

Providing a “takeaway” service is also one of my red lines: a student should be free to take away any computing environment we provide them with. One in-testing hosted version of the TM351 VM comes, I believe, with centralised Postgres and MongoDB servers that students have accounts on and must log in to. Providing a mutli-user service, rather than a self-contained personal server, raises certain issues regarding support, but also denies the student the ability to take away the database service and use it for their own academic, personal or even work purposes. A fundamentally wrong approach, in my opinion. It’s just not open.

So… binderising the VM…

When Docker containers first made their appearance, best practice seemed to be to have one service per container, and then wire containers together using docker-compose to provide a more elaborate environment. I have experimented in the past with decoupling the TM351 services into separate containers and then launching them using docker-compose, but it;s never really gone anywhere…

In the communities of practice that I come across, more emphasis now seems to be on putting everything into a single container. Binderhub is also limited to launching a single container (I don’t think there is a Jupyter docker-compose provisioner yet?) so that pretty much seals it… All in one…

A proof-of-concept Binderised version of the TM351 VM can be found here: innovationOUtside/tm351vm-binder.

It currently includes:

  • an OU branded Jupyter notebook server running jupyter-server-proxy;
  • the TM351 Python environment;
  • an OpenRefine server proxied using jupyter-server-proxy;
  • a Postgres server seeded (I think? Did I do that yet?!) with the TM351 test db (if I haven’t set it up as per the VM, the code is there that shows how to do it…);
  • a MongoDB server serving the small accidents dataset that appears in the TM351 VM.

What is not included:

  • the sharded Mongo DB activity; (the activity it relates to as presented at the moment is largely pointless, IMHO; we could deminstrate the sharding behaviour with small datasets, and if we did want to provided queries over the large dataset, that might make sense as something we host centrally and et students log in to query. Which would also give us another teachng point.)

The Binder configuration is provided in the binder/ directory. An Anaconda binder/environment.yml file is used to install packages that are complicated to build or install otherwise, such as Postgres.

The binder/postBuild file is run as a shell script responsible for:

  • configuring the Postgres server and seeding its test database;
  • installing and seeding the MongoDB database;
  • installing OpenRefine;
  • installing Python packages from binder/requirements.txt (the requirements.txt is not otherwise automatically handled by Binderhub — it is trumped by the environment.yml file);
  • enabling required Jupyter extensions.

If any files handled via postBuild need to be persisted, they can be written into $CONDA_DIR.

(As a reference, I have also created some simple standalone template repos showing how to configure Postgres and MongoDB in Binderhub/repo2docker environments. There’s also a neo4j demo template too.)

The binder/start file is responsible for:

  • defining environment variables and paths required at runtime;
  • starting the PostgreSQL and MongoDB database services.

(OpenRefine is started by the user from the notebook server homepage or JupyterLab. There’s a standalone OpenRefine demo repo too…)

Launching the repo using MyBinder will build the TM351 environment (if a Binder image does not already exist) and start the required services. The repo can also be used to build an environment locally using repo2docker.

As well as building a docker image within the Binderhub context, the repo is also automated with a Github Action that is used to build release commits using repo2docker and then push the resulting container to Docker Hub. The action can be found in the .github/workflows directory. The container can be found as ousefuldemos/tm351-binderised:latest. When running a container derived from this image, the Jupyter notebook server runs on the default port 8888 inside the container, and the OpenRefine application proxied through it; the database services should autostart. The notebook server is started with a token required, so you need to spot the token from the start up logs of the container – which means you shouldn’t run it with the -d flag. A variant of the following command should work (I’m not sure how you reliably specify the correct $PWD (present working directory) mount directory from a Windows command prompt):

docker run --name tm351test --rm -p 8895:8888 -v $PWD/notebooks:/notebooks -v $PWD/openrefine_projects:/openrefine_projects ousefuldemos/tm351-binderised:latest

Probably easier is to use the Kitematic inspired containds “personal Binderhub” app which can capture and handle the token auomatically and let you click straight through into the running notebook server. Either use containds to build the image locally by providing the repo URL, or select a new image and search for tm351: the ousefuldemos/tm351-binderised image is the one you want. When prompted, select the “standard” launch route, NOT the ‘Try to start Jupyter notebook’ route.

Although I’ve yet to try it (I ran out of time before the break), I’m hopeful that the prebuilt container should work okay with JupyterHub. If it does, this means the innovationOUtside/tm351vm-binder repo can serve as a template for building images that can be used to deploy authenticated OU computing environments via an OU authenticated and OU hosted JupyterHub server (one can but remain hopeful!).

If you try out the environment, either using MyBinder, via repo2docker, or from the pre-built Docker image, please let me know either here, via the repo issues, or howsoever: a) whether it worked; b) whether it didn’t; c) whether there were any (other) issues. Any and all feedback would be much appreciated…

Simple Rule Based Approach in Python for Generating Explanatory Texts from pandas Dataframes

Many years ago, I used to use rule based systems all the time, first as a postdoc, working with the Soar rule based system to generate “cognitively plausible agents”, then in support of the OU course T396 Artificial Intelligence for Technology.

Over the last couple of years, I’ve kept thinking that a rule based approach might make sense for generating simple textual commentaries from datasets. I had a couple of aborted attempts around this last year using pytracery (eg here and here) but the pytracery approach was a bit too clunky.

One of the tricks I did learn at the time was that things could be simplified by generating data truth tables that encode the presence of particular features in “enrichment” tables that could be used to trigger particular rules.

These tables would essentially encode features that could be usefully processed in simple commentary rules. For example, in rally reporting, something like “X took stage Y, his third stage win in a row, increasing his overall lead by P seconds to QmRs” could be constructed from an appropriately defined feature table row.

I’m also reminded that I started to explore using symbolic encodings to try to encode simple feature runs as strings and then use regular expressions to identify richer features within them (for example, Detecting Features in Data Using Symbolic Coding and Regular Expression Pattern Matching).

Anyway, a brief exchange today about a possible PhD projects for faculty-funded PhD studentships, starting in October (the project will be added here at some point…) got me thinking again about this… So as the Dakar rally is currently running, and as I’ve been scraping the results, I wondered how easy it would be to pull an off-the-shelf python rules engine, erm, off the shelf, and create a few quick rally commentary rules…

And the answer is, surprisingly easy…

Here’s a five minute example of some sentences generated from a couple of simple rules using the durable_rules rules engine.

The original data looks like this:

and the generated sentences look like this: JI. CORNEJO FLORIMO MONSTER ENERGY HONDA TEAM 2020 were in fifth position, 11 minutes and 19 seconds behind the first placed HONDA.

Let’s see how the various pieces fit together…

For a start, here’s what the rules look like:

from durable.lang import *

txts = []

with ruleset('test1'):
    #Display something about the crew in first place
    @when_all(m.Pos == 1)
    def whos_in_first(c):
        """Generate a sentence to report on the first placed vehicle."""
        #We can add additional state, accessiblr from other rules
        #In this case, record the Crew and Brand for the first placed crew
        c.s.first_crew = c.m.Crew
        c.s.first_brand = c.m.Brand
        #Python f-strings make it easy to generate text sentences that include data elements
        txts.append(f'{c.m.Crew} were in first in their {c.m.Brand} with a time of {c.m.Time_raw}.')
    #This just checks whether we get multiple rule fires...
    @when_all(m.Pos == 1)
    def whos_in_first2(c):
        txts.append('we got another first...')
    #We can be a bit more creative in the other results
    def whos_where(c):
        """Generate a sentence to describe the position of each other placed vehicle."""
        #Use the inflect package to natural language textify position numbers...
        nth = p.number_to_words(p.ordinal(c.m.Pos))
        #Use various probabalistic text generators to make a comment for each other result
        first_opts = [c.s.first_crew, 'the stage winner']
        if c.m.Brand==c.s.first_brand:
            first_opts.append(f'the first placed {c.m.Brand}')
        t = pickone_equally([f'with a time of {c.m.Time_raw}',
                             f'{sometimes(f"{display_time(c.m.GapInS)} behind {pickone_equally(first_opts)}")}'],
                           prefix=', ')
        #And add even more variation possibilities into the returned generated sentence
        txts.append(f'{c.m.Crew} were in {nth}{sometimes(" position")}{sometimes(f" representing {c.m.Brand}")}{t}.')

Each rule in the ruleset is decorated with a conditional test applied to the elements of a dict passed in to the ruleset. Rules can also set additional state which can be accessed tested by, and accessed from within, other rules.

Rather than printing out statements in each rule, which was the approach taken in the original durable_rules demos, I instead opted to append generated text elements to an ordered list (txts), that I could then join and render as a single text string at the end.

(We could also return a tuple from a rule, eg (POS, TXT) that would allow us to re-order statements when generating the final text rendering.)

The data itself was grabbed from my Dakar scrape database into a pandas dataframe using a simple SQL query:

q=f"SELECT * FROM ranking WHERE VehicleType='{VTYPE}' AND Type='general' AND Stage={STAGE} AND Pos2:
            return ', '.join(f'{l[:-1]} {andword} {str(l[-1])}')
        elif len(l)==2:
            return f' {andword} '.join(l)
        return l[0]
    result = []

    if intify:

    #Need better handle for arbitrary time strings
    #Perhaps parse into a timedelta object
    # and then generate NL string from that?
    if units=='seconds':
        for name, count in intervals:
            value = t // count
            if value:
                t -= value * count
                if value == 1:
                    name = name.rstrip('s')
                result.append("{} {}".format(value, name))

        return nl_join(result[:granularity])

To add variety to the rule generated text, I played around with some simple randomisation features when generating commentary sentences. I suspect there’s a way of doing things properly “occasionally” via the rules engine, but that could require some clearer thinking (and reading the docs…) so it was easier to create some simple randomising functions that I could call on with in a rule to create statements “occasionally” as part of the rule code.

So for example, the following functions help with that, returning strings probabilistically.

import random

def sometimes(t, p=0.5):
    """Sometimes return a string passed to the function."""
    if random.random()>=p:
        return t
    return ''

def occasionally(t):
    """Sometimes return a string passed to the function."""
    return sometimes(t, p=0.2)

def rarely(t):
    """Rarely return a string passed to the function."""
    return sometimes(t, p=0.05)

def pickone_equally(l, prefix='', suffix=''):
    """Return an item from a list,
       selected at random with equal probability."""
    t = random.choice(l)
    if t:
        return f'{prefix}{t}{suffix}'
    return suffix

def pickfirst_prob(l, p=0.5):
    """Select the first item in a list with the specified probability,
       else select an item, with equal probability, from the rest of the list."""
    if len(l)>1 and random.random() >= p:
        return random.choice(l[1:])
    return l[0]

The rules handler doesn’t seem to like the numpy typed numerical objects that the pandas dataframe provides [UPDATE: it turns out this is a python json library issue: it does like np.int64s…], but if we cast the dataframe values to JSON and then back to a Python dict, everything seems to work fine.

import json
#This handles numpy types that ruleset json serialiser doesn't like
tmp = json.loads(tmpq.iloc[0].to_json())

One nice thing about the rules engine is that you can apply statements that are processed by the rules in a couple of ways: as events and as facts.

If we post a statement as an event, then only a single rule can be fired from it. For example:


generates a sentence along the lines of R. BRABEC MONSTER ENERGY HONDA TEAM 2020 were in first in their HONDA with a time of 10:39:04.

We can create a function that can be applied to each row of a pandas dataframe that will run the contents of the row, expressed as a dict, through the ruleset:

def rulesbyrow(row, ruleset):
    row = json.loads(json.dumps(row.to_dict()))

Capture the text results generated from the ruleset into a list, and then display the results.

tmpq.apply(rulesbyrow, ruleset='test1', axis=1)


The sentences generated each time (apart from the sentence generated for the first position crew) contain randomly introduced elements even though the rules are applied deterministically.

R. BRABEC MONSTER ENERGY HONDA TEAM 2020 were in first in their HONDA with a time of 10:39:04.

K. BENAVIDES MONSTER ENERGY HONDA TEAM 2020 were in second representing HONDA.


J. BARREDA BORT MONSTER ENERGY HONDA TEAM 2020 were in fourth, with a time of 10:50:06.

JI. CORNEJO FLORIMO MONSTER ENERGY HONDA TEAM 2020 were in fifth, 11 minutes and 19 seconds behind the stage winner.

We can evaluate a whole set of events passed as list of events using the post_batch(RULESET,EVENTS) function. It’s easy enough to convert a pandas dataframe into a list of palatable dicts…

def df_json(df):
    """Convert rows in a pandas dataframe to a JSON string.
       Cast the JSON string back to a list of dicts 
       that are palatable to the rules engine. 
    return json.loads(df.to_json(orient='records'))

Unfortunately, the post_batch() route doesn’t look like it necessarily commits the rows to the ruleset in the provided row order? (Has the dict lost its ordering?)


post_batch('test1', df_json(tmpq))

R. BRABEC MONSTER ENERGY HONDA TEAM 2020 were in first in their HONDA with a time of 10:39:04.

X. DE SOULTRAIT MONSTER ENERGY YAMAHA RALLY TEAM were in tenth position, with a time of 10:58:59.

S. SUNDERLAND RED BULL KTM FACTORY TEAM were in ninth, with a time of 10:56:14.

P. QUINTANILLA ROCKSTAR ENERGY HUSQVARNA FACTORY RACING were in eighth position representing HUSQVARNA, 15 minutes and 40 seconds behind R. BRABEC MONSTER ENERGY HONDA TEAM 2020.

We can also assert the rows as facts rather than running them through the ruleset as events. Asserting a fact adds it as a persistent fact to the rule engine, which means that it can be used to trigger multiple rules, as the following example demonstrates (check the ruleset definition to see the two rules that match on the first position condition).

Once again, we can create a simple function that can be applied to each row in the pandas dataframe / table:

def factsbyrow(row, ruleset):
    row = json.loads(json.dumps(row.to_dict()))

In this case, when we assert the fact, rather than post a once-and-once-only resolved event, the fact is retained even it it matches a rule, so it gets a chance to match other rules too…

tmpq.apply(factsbyrow, ruleset='test1', axis=1);

R. BRABEC MONSTER ENERGY HONDA TEAM 2020 were in first in their HONDA with a time of 10:39:04.

we got another first…

K. BENAVIDES MONSTER ENERGY HONDA TEAM 2020 were in second, with a time of 10:43:47.

M. WALKNER RED BULL KTM FACTORY TEAM were in third representing KTM.

J. BARREDA BORT MONSTER ENERGY HONDA TEAM 2020 were in fourth representing HONDA, with a time of 10:50:06.

JI. CORNEJO FLORIMO MONSTER ENERGY HONDA TEAM 2020 were in fifth position, 11 minutes and 19 seconds behind the first placed H

The rules engine is much richer in what it can handle than I’ve shown above (the reference docs provide more examples, including how you can invoke state machine and flowchart behaviours, for example in a business rules / business logic application) but even used in my simplistic way, it still offers quite a lot of promise for generating simple commentaries, particulalry if I also make use enrichment tables and symbolic strings (the rules engine supports pattern matching operations in the conditions).

In passing, I also note a couple of minor niggles. Firstly, you can’t seem to clear the ruleset, which means in a Jupyter notebook environment you get an error if you try to update a ruleset and run that code cell again. Secondly, if you reassert the same facts into a ruleset context, an error is raised that also borks running the ruleset again. (That latter one might make sense depending on the implementation, although the error is handled badly? I can’t think through the consequences… The behaviour I think I’d expect reasserting a fact is for that fact to be removed and then reapplied… UPDATE: retract_fact() lets you retract a fact.)

FWIW, the code is saved as a gist here, although with the db it’s not much use directly…

Installing Applications via postBuild in MyBinder and repo2docker

A note on downloading and installing things into a Binderised repo, or a container built using repo2docker.

If you save the files into $HOME as part of the container build process, if you try to use the image outside of MyBinder you will find that if storage volumes or local directories are mounted onto $HOME, your saved files are clobbered.

The MyBinder / repo2docker build is pretty limiting in terms of permissions the default jovyan user has over the file system. $HOME is one place you can write to, but if you need somewhere outside the path, then $CONDA_DIR (which defaults to /srv/conda) is handy…

For example, I just tweaked my neo4j binder repo to install a downloaded neo4j server into that path.

Fragment – Metrics for Jupyter Notebook Based Educational Materials

Complementing the Jupyter notebook visualisations described in the previous post, I’ve also started dabbling with notebook metrics. These appear to be going down spectacularly badly with colleagues, but I’m going to carry on poking a stick at them nevertheless. (When I get a chance, I’ll also start applying them across various courses to content in OU-XML documents that drives our online and print course materials… I’d be interested to know if folk in IET already do this sort of thing, since they do love talking about things like reading rates and learning designs, and automation provides such an easy way of generating huge amounts of stats and data entrails to pore over…)

The original motivation was to try to come up with some simple metrics that could be applied over a set of course notebooks. This might include things like readability metrics (are introductory notebooks easier to read in terms of common readability scores than teaching notebooks, for example, under such measures?) and code complexity measures (do these give any insight into how hard a code cell might be to read and make sense of, for example). The measures might also help us get a feel for which notebooks might be overloaded in terms of estimated reading time, and potentially in need of some attention on that front in our next round of continual updates.

I also wanted to started building some tools to help me explore how the course notebooks we have developed to date are structured, and whether we might be able to see any particular patterns or practice developing in our use of them that a simple static analysis might reveal.

I might also have been influenced in starting down this route by a couple of papers I mentioned in a recent edition of the Tracking Jupyter newsletter (#TJ24 — Notebook Practice) that had reviewed the “quality” (and code quality) of notebooks linked from publications and on Github.

Estimating workload as a time measure is notoriously tricky and contentious, for all manner of reasons:

  • what do we mean by workload? “Reading time” is easy enough to measure but how does this differ from “engagement time” if we want students to “engage” or interat with our materials and not just skim over them?
  • different learners study at different rates; learners may also be pragmatic and efificient, using the the demands of continuous assssement material to focus their attention to certain areas of the course material;
  • reading time estimates, based on assumed word-per-minute (wpm) rates (in the OU, our rules of thumb are 35 wpm (~2000 words per hour) for challenging texts, 70 wpm for medium texts (4k wph), 120 wpm for easy texts (7k wph), assume that students read every word, and don’t skim; it’s likely that many students do skim read, though, flipping through pages of a print material to spot headings, images (photos, diagrams, etc) that grab attention, and exercises or self-assessment questions, so an estimate of “skim time” might also be useful. This is harder to do in online environments, particularly where the user interface requires a button click at the bottom of the page to move to the next page (if the button is not in the same place on the screen for each consecutive page, and there is no keyboard shortcut, you have to focus on moving the mouse to generate the next-page button click..), so for online, rather than print material, users, should we give them a single page view thay can skim over (OU VLE materials do have this feature at a unit (week of study) level, via the “print as single page” option);
  • activities and exercises often don’t have a simple mapping from word count to exercise completion time; a briefly stated activity may require 15 mins of student activity, or even hour. Activity text may state “you should expect to spend about X mins on this activity”, and structured activity texts may present expected activity time in a conventional way (identifiable metadata, essentially); when estimating the time of such activities, if we can identify the expected time, we might use this as a guide, possibly on top of the time estimated to actually read the activity text…
  • some text may be harder to read than other text, which we can model with reading time; but how do we know how hard to read a text is? Or do we just go with the most conservative read rate estimate? Several readability metrics exist, so these could be used to analyse different blocks of text and estimate reading rates relative to the calculated readability of each block in turn;
  • for technical materials, how do we caclculate reading rates asscociated with reading computer code, or mathemeatical or chemical equations? In the arts, how long does it take to look at a particular? In languages, how long to read a foreign text?
  • when working with code or equations, do we want the student to read the equation or code as text or engage with it more deeply, for example by executing the code, looking at the output, perhaps making a modification to the code and then executing it again to see how the output differs? For a mathematical equation, do we want students to run some numbers through the equation, or manipulate the equation?
  • code and equations are line based, so should we use line based, rather than word based, calculations to estimate reading — or engagement — time? For example, Xs per line?, with additional Xs per cell chunk / block for environments like Jupyter notebooks where a code chunk in a single cell often produces an single output per cell that we might expect the student to inspect?
  • as with using readability measures to tune reading rate parameters, we might be able to use code complexity measures to generate different code appreciation rates based on code complexity;
  • again, in Jupyter notebooks, we might distinguish between code in a markdown cell, that intended to be read but not executed, compared to code in a code cell which we do expect to be executed. The code itself may also have an execution time associated with it: for example, a single line of code to train a neural network model or run a complex statistical analysis, or even a simple analysis or query over a large dataset, may take several seconds, if not minutes, to run.

And yes, I know, there is probably a wealth of literature out there about this, and some of it has probably even been produced by the OU. If you can point me to things you think I should read, and/or that put me right about things that are obviously nonsense that I’ve claimed above, please post some links or references in the comments…:-)

At this point, we might say it’s pointless trying to capture any sort of metric based on a static analysis of course materials, compared to actually monitoring student study times. Instead, we might rely on our own rules of thumb as educators: if it takes me, as an “expert learner”, X minutes to work through the material, then it will take students 3X minutes (or perhaps, 4X minutes if I work through my own material, which I am familiar with, or 3X when I work through yours, which I am less familiar with); alternatively, based on experience, I may know that it typically takes me three weeks of production time to generate one week of study material, and use that as a basis for estimating the likely study time of a text based on how long I have spent trying to produce it. Different rules of thumb for estimating different things: how long does it take me to produce X hours of study material, how long does it take students to study Y amount of material.

Capturing actual study time is possible; for our Jupyter notebooks, we could instrument them with web analytics to capture webstats about how students engage with notebooks as if they were web pages, and we could also capture Jupyter telemetry for analysis. For online materials, we can capture web stats detailing how long students appeaer to spend on each page of study material before clicking through the the next, and so on.

So what have I been looking at? As well as the crude notebook visualisations, my reports are in the early stages, taking the following form at the current time:

In directory `Part 02 Notebooks` there were 6 notebooks.

– total markdown wordcount 5573.0 words across 160
– total code line count of 390 lines of code across 119 code cells
– 228 code lines, 137 comment lines and 25 blank lines

Estimated total reading time of 288 minutes.

The estimate is open to debate and not really I’ve spent much time thinking about yet (I was more interested in getting the notebook parsing and report generating machinery working): it’s currently a function of wpm reading rate applied to text and a “lines of code per minute” rate for code. But it’s not intended to be accurate, per se, and it’s definitely not intended to be precise; it’s just intended to provide an relative estimate of how long one notebook full of text may take to study compared to one that contains text and code; the idea is to calculate the numbers for all the notebooks across all the weeks of a course, then if we do manage to get a good idea of how long it takes a student to study one particular notebook, or one particular week, we can try to use structural similarities across other notebooks to get hopefully more accurate estimates out.

The estimate is also derived in code, and it’s easy enough to change the parameters (such as reading rates; lines of code engagement rates, etc) in the current algorithm, or the algorithm itself, to generate alternative estimates. (In fact, it might be interesting to generative several alternative forms and then compare them to see how threy feel, and if the ranked estimates and normalised estimates across different notebooks stay roughly the same, or whether they give different relative estimates.)

The report itself is generated from a template fed values from a pandas dataframe cast to a dict (that is, a Python dictionary). The templates take the form:

The bracketed items refer to columns in a feedstock dataframe and templated text blocks are generated a block at a time from individual rows of the dataframe passed to the template as a feedstock dict using a construction of the form:


Robot journalism ftw… meh…

The actual metrics collected are more comprehensive, including:

  • readability measures for markdown text (flesch_kincaid_grade_level, flesch_reading_ease, smog_index, gunning_fog_index, coleman_liau_index, automated_readability_index, lix, gulpease_index, wiener_sachtextformel), as well as simply structural measures (word count, sentence count, average words per sentence (mean, median) and sd, number of paragraphs, etc;
  • simpe code analysis (lines of code, comment lines, blank lines) and some experimental code complexity measure.

I’ve also started experimenting with tagging markdown with automatically extracted acronyms and “subject terms”, and exploring things like identifying the Python pakages imported into each notebook. Previous experiments include grabbing text headings out of notebooks, which may be useful when generating summary reports over sets of notebooks for review purposes.

Something I haven’t yet done is explore ways in which metrics evolve over time, for example as materials are polished and revised during a production or editorial process.

Reaction internally to my shared early doodlings so far have been pretty much universally negative, although varied: folk may happy with their own simple metrics (reading rates applied to word counts), or totally in denial about the utility of any form of static analysis depending on the intended study / material use model. As with many analytics, there are concerns that measures are okay if authors can use them as a tool to support their own work, but may not be appropriate for other people to make judgements from or about them. (This is worth bearing in mind when we talk about using metrics to monitor students, or computational tools to automatically grade them, but we then shy against applying similar techniques to our own outputs…)

You can find the code as it currently exists, created as a stream of consciousness notebook, in this gist. Comments and heckles welcome. As with any dataset, the data I’m producing is generated: a) because I can generate it; b) as just another opinion…

PS once I’ve gone through the notebook a few more times, building up different reports, generating different reading-time and engagement measures, coming up with a commandline interface to make it easier for folk to run against their own notebooks, etc, I think I’ll try to do the same for OU-XML materials… I already have OU-XML to markdown converters, so a running the notebook profiler over that material is easy enough, particularly if I use Jupytext to transform the md to notebooks. See also the PS to notebook visualisation post for related thoughts on this.

PPS The demo notebooks in this repository look like they could be interesting for eg code analysis. And this interactive DAG visualisation tool might also be interesting when it comes to viewing generated graphs.

PPPS This could be an interesting approach for building up a set of tools for checking student code: writing your own static code analysis code checks.

Fragment -Visualising Jupyter Notebook Structure

Over the weekend, I spent some time dabbling with generating various metrics over Jupyter notebooks (more about that in a later post…). One of the things I started looking at were tools for visualising notebook structure.

In the first instance, I wanted a simple tool to show the relative size of notebooks, as well as the size and placement of markdown and code cells within them.

The following is an example of a view over a simple notebook; the blue denotes a markdown cell, the pink a code cell, and the grey separates the cells. (The colour of the separator is controllable, as well as its size, which can be 0.)

When visualising multiple notebooks, we can also display the path to the notebook:

The code can be be found in this repo this gist.

The size of the cells in the diagram are determined as follows:

  • for markdown cells, the number of “screen lines” taken up by the markdown when presented on a screen with a specified screen character width;
        import textwrap
        LINE_WIDTH = 160
        def _count_screen_lines(txt, width=LINE_WIDTH):
            """Count the number of screen lines that an overflowing text line takes up."""
            ll = txt.split('\n')
            _ll = []
            for l in ll:
                #Model screen flow: split a line if it is more than `width` characters long
                _ll=_ll+textwrap.wrap(l, width)
            n_screen_lines = len(_ll)
            return n_screen_lines

  • for code cells, the number of lines of code; (long lines are counted over multiple lines as per markdown lines)

In parsing a notebook, we consider each cell in turn capturing its cell type and screen line length, returing a cell_map as a list of (cell_size, cell_type) tuples:

   import os
   import nbformat
   VIS_COLOUR_MAP  = {'markdown':'cornflowerblue','code':'pink'}

   def _nb_vis_parse_nb(fn):
        """Parse a notebook and generate the nb_vis cell map for it."""

        cell_map = []

        _fn, fn_ext = os.path.splitext(fn)
        if not fn_ext=='.ipynb' or not os.path.isfile(fn):
            return cell_map

        with open(fn,'r') as f:
            nb = nbformat.reads(f.read(), as_version=4)

        for cell in nb.cells:
            cell_map.append((_count_screen_lines(cell['source']), VIS_COLOUR_MAP[cell['cell_type']]))

        return cell_map

The following function handle single files or directory paths and generates a cell map for each notebook as required:

    def _dir_walker(path, exclude = 'default'):
        """Profile all the notebooks in a specific directory and in any child directories."""

        if exclude == 'default':
            exclude_paths = ['.ipynb_checkpoints', '.git', '.ipynb', '__MACOSX']
            #If we set exclude, we need to pass it as a list
            exclude_paths = exclude
        nb_multidir_cell_map = {}
        for _path, dirs, files in os.walk(path):
            #Start walking...
            #If we're in a directory that is not excluded...
            if not set(exclude_paths).intersection(set(_path.split('/'))):
                #Profile that directory...
                for _f in files:
                    fn = os.path.join(_path, _f)
                    cell_map = _nb_vis_parse_nb(fn)
                    if cell_map:
                        nb_multidir_cell_map[fn] = cell_map

        return nb_multidir_cell_map

The following function is used to grab the notebook file(s) and generate the visualisation:

def nb_vis_parse_nb(path, img_file='', linewidth = 5, w=20, **kwargs):
    """Parse one or more notebooks on a path."""
    if os.path.isdir(path):
        cell_map = _dir_walker(path)
        cell_map = _nb_vis_parse_nb(path)
    nb_vis(cell_map, img_file, linewidth, w, **kwargs)

So how is the visualisation generated?

A plotter function generates the plot from acell_map:

    import matplotlib.pyplot as plt

    def plotter(cell_map, x, y, label='', header_gap = 0.2):
        """Plot visualisation of gross cell structure for a single notebook."""

        #Plot notebook path
        plt.text(y, x, label)
        x = x + header_gap

        for _cell_map in cell_map:

            #Add a coloured bar between cells
            if y > 0:
                if gap_colour:
                    plt.plot([y,y+gap],[x,x], gap_colour, linewidth=linewidth)

                y = y + gap
            _y = y + _cell_map[0] + 1 #Make tiny cells slightly bigger
            plt.plot([y,_y],[x,x], _cell_map[1], linewidth=linewidth)

            y = _y

The gap can be automatically calculated relative to the longest notebook we’re trying to visualise which sets the visualisation limits:

    import math

    def get_gap(cell_map):
        """Automatically set the gap value based on overall length"""
        def get_overall_length(cell_map):
            """Get overall line length of a notebook."""
            overall_len = 0
            gap = 0
            for i ,(l,t) in enumerate(cell_map):
                #i is number of cells if that's useful too?
                overall_len = overall_len + l
            return overall_len

        max_overall_len = 0
        #If we are generating a plot for multiple notebooks, get the largest overall length
        if isinstance(cell_map,dict):
            for k in cell_map:
                _overall_len = get_overall_length(cell_map[k])
                max_overall_len = _overall_len if _overall_len > max_overall_len else max_overall_len
            max_overall_len = get_overall_length(cell_map)

        #Set the gap at 0.5% of the overall length
        return math.ceil(max_overall_len * 0.01)

The nb_vis() function takes the cell_map, either as a single cell map for a single notebook, or as a dict of cell maps for multiple notebooks, keyed by the notebook path:

def nb_vis(cell_map, img_file='', linewidth = 5, w=20, gap=None, gap_boost=1, gap_colour='lightgrey'):
    """Visualise notebook gross cell structure."""

    #If we have a single cell_map for a single notebook
    if isinstance(cell_map,list):
        gap = gap if gap is not None else get_gap(cell_map) * gap_boost
        fig, ax = plt.subplots(figsize=(w, 1))
        plotter(cell_map, x, y)
    #If we are plotting cell_maps for multiple notebooks
    elif isinstance(cell_map,dict):
        gap = gap if gap is not None else get_gap(cell_map) * gap_boost
        fig, ax = plt.subplots(figsize=(w,len(cell_map)))
        for k in cell_map:
            plotter(cell_map[k], x, y, k)
            x = x + 1
    if img_file:

The function will render the plot in a Jupyter notebook, or can be called to save the visualisation to a file.

This was just done as a quick proof of concept, so comments welcome.

On the to do list is to create a simple CLI (command line interface) for it, as well as explore additional customisation support (eg allow the color types to be specified). I also need to account for other cell types. An optional legend explaining the colour map would also make sense.

On the longer to do list is a visualiser that supports within cell visualisation. For example, headers, paragraphs and code blocks in markdown cells; comment lines, empty lines, code lines, magic lines / blocks, shell command lines in code cells.

In OU notebooks, being able to identify areas associated with activities would also be useful.

Supporting the level of detail required in the visualisation may be be tricky, particulary in long notebooks. A vertical, multi-column format is probably best showing eg an approximate “screen’s worth” of content in a column then the next “scroll” down displayed in the next column along.

Something else I can imagine is a simple service that would let you pass a link to an online notebook and get a visulisation back, or a link to a Github repo that would give you a visualisation back of all the notebooks in the repo. This would let you embed a link to the visualisation, for example, in the repo README. On the server side, I guess this means something that could clone a repo, generate the visualisation and return the image. To keep the workload down, the service would presumably keep a hash of the repo and the notebooks within the repo, and if any of those had changed, regenerate the image, else re-use a cached one. (It might also make sense to cache images at a notebook level to save having to reparse all the notebooks in a repo where only a single notebook has changed, and then merge those into a single output image?)

PS this has also go me thinking about simple visualisers over XML materials too… I do have an OU-XML to ipynb route (as well as OU-XML2md2html, for example), but a lot of the meaningful structure from the OU-XML would get lost on a trivial treatment (eg activity specifications, mutlimedia use, etc). I wonder if it’d make more sense to create an XSLT to generate a summary XML document and then visualise from that? Or create Jupytext md with lots of tags (eg tagging markdown cells as activities etc) that could be easily parsed out in a report? Hmmm… now that may make a lot more sense…

Fragment: Code Complexity in Notebooks — I’m Obvously Not Wily Enough

Following on from Thinking About Things That Might Be Autogradeable or Useful for Automated Marking Support, via Chris Holdgraf I get something else that might be worth considering both for profiling notebooks as well as assessing code.

The response came following an idle tweet I’d posted wondering “If folk can read 600wpm (so 10wps), what’s a reasonable estimate for reading/understanding code blocks eg in jupyter notebook?”; if you’re trying to make sense of a code chunk in a notebook, I’m minded to assume that the number of lines may have an effect, as well as the line length.

Context for this: I’ve started mulling over a simple tool to profile / audit our course notebooks to try to get a baseline for how long it might reasonably take for a student to work through them. We could instrument the notebooks (eg using the nbgoogleanalytics or jupyter-analytics extensions to inject Google Analytics tracking codes into notebooks) and collect data on how long it actually takes, but we don’t. And whilst our course compute environment is on my watch, we won’t (at least, not using a commercial analytics company, even if their service is “free”, even though it would be really interesting…). If we were to explore logging, it might be interesting to add something an open source analytics engine like Matomo (Piwik, as was) to the VM and let students log their own activity… Or maybe explore jupyter/telemetry collection with a local log analyser that students could look at…

So, Chris’ suggestion pointed me towards wily, “an application for tracking, reporting on timing and complexity in Python code”. Out of the can wily can be used to analyse and report on the code complexity of a git repo over a period of time. It also looks like it can cope with notebooks: Wily will detect and scan all Python code in .ipynb files automatically”. It also seems like there’s the ability to “disable reporting on individual cells*, so maybe I can get reports on a per notebook or per cell basis?

My requirement is much simpler than the evolution of the code complexity over time, however: I just want to run the code complexity tools over a single set of files, at one point in time, and generate reports on that. (Thinks: letting students plot the complexity of their code over time might be interesting, eg in a mini-project setting?) However, from the briefest of skims of the wily docs, I can’t fathom out how to do that (there is support for analysing across the current filesystem rather rather than a git repo, but that doesn’t seem to do anything for me… Is it looking to build a cache and search for diffs? I DON’T WANT A DIFF! ;-)

There is an associated blog post that builds up the rationale for wily here — Refactoring Python Applications for Simplicity — so maybe by reading through that and perhaps poking through the wily repo I will be able to find an easy way of using wily, somehow, to profile my notebooks…

But the coffee break break I gave myself to look at this and give it a spin has run out, so it’s consigned back to the back of the queue I’ve started for this side-project…

PS From a skim of the associated blog post, wily‘s not the tool I need: radon is, “a Python tool which computes various code metrics, including raw metrics (SLOC (single lines of code), comment lines, blank lines, etc.), Cyclomatic Complexity (i.e. McCabe’s Complexity), Halstead metrics (all of them), the Maintainability Index (a Visual Studio metric)”. So I’ll be bumping that to the head of the queue…

Thinking About Things That Might Be Autogradeable or Useful for Automated Marking Support

One of the the ideas we keep floating but never progressing is how we might make use of nbgrader. My feeling is we could start to make use of it now on an optional, individual tutor-marker basis. The current workflow is such that students submit assessments centrally and work is then sent to assigned markers; markers mark the work and then return it centrally, whence it is dispatched back to students.

Whilst there has been a recent procurement exercise looking at replacing the central assignment handling system, I doubt that nbgrader even featured as a side note; although it can be used to release work to students, collect it from then, manage it’s allocation to markers, etc, I suspect the chance is vanishingly small of the institution tolerating more than one assignment handling system, and I very much doubt that nbgrader would be that system.

Despite that, individual working is still a possibility and it requires the smallest of tweaks. Our data course currently distributes continuous assignments as Jupyter notebooks, and students have been encouraged to return their work as completed notebooks, although they may also return notebooks converted to Word docs, for example. So if we just marked up the notebook with each test cell marked as a manually graded assignment, or manually graded task, markers could individually decide to use the nbgrader tools to support their marking and feedback.

(We could also use the nbgrader system to generated the released-to-student notebooks and make sure we have stripped the answers out of them…Erm…)

When it comes to automated grading, lots of the questions we ask are not ideally suited to autograding, although with a few tweaks we could make them testable.

The nbgrader docs provides some good advice on writing good test cases, including examples of using mocking to help test whether functions were called or not called, as well as grading charts / plots using plotchecker.

As someone who doesn’t write tests, I started to explore for myself examples of things we can test for autograding and auto-feedback . Note the auto-feedback reference there: one of the things that started to interest me was not the extent to which we could use automated tests to generate a mark per se, but how we could use tests to provide more general and informative forms of feedback.

True, a score is a form of feedback, but quite a blunt one, and may suffer from false positives or, more likely, false negatives. So could we instead explore how tests can be used to provide more constructive feedback; cf the use of linters in this respect (for example, Nudging Student Coders into Conforming with the PEP8 Python Style Guide Using Jupyter Notebooks, flake8 and pycodestyle_magic Linters). And rather than using autograders as a be-all and end-all, could we use them as feedback generators and as a support tool for markers, making mark suggestions rather than official scores.

Once you start thinking about an autograder as a marker support tool, rather than a marker in its own right, it reduces the need for the marker to be right… that can be left to the judgement of the human marker. All that we would require is that it is mostly useful/helpful, or at least, more helpful/useful than it is a hindrance.

Here’s another example of how we might genearte useful feedback, this time as part of a grader that is capable of assigning partial credit: generating partial credit.

As an example, I wrote up some notes on the crudest of marking support tools for marking free text answers against a specimen answer. I know very little about NLP (natural language processing) and even less about automated marking of free text answers, but I think I can see some utlity even with a crappy similarity matcher from an off-the-shelf NLP package (spacy).

PS in passing, I also noticed this tip for nbgrader autograding in a Docker container using envkernel, a tool that can wrap a docker container so you can launch it as a notebook kernel. (I haven’t managed to get this working yet; I didnlt spot a demo that “just works”, so I figure I need to actually read the docs, which I haven’t made time to do yet… So if you do have a baby steps example that does work, please share it via the comments… Or submit it as a PR to the official docs…)

Accessing MyBinder Kernels Remotely from IPython Magic and from VS Code

One of the issues facing us as a distance learning organisation is how to support the computing needs of distance learning students, on a cross-platform basis, and over a wide range of computer specifications.

The approach we have taken for our TM351 Data Management and Analysis course is to ship students a Virtualbox virtual machine. This mostly works. But in some cases it doesn’t. So in the absence of an institutional online hosted notebook, I started wondering about whether we could freeload on MyBinder as a way of helping students run the course software.

I’ve started working on an image here though it’s still divergent from the shipped VM (I need to sort out things like database seeding, and maybe fix some of the package versions…), but that leaves open the question of how students would then access the environment.

One solution would be to let students work on MyBinder directly, but this raises the question of how to get the course notebooks into the Binder environment (the notebooks are in a repo, but its a private repo) and out again at the end of a session. One solution might be to use a Jupyter github extension but this would require students setting up a Github repository, installing and configuring the extension, remember to sync (unless auto-save-and-commit is available, or could be added to the extendion) and so on…

An alternative solution would be to find a way of treating MyBinder like an Enterprise Gateway server, launching a kernel via MyBinder from a local notebook server extension. But I don’t know how to do that.

Some fragments I have had laying around for a bit were the first fumblings towards a Python Mybinder client API, based on the Sage Cell client for running a chunk of code on a remote server… So I wondered whether I could do another pass over that code to ceate some IPython magic that let you create a MyBinder environment from a repo and then execute code against it from a magicked code cell. Proof of concept code for that is here: innovationOUtside/ipython_binder_magic.

One problem is that the connection seems to time out quite quickly. The code is really hacky and could probably be rebuilt from functions in the Jupyter client package, but making sense of that code is beyond my limited cut-and-paste abilities. But: it does offer a minimal working demo of what such a thing could be like. At a push, a student could install a minimal Jupyter server on their machine, install the magic, and then write notebooks using magic to run the code against a Binder kernel, albeit one that keeps dying. Whilst this would be inconvenient, it’s not a complete catastrophe because the notebook would be bing saved to the student’s local machine.

Another alternative struck me today when I say that Yuvi Panda had posted to the official Jupyter blog a recipe on how to connect to a remote Jupyterhub from Visual Studio Code. The mechanics are quite simple — I posted a demo here about how to connect from VS Code to a remote Jupyter server running on Digital Ocean, and the same approach works for connecting to out VM notebook server, if you tweak the VM notebook server’s access permissions — but it requires you to have a token. Yuvi’s post says how to find that from a remote JupyterHub server, but can we find the token for a MyBinder server?

If you open your browser’s developer tools and watch the network traffic as you launch a MyBinder server, then you can indeed see the URL used to launch the environment, along with the necessary token:

But that’s a bit of a faff if we want students to launch a Binder environment, watch the newtrok traff, grab the token and then use that to create a connection to the Binder environment from VS Code.

Searching the contents of pages from a running Binder environment, it seems that the token is hidden in the page:

And it’s not that hard to find… it’s in the link from the Jupyter log. The URL needs a tiny bit of editing (cut the /tree path element) but then the URL is good to go as the kernel connection URL in VS Code:

Then you can start working on your notebook in VS Code (open a new notebook from the settings menu), executing the code against the MyBinder environment.

You can also see the notebooks listed in the remote MyBinder environment.

So that’s another way… and now it’s got me thinking… how hard would it be to write a VS Code extension to launch a MyBinder container and then connect to it?

Ps by the by, I notice that developer tools in Firefox become increasingly useful with the Firefox 71 release in the form of a websocket inspector.

This lets you inspect traffic sent across a webseocket connection. For example, if we force a page reload on a running Jupyter notebook, we can see a websocket connection:

We can the click on that connection and monitor the messages being passed over it…

I thought this might help me debug / improve my Binder magic, but it hasn’t. The notebook looks like it sends an empty ping as a heartbeat (as per the docs), but if I try to send an empyt message from the magic it closes the connection? Instead, I send a message to the hearbeat channel…

PS sort of related, binderbot, “A simple CLI to interact with binder, eg run local notebooks on a remote binder”.

vagrant share – sharing a vagrant launched headless VM service on the public interwebz

Lest I forget (which I had…):

vagrant share lets you launch a VM using vagrant and share the environment using ngrok in three ways:

  • via public URLs (expose your http ports to the web, rather than locally);
  • via ssh;
  • via vagrant connect (connect to any exposed VM port from a remote location).

So this could be handy for remote support with students… If we tell them to install the vagrant share plugin, then we can offer remote support…