Fragment: Structural Testing of Jupyter Notebook Cell Outputs With nbval

In an attempt to try to automate a bit more of our educational notebook testing and release process, I’ve started looking again at nbval [repo]. This package allows you to take a set of run notebooks and then re-run them, comparing the new cell outputs with the original cell outputs.

This allows for the automated testing of notebooks when our distributed code execution environment is updated. This allows us to check for code that has stopped working for whatever reason, as well as picking up new warning notices, such as deprecation notices.

It strikes me that it would also be useful to generate a report for each notebook that captures the notebook execution time. Which makes me think, is there also a package that profiles notebook execution time on a per cell basis?

The basis of comparison I’ve been looking at is the string match on each code cell output area and on each code cell stdout (print) area. In several of the notebooks I’m interested in checking in the first instance, we are raising what are essentially false positive errors in certain cases:

  • printed outputs that have a particular form (for example, a printed output at each iteration of a loop) but where the printed content may differ within a line;
  • database queries that return pandas dataframes with a fixed shape but variable content, or Python dictionaries will a particular key structure but variable values;
  • %%timeit queries that return different times each time the cell is run.

For the timing errors, nbval does support the use of regular expressions to rewrite cell ouptut before comparing it. For example:

regex: CPU times: .*
replace: CPU times: CPUTIME

[regex2]regex: Wall time: .*
replace: Wall time: WALLTIME

regex: .* per loop \(mean ± std. dev. of .* runs, .* loops each\)

In a fork of the nbval repo, I’ve added these as a default sanitisation option, although it strikes me it might also be useful to capture timing reports and then raise an error if the times are significantly different (for example, and order of magnitude difference either way). This would then also start to give us some sort of quality of service test as well.

For the dataframes, we can grab the dataframe table output from the text/html cell output data element and parse it back into a dataframe using the pandas pd.read_html() function. We can then compare structual elements of the dataframe, such as its size (number of rows and columns) and the column headings. In my hacky code, this behaviour is triggered using an nbval-test-df cell tag:

def compare_dataframes(self, item, key="data", data_key="text/html"):
        """Test outputs for dataframe comparison. """
        df_test = False
        test_out = ()
        if "nbval-test-df" in self.tags and key in item and data_key in item[key]:
            df = pd.read_html(item[key][data_key])[0]
            df_test = True
            test_out = (df.shape, df.columns.tolist())
        return df_test, data_key, test_out

The error report separately reports on shape and column name mismatches:

    def format_output_compare_df(self, key, left, right):
        """Format a dataframe output comparison for printing"""
        cc = self.colors

            + "dataframe mismatch from parsed '%s'" % key
            + cc.FAIL)

        size_match = left[0]==right[0]
        cols_match = left[1]==right[1]
        if size_match:
                + f"df size match: {size_match} [{left[0]}]" + cc.FAIL)
            self.comparison_traceback.append("df size mismatch")
            self.fallback_error_report(left[0], right[0])
        if cols_match:
                + f"df cols match: {cols_match} [{left[1]}]"+ cc.FAIL)
            self.comparison_traceback.append("df cols mismatch")
            self.fallback_error_report(left[1], right[1])

In passing, I also extended the reporting for mismatched output fields to highlight what output was either missing or added:

        missing_output_fields = ref_keys - test_keys
        unexpected_output_fields = test_keys - ref_keys

        if missing_output_fields:
                + "Missing output fields from running code: %s"
                % (missing_output_fields)
                + '\n'+'\n'.join([f"\t{k}: {reference_outs[k]}" for k in missing_output_fields])
                + cc.ENDC
            return False
        elif unexpected_output_fields:
                + "Unexpected output fields from running code: %s"
                % (unexpected_output_fields)
                + '\n'+'\n'.join([f"\t{k}: {testing_outs[k]}" for k in unexpected_output_fields])
                + cc.ENDC

For printed output, we can grab the stdout cell output element, and run a simple linecount test to check the broad shape of the output is similar, at least in terms of linecount.

    def compare_print_lines(self, item, key="stdout"):
        """Test line count similarity in print output."""
        linecount_test = False
        test_out = None
        if "nbval-test-linecount" in self.tags and key in item:
            test_out = (len(item[key].split("\n")))
            linecount_test = True
        return linecount_test, test_out

The report is currently just a simple “mismatch” error message:

            for ref_out, test_out in zip(ref_values, test_values):
                # Compare the individual values
                if ref_out != test_out:
                    if df_test:
                        self.format_output_compare_df(key, ref_out, test_out)
                    if linecount_test:
                            + "linecount mismatch '%s'" % key
                            + cc.FAIL)
                    if not df_test and not linecount_test:
                        self.format_output_compare(key, ref_out, test_out)
                    return False

I also added support fork some convenience tags: nb-variable-output and folium-map both suppress the comparison of outputs of cells in a behaviour that currntly models the NBVAL_IGNORE_OUTPUT case, but with added semantics. (My thinking is this should make it easy to improve the test coverage of notebooks as I figure out how to sensibly test different things, rather than just “escaping” problematic false positive cells with the nbval-ignore-output tag.

Fragment: Opportunities for Using (Hosted VS Code in the Browser) in Distance Education

Although browser accessible versions of VS Code have been available for some time, for example in the form of cdr/code-server or as a Jupyter server-proxied application, there is now an “official” hosted in-browser VS Code editing enviorment in the form of [announcement:!)].

My personal opionion is that this could be useful for open distance education, at least in situations where users have access to a network connection, not least because it provides a reasonable balance between who provides what resource (storage, bandwidth, compute).

To begin with, the editor provides integration with a local file system, allowing you to open and save files to your own desktop. (If the local filesystem support does not work for some reason, you can also upload/download files.) This means that students can keep work on files stored on their own desktop, albeit at the requirement of having a network connection to access the online environment.

Even though VS Code is perhaps best known as a code development environment, it can also be used as a text editor for editing and previewing markdown documents. Whilst the full range of VS Code extensions that support rich media markdown authoring are not currently available (in many cases, this will only be a matter of time…), some of the extensions are already available, such as the Markdown Preview Mermaid Support extension.

This extension lets you use fenced code blocks for describing mermaid.js diagrams that can be previewed in the editor:

Being able to generate Javascript rendered diagrams from text in markdown documents is now quite a widely recognised pattern (for example, here’s a recipe for creating IPython magics to pipe diagram code to javascript renderers). That said, it’s perhaps not widely adopted in institutionally provided authoring systems that tend to lag behind the feature offerings of contemporary authoring environments by several years.

Admittedly, the VS Code user interface is rather complicated, and not just for novice users, although it is possible to customise the view somewhat.

Exploring different editor configuration views would probably be worth exploring in terms of how different layouts may be suited to different sorts of workflow.

As well as editing and previewing rendered markdown documents, we can also edit and execute HTML+Javascript+CSS applications (as you might expect) and Python code (which you might not expect).

In-browser code execution support for Python code is provided by the joyceerhl.vscode-pyodide extension [repo], a WASM powered environment that bundles a scipy stack.

Usefully for education purposes, the pyodide Python environment is available as a code execution environment for Jupyter notebooks, which can also be edited in the environment:

This means we can run Jupyter notebooks with a Python kernel purely within the browser without the need for a local or remote Python environment. But it does mean you need a browser that’s powerful enough to run the pyodide environment. And a network connection to access the environment.

The ms-toolsai.jupyter-renderers extension is also supported, which means rich outputs can be rendered inline in edited and executed Jupyter notebooks:

Some of the backend renderers may not be preinstalled as part of the default pyodide environment. However, additional Python packages can be installed using micropip. For example:

import micropip
await micropip.install("plotly")

To prove it works, here’s an example of a notebook that installs the plotly package and then renders a rich plotly chart output, inline, in the browser:

If it’s anyhting like jupyterlite, there may be some issues trying to read in data files into pandas. If so, some of the tricks in ouseful_jupyterlite_utils might help…

For files opened from a local filesystem, edits will be saved back to the file system. But what about users who are working across several machines? In this case, it might make sense to edit files against an online file source. Once again, provides one possible solution, this time in the form of its integration with Github: simply add a path to one of your github repos (for example, and you will be prompted to grant permissions to the application:

Once in, you will be able to open, edit, and commit back changes to files contained in the repository:

This means that a user can edit, preview, and potentially run files stored in one of their Github repositories (public or private) from Once again, a network connection is required, but this time the editor and the storage are both provided remotely, with editing and code execution, for example using pyodide, provided in the browser.

In addition, using Github automation, committed changes can be acted upon automatically, for example using a jupyter-book publishing Github Action.

From this briefest of plays, I can see myself starting to edit and check simple code execution using opened over a Github repo.

Of course, at least two big questions arise:

  • what user interface layout(s) will best support users for different sorts of workflow (we don’t want to bewilder users or scare them away by the complexity of the UI);
  • how would course materials need tuning to make them useable in a VS Code environment.

In terms of distance education use, the overhead of requiring a network connection is offset by removing the need to install any software locally, or provide a remote code execution service. Networked file persistence is available via a (free) Github account. In order to run code in browser, for example Python code using the pyodide environment, loading the environment may take some time (I’m not sure if it gets cached in the browser to remove the requirment for repeatedly downloading it? Similalry any additionally installed packages?); the pyodide environment also requires a reasonably powerful and recent browser in order to work smoothly.

Towards One (or More!) Institutionally Hosted Jupyter Services

When we first presented a module to students that used Jupyter notebooks (which is to say, “IPython notebooks” as they were still referred to back in early 2016) we were keen that students should also be able to access an online hosted solution as an alternative. Last year, we provided an on-request hosted service for the data management module, but it was far from ideal, running as it did an environment that differed from the environment we provided to students via a Docker container.

This year, we are offering a more complete hosted service for the same module that makes use of pretty much the same Dockerised environment that the students can run locally (the difference is that the hosted solution also bundles the JupyterHub package and a shim to provide a slightly different start-up sequence).

The module is hosted by the schools specialist IT service using a customised zero2kubernetes deployment route developed by an academic colleague, Mark Hall, as part of an internal scholarly research project exploring the effectiveness of using a hosted environment for a web technology course. (Note that: the developer is an academic…)

The student user route is to click on an LTI auth link in the Moodle VLE that takes ths students to an “available images” list:

On starting an image, an animated splash screen shows the deployment progress. This can take a couple of minutes, depending on the deployment policy used (e.g. whether a new server needs to be started or whether the container can be squeezed onto an already running server.

Cost considerations play a part here in determining resource availability, although the data management module runs on local servers rather than commercial cloud servers.)

Once the environment has started up and been allocated, a startup sequence checks that everything is in place. The container is mounted against a persistent user filespace and can deploy files stashed inside the container to that mounted volume on first run.

Once again, there may be some delay as any required files are copied over and the actual user environment services are started up:

When the user environment is available, for example, a Jupyter notebook environment, the student is automatically redirected to it.

The Control Panel link takes the student back to the server image selection page.

In terms of user permissions, the containers are intended to run the user as a permission limited generic user (i.e. not root). This has some downsides, not least in terms of slightly increasing the complexity of the environment to ensure that permissions are appropriately set on the user. Also note that the user is not a unique user (e.g. all users might be user oustudent inside their own container, rather than being their own user as set using their login ID).

The Docker image used for the data management course is an updated version of the image used for October 2020.

A second module I work on is using the same deployment framework but hosted by the academic developer using Azure servers. The image for that environment is an opinionated one constructed using a templated image building tool, mmh352/ou-container-builder. The use of this tool is intended to simplify image creation and management. For example, the same persistent user file volume is used for all launched computational environments, so care needs to be taken that an environment used for one course doesn’t clobber files or environment settings used by another course in the persistent file volume.

One thing that we haven’t yet bundled in our Jupyter containers is a file browser; but if students are mounting persisting files against different images, and do want to be able to easily upload and download files, I wonder if it makes sense to start adding such a service vie a Jupyter server proxy wrapper. For example, imjoy-team/imjoy-elfinder [repo] provides such a proxy off-the-shelf:

The above approach, albeit variously hosted and resourced, uses JupyterHub to manage the (containerised) user environments. A second approach has been taken by central IT in their development of a generic Docker container launch.

I note that you can use JupyterHub to manage arbitrary services running in containers, not just Jupyter notebook server environments, but from what I can tell, I don’t think that central IT even evaluated that route.

As with the previous approach, the student user route in is via an LTI authentication link in the browser. The landing page sets up the expectation of a long wait…

Telling a student who may be grabbing a quick 30 minute study session over lunch that they must wait up to 10 minutes for the environment to appear is not ideal… Multiply that wait by 1000 students on a course, maybe two or three times a week for 20+ weeks, and that is a lot of lost study time… But then, as far as IT goes, I guess you cost it as “externality”…

To add to the pain, different containers are used for different parts of the module, or at least, different containers are used for teaching and assessment. Since a student can only run one container at a time, if you start the wrong cotnainer (and wait 5+ minutes for it to start up) and then half to shut it down to start the cotnainer you meant to start), I imagine this could be very frustrating…

As well as the different Jupyter environment containers, a file manager container is also provided (I think this can be run at the same time as one of the Jupyer container images). Rather than providing a container image selection UI on an integrated image launcher page, separate image launching links are provided (and need to be maintained) in the VLE:

The file browser service is a containerised version of TinyFileManager [repo]:

In the (unbranded) Jupyter containers, the environment is password protected using a common Jupyter token:

The (unbranded) environment is wrapped in an IFrame by default:

However, if you click on the Jupyter link you can get a link to the notebook homepage without the IFrame wrapper:

In passing, I note that in this IT maintained image, the JupyterLab UI is not installed, which means students are required to use the notebook UI.

It’s also worth noting that these containers are running on GPU enabled servers. Students are not provided with a means of running environments locally because some of the activities require a GPU if they are to complete in a timely fashion.

In passing, I note that another new module that started this year that also used Jupyter notebooks does not offer a hosted solution but instead instructs students to download and install Anaconda and run a Jupyter server that way. This module has very simple Python requirements and I suspect that most, if not all, the code related activities could be executed using a JupyterLite/Pyodide kernel run via WASM in the browser. In many cases, the presentation format of the materials (a code text book, essentially) are also suggestive of a Jupyter Book+Thebe code execution model to me, although the the ability for students to save any edited code to browser storage, for example, would probably be required [related issue].

One final comment to make about the hosted solutions is that the way they are accessed via an LTI authenticated link using institutional credentials prevents a simple connection to the Jupyter server as a remote kernel provider. For example, it is not possibly to trivially launch a remote kernel on the hosted server via a locally running VS Code environment. In the case of the GPU servers, this would be really useful because it would allow students to run local servers most of the time and then only access a GPU powered server when required for a particular activity.

Using R To Calculate A Simple Speed Model For Rally Stage Routes

A few months ago, I started hacking together an online e-book on Visualising WRC Rally Stages With rayshader and R. One of the sections (Estimating speeds) described the construction of a simple speed model based around the curvature of the stage route.

As part of another sprint into some rally data tinkering, this time focusing on Visualising WRC Telemetry Data With R, I’ve extracted just the essential code for creating the speed model and split it into a more self-contained extract: Creating a Route Speed Model. The intention is that I can use this speed model to help improve interpolation within a sparse telemetry time series.

Also on the to do list is to see if I can validate – or not! – the speed model using actual telemetry.

The recipe for building the model builds up from the a boundary convexity tool (bct()) that can be found in the rLFT processing linear features R package. This tool provides a handy routine for modeling the curvature along each point of a route in the form, a process that also returns the co-ordinates of a center of curvature for each sement. A separate function inspired by the pracma::circlefit() function, then finds the radius.

Because I don’t know how to write vectorised functions properly, I use the base::Vectorize() function to do the lifting for me around a simpler, non-vectorised function.


# The curvature function takes an arc defined over
# x and y coordinate lists

#circlefit, from pracma::
circlefit = function (xp, yp, fast = TRUE) 
    if (!is.vector(xp, mode = "numeric") || !is.vector(yp, mode = "numeric")) 
        stop("Arguments 'xp' and 'yp' must be numeric vectors.")
    if (length(xp) != length(yp)) 
        stop("Vectors 'xp' and 'yp' must be of the same length.")
    if (!fast) 
        warning("Option 'fast' is deprecated and will not be used!", 
            call. = FALSE, immediate. = TRUE)
    n <- length(xp)
    p <- qr.solve(cbind(xp, yp, 1), matrix(xp^2 + yp^2, ncol = 1))
    v <- c(p[1]/2, p[2]/2, sqrt((p[1]^2 + p[2]^2)/4 + p[3]))
    rms <- sqrt(sum((sqrt((xp - v[1])^2 + (yp - v[2])^2) - v[3])^2)/n)
    #cat("RMS error:", rms, "\n")

curvature = function(x,y){
  #729181.8, 729186.1, 729190.4
  #4957667 , 4957676, 4957685
      # circlefit gives an error if we pass a straight line
      # Also hide the print statement in circlefit
      # circlefit() returns the x and y coords of the circle center
      # as well as the radius of curvature
      # We could then also calculate the angle and arc length
    error = function(err) { 
      # For a straight, return the first co-ord and Inf diameter
      # Alternatively, pass zero diameter?
      c(x[1], y[1], Inf)[3]})

curvature2 = function(x1, x2, x3, y1, y2, y3){
  curvature(c(x1, x2, x3), c(y1, y2, y3))

# The base::Vectorize function provides a lazy way of 
# vectorising a non-vectorised function
curvatures = Vectorize(curvature2)

# The Midpoint values are calculated by rLFT::bct()
route_convexity$radius = curvatures(lag(route_convexity$Midpoint_X), 

A corner speed model than bins each segment into a corner type. This is inspired by the To See The Invisible rally pacenotes tutorial series by David Nafría which uses a simple numerical value to categorise the severity of each corner as well as identifying a nominal target speed for each corner category.

corner_speed_model = function(route_convexity){
  invisible_bins = c(0, 10, 15, 20, 27.5, 35,
                    45, 60, 77.5, 100, 175, Inf)

  route_convexity$invisible_ci = cut(route_convexity$radius,
                                     breaks = invisible_bins,
                                     labels = 1:(length(invisible_bins)-1),
  # Speeds in km/h
  invisible_speeds = c(10, 40, 50, 60, 70, 80,
                       95, 110, 120, 130, 145)
  route_convexity$invisible_sp = cut(route_convexity$radius,
                                     breaks = invisible_bins,
                                     labels = invisible_speeds,
  # Cast speed as factor, via character, to integer
  route_convexity$invisible_sp = as.integer(as.character(route_convexity$invisible_sp))

We can now build up the speed model for the route. At each step we accelerate towards a nominal sector target speed (the invisible_sp value). We can’t accelerate infinitely fast, so our actual target accumulated speed for the segment, acc_sp, is a simple function of the current speed and the notional target speed. We can then calculate the notional time to complete that segment, invisible_time.

acceleration_model = function(route_convexity, stepdist=10){
  # Acceleration model
  sp = route_convexity$invisible_sp
  # Nominal starting target speed
  # In we don't set this, we don't get started moving
  sp[1] = 30 

  # Crude acceleration / brake weights
  acc = 1
  dec = 1
  for (i in 2:(length(sp)-1)) {
    # Simple linear model - accumulated speed is based on
    # the current speed and the notional segment speed
    # Accelerate up
    if (sp[i-1]<=sp[i]) sp[i] = (sp[i-1] + acc * sp[i]) / (1+acc)

    # Decelerate down
    if (sp[i]>sp[i+1]) sp[i] = (dec * sp[i] + sp[i+1]) / (1+dec)

  route_convexity$acc_sp = sp
  route_convexity$acc_sp[length(sp)] = route_convexity$invisible_sp[length(sp)]

  # New time model
  # Also get speed in m/s for time calculation
  meters = 1000
  seconds_per_hour = 3600 # 60 * 60
  kph_unit = meters / seconds_per_hour
  route_convexity = route_convexity %>% 
                      mutate(segment_sp = route_convexity$acc_sp * kph_unit,
                             invisible_time = dist/segment_sp,
                             acc_time = cumsum(invisible_time))

  # So now we need to generate kilometer marks
  route_convexity$kmsection = 1 + trunc(route_convexity$MidMeas/1000)
  # We can use this to help find the time over each km


With the speed model, we can then generate a simple plot of the anticipated speed against distance into route:

We can also plot the accumulated time into the route:

Finally, a simple cumulative sum of the time taken to complete each segment gives us an estimate of the stage time:

anticipated_time = function(route_convexity) {
  anticipated_time = sum(route_convexity$invisible_time[1:nrow(route_convexity)-1])
  cat(paste0("Anticipated stage time: ", anticipated_time %/% 60,
           'm ', round(anticipated_time %% 60, 1), 's' ))


# Anticipated stage time: 8m 40.3s

Next on my to do list is to generate an “ideal” route from a collection of telemetry traces from different drivers on the same stage.

If we know the start and end of the route are nominally at the same location, we can normalise the route length of multiple routes, map equidistant points onto each other, and then take the average. For example, this solution: seems to offer a sensible way forward. See also and .

I was rather surprised, though, not to find a related funciton in one of the ecology / animal tracking R packages that would, for example, pull out a “mean” route based on a collection of locations from a tracked animal or group of animals following the same (ish) path over a period of time. Or maybe I just didnlt spot it? (If you know of just such a function I can reuse, please let me know via the comments…)

Quick Play: Using Grafana For Motorsport Telemetry Dashboards

In passing, I came across the Grafana dashboarding platform, which is also available as an open source, self-hosted app.

Grabbing the Docker container and installing the Grafana SQLite datasource plugin, I had a quick go at visualising some motorsport telemetry, in this case WRC rally telemetry.

The way I’m using the dashboard is to run queries against the SQLite datasource, which contains some preprocessed telemetry data. The data source includes time into stage, lat/lon co-ordinates, distance into stage (-ish!), and various telemetry channels, such as speed, throttle, brake, etc.

Here’s a quick view of a simple dashboard:

At the top of the dashboard are controls for selecting particular fields. These are defined using dashboard variables and can be used to set parameter values used by other queries in the dashboard.

Changing one of the selector fields updates all queries that call on the variable.

Variables can also be used in the definition of other variables, which means that selector fields will up date responsively to changes in other fields.

In the dataset I am using, telemetry is not available on all stages for all drivers, so we can limit which stages can be selected for each driver based on the selected driver.

The timeseries chart display panels are also defined against a query. A time field is required. In the ata I am using, the time is just a delta, starting at zero and then counting in seconds (and milliseconds). This time seems to be parsed as a timedelta from from Unix basetime, which is okay, although it does mean you have to be careful selecting the period you want to display the chart for…

SELECT displacementTime as time, throttle, brk FROM full_telem2 WHERE driver="${driver}" AND stage="${stage}"

One thing I didn’t explore was dual axis charts, but these do appear to be available.

I appreciate that grafana is intended for displaying real time streaming data, but for analysing this sort of historical data it would be so much easier if you could not only specify the input time as a delta, but also set the display window to automatically display the full width of the data to fit the display window.

If you have multiple time series charts, clicking and dragging over a horizontal region of one of the time series charts allows you to zoom in on that chart and automatically zoom the other time series chart displays. This is great for zooming in, but I didn’t find a way to reset the zoom level other than by trying to use the browser back button. This really sucks.

The map display uses another plugin, pr0ps-trackmap-panel. Shift-clicking and dragging to select a particualr area of the map also zooms the other displays, which means you can check the telemetry for a particular part of the stage (it helps if you can remember which way it flows! It would be hand to be able to put additional specific coloured marker points onto the map, for example to indicate the start and end of the route with different coloured markers.

The map definition is a bit odd: you have specify to separate queries:

SELECT displacementTime as time, lat from full_telem2 where driver="${driver}" and stage="${stage}";

SELECT displacementTime as time, lon from full_telem2 where driver="${driver}" and stage="${stage}";

You also need to disable the autozoom in to prevent the view zooming back to show the whole route after shift-click-dragging to zoom to a particular area of it. (That said, disabling autozoom seems to break the display of the map for me when you open the dashboard?)

SELECT DISTINCT(stage) FROM full_telem2 WHERE driver="$driver" ORDER BY stage

The final chart on my demo dashboard tries to capture the stage time for a particular distance into the stage. The time series chart requires a time for the x-coordinate, so I pretend that my stage distance is a time delta, then map the time into stage at that distance onto the y-axis.

This isn’t ideal because the x-axis units are converted to times (minutes and seconds) and I could see how to force the display of raw “time” in seconds to better indicate the actual value (of distance in meters). The y-axis is a simple seconds count, but that would be handy if it were given as a time!

What would be useful would be the ability to at least transpose the x and y-axis of the time series chart, although a more general chart where you can plot what you like on the x-and y-axes would be even nicer.

Which makes me think… is there a plugin for that? Or perhaps a scatter plot plugin with a line facility?

At first sense, a scatter plot may not seem that useful for this dataset. But a couple of the telemetry traces capture the lateral and longitudinal accelerations… So let’s see how they look using the michaeldmoore-scatter-panel.

This is okay insofar as it goes, but it doesn’t appear to be responsive to selection of particular areas of the other charts, nor does it appear to let you click and drag to highlight areas and then automatically set a similar time scope in the other charts. It would also be handy if you could set colour thresholds. Taking a hint from the map plugin, which required mutliple queires, I did try using multiple queries to give me positive and negative accelerations that I thought I could then colour separately, but only ata from the first query appeared to be displayed.

The scatter plot also gives us a route to a time versus distance chart, but again there’s no zooming or linking.

It’s possible to create you extensions, so maybe at some point I could have a look at creating my own extensions.

For some of my other recent rally telemetry doodles, see Visualising WRC Telemetry With R.

PS I did also wonder whether there would be a route to using pyodide (WASM powered Python) in a grafana extensions, which would then potentially give you access to a full scipy stack when creating widgets…

Fragment: Generating Natural Language Text Commentaries

I didn’t get to tinker with any of WRC rally review code this weekend (Dawners gig, so the following crew got back together…), but I’ve been pondering where to try to take the automated text generation next.

At the moment, commentary is generated from simple rules using the durable-rules Python package. The rules are triggered in part from state recorded in a set of truth tables:

These aren’t very efficient, but they are a handy lookup and it makes it relatively easy to write rules…

At the moment, I generate sentences that get added to a list, then all the generated sentences are displayed, which is a really naive solution.

What I wonder is: could I generate simple true phrases and then pass them to a thing that would write a paragraph for me derived from those true statements. What would happen if generate lots of true statements and then run a summariser over them all to generate a shorter summary statement. Will truth be preserved in the statements? Can I have an “importance” or “interestingness” value associated with each statement that identifies which sentences should more likely appear in the summary?

I also wonder about some “AI” approaches, eg generating texts from data tables (for example, wenhuchen/LogicNLG). I suspect that one possible issue with using texts generated using pretrained text models is the risk of non-truthy statements being generated from a table. This makes me wonder whether a two part architecture makes sense, where generated sentences are also parsed and then checked back against a truthtable (essentially, an automated fact checking step).

So, that makes three possible approaches to explore:

  1. generate true statements and then either find a way to make (true) paragraphs from them;
  2. generates loads of true statements and then summarise them (and retain truth);
  3. use a neural model to generate who knows what from the data table, and then try to parse the generated sentences and run them though a fact checker, only then letting the true sentences through.

Alternatively, I continue with my crappy rules, and try to learn how to compound them properly, so one rule can write state, instead of or as well as generating text, that other rules can pull on. (I should probably figure out to do this anyway…)

Fragment: Basic Computing Concepts for Students – File Paths

A widely shared blog post – File not found – has been doing the rounds lately that describes how for a large proportion of students, the question of “where” a file is on their computer is a meaningless question.

We see this from time to time in a couple of modules I work on, even amongst third year undergrad students, where the problem of not being able to locate a file manifests itself when students have to open files from a code by referring to the file’s address or file path. “Path? Wossat then?”

Orienting yourself to the current working directory (“the what?”) and then using either relative or absolute addressing (“either what or what?”) on the filesystem (“the f…?”) to move or view the contents of another directory are just not on the radar…

In the courses I work on, the problems are compounded by having two file systems in place, one for the students desktop, one inside a local docker container; some directories mounted from host onto a path inside the virtual environment as a shared folder. (If none of that makes sense, but you laughed at the thought of students not being able to reference file locations, then in the world we now live in: that… And: you…)

I used to think that one of the ways in to giving students some sort of idea about what file paths are was to get them to hack URLs, but I could never convince any of my colleagues that was a sensible approach.

I still think they’re wrong, but I’m not sure I’m right any more… I’ve certainly stopped hacking URLs so much myself in recent years, so I figure that URL patterns are perhaps changing for the worse. Or I’m just not having to hack my own way around websites so much any more?

Anyway… the blog post immediately brought to mind a book that was championed folksonomies, back in the day, a decade or so ago, when tagging was all the rage amongst the cool library kids, and if you knew what faceted browsing was and knew how to wield a ctrl-f, you were ahead of the curve. (I suspect many folk have no idea what any of that meant…). The book was, David Weinberger’s Everything is Miscellaneous, and pretty much every one who was into that sort of thing would have read it, or heard one of his presentations about it:

I would have linked to it on Hive or, but a quick naive search on those sites (which are essentially the same site in terms of catalogue, I think?) didn’t find the book…

Anyway, I think getting students to read that book, or watch something like embedded video above, where David Weinberger does the book talk in a Talks at Google session from a hundred years ago, might help…

See also: Anatomy for First Year Computing and IT Students, another of those crazy ideas that I could never persuade any of my colleagues about.

A Simple Pattern for Embedding Third Party Javascript Generated Graphics in Jupyter Notebooks

Part of the rationale for this blog is to capture and record my ignorance. That’s why there’s often a long rambling opening and the lede is buried way down the post: the rambling is contextualisation for what follows. It’s the reflective practice bit.

If you want to cut to the chase, scroll down for examples of how to embed mermaid.js, wavedrom.js, flowchart.js and wavesurfer.js diagrams in a notebook from a Python function call or by using IPython magic.

The Rambling Context Bit

Much teaching material, as well as research papers, are revisionist. For the researcher, they battle their way after many false starts, misunderstandings, dead ends, lots of backtracking, unwarranted assumptions that cause you to just try that one ridiculous thing that turns out to be the right thing, and from the summit of their successful result, they look down the mountain, see a route that looks like it would have been an easier result, follow that back down the mountain, and then write up that easier journey, the reverse ascent essentially, as the method.

Educators have it even worse. Writing not only from a position of knowledge, there is also the temptation to teach as they were taught. There is also a canon of “stuff that must be taught” (based largely on what they were taught) which further limits the teaching tour, and hence the learning journey.

The “expert’s dilemma” takes many forms…

So, in this blog, as well as trying to capture recipes that work (paths up the mountain), I also try to capture the original forward path, with all the false steps, ignorance, and gaps in my own understanding as I try the climb for the first time, or the second, in a spirit of exploration rather than knowledge.

That is to say, this blog is, as much as anything, a learning diary. And at times it’s knowledge-reflective too, as I try to identify what I didn’t know when I started that I did at the end, knowledge gaps or misapprehensions that perhaps made the journey harder than it needed to be.

I’m reminded of site a few of us, who’d put together the original, unofficial OU Facebook app (is there still an OU Facebook app?), mulled over building that we monikered: “Kwink: Knowing What I Now Know”. The site would encourage folk to to openly reflect on their own journey and to share the misunderstandings they had before a moment of revelation, then the thing they learned, the trick, that solved a particular problem or opened a particular pathway. But it never went past the whiteboard stage. Sites like Stack Overflow a similar effect in another way: there is the naive question from the position of ignorance or confusion, then the expert answer related in a teaching style that starts from the point of the questioner’s ignorance and.or confusion and then tries to present a solution in a way the questioner will understand.

So the position of ignorance that this post describes relates to my complete lack of understanding about how to load and access arbitrary Javascript packages in a classic Jupyter notebook (let alone the more complex JupyterLab environment) in an attempt to identify some of the massive gaps in understanding a have-a-go tinkerer might have compared to someone happy to work as a “proper developer” in either of those environments.

The Problem – Rendering Javascript Created Assets in Jupyter Notebooks

The basic problem statement is a general one: given a third party Javascript package that generates a diagram or interactive application based on some provided data, typically provided as a chunk of JSON data, how do we write a simple Python package that will work in a Jupyter notebook context to render the Javascript rendered diagram from data that currently sits in a Python object?

There are a few extensions that we might also add to the problem:

  • the ability to add multiple diagrams to a notebook at separate points, to only render each one once, and to have no interference between diagrams if we render more than one;
  • if multiple diagrams are loaded in the same notebook, ideally we only want to load the Javascript packages once;
  • if there are no diagrams generated in the notebook, we don’t want to load the packages at all;
  • once the image is created, how do we save it to disk as an image file we can reuse elsewhere.

One solution I have used before is to wrap the Javascript application using Aaron Watter’s jp_proxy_widget as an ipywidget. This provides convenient tools for:

  • loading in the required Javascript packages either from a remote URL or a file bundled into the package;
  • for passing state from Python to Javascript, which means you can pass the Javascript the data it needs to generate the diagram, for example; and
  • for passing state from Javascript to Python, which means you can pass the image data back from Javascript to Python and let the Python code save to disk, for example.

It may be that it is easy enough to create your own ipywidget around the Javascript package, but I found the jp_proxy_widget worked when I tried it, it had examples I could crib from, and I don’t recall getting much of a sense of knowing what I was doing or why when I’d tried looking at the ipywidgets docs (this was several years ago now: things may have changed…).

But the jp_proxy_widget has overheads in terms of loading things, you can only have one widget in the same notebook, and (but I need to check this) I don’t think the widgets rendered in a notebook will directly render in a Juptyer Book version of a notebook.

Another solution is to load the Javascript app into another HTML page and then embed it as an IFrame in the notebook. The folium (interactive maps) and nicolaskruchten/jupyter_pivottablejs packages both take this approach, I think. This has the advantage being relatively easy to to do, but it complicates generating an output image. One approach I have used to grab an image of an interactive created this way is to take the generated HTML page and render it in headless browser using selenium, and then grab a screenshot. Another approach might be to render the page using selenium and then scrape a generated image from it.

Rendering Javascript Generated Assets in Jupyter Notebooks Using Embedded IFrames

So here’s the pattern; the code is essentially cribbed from jupyter_pivottablejs.

import io
import uuid
from pathlib import Path
from IPython.display import IFrame

def js_ui(data, template, out_fn = None, out_path='.',
          width="100%", height="", **kwargs):
    """Generate an IFrame containing a templated javascript package."""
    if not out_fn:
        out_fn = Path(f"{uuid.uuid4()}.html")
    # Generate the path to the output file
    out_path = Path(out_path)
    filepath = out_path / out_fn
    # Check the required directory path exists
    filepath.parent.mkdir(parents=True, exist_ok=True)

    # The open "wt" parameters are: write, text mode;
    with, 'wt', encoding='utf8') as outfile:
        # The data is passed in as a dictionary so we can pass different
        # arguments to the template

    return IFrame(src=filepath, width=width, height=height)

One of the side effects of the above approach is that we generate an HTML file that is saved to disk and then loaded back in to the page. This may be seen as a handy side effect, or it may be regarded as generating clutter.

If we had access to a full HTML iframe API, we would be able to pass in the HTML data using the srcdata parameter, rather than an external file reference, but the IPython IFrame() display function doesn’t support that.


We can use that function to render objects from a wide variery of packages. For example, a flowchart.js flowchart:

<!DOCTYPE html>
        <meta charset="UTF-8">
        <script type="text/javascript" src=""></script>
        <script type="text/javascript" src=""></script>
        <div id="diagram"></div>
  var diagram = flowchart.parse(`{src}`);


Note that the template is rather sensitive when it comes to braces ({}). A single brace is used for template substitution, so if the template code has a { } in it, you need to double them up as {{ }}. This is a real faff… There must be a better way?

Here’s an example:

st=>start: Start
e=>end: End
op1=>operation: Generate
op2=>parallel: Evaluate
op2(path1, top)->op1
op2(path2, right)->e

Or how about a wavedrom/wavedrom timing diagram:

        <meta charset="UTF-8">
<script src="" type="text/javascript"></script>
<script src="" type="text/javascript"></script>
        <body onload="WaveDrom.ProcessAll()">
<script type="WaveDrom">

If you’re wondering where the template code comes from, it’s typically a copy of the simplest working example I can find on the original Javascript package documentation website. Note also that you also often get simple minimal example code fragments that don’t appear in the docs on the original Github on repository README homepage.

Here’s some example wavedrom source code…

wcode="""{ signal : [
  { name: "clk",  wave: "p......" },
  { name: "bus",  wave: "x.34.5x",   data: "head body tail" },
  { name: "wire", wave: "0.1..0." },

The mermaid-js package supports several diagram types including flowcharts, sequence diagrams, state diagrams and entity relationship diagrams:

        <script src=""></script>
            mermaid.initialize({{ startOnLoad: true }});

        <div class="mermaid">


For example, a flow chart:

mcode = """
graph TD;

Or a sequence diagram:

    Alice->>John: Hello John, how are you?
    John-->>Alice: Great!
    Alice-)John: See you later!

Note to self: create a Jupyter notebook server proxy package for mermaid.js server wavesurfer.js

The wavedrom and mermaid templates actually allow multiple charts to be rendered in the same page as long as they are in their own appropriately classed div element, so we could tweak the template pattern to support that if passed multiple chart source data objects…

Here’s another example: the wavesurfer-js package that provides a whole range of audio player tools, including spectorgrams:

        <script src=""></script>
        <div id="wavesurfer">
            <div id="waveform"></div>
            <div class="controls">
                <button class="btn btn-primary" data-action="play">
                    <i class="glyphicon glyphicon-play"></i>
                    <i class="glyphicon glyphicon-pause"></i>
            var GLOBAL_ACTIONS = {{ // eslint-disable-line
                play: function() {{

                back: function() {{

                forth: function() {{

                'toggle-mute': function() {{

            // Bind actions to buttons and keypresses
            document.addEventListener('DOMContentLoaded', function() {{
                document.addEventListener('keydown', function(e) {{
                    let map = {{
                        32: 'play', // space
                        37: 'back', // left
                        39: 'forth' // right
                    let action = map[e.keyCode];
                    if (action in GLOBAL_ACTIONS) {{
                        if (document == || document.body == ||["data-action"]) {{

                []'[data-action]'), function(el) {{
                    el.addEventListener('click', function(e) {{
                        let action = e.currentTarget.dataset.action;
                        if (action in GLOBAL_ACTIONS) {{

            var wavesurfer = WaveSurfer.create({{
                container: '#waveform',
                waveColor: 'violet',
                backend: 'MediaElement',
                progressColor: 'purple'

We can pass a local or remote (URL) path to an audio file into the player:

wscode = ""

The wavesurfer.js template would probbaly benefit from some elaboration to allow configuration of the palyer from passed in parameters.

Do It By Magic

It’s easy enough to create some magic to allow diagramming from block magicked code cells:

from IPython.core.magic import Magics, magics_class, cell_magic, line_cell_magic
from IPython.core import magic_arguments
from pyflowchart import Flowchart

class JSdiagrammerMagics(Magics):
    """Magics for Javascript diagramming.""" 
    def __init__(self, shell):
        super(JSdiagrammerMagics, self).__init__(shell)

        "--file", "-f", help="Source for audio file."
    def wavesurfer_magic(self, line, cell=None):
        "Send code to wavesurfer.js."
        args = magic_arguments.parse_argstring(self.wavesurfer_magic, line)
        if not args.file:
        return js_ui({"src":args.file}, TEMPLATE_WAVESURFERJS, height=200)

        "--height", "-h", default="300", help="IFrame height."
    def mermaid_magic(self, line, cell):
        "Send code to mermaid.js."
        args = magic_arguments.parse_argstring(self.mermaid_magic, line)
        return js_ui({"src":cell}, TEMPLATE_MERMAIDJS, height=args.height)

def load_ipython_extension(ip):
    """Load the extension in IPython."""
ip = get_ipython()

Here’s how it works…

General Ignorance

So, the pattern is simple, but there’s a couple of things that would make it a lot more useful. At the moment, it requires loading the Javascript in from a remote URL. It would be much more useful if we could use the package offline and bundle the Javascript so it could be accessed offline, but I don’t know how to do that. (The files can be bundled in a Python package easily enough, but what URL would they be loaded in from in the IFrame and how would I generate such a URL?) I guess one way is to create an extension that would load the Javascript files in when the notebook loads, and then embed code into the notebook using an IPython.display.HTML() wrapper rather than using an IPython.display.IFrame()?

UPDATE: here’s another way… read the script in from a bundled js file then add it to the notebook UI via an IPython.display.HTML() call. The same developer also has a crazy hack for one-time execution of notebook Javascript code…

# Via

import os

#Locate input_table package directory
mydir = os.path.dirname(__file__) #absolute path to directory containing this file.

#Load Javascript file
with open(os.path.join(mydir,'javascript','input_table.js')) as tempJSfile:
  tmp=f'<script type="text/javascript">{}</script>'

As mentioned previously, there’s also no obvious way of accessing the created diagram so it can be saved to a file, unless we perhaps add some logic into the template to support downloading the created asset? Another route would be to load the HTML into a headless browser and then either screenshot it (e.g. as for example here), or scrape the asset from it.

In terms of ignorance lessons, the above recipe shows a workaround for not having any clue about how to properly load and access Javascript into a notebook (let alone JupyterLab). It doesn’t require a development environment (all the above was created in a single notebook), it doesn’t require knowledge of require or async or frameworks. It does require some simple knowledge of HTML and writing templates, and does require a bit of knowledge, or at least, cut-and-paste skills, in creating the magics.

Pondering: Stories Not Accounts

Chatting with Island Storytellers convener Sue Bailey last week, I commented that I really need to put more time into getting the end of the stories I try to tell much tighter. If you have a good opening, and a really strong ending, then you can generally get from one to the other. But if the ending isn’t as solid as it should be, you start to worry about how to close as you get closer to it, and may even finish without anyone realising.

A good title can also help, and often captures a key scene or storypoint from somewhere in the middle.

We also chatted about sourcing stories from local histories. I’ve started trying to pull together a small set of Island history stories I can tell, and I’ve also got a longer tale about the Yorkshire Luddites; but as much as anything, they’re told as accounts rather than stories (thanks to Sue for introducing the distinction of an account into the discussion).

So pondering a handful of stories I’ve not quite got round to pulling together yet into a form that I can tell, I think I’m going to have a go at quickly summarising first the account, which will give me lots of facts and footnotes and depth behind the tale, and then try to to turn them into stories by taking a particular perspective – a particular person, or place, or animal, for example – whose journey we can follow (because a good story is often about something or someone).

For my own reference (so I can keep track of progress, as much as anything), the tales I’ll work on are:

  • the wrecking of the St Mary and the building of The Pepperpot by Walter de Godeton; this could be told from de Godeton’s perpective, but it might also be interesting to try it as a story about a barrel of communion wine, or the Abbey that was expecting the wine…; [story and account]
  • when the Island invaded France, a tale of Sir Edward Woodville (probably; or maybe a story about Diccon Cheke, the sole returning survivor…);
  • The Worsley trial: it could be interesting to be able to tell this two ways: as a story about Richard Worsley, or a story about Seymour Fleming;
  • Odo’s gold, perhaps from the perspective of Odo, perhaps from the perspective of Carisbrooke Castle…

I’m also going to have a go at recasting my Yorkshire Luddite account, I think as a story about George Mellor…

Electron Powered Desktop Apps That Bundle Python — Datasette and JupyterLab

You wait for what feels like years, then couple of in the wild releases appear that put the same pattern into practice within a few weeks of each other!

A couple of years ago, whilst pondering ways of bundling Jupyter Book content in a desktop electron app to try to get round the need for separate webserver to serve the content, I observed:

Looking around, a lot of electron apps seem to require the additional installation of Python environments on host that are then called from the electron app. Finding a robust recipe for bundling Python environments within electron apps would be really useful I think?

Fragment – Jupyter Book Electron App,, May, 2019

And suddenly, in a couple of the projects I track, this now seems to be A Thing.

For example, datasette-app (Mac only at the moment?) bundles a Python environment and a datasette server in an electron app to give you a handy cross-desktop application for playing with datasette.

I need to do a proper review of this app…

Simon Willison’s extensive developer notes, tracked in Github Issues and PRs, tell the tale. For example:

And from a repo that appeared to have gone stale, jupyterlab-app gets an initial release as an electon app bundling an Anaconda environment with some handy packages pre-installed (announcement; cross-platform (Mac, Linux, Windows)).

Naively double-clicking the downloaded JupyterLab-app installer to open raises a not helpful dialog:

Sigining issue with Jupyterlab app on Mac

To make progress, you need to list the app in a Mac Finder window, right-click it and Open, at which point you get a dialog that is a little more helpful:

The download for the JupyterLab-app (Mac) installer was about 300MB, which expands to an installed size of at least double that:

The installation (which requires an admin user password?) takes some time so I’m wondering if load of other things get downloaded and installed as part of the installation process….

Hmmm… on a first run, the app opens up some notebooks that I think I may have been running in another JupyterLab session from a local server – has it actually picked up that JupyterLab workspace context? The announcement post said it shipped with its own conda environment? So where has it picked up a directory path from?

Hmmm… it seems to have found my other kernels too… But I don’t see a new one for the environment, and kernel, it ships with?

Opening the Python 3 (ipykernel) environment appears to give me a kernel that has shipped with the application:

import site

I wonder if there is a name clash with a pre-existing external kernel, the kernel that ships with the JupyterLab app it uses it’s local one, otherwise it uses the other kernels it can find?

Hmmm… seems like trying to run the other Python kernels gets stuck trying to connect and then freezes the app. But I can connect to the R kernel, which looks like it’s the “external” kernel based on where it thinks the packages are installed (retrieved via the R .libPaths() lookup):

Something else that might be handy would be the ability to connect to a remote server and launch a remote kernel, or launch and connect to a MyBinder kernel…

I also note that if I open several notebooks in the app, then in my browser launch a new JupyterLab session, the open notebooks in the app appear as open notebooks in the browser JupyterLab UI: so workspaces are shared? That really really sucks. Sucks double. Not only are the workspaces messing each other up, but it means not only that the same notebook is open in two environments (causing write conflicts) but those environments are also using different kernels for the same notebook. Wrong. Wrong. Wrong!;-)

But a handy start… and possibly a useful way of shipping simple environments to students, at least, once the signing issues are fixed. (For a related discussion on signing Mac apps, see @simonw’s TIL tale, “Signing and notarizing an Electron app for distribution using GitHub Actions” and this related JupyterLab-app issue.)

I also wonder: as the electron app bundles conda, could it also ship postgres as a callable database inside the app? Postgres is available via conda, after all..

Hmm… thinks… did I also see somewhere discussion about a possible JuypterLite app that would use WASM kernels rather than eg bundling conda or accessing local Jupyyter servers? And I wonder… how much of the JupyterLab-App repo could be lifted and used to directly wrap RetroLab? Or maybe, the JupyterLab app could have a button to switch to RetroLab view, and vice versa?

Hmm… I wonder… the jupyter-server/jupyter_releaser package looks to be providing support for automating builds and release generators for templated Jupyter projects… How tightly are apps bound into electron packages? How easy would it be to have a releaser to wrap an app in an electron package with a specfied conda distribution (an “electron-py-app proxy” into which you could drop your jupyter framework app?)

PS whilst tinkering with the fragment describing a Jupyter Book Electron App, I wrapped a Jupyter book in an electron app to remove the need for a web server to serve the book. That post also briefly also explored the possibility of providing live code execution via a thebe connected local server, as well as mooting the possibilty of executing the code via pyodide. I wonder if the JupyterLab-app, and or the datasette-app, have elecron+py components cleanly separated from the app code components. How easy would it be to take one or other of those components to create an electron app bundled with a py/jupyter server that could support live code execution from the Jupyer Book also inside the electron app?