From Charts to Interactive Apps With Jupyter Notebooks and IPywidgets…

One of the nice things about Jupyter notebooks is that once you’ve got some script in place to generate a particular sort of graphic, you can very easily turn it into a parameterised, widgetised app that lets you generate chart views at will.

For example, here’s an interactive take on a  WRC chartable(?!) I’ve been playing with today. Given a function that generates a table for a given stage, rebased against a specified driver, it takes only a few lines of code and some very straightforward widget definitions to create an interactive custom chart generating application around it:

In this case, I am dynamically populating the drivers list based on which class is selected. (Unfortunately, it only seems to work for RC1 and RC4 at the moment. But I can select drivers and stages within that class…)

It also struck me that we can add further controls to select which columns are displayed in the output chart:

What this means is that we can easily create simple applications capable of producing a wide variety of customised chart outputs. Such a tool might be useful for a sports journalist wanting to use different sorts of table to illustrate different sorts of sport report.

Tinkering with Stage Charts for WRC Rally Sweden

Picking up on some doodles I did around the Dakar 2019 rally, a quick review of a couple of chart types I’ve been tweeting today…

First up is a chart showing the evolution of the rally over the course of the first day.

overall_lap_ss8

This chart mixes metaphors a little…

The graphics relate directly to the driver identified by the row. The numbers are rebased relative to a particular driver (Lappi, in this case).

The first column, the stepped line chart, tracks overall position over the stages, the vertical bar chart next to it identifying the gap to the overall leader at the end of each stage (that is, how far behind the overall leader each driver is at the end of each stage). The green dot highlights that the driver was in overall lead of the rally at the end of that stage.

The SS_N_overall numbers are represent rebased overall times. So we see that at the end of SS2, MIK was 9s ahead of LAP overall, and LAP was 13.1 seconds ahead of LOE. The stagePosition stepped line shows how the driver specified by each row fared on each stage. The second vertical bar chart shows the time that driver lost compared to the stage winner; again, a green dot highlights a nominal first position, in this case stage wins. The SS_N numbers are once again rebased times, this time showing how much time the rebased driver gained (green) or lost (red) compared to the driver named on that row.

I still need to add a few channels into the stage report. The ones I have for WRC are still the basic ones without any inline charts, but the tables are a bit more normalised and I’d need to sit down and think through what I need to pull from where to best generate the appropriate rows and from them the charts…

Here’s a reminder of what a rebased single stage chart looks like: The first column is road position, the second the overall gap at the end of the previous stage. The first numeric columns are how far the rebased driver was ahead (green) or behind (red) each other driver at each split. The Overall* column is the gap at the end of the stage (I should rename this and drop the Overall* or maybe replace as Final; then overall position and overall rally time delta (i.e. the column that take on the role of Previous column in the next stage). The DN columns are the time gained/lost going between split points. This  often highlights any particularly good or bad parts of the stage.  For example, in the above example, rebased on Lappi, the first split was dreadful but then he was fastest going between splits 1 and 2, and fared well 2-3 and 3-4.

Connecting to a Remote Jupyter Notebook Server Running on Digital Ocean from Microsoft VS Code

Despite seeing talk of Jupyter notebook integration in Microsoft Visual Studio (VS) Code, I didn’t do much more than pass it on (via the Tracking Juptyer newsletter) because I though it was part of a heavyweight Visual Studio IDE.

Not so.

Microsoft Visual Studio Code is an electron app, reminiscent-ish of Atom editor (maybe?) that’s available as a quite compact download across Windows, Mac and Linux platforms.

Navigating the VS Code UI is probably the hardest part of connecting it to a Jupyter kernel, remote or local, so let’s see what’s involved.

If you haven’t got VS Code installed, you’ll need to download and install it.

Install the Python extension and reload…

Now let’s go hunting for the connection dialogue…

From the Command Palette, search for Python: Specify Jupyter server URI (there may be an easier way: I’ve spent all of five minutes with this environment!):

You’ll be prompted with another dialogue. Select the Type in the URI to connect to a running Jupyter server option:

and you’ll be prompted for a URI. But what URI?

Let’s launch a Digital Ocean server.

If you don’t have a Digital Ocean account you can create one here and get $100 free credit, which is way more than enough for this demo.

Creating a server is quite straightforward. There’s an example recipe here — you’ll need to create as one click app a Docker server, select your region and server size (a cheap 2GB server will be plenty), and then enter the following into the User data area:

#!/bin/bash

docker run -d --rm -p 80:8888 -e JUPYTER_TOKEN=letmein jupyter/minimal-notebook

You can now create your server (optionally naming it for convenience):

The server will be launched and after a moment or two it will be assigned a public IP address. Copy this address and paste it into a browser location bar — this is just to help us monitor when the Jupyter server is ready (it will probably take a minute or two to download and install the notebook container into the server).

When you see the notebook server (no need to log in, unless you want to; the token is letmein, or whatever you set it to in the User data form), you can enter the following into the VS Code server URI form using the IP address of your server:

http://IPDDRESS?token=letmein

In VS Code, raise the Command Palette… again and start to search for Pythin: Show Python Interactive window.

When you select it, a new interactive Python tab will be opened, connected to the remote server.

You should now be able to interact with your remote IPython kernel running on a Digital Ocean server.

See Working with Jupyter Notebooks in Visual Studio Code for some ideas of what to do next… (I should probably work through this too…)

If you want to change the remote Jupyter kernel URL, you either need to quit VS Code, restart it, and go through the adding a connection URI process again, or dip into the preferences (h/t Nick H. in the TM351 forums for that spot):

When you’re done, go back to the Digital Ocean control panel and destroy the droplet you created. If you don’t, you’ll continue to be billed at its hourly rate for each hour, or part thereof, that you keep it around (switched or not; there’s still a rental charge… If you treat the servers as temporary servers, and destroy them when you’re done, your $100 can go a long way…)

Quick Review – Jupyter Multi Outputs Notebook Extension

This post represents a quick review of the Jupyter multi-outputs Jupyter notebook extension.

The extension is one of a series of extensions developed by the Japanese National Institute of Informatics (NII) Literate Computing for Reproducible Infrastructure project.

My feeling is that some of these notebook extensions may also be useful in an educational context for supporting teaching and learning activities within Jupyter notebooks, and I’ll try to post additional reviews of some of the other extensions.

So what does the multi-outputs extension offer?

We can also save the output of a cell into a tab identified by the cell execution number. Once the cell is run, click on the pin item in the left hand margin to save that cell output:

The output is saved into a tab numbered according to the cell execution count number. You can now run the cell again:

and click on the previously saved output tab. You may notice that when you select a previous output tab that a left/right arrow “show differences” icon appears:

Click on that output to compare the current and previous outputs:

(I find the graphic display a little confusing, but’s typical for many differs! If you look closely, you may seen green (addition) and red (deletion) highlighting.)

The differ display also supports simple search (you need to hit Return to register the search term as such.)

The saved output is actually saved as notebook metadata associated with the cell, which means it will persist when the notebook is closed and restarted at a later date.

One of the hacky tools I’ve got in tm351_utils (which really needs some example notebooks…) is a simple differencing display. I’m not sure if any of the TM351 notebooks I suggested during the last round of revisions that used the differ made it into the finally released notebooks, but it might be worth comparing that approach, of diffing across the outputs of two cells, with this approach, of diffing between two outputs from the same cell run at different times/with different parameters/state.

Config settings appear to be limited to the maximum number of saved / historical tabs per cell:

So, useful? I’ll try to work up some education related examples. (If you have any ideas for some, or have already identified and/or demonstrated some, please let me know via the comments.)

OpenRefine Hangs On Start…

If you’re testing OpenRefine on things like Digital Ocean, use at least a 3GB server, ideally more.

If you use a 2GB server and default OpenRefine start settings, you may find it stalls on start and just hangs, particular if you are running it via a Docker container.

(I just wasted four, going on five, hours trying to debug what I thought were other issues, when all the time it was a poxy memory issue.)

So be warned: when testing Java apps in Docker containers / Docker Compose configurations, use max spec, not min spec machines.

I waste hours of my evenings and weekends on this sort of crap so you don’t have to… #ffs

Note to OUseful.Info Blog Email Subscribers…

I’m not sure how the email subscription works, but just so you know: the content of OUseful.info blog posts can go through multiple edits, not just for typos, but also rejoinders and clarifications, in the 10-30 minutes after the post is first published. So it’s worth viewing an email post you’re actually sent as a rough first draft, and click through to see the actual latest version. My blog, my rules!;-)

Research into Practice?

Picking up on So I was Wrong… Someone Does Look at the Webstats…, and the second part of the title of Martin Weller’s blogpost that prompted it — (Learning design – the long haul of institutional change), the question: how do we shorten the feedback loop so data can be used by course teams?

IET are a research wing who do their own thing for academic research credit and also contribute to internal innovation and change. (UPDATE: or not.. see @R3beccaF’s comment…) The research I referred to in the previous post drew on institutionally sourced data that looked like it required some sort of project in place in order to have it collected and is not something (I think) I have direct access to.

I get the need for research and folk to do stats and etc etc, but I also believe that folk with an interest can use often quite scruffy data to provide anecdotal evidence about what’s working and what isn’t (the “first draft” of more formal research perhaps?).

So for example, I’ve been interested (casually) in this for years but never done more than play around the edges, but not as much as I’d have liked. I’ve only ever managed to get access to reasonable granularity page level tracking data several years ago when I managed to persuade someone to pop a Google Tracking code I had access to the dashboard for onto a set of course pages for a course I was sole author on. More recently, I’ve struggled to find many VLE stats on course pages I am still involved with (maybe I should check again; it’s been a while…).

On the other hand, I have a modicum of data skills, data storytelling / exploratory analysis skills, and end user app developer skills. And I’m interested in rapidly prototyping tools that may help make the data useful.