Custom Charts – RallyDataJunkie Stage Table, Part 1

Over the last few evenings, I’ve been tinkering a bit more with my stage table report for displaying stage split times, using the Dakar Rally 2019 timing data as a motivator for it; this is a useful data set to try the table out with not because the Dakar stages are long, with multiple waypoints (that is, splits) along each stage.

There are still a few columns I want to add to the table, but for now, here’s a summary of how to start reading the table.

Here’s stage 3, rebased on Sebastien Loeb; the table is ordered according to stage rank:

The first part of the chart has the Road Position (that is, stage start order) using a scaled palette so that out of start order drivers in the ranking are highlighted. The name of the Crew and vehicle Brand follow, and a small inline step chart that shows the evolution of the Waypoint Rank of each crew (that is, their rank in terms of stage time to that point, at each waypoint). The upper grey bar shows podium ranks 1 to 3, the lower grey line is tenth. If a waypoint returns an NA time, we get a break in the line.

Much of the rest of the chart relies on “rebased” times. So what do I mean by “rebased”?

One of the things the original data gives us the stage time it took each driver to get to each way point.

For example, it took Loeb 18 minutes dead to get to waypoint 1, and Peterhansel 17m 58. Rebasing this relative to Loeb suggests Loeb lost 2s to Perterhansel on that split. On the other hand, Coronel took 22:50, so Loeb gained 290s.

Rebasing times relative to a particular driver finds the time difference (delta) between that driver and all the other drivers at that timing point. The rebased times show for each driver other than the target driver are thus the deltas between their times and the time recorded for the target driver. The rebased time display was developed to be primarily useful to the driver with reference to who the rebased times are calculated.

So what’s going on in the other columns? Let’s rebase relative to Loeb.

Here’s what it looks like, again;

The left hand  middle of the table/chart shows time taking in making progress between waypoints.

To start with we have the Stage Gap of each driver relative to Loeb. This is intended to be read from the target driver’s perspective, so where a driver made time over the target driver, we colour it red to show our target lost time relative to that driver. If a driver was slower than the target driver (the target made up time), we colour it green.

The Stage Gap is incremental, based on differences between drivers of based on the total time in stage at each waypoint. In the above case, Loeb was losing out slightly to the first two drivers at the first couple of waypoint, but was ahead of the third place driver. Then something went bad and a larget amount of time was lost.

But how much time? That what the inline bar chart cells show: the time gained / dropped going from one waypoint to the next. The D0_ times capture differences in the time taken going from one split/waypoint to the next. The horizontal bar chart x-axis limits are set on a per column basis, so you need to look at the numbers get a size of how much time gained/lost they represent. The numbers are time deltas in seconds. I ummed and ahhed about the sign of these. At the moment, a positive time means the target (Loeb) was that much time slower (extra, plus) than the driver indicated by the row.

Finally, the Pos column is rank position at the end of the stage.

If we look down the table, around Loeb, we see how Loeb’s times compare to the drivers who finished just ahead —and behind— hi. For drivers ahead in the ranking, their Stage Gap will end up red at the end of the stage, for drivers behind, it’ll be green (look closely!)

Scanning the D0_ bars within a column, it’s obvious which bits of the stage Loeb made, and dropped, time.

The right hand side of the figure considers the stage evolution as a whole.

The Gap to Leader column shows how much time each driver was behind the stage leader at each waypoint (that is, at each waypoint, rank the drivers to see who was quickest getting to that point).

Along with the Waypoint Rank, the Road Position and Gap to Leader, this is the only aspect of the table that is relative to the driver associated with that row: it helps our target (Loeb) put each other driver’s performance on the stage in the context of the overall stage rankings. The dot marker indicates the gap to leader at the end of the stage.

The 0N_ columns show the time delta on stage between each driver and Loeb, which is the say, the delta between the accumulated stage time for each driver at each waypoint. The final column records the amount of time, in seconds, gained or lost by Loeb relative to each driver in the final stage ranking (penalties excepted).

Looking at the table aound Loeb we see the column entries are empty except for the Gap to Leader evolution.

The original version of this chart, which I was working up around WRC 2018, also includes a couple more columns relating to overall rally position at the start and end of the stage. Adding those is part of my weekend playtime homework!

Converting Pandas Generated HTML Data Tables to PNG Images

Over the weekend, I noticed the Dakar 2019 rally was on, which resulted in my spending Sunday evening putting a scraper together to grab timing data down from the official website (notebook code here).

The data on its own is all a bit “so  what?”; it only comes alive when you start playing with it. One of the displays I was still tinkering with at the end of last year’s WRC season was a tabular stage report that tries to capture a chunk of information from stage timing, such as split times, so it made sense to start riffing on that.

The Rally Dakar timing screen presents split times like this:

You can get a view with either of the running time in stage at each split / waypoint, or the gap, which is to say, the time difference to the person who was fastest at each split. (I think sometimes Gap times may report the time difference to the person who ranked first on the stage overall, rather than the gap to the person ranked first at a particular split.)

One of the things that interests me (from a data storytelling point of view) is how much time a driver gains, or loses, within a split compared to other drivers. We can use this to spot parts of the stage where a driver has hit a problem, or pushed hard.

The sort of display I’ve been working up looks, at least with the Dakar data, like this so far (there are a few columns missing compared to my WRC tables, but there’s also an extra one: the in-line bimodal sparkline chart).

This particular view displays split times rebased relative to Peterhansel (it’s easy enough to generate views rebased relative to any other specified driver). That is, the table shows how much time Peterhansel gained/lost relative to each other driver at each waypoint. The table is ordered by stage rank. The columns on the left show how much time was gained/lost going from one waypoint to the next. The columns on the right show how the gap relative to each driver evolved over the stage. The inline chart tracks the gap evolution.

The table is a styled pandas table, rendered as HTML. After applying styling, you can get a preview in a notebook using something of the form:

from IPython.display import display, HTML
display( HTML( df.style.render() ) )

I’ve previously posted a recipe for Grabbing Screenshots of folium Produced Choropleth Leaflet Maps from Python Code Using Selenium so here’s the latest iteration of my code fragment (which built on the previous example) for taking a chunk of HTML and using selenium to open it in a browser and grab a screenshot of it.

The code is h/t to several Stack Overflow posts.

import os
import time
from selenium import webdriver

#Via https://stackoverflow.com/a/52572919/454773
def setup_screenshot(driver,path):
    ''' Grab screenshot of browser rendered HTML.
        Ensure the browser is sized to display all the HTML content. '''
    # Ref: https://stackoverflow.com/a/52572919/
    original_size = driver.get_window_size()
    required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
    required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
    driver.set_window_size(required_width, required_height)
    # driver.save_screenshot(path)  # has scrollbar
    driver.find_element_by_tag_name('body').screenshot(path)  # avoids scrollbar
    driver.set_window_size(original_size['width'], original_size['height'])

def getTableImage(url, fn='dummy_table', basepath='.', path='.', delay=5, height=420, width=800):
    ''' Render HTML file in browser and grab a screenshot. '''
    browser = webdriver.Chrome()

    browser.get(url)
    #Give the html some time to load
    time.sleep(delay)
    imgpath='{}/{}.png'.format(path,fn)
    imgfn = '{}/{}'.format(basepath, imgpath)
    imgfile = '{}/{}'.format(os.getcwd(),imgfn)

    setup_screenshot(browser,imgfile)
    browser.quit()
    os.remove(imgfile.replace('.png','.html'))
    #print(imgfn)
    return imgpath

def getTablePNG(tablehtml, basepath='.', path='testpng', fnstub='testhtml'):
    ''' Save HTML table as: {basepath}/{path}/{fnstub}.png '''
    if not os.path.exists(path):
        os.makedirs('{}/{}'.format(basepath, path))
    fn='{cwd}/{basepath}/{path}/{fn}.html'.format(cwd=os.getcwd(), basepath=basepath, path=path,fn=fnstub)
    tmpurl='file://{fn}'.format(fn=fn)
    with open(fn, 'w') as out:
        out.write(tablehtml)
    return getTableImage(tmpurl, fnstub, basepath, path)

#call as: getTablePNG(s)
#where s is a string containing html, eg s = df.style.render()

The png image is saved as an image file that can be embedded in other HTML pages, shared via soshul meeja, etc…

Docker Housekeeping _ Removing Old Images and Containers

Some handy commands for tidying up old Docker images and containers…

Remove Named Images


docker rmi `docker images --filter 'dangling=true' -q --no-trunc`

Remove Images With a Particular Name Pattern

Ish via here.

For example, removing repo2docker default named containers which start r2d:


docker images | awk '{ print $1,$2, $3 }' | grep r2d | awk '{print $3 }' | xargs -I {} docker rmi {}

For added aggression, use rmi -f {}.

Remove Containers With a Particular Name Pattern

Via here.


docker ps -a | awk '{ print $1,$2 }' | grep r2d | awk '{print $1 }' | xargs -I {} docker rm {}

Remove Exited Containers


docker ps -a | grep Exit | cut -d ' ' -f 1 | xargs docker rm

PS lots of handy tips here: https://www.digitalocean.com/community/tutorials/how-to-remove-docker-images-containers-and-volumes

AutoStarting A Headless OpenRefine Server in MyBinder Using Repo2Docker and a start Config File

When I first started using MyBinder in 2015, it came with the option of autostarting selectable services — PostgreSQL and a Spark server — within the container along with the Jupyter notebook server (early review). I’m not sure when it disappeared (the Github repo commit history should show it, if anyone’s feeling forensic investigative and wants to let me know via a comment) but for some time I’ve been wondering how to start my own services, such as a database server, or OpenRefine server, in a Binderised container.

I guess I shoulda read the docs again…

…although the pace of change with: a) documented features; b) undocumented features, means it can be hard to keep up with what’s possible in the Jupyterverse (I’m nowhere close to cracking that with the TrackingJupyter in its current formulation…).

So via this issue, some handy leads…

repo2docker start

Binderised containers are built using repo2docker. A new-to-me config file for repo2docker is the start file, “a script that can contain simple commands to be run at runtime. If you want this to be a shell script, make sure the first line is #!/bin/bash. The last line must be exec "$@" equivalent”.

So for example, if we want to autorun a headless OpenRefine instance in MyBinder that I can access via a notebook using an OpenRefine Python client, (see an example notebook here), we can just add the following start file to the repo:

#!/bin/bash

#Start OpenRefine
OPENREFINE_DIR="$HOME/openrefine"
mkdir -p $OPENREFINE_DIR
nohup openrefine-2.8/refine -p 3333 -d OPENREFINE_DIR > /dev/null 2>&1 &

#Do the normal Binder start thing here...
exec "$@" 

Demo here: Binder

PS thanks as ever to the ever helpful Jupyter devs for putting up with my “issues”…

PPS I got stuck trying to generalise the running auto-starting, arbitrary services thing any further. You can see the state of my ignorance here. As is often the case, “permissions” makes me realise I don’t really understand how any of this stuff actually works. That’s why I tend to work: a) locally, b) as root…!

More Than Ten Free Hosted Jupyter Notebook Environments You Can Try Right Now

Looking for a free, online Jupyter notebook server? Here are some you can try right now…

Generic Environments

Research Publishing / Environments

DataSci / ML Environments

Training / Education Environments

There are also a couple of temporary notebook servers provided by the Jupyter core development team:

If you’re looking for some consultancy support / help in getting a notebook server environment up and running, the folk at Hub Hero may be able to help…

Afterword

My old, old post on Seven Ways of Running IPython / Jupyter Notebooks is getting rather long in the tooth — I really should cast it as a WordPress page rather than post and keep it up to date — and it doesn’t seem to get much traffic anymore, so this post is set to serve as a quick round up of several services I’m aware of that offer a free plan for hosted Jupyter notebooks and a crappy post title that may or may not generate traffic…

If you’re aware of any more, please let me know… This post won’t be actively maintained but I will try to do the odd bit of gardening around it as and when I come across additional services and/or aware of services shutting down, closing their free plans, etc etc… If you come across more services, please add them via the comments (ideally with a brief description of what they offer) and I’ll plant them in the page each time I pass by… If you want to comment on any of the services listed, eg with services they offer, the extent of the free plan, or how quickly pricing kicks in, please feel free to do that too…

I’ll also try to annotate entries with brief descriptions of what’s available as and when I get a chance…

Initial Notes… Jupyter Notebook To OU-XML

Scrabbly notes on generating OU-XML from Jupyter notebooks. There may well be spin-offs such as a route for producing OU-XML from markdown. (The OU has been using OU-XML for years so there’s surely a Markdown to OU-XML engine somewhere that someone has written, surely?)

With several courses looking at the possibility of using notebooks, having a workflow that supports authoring as well as interactive content delivery using notebooks or a Jupyter markdown / Rmd equivalent could make for a much lighter workflow than the current one.

I think the internal OpenCreate project has been shelved for now (anyone fancy FOI-ng how much was spent on it?;-) so I’m not sure if there are any bits of code or novel process/workflow that are salvageable or reusable from that, particular insofar as they relate to facilitating more agile workflows?

One of the issues we’ve found on TM351 is that notebook maintenance using Github is possible but can be a bit of a pain when trying to check-in or compare run notebooks. One thing I intend to explore is whether we could just as easily create a workflow around linked Jupytext documents that would allow us to author in notebooks and check-in dual markdown documents.

Something else to note is the extent to which the notebook .ipynb JSON format might be useful as a feed for accessibility tools. The cell structure and markdown code should presumably provide an easier to parse source for such tools than the mass of styled HTML that the notebook interface provides? I keep thinking that the Calysto nbplayer, which plays through a notebook a step at a time on the command line, might be useful as a route towards an accessible notebook player, perhaps with added support from pindent (code; see also these other random accessibility thoughts)? (By the by, other accessibility features we could explore include automatically generating image descriptions or sonifications from data driven charts.)

In terms of what needs doing on a, Jupyter nbconvert uses mistune (docs) to convert markdown to HTML, which looks like it should be easy to subclass, or fork if needs be, to generate a markdown to OU-XML converter. There are also some contributed packages which might provide some handy references…

The Jupyter nbconvert package also demonstrates how to extend mistune to render maths content.  The HTML templates look like they should provide a good basis for a full OU-XML template.

Jupyter is Not Just Notebooks

Last week, I filled an hour in a department seminar showing ways in which we could use to Jupyter notebooks to support the creation and use of interactive educational materials.

I’ve no idea if it converted anyone to the cause.

I could have done any number of other talks — about the architecture of the Jupyter ecosysytem more widely (at least, insofar as I understand), or the way in which Jupyter makes sense for reproducible research and how it fits into a containerised / virtualised way of working.

Because Jupyter is not just about notebooks.

It’s also about string and glue.

Here’s something I suddenly grokked the other day whilst chatting to somebody about different ways of accessing applications that have a graphical UI… (on a desktop, on a desktop in a VM, via X11 (“what’s that?” they asked… sigh…), via a browser if is has an HTML UI, via novnc in a browser window if it doesn’t (albeit w/ borked audio support); note to self – try out this  novnc Jupyter extension.): if you wrap an application that has a command line interface using metakernel, you can access it in a notebook, or JupyterLab.

Obvious, right? But that means I can also access it via a web page using something like ThebeLab (or Juniper, or nbinteract), run via a container launched using Binderhub.

This is all tied up with a couple of the Big Ideas that underlies Jupyter: firstly, that it supports the read/write web. Secondly that it supports remote code execution (and as such enables the read/write/execute web).

So for example, one of the many metakernel based kernels is the gnuplot_kernel that lets you run Gnuplot commands from a notebook code cell and display the generated figure in a notebook. Here’s a forked version with the repo tweaked so it runs on MyBinder.

Using a gnuplot_kernel enabled Binder repo, we can now run Gnuplot commands via a web-browser using the ThebeLab Javascript package, for example, and display the result in the same web page. The container on the back is fired up in response to the first command issued from the page, which make take up to a minute or two, and will be used for future commands issued from the page in the same session.

Here’s what it looks like:

(The Gnuplot code is ripped from an example in the Gnuplot docs / gallery.)

The code seems to be repeated in the output, but I guess a tweak to the ThebeLab settings, or code, may fix that. Or maybe the kernel needs a tweak. But the proof of concept is there…

Here’s the code for the web page (image file, sorry… WordPress-com editor’n’sourcecode support sucks and I get fed up faffing around with tag brackets each time I re-edit the page):

That source code image does make a second point, though… Look closely, and compare the URLs in the two images above: I can edit an HTML file via the Jupyter notebook text file editor, and also render the page as a served HTML file.

So that’s a couple more things for my colleagues to say “ah, but it won’t work for my course because…”

Bring it on…

PS the code as a gist:

PPS Interested in keeping up to date with Jupyter news? Sign up to the Tracking Jupyter weekly newsletter.