I So Want Try a Docker/Kitematic ContainerBook…

So it seems that Chrome OS joins forces with VMware to accelerate the adoption of Chromebooks in the enterprise.

From a quick skim, it seems as if VMWare’s Workspace ONE product, which is at the heart of the announcement, provides a secure online environment for launching managed, personally contextualised, virtualised services

Which is just pusshin more and more stuff to the web, and more and more requiring that always on netwrok connection.

What I keep thinking I’d like to have is a containerbook, rather than netbook. Think: Docker + Kitematic + Docker Compose + a browser.

Kitematic UI

Services/apps in the container(s) then either run as headless/macine accessed services, or expose an HTML UI accessed via a browser.

Which reminds me: Kitematic still doesn’t support Docker Compose, does it? (Is Panamax still a think in this regard?)

PS another take would be a browser that had Virtualbox built in that could be used to run containers, or could otherwise access desktop virtualisation. This could all get a bit messy though… CF. also things like Windows 10S won’t run Chrome, or the Chrome O/S requirement to use the browser that is the O/S, rather than installing your own browser – such as Microsoft Edge, for example.

Visualising WRC Rally Stages With Relive?

A few days ago, via Techcrunch, I came across the Relive application that visualises GPS traces of bike rides using 3D Google Earth style animations using a range of map data sources.

Data is uploaded using GPX, TCX, or FIT formatted data – all of which are new to me. Standard KML uploads don’t work – time stamps are required for each waypoint.

Along the route, photographic waypoints can be added to illustrate the journey, which got me thinking: this could be a really neat addition to the Rally-maps.com website, annotating stage maps after a race with:

  • photographs from various locations on the stage;
  • images at each split point showing the leaderboard and time splits from each stage;
  • pace info, showing the relative pace across each stage, perhaps captured from a reconnaissance vehicle or zero car.

Alternatively, it might be something that the WRC – or Red Bull TV, who are providing online and TV coverage of this year’s rallys – could publish?

And if they want to borrow some of my WRC chart styles for waypoint images, I’m sure something could be arranged:-)

From Points to (Messy) Lines

A week or so ago, I came up with a new chart type – race concordance charts – for looking at a motor circuit race from the on-track perspective of a particular driver. Here are a couple of examples from the 2017 F1 Grand Prix:

The gap is the time to the car on track ahead (negative gap, to the left) or behind (to the right). The colour indicates whether the car is on the same lap (light blue),  on the lap behind (orange to red), or a lap ahead (dark blue).

In the dots, we can “see” lines relating to the relative progress of particular cars. But what if we actually plot the progress of each of those other cars as a line? The colours represent different cars.

 

bot_conc_lineHUL_conc_line

Here’s another view of the track from Hulkenberg’s perspective with a wider window, whoch by comparison with the previous chart suggests I need to handle better cars that do not drop off the track but do fall out of the display window… (At the moment, I only grab data for cars in the specified concordance window):

HUL-conc_line2

Note that we need to do a little bit of tidying up of the data so that we don’t connect lines for cars that flow off the left hand edge, for example, and then return several laps later from the right hand edge:

#Get the data for the cars, as before
inscope=sqldf(paste0('SELECT l1.code as code,l1.acctime-l2.acctime as acctimedelta,
                       l2.lap-l1.lap as lapdelta, l2.lap as focuslap
                       FROM lapTimes as l1 join lapTimes as l2
                       WHERE l1.acctime < (l2.acctime + ', abs(limits[2]), ') AND l1.acctime > (l2.acctime - ', abs(limits[1]),')
                       AND l2.code="',code,'";'))

  #If consecutive rows for same driver are on more than one focuslap apart, break the line
  inscope=ddply(inscope,.(code),transform,g=cumsum(c(0,diff(focuslap)>1)))
  #Continuous line segments have the same driver code and "group" number

  g = ggplot(inscope)

  #The interaction splits up the groups based on code and the contiguous focuslap group number
  #We also need to ensure we plot acctimedelta relative to increasing focuslap
  g=g+geom_line(aes(x=focuslap, y=acctimedelta, col=code,group=interaction(code, g)))
  #...which means we then need to flip the axes
  g=g+coord_flip()

There may still be some artefacts in the line plotting based on lapping… I can’t quite think this through at the moment:-(

So here’s my reading:

  • near horizontal lines that go slightly up and to the right, and where a lot of places in the window are lost in a single lap are a result of pit stop by the car that lost the places; if we have access to pit information, we could perhaps dot these lines?
  • the “waist” in the chart for HUL shows cars coming together for a safety car, and then HUL losing pace to some cars whilst making advances on others;
  • lines with a constant gradient show a  consistent gain or loss of time, per lap, over several laps;
  • a near vertical line shows a car keeping pace, and neither making nor losing time compared to the focus car.

Local Election Fragments

Reusing stuff from before, a notebook with code to scrape Local Election Notice of Poll PDFs. Includes scripts for geocoding addresses, trying to find whether candidates live in ward or out of ward, searches for possible directorships of locally registered companies amongst the candidates:

[https://gist.github.com/psychemedia/f611f36dbdae5e744a434216690d6c47]

Other things that come to mind, with a bit more data:

  • is a candidate standing for re-election?
  • has a candidate stood previously (and for which party), and/or previously been a councillor?
  • how may committee membership change if a councillor loses their seat?
  • which seats are vulnerable based on previous voting numbers?
  • what are demographics of council wards?

Figure Aesthetics or Overlays?

Tinkering with a new chart type over the weekend, I spotted something rather odd in in my F1 track history charts – what look to be outliers in the form of cars that hadn’t been lapped on that lap appearing behind the lap leader of the next lap, on track.

If you count the number of cars on that leadlap, it’s also greater than the number of cars in the race on that lap.

How could that be? Cars being unlapped, perhaps, and so “appearing twice” on a particular leadlap – that is, recording two laptimes between consecutive passes of the start/finish line by the race leader?

My fix for this was to add an “unlap” attribute that detects whether

#Overplot unlaps
lapTimes=ddply(lapTimes,.(leadlap,code),transform,unlap= seq_along(leadlap))

This groups by leadlap an car, and counts 1 for each occurrence. So if the unlap count is greater than 1, a car a has completed more than 1 lap in a given leadlap.

My first thought was to add this as an overprint on the original chart:

#Overprint unlaps
g = g + geom_point(data = lapTimes[lapTimes['unlap']>1,],
                   aes(x = trackdiff, y = leadlap, col=(leadlap-lap)), pch = 0)

This renders as follows:

Whilst it works, as an approach it is inelegant, and had me up in the night pondering the use of overlays rather than aesthetics.

Because we can also view the fact that the car was on its second pass of the start/finish line for a given lead lap as a key property of the car and depict that directly via an aesthetic mapping of that property onto the symbol type:

  g = g + geom_point(aes( x = trackdiff, y = leadlap,
                          col = (lap == leadlap),
                          pch= (unlap==1) ))+scale_shape_identity()

This renders just a single mark on the chart, depicting the diff to the leader *as well as * the unlapping characteristic, rather than the two marks used previously, one for the diff, the second, overprinting, mark to depict the unlapping nature of that mark.

So now I’m wondering – when would it make sense to use multiple marks by overprinting?

Here’s one example where I think it does make sense: where I pass an argument into the chart plotter to highlight a particular driver by infilling a marker with a symbol to identify that driver.

#Drivers of interest passed in using construction: code=list(c("STR","+"),c("RAI","*"))
if (!is.na(code)){
  for (t in code) {
    g = g + geom_point(data = lapTimes[lapTimes['code'] == t[1], ],
                       aes(x = trackdiff, y = leadlap),
                       pch = t[2])
  }
}

In this case, the + symbol is not a property of the car, it is an additional information attribute that I want to add to that car, but not the other cars. That is, it is a property of my interest, not a property of the car itself.

Race Track Concordance Charts

Since getting started with generating templated R reports a few weeks ago, I’ve started spending the odd few minutes every race weekend around looking at ways of automating the generation of F1 qualifying and race reports.

Im yesterday’s race, some of the commentary focussed on whether MAS had given BOT an “assist” in blocking VET, which got me thinking about better ways of visualising whether drivers are stuck in traffic or not.

The track position chart makes a start at this, but it can be hard to focus on a particular driver (identified using a particular character to infill the circle marker for that driver). The race leader’s track position ahead is identified from the lap offset race leader marker at the right hand side of the chart.

One way to help keep track of things from the perspective of a particular driver, rather than the race leader, is to rebase the origin of the x-axis relative to the that driver.

In my track chart code, I use a dataframe that has a trackdiff column that gives a time offset on track to race leader for each lead lap.

track_encoder=function(lapTimes){
  #Find the accumulated race time at the start of each leader's lap
  lapTimes = ddply(lapTimes, .(leadlap), transform, lstart = min(acctime))

  #Find the on-track gap to leader
  lapTimes['trackdiff'] = lapTimes['acctime'] - lapTimes['lstart']
  lapTimes
}

Rebasing for a particular driver simply means resetting the origin with respect to that time, using the trackdiff time for one driver as an offset for the others, to create a new trackdiff2 for use on the x-axis.

#I'm sure there must be a more idiomatic way of doing this?
rebase=lapTimes[lapTimes['code']==code,c('leadlap','trackdiff')]
rebase=rename(rebase,c('trackdiff'='trackrebase'))
lapTimes=merge(lapTimes,rebase,by='leadlap')
lapTimes['trackdiff2']=lapTimes['trackdiff']-lapTimes['trackrebase']

Here’s how it looks for MAS:

But not so useful for BOT, who led much of the race:

This got me thinking about text concordances. In the NLTK text analysis package, the text concordance function allows you to display a search term centred in the context in which it is found:

concordance

The concordance view finds the location of each token and then displays the search term surrounded by tokens in neighbouring locations, within a particular window size.

I spent a chunk of time wondering how to do this sensibly in R, struggling to identify what it was I actually wanted to do: for a particular driver, find the neighbouring cars in terms of accumulated laptime on each lap. After failing to see the light for more an hour or so, I thought of it in terms of an SQL query, and the answer fell straight out – for the specified driver on a particular lead leadlap, get their accumulated laptime and the rows with accumulated laptimes in a window around it.

inscope=sqldf(paste0('SELECT l1.code as code,l1.acctime-l2.acctime as acctimedelta,
l2.lap-l1.lap as lapdelta, l2.lap as focuslap
FROM lapTimes as l1 join lapTimes as l2
WHERE l1.acctime &lt; (l2.acctime + ', abs(limits[2]), ') AND l1.acctime &gt; (l2.acctime - ', abs(limits[1]),')
AND l2.code="',code,'";'))

Plotting against the accumalated laptime delta on the x-axis gives a chart like this:

If we add in horizontal rules to show laps where the specified driver pitted and vertical bars to show pit windows, we get a much richer particular of the race from the point of view of the driver.

Here’s how it looks from the perspective of BOT, who led most of the race:

Different symbols inside the markers can be used to track different drivers (in the above charts, BOT and VET are highlighted). The colours are used to identify whether or not cars on the same lap as the specified driver, are cars on laps ahead for shades of blue then green (as per “blue flag”) and orange to red for cars on increasing laps behind (i.e. backmarkers from the perspective of the specified driver). If a marker is light blue, that car is on the same lap and you’re racing…

All in all, I’m pretty chuffed (for now!) with how that chart came together.

And a new recipe to add to the Wrangling F1 Data With R book, I guess..

PS in response to [misunderstanding…] a comment from @sidepodcast, we also have control over the concordance window size, and the plotsize:

concordresize

Generating hi-res versions in other file formats is also possible.

Just got to wrap it all up in a templated report now…

PPS On the track position charts, I just noticed that where cars are lapped, they fall off the radar… so I’ve added them in behind the leader to keep the car count correct for each leadlap…

trackposrebaselapped

 

PS See also: A New Chart Type – Race Concordance Charts, which also includes examples of “line chart” renderings of the concordance charts so you can explicitly see the progress of each individually highlighted driver on track.

Creating a Jupyter Bundler Extension to Download Zipped Notebook and HTML Files

In the first version of the TM351 VM, we had a simple toolbar extension that would download a zipped ipynb file, along with an HTML version of the notebook, so it could be uploaded and previewed in the OU Open Design Studio. (Yes, I know, it would have been much better to have an nbviewer handler as an ODS plugin, but the we don’t do that sort of tech innovation, apparently…)

Looking at updating the extension today for the latest version of Jupyter notebooks, I noticed the availability of custom bundler extensions that allow you to add additional tools to support notebook downloads and deployment (I’m not sure what deployment relates to?). Adding a new download option allows it to be added to the notebook Edit -&gt; Download menu:

The extension is created as a python package:

# odszip/setup.py
from setuptools import setup

setup(name='odszip',
      version='0.0.1',
      description='Save Jupyter notebook and HTML in zip file with .nbk suffix',
      author='',
      author_email='',
      license='MIT',
      packages=['odszip'],
      zip_safe=False)
#odszip/odszip/download.py

# Copyright (c) The Open University, 2017
# Copyright (c) Jupyter Development Team.

# Distributed under the terms of the Modified BSD License.
# Based on: https://github.com/jupyter-incubator/dashboards_bundlers/

import os
import shutil
import tempfile

#THIS IS A REQUIRED FUNCTION
def _jupyter_bundlerextension_paths():
    '''API for notebook bundler installation on notebook 5.0+'''
    return [{
                'name': 'odszip_download',
                'label': 'ODSzip (.nbk)',
                'module_name': 'odszip.download',
                'group': 'download'
            }]


def make_download_bundle(abs_nb_path, staging_dir, tools):
	'''
	Assembles the notebook and resources it needs, returning the path to a
	zip file bundling the notebook and its requirements if there are any,
	the notebook's path otherwise.
	:param abs_nb_path: The path to the notebook
	:param staging_dir: Temporary work directory, created and removed by the caller
	'''
    
	# Clean up bundle dir if it exists
	shutil.rmtree(staging_dir, True)
	os.makedirs(staging_dir)
	
	# Get name of notebook from filename
	notebook_basename = os.path.basename(abs_nb_path)
	notebook_name = os.path.splitext(notebook_basename)[0]
	
	# Add the notebook
	shutil.copy2(abs_nb_path, os.path.join(staging_dir, notebook_basename))
	
	# Include HTML version of file
	cmd='jupyter nbconvert --to html "{abs_nb_path}" --output-dir "{staging_dir}"'.format(abs_nb_path=abs_nb_path,staging_dir=staging_dir)
	os.system(cmd)

	zip_file = shutil.make_archive(staging_dir, format='zip', root_dir=staging_dir, base_dir='.')
	return zip_file

#THIS IS A REQUIRED FUNCTION       
def bundle(handler, model):
	'''
	Downloads a notebook as an HTML file and zips it with the notebook
	'''
	
	# Based on https://github.com/jupyter-incubator/dashboards_bundlers
	
	abs_nb_path = os.path.join(
		handler.settings['contents_manager'].root_dir,
		model['path']
	)
		
	notebook_basename = os.path.basename(abs_nb_path)
	notebook_name = os.path.splitext(notebook_basename)[0]
	
	tmp_dir = tempfile.mkdtemp()

	output_dir = os.path.join(tmp_dir, notebook_name)
	bundle_path = make_download_bundle(abs_nb_path, output_dir, handler.tools)
		
	handler.set_header('Content-Disposition', 'attachment; filename="%s"' % (notebook_name + '.nbk'))
	
	handler.set_header('Content-Type', 'application/zip')
	
	with open(bundle_path, 'rb') as bundle_file:
		handler.write(bundle_file.read())

	handler.finish()


	# We read and send synchronously, so we can clean up safely after finish
	shutil.rmtree(tmp_dir, True)

We can then create the python package and install the extension, remmebering to restart the Jupyter server for the extension to take effect.

#Install the ODSzip extension package
pip3 install --upgrade --force-reinstall ./odszip

#Enable the ODSzip extension
jupyter bundlerextension enable --py odszip.download  --sys-prefix