Fragments – Looking for Ways of Illustrating Rally Stage Stories

A couple of weekends ago, I spent a chunk of time hacking various bits and pieces around the various live data feeds I could find supporting the WRC Rally of Sardinia. There’s still a bit to do to tidy up what I learned to make it properly useful, but I learned a lot and just need to find the time now to write bits of it up.

One of the things I doodled was a take on a couple of the official live timing screens, specifically one that shows stage results next to overall results at the end of the stage:

and one that shows stage splits:

My table combines these, and rebases the times relative to a particular driver of interest. The layout isn’t ideal, but I was evolving the chart stage on stage, and this is about as far as I got:

The table shows the time deltas with respect to running/stage time at each split on the left, and within each split on the right. Road position, stage position and overall position at the end of the stage are all captured. (The alignment of data in the SS19 Overall column really should be center aligned to give space to left and right and distinguish the grouped split times from the overall stage delta…)

The table cells are coloured using a pandas table styler. One of the issues with the table is the numbers don’t necessarily add up – I was using the Python round function which rounds to the nearest, rather than lowest, tenth, which I need to correct…

Since the rally, I’ve poked around with styles a bit more. One possible modification is to colour cells directly, either using a diverging palette, for example to try to spot patterns in road order, or to embed a bar chart in each cell. Also note the highlighted cell showing the overall rally leader at the end of the stage.

One of the problems with the bar chart is that the range is set automatically for each column; in this case, it would be nice to be able to use the same range for the bars in each column so we could more easily identify the splits within which most time was lost. Another issue is the display of the nan strings: it would be handy if the styler could replace these with an empty string.

As well as tinkering with the table, I also spent a chunk of time thinking about stage maps. Looking around for precedents, I found a couples of really nice examples these evening from a company called GeoRacing. The first is an animated route (the full glory of it can be seen in their Tour de Corse 2015 route preview:

The Leaflet Polyline snake animation plugin looks like it might provide a way of achieving that sort of effect, at least on a 2D map, given a GPS/geojson route map?

The second graphic was a really neat virtual timing “race view” that ghosted cars along the route to show their relative positions. Also note the rebased time graphic showing deltas between the highlighted driver and the other drivers:

You get a much better idea of how exciting  – and good for storytelling – this “race view” is in the full GeoRacing Tour de Corse 2015 showreel.

Maybe I could use the Leaflet.MovingMarker plugin to animate separate markers to achieve a similar effect to the virtual cars in that example?

Other forms of racing also provide a showcase for geotemporal data visualisation. For example, Crossbox Laptiming provide an app for motocross riders that also makes use of a geochronometric (?) display to compare lap times and lines:

By the by, it looks as if one of the official technology providers for WRC for 2018-2022 are an Australian company called Status Awareness Systems and their RallySafe product. The original FIA tender is still up on the FIA tendering website: Timing, Tracking and Connectivity Solution for FIA WRC – Call for expressions of interest and selection process [PDF].

It was quite interesting to see how the tender split out core/required and optional services:

20170717_vf_call_timing_tracking_fia_wrc_with_app__2__pdf

(It’s actually quite fascinating looking around the web for other providers of rally monitoring systems. For example, Nam System, a Czech company that provide asset and fleet monitoring services, also use their ONI system for rally management. It’s not just F1 that can be used to drive innovation and showcase technologies in the context of motorsport…

PS see also this round up from a year or two ago on my once again lapsed Digital Worlds blog: Augmented TV Sports Coverage & Live TV Graphics.

PPS Also related, Visualising WRC Rally Stages With Relive?.

Some More Rally Result Chart Sketches

Some more sketches, developing / updating one of the charts I first played with last year (the stage chart and tinkering with something new.

First the stage chart – I’ve started pondering a couple of things with this chart to try to get the information density up a bit.

At a first attempt at updating the chart, I’ve started to look at adding additional marginal layers. In the example above:

  • vertical dashed lines separate out the different legs. As soon as I get the data to hand, I think it could make sense to use something like a solid line to show service, maybe a double solid line to show *parc fermé*; I’m not sure about additionally separating the days? (They’re perhaps implied by *parc fermé*? I need to check that…)
  • I added stage names *above* the chart  – this has the benefit of identifying stages that are repeated;
  • stage distances are added *below* the chart. I’ve also been wondering about adding the transit distances in *between* the stages;
  • driver labels – and positions – are shown to the left and the right.

As a second attempt, I started zooming in to just the stages associated with a particular leg. This encouraged me to start adding more detailed layers. These can be applied to the whole chart, but it may start to get a bit cluttered.

Here’s an example of a chart that shows three stages that make up a notional leg:

You’ll notice several additions to the chart:

  • the labels to the left identify the driver associated with each line. The number is currently the overall position of the driver at the end of the first stage in the leg, but I’m not sure if it should be the position at the end of the previous stage so it carries more information. The time is the gap to the overall leading driver at the end of the first stage;
  • the labels to the right show the overall positions and gap to overall leader at the end of the leg. The position label is in bold font if the driver position has improved over the leg (a switch lets you select whether this is a class rank improvement or an overall position improvement). Thinking about it, I could use italics for class improvement and bold for overall improvement to carry both pieces of information in the same label. The position is actually redundant (you can count…) so maybe it’d make more sense to give a position delta from the start of the leg (that is, the position at the end of the stage prior to the first stage shown in the current leg). The time delta is given in bold if it is better than at the start of the leg.
  • the red dots depict that the gap to the overall leader had *increased* for a driver by the end of the stage compared to the end of the previous stage. So a red dot means the car is further behind the leader at the end of the stage than they were at the end of the previous stage; this indicator could be rebased to show deltas between a target (“hero”) car and the other cars on the stage. The green dot shows that the time to the leader did not increase;
  • the grey labels at the top are a running count of the number of “wins in a row” a driver has had. There are options to choose other running counts (eg stage wins so far), and flags available for colouring things like “took lead”, “retained lead”, “lost lead”.

As well as the stage chart, I started wondering about an “ultimate stage report” for each stage, showing the delta between each driver and the best time achieved in a sector (that is, the time spent between two splits).

Here’s what I came up with at a first attempt. Time delta is on the bottom. The lower level grey bar indicates the time a driver lost relative to the “ultimate” stage. (The bar maxes out at the upper limit of the chart to indicate “more than” – I maybe need to indicate this visually eg with a dashed / broken line at the end of a maxed out bar.)

Within each driver area is a series of lollipop style charts. These indicate the gap between a driver and the best time achieved on the sector (first sector at the top of the group, last at the bottom). The driver label indicates the driver who achieved the best sector time. This chart could be rebased to show other gaps, but I need to think about that… The labels are coloured to indicate sector, and transparent to cope with some of the overlapping issues.

It’s also possible to plot this chart using a log scale:

This makes it easier to see the small gaps, as well as giving a far range on the delta. However, the log scale is harder to read for folk not familiar with them. It might be handy to put in a vertical dashed line for each power of 10 time (so a dashed line at 1s and 10s; the limit is 100s). It might also make sense to add a label to the right of the total delta bar to show what the actual delta time is.

So… tinkering… I was hoping to start to pull all the chart types I’ve been playing with together in a Leanpub book, but Leanpub is not free to play anymore unless you have generated over $10k of royalties (which I haven’t…). I’ve started looking at gitbook, but that’s new to me so I need to spend some time getting a feel for how to use it and to come up with a workflow /toolchain around it.

Rally Stage Sector Charts, Overlaid On Stage Progress Charts

One of the last rally charts I sketched last year was a dodged bar chart showing “sector” times within a stage for each driver, either as absolute times or rebased relative to a particular target driver. This represented an alternative to the “driver subseries” line charts e.g. as shown here.

Re: the naming of driver subseries charts – this is intended to be reminiscent of seasonal subseries charts.

The original slit time data on the WRC site looks like this:

Taking the raw sector (split) times, we can rebase the times relative to a particular driver. In this case, I have rebased relative to OGI so the sector times are as shown in the table above. The colour basis is the opposite to the basis used in the chart because I am trying to highlight to the target driver where they lost time, rather than where the others gained time. It may be that the chart makes more sense to professionals if I change the colour basis in the chart below, to use green to show that the driver made up that amount of time on the target driver).

The dodged bar charts are ordered with the first split time at the top of the set for each driver. The overall stage time is the lower bar in each driver group.

Here’s how it looks using the other colour basis:

Hmm… maybe that is better…

Note that the above charts also differs from the WRC split times results table in the ordering. The results table orders results in order of start, whereas the above charts rank relative to stage position.

To generate the “sector” times, we can find the difference between each split for a particular driver, and between the final (overall) stage time and the final split time. As before, we can then rebase these times relative to a specific target driver.

This chart shows how the target driver compared to each of the other drivers on each of the “sectors” between each split. So we see OGI dropped quite a lot of time relative to other drivers on the fourth “sector” of the stage, between splits 3 and 4. He made up time against MEE on the first and second parts of the stage.

As well as just showing the split times, we can find the total time delta between the target driver and each of the other drivers on the stage as the sum of the sector times. We can use a lower graphic layer to underplot this total beneath the dodged bars for each driver.

The grey bars show the total time gained / lost by the target driver on the stage relative to the other drivers.

In a similar way, we can also overplot the dodged bars on top of a stage progress chart, recolouring slightly.

This increases the information density of the stage progress chart even further, and provides additional “delta” signals in keeping with the deltascope inspiration / basis for that chart type.

Again, this sort of fits with the warped hydraulic model: the dodged bars can be stacked to give the length of the lighter coloured bar underneath them

(I’m not sure I’ve ordered the drivers correctly in the above chart – it was generated from two discordantly arranged datasets. The final chart will be generated from a single, consistent dataframe.)

PS it strikes me that the dodged bars need to be transparent to show a solid bar underneath that doesn’t extend as far as the dodged bars. This might need some careful colour / transparency balancing.

Sketching – WRC Stage Progress Chart

Picking up on some sketches I started doing around WRC (World Rally Championship) results data last year, and after rewriting pretty much everything following an update to WRC’s website that means results data can now be accessed via JSON data feeds, here are some notes on a new chart type, which I’m calling a Stage Progress Chart for now.

In contrast to one of my favourite chart forms, the macroscope, which is intended to visualise the totality of a dataset in a single graphic, this one is developed as a deltascope, where the intention is to visualise a large number of differences in same chart.

In particular, this chart is intended to show:

  • the *time difference* between a given car and a set of other cars at the *start* of a stage;
  • the *time difference* between that same given car and the same set of other cars at the *end* of a stage;
  • any *change in overall ranking* between the start and end of the stage;
  • the *time gained* or the *time lost* by the given car relative to each of the other cars over the stage.

Take 1

The data is presented on the WRC website as a series of data tables. For example, for SS4 of WRC Rally Sweden 2018 we get the following table:

The table on the left shows the stage result, the table on the right the overall position at the end of the stage. DIFF PREV is the time difference to the car ahead, and DIFF 1st the difference to the first ranked driver in the table, howsoever ranked: the rank for the left hand table is the stage position, the rank on the right hand table is overall rally position at the end of the stage.

One thing the table does not show is the start order.

The results table for SS4 is shown below. SS4 is the stage that forms the focus for this walkthrough.

Data is obtained by scraping the WRC results JSON data feeds and popping it into a SQLite3 database. A query returns the data for a particular stage:

We can pivot the data to get the total accumulated times (in ms) at the end of each stage for the current (SS4) and previous (SS3) stage for each driver onto the same row:

Obtaining the times for a target driver allows us to rebase the times relative to that driver as a Python dict:

Rebasing is simply a matter of relativising times with respect to the target driver times:

The rebased accumulated rally times for each stage are relative to the accumulated rally time for the target driver at the end of the corresponding stage. The rebased delta gives the time that the target driver either made up, or lost against, each other car on the stage. The deltaxx value is “a thing” used to support the chart plotting. It’s part of the art… ;-)

UPDATE: it strikes me that thing() is actually returning the value closest to zero if the signs are the same:

if samesign(overall,delta): return min([overall,delta], key=abs)
return delta

So, that’s data wrangly bits… here’s what we can produce from it:

The chart is rebased according to a particular driver, in this case Ott Tänak. It’s overloaded and you need to learn to read it. It’ll be interesting to see how natural it comes to read with some practise – if it doesn’t, then it’s not that useful; if it does, it may be handy for power users.

First up, what do you see?

The bars are ordered and have positive (left) and negative (right) values, with names down the side. The names are driver labels, in this case for WRC RC1 competitors. The ordering is the in overall rally class rank order at the end of the stage identified in the title (SS4). So at the end of stage SS4, NEU was in the lead of the rally class, with MIK in second.

The chart times are rebased relative to TÄN – at the moment this is not explicitly identified, though it can be read from the chart – the bars for TÄN are all set to 0.

Rebasing means that the times are normalised (rebased) relative to a particular driver (the “target driver”).

The times that are used for the rebasing are:

  • the difference between the accumulated rally stage time of the target driver and each other driver at the end of the *previous* stage;
  • the difference between the accumulated rally stage time of the target driver and each other driver at the end of the specified stage (in this case, SS4).

From these differences, calculated relative to the target driver, we can calculate the time gained (or lost) on the stage by the target driver relative to each of the other drivers.

The total length of each continued coloured bar indicates the time delta (that is, the time gained/lost by the target driver relative to each other car) on the stage between the target driver and each other car. Red means time was lost, green means time was gained.

So TÄN lost about 15s (red) to NEU on the stage, and gained about 50s (green) on AL. He also lost about 3s (pink) to MEE and about 15s to BRE (pink plus red).

The furthest extent of the light grey bar, or the solid (red / green) bar shows the overall difference to that driver from the target at the end of the stage. So NEU is 17s or so ahead and EVA is about 80s behind. Of the 80s or so EVA is behind, TÄN made about 50s of those (green) on the current stage. Of the 17s or so TÄN is behind NEU, NEU made up about 15s on the current stage, and went into the stage ahead (grey to the right) of TÄN by about 2s.

If a pastel (pink or light green) colour totally fills a bar to left or right, that is a “false time”. It doesn’t indicate the overall time at the end of the stage (the grey bar on the other side does that), bit it does allow you to read off how much time was gained / lost.

The dashed bar indicates situations where the driver started the stage relative to the target driver. So SUN started the stage about 15s behind and made up about 14s (the pink bar to the right). The grey bar to the left shows SUN finished the stage about 1s behind TÄN. PAD also started about 15s behind and made up about 15s, leaving just a fraction of a second behind (below) TÄN at the end of the stage. If you look very closely, you might just see a tiny sliver of grey to the left for PAD.

Where coloured bars straddle the zero line, that shows the the target driver has gone from being ahead (or behind) a driver at the start of the stage to being behind (ahead) of them at the end of the stage. So MIK, LAP, OST, BRE and LAT all started the stage behind TÄN but finished overall ahead at the end. (If you look at the WRC overall result table on the right for SS3 (the previous stage) you’ll see TÄN was second overall. At the end of SS4, he was in seventh, as the stage progress chart shows.)

Here’s the chart for the same stage rebased relative to PAD.

Here we see that PAD make up time on TÄN and LAT, remaining a couple of seconds behind LAT (the light green that fills the bar to the left shows this is a “false” time) and not quite getting ahead of TÄN overall (TÄN is above PAD). SUN held stay relative to PAD, but PAD took good chunks of time away from MEE all the way down (green). PAD also lost (red) a second or two to each of NEU, LAP and OST, and lost most to MIK.

Take 2

The interpretation of the pink /light green bars is a bit confusing, so let’s simplify a little to show times that should be added using the solid red/green colour, and the pastel colours purely as false times.

We can also simplify the chart symbols further by removing the dashed line and just using a solid line for bars.

Now the solid colour clearly shows the times should be added together to give the overall delta on the stage between the rebase target car and the other cars.

One thing missing from the chart is the start order, which could be added as part of the y-axis label. The driver the chart is rebased relative to should also be identified clearly, even if just in the chart title.

As to how to understanding how the chart works, one way is to think of it as using a warped hydraulic model.

For example:

  • imagine that a grey bar to the right shows how far behind a particular car the target driver is at the start of the stage; being ahead, the bar is above that of the target car:
    • the car extends its lead over the target driver during the stage, so the bar plunger is pulled to the right and it fills with red (time lost) fluid at the base. How much red fluid is the amount of additional time the target driver lost on that stage to the target bar.
    • the car reduces its overall lead over the target driver during the stage, which is to say the target driver gains time. The plunger is pushed to the left and fills with notional light green time, to the left, representing the time gained by the target driver; the actual time the target driver is behind the car is still shown by the solid grey bar to the right; the white gap at the end of the bar (similar in size to the notional light green extruded to the left) shows how much of the lead the car had at the start of the stage is has been lost during the stage. Reading off the width of the white area is har to do, which is why we map it to the notional light green time to the left;
    • the car loses all its lead and falls behind the target car, overall, by the end of the stage. In this case, the grey plunger to the right is pushed all the way down, past zero, which fills the whole bar with solid green; the bar is further pushed to the left by the amount of time the car is now behind the target car, overall, at the end of the stage. The total width of the solid green bar is the total amount of time the target car gained on the stage. The vertical positioning of the bar also falls below that of the target car to show it is now behind. That the bar is filled green to the right also indicates that the car was ahead of the target car at the end of the previous stage;
  • imagine that a grey bar to the left shows how far ahead of a particular car the target driver is at the start of the stage; behind behind, the bar is below that of the target car:
    • the car loses further ground on the stage, so the bar is extended to the left and fills with green “time gained by target car” fluid at the base;
    • the car makes some time back on the target car on the stage, but not enough to get ahead overall; the bar is pushed to the right, leaving a white space to the left indicative of the time the target car lost. The amount by which the grey still extends to the left is the time the target driver is still ahead of the car at the end of the stage. A notional pink time to the right (the same width as the white space created on the left) is shown to the right;
    • the car makes all the time it was behind at the start of the stage back, and then some; the bar moves above the target car on the vertical dimension. The plunger is pushed so far from he left it passes zero and fills the bar with solid red (time lost by target car). The amount the bar extends to the right is the time the car is ahead of the target car, overall, at the end of the stage. The total width of the completely solid red bar is the total time the target driver lost relative to that car on the stage. The fact that the bar is above the target driver and has a solid red bar that extends on the left as well as the right further shows that the car was behind the target at the start of the stage.

I’m still feeling my way with this one… as mentioned at the start, it’ll be interesting to see how natural it feels – or not – after some time spent trying to use it.

Visualising WRC Rally Stages With Relive?

A few days ago, via Techcrunch, I came across the Relive application that visualises GPS traces of bike rides using 3D Google Earth style animations using a range of map data sources.

Data is uploaded using GPX, TCX, or FIT formatted data – all of which are new to me. Standard KML uploads don’t work – time stamps are required for each waypoint.

Along the route, photographic waypoints can be added to illustrate the journey, which got me thinking: this could be a really neat addition to the Rally-maps.com website, annotating stage maps after a race with:

  • photographs from various locations on the stage;
  • images at each split point showing the leaderboard and time splits from each stage;
  • pace info, showing the relative pace across each stage, perhaps captured from a reconnaissance vehicle or zero car.

Alternatively, it might be something that the WRC – or Red Bull TV, who are providing online and TV coverage of this year’s rallys – could publish?

And if they want to borrow some of my WRC chart styles for waypoint images, I’m sure something could be arranged:-)

Grouping Numbers that are Nearly the Same – Casual Clustering

A couple of reasons for tinkering with WRC rally data this year, over and the above the obvious of wanting to find a way to engage with motorsport at a data level, specifically, I wanted a context for thinking a bit more about ways of generating (commentary) text from timing data, as well as a “safe” environment in which I could look for ways of identifying features (or storypoints) in the data that might provide a basis for making interesting text comments.

One way in to finding features is to look at a visual representations of the data (that is, just look at charts) and see what jumps out… If anything does, then you can ponder ways of automating the detection or recognition of those visually compelling features, or things that correspond to them, or proxy for them, in some way. I’ll give an example of that in the next post in this series, but for now, let’s consider the following question:how can we group numbers that are nearly the same? For example, if I have a set of stage split times, how can I identify groups of drivers that have recorded exactly, or even just nearly, the same time?

Via StackOverflow, I found the following handy fragment:

def cluster(data, maxgap):
    '''Arrange data into groups where successive elements
       differ by no more than *maxgap*

        cluster([1, 6, 9, 100, 102, 105, 109, 134, 139], maxgap=10)
        [[1, 6, 9], [100, 102, 105, 109], [134, 139]]

        cluster([1, 6, 9, 99, 100, 102, 105, 134, 139, 141], maxgap=10)
        [[1, 6, 9], [99, 100, 102, 105], [134, 139, 141]]

    '''
    data.sort()
    groups = [[data[0]]]
    for x in data[1:]:
        if abs(x - groups[-1][-1]) <= maxgap:
            groups[-1].append(x)
        else:
            groups.append([x])
    return groups

print(cluster([2.1,7.4,3.9,4.6,2.5,2.4,2.52],0.35))
[[2.1, 2.4, 2.5, 2.52], [3.9], [4.6], [7.4]]

It struck me that a tweak to the code could limit the range of any grouping relative to a maximum distance between the first and the last number in any particular grouping – maybe I don’t want a group to have a range more than 0.41 for example (that is, strictly more than a dodgy floating point 0.4…):

def cluster2(data, maxgap, maxrange=None):
    data.sort()
    groups = [[data[0]]]
    for x in data[1:]:
        inmaxrange = True if maxrange is None else abs(x-groups[-1][0]) <=maxrange
        if abs(x - groups[-1][-1]) <= maxgap and inmaxrange:
            groups[-1].append(x)
            groups[-1].append(x)
        else:
            groups.append([x])
    return groups

print(cluster2([2.1,7.4,3.9,4.6,2.5,2.4,2.52],0.35,0.41))
[[2.1, 2.4, 2.5], [2.52], [3.9], [4.6], [7.4]]

A downside of this is we might argue we have mistakenly omitted a number that is very close to the last number in the previous group, when we should rightfully have included it, because it’s not really very far away from a number that is close to the group range threshold value…

In which case, we might pull back numbers into a group that are really close to the current last member in the group irrespective of whether we past the originally specified group range:

def cluster3(data, maxgap, maxrange=None, maxminrange=None):
    data.sort()
    groups = [[data[0]]]
    for x in data[1:]:
        inmaxrange = True if maxrange is None else abs(x-groups[-1][0])<=maxrange
        inmaxminrange = False if maxminrange is None else abs(x-groups[-1][-1])<=maxminrange
        if (abs(x - groups[-1][-1]) <= maxgap and inmaxrange) or inmaxminrange:
            groups[-1].append(x)
        else:
            groups.append([x])
    return groups

print(cluster3([2.1,7.4,3.9,4.6,2.5,2.4,2.52],0.35,0.41,0.25))
[[2.1, 2.4, 2.5, 2.52], [3.9], [4.6], [7.4]]

With these simple fragments, I can now find groups of times that are reasonably close to each other.

I can also look for times that are close to other times:

trythis = [x for x in cluster3([2.1,7.4,3.9,4.6,2.5,2.4,2.52],0.35,0.41,0.25) if 2.4 in x]
trythis[0] if len(trythis) else ''
[2.1, 2.4, 2.5, 2.52]

PS I think the following vectorised pandas fragments assign group numbers to rows based on the near matches of numerics in a specified column:

def numclustergroup(x,col,maxgap):
    x=x.sort_values(col)
    x['cluster'] = (x[col].diff()>=maxgap).cumsum()
    return x

def numclustergroup2(x,col,maxgap,maxrange):
    x=x.sort_values(col)
    x['cluster'] = (x[col].diff()>=maxgap).cumsum()
    x['cdiff']=x.groupby('cluster')[col].diff()
    x['cluster'] = ((x.groupby('cluster')['cdiff'].cumsum()>maxrange) | (x[col].diff()>=maxgap)).cumsum()
    return x.drop('cdiff',1)

def numclustergroup3(x,col,maxgap,maxrange,maxminrange):
    x=x.sort_values(col)
    x['cluster'] = (x[col].diff()>=maxgap).cumsum()
    x['cdiff']=x.groupby('cluster')[col].diff()
    x['cluster'] = (((x.groupby('cluster')['cdiff'].cumsum()>maxrange) | (x[col].diff()>=maxgap)) & (x[col].diff()>maxminrange) ).cumsum()
    return x.drop('cdiff',1)

#Test
uu=pd.DataFrame({'x':list(range(0,8)),'y':[1.3,2.1,7.4,3.9,4.6,2.5,2.4,2.52]})
numclustergroup(uu,'y',0.35)
numclustergroup2(uu,'y',0.35,0.41)
numclustergroup3(uu,'y',0.35,0.41,0.25)

The basic idea is to generate logical tests that evaluate as True whenever you want to increase the group number.