Custom Charts – RallyDataJunkie Stage Table, Part 1

Over the last few evenings, I’ve been tinkering a bit more with my stage table report for displaying stage split times, using the Dakar Rally 2019 timing data as a motivator for it; this is a useful data set to try the table out with not because the Dakar stages are long, with multiple waypoints (that is, splits) along each stage.

There are still a few columns I want to add to the table, but for now, here’s a summary of how to start reading the table.

Here’s stage 3, rebased on Sebastien Loeb; the table is ordered according to stage rank:

The first part of the chart has the Road Position (that is, stage start order) using a scaled palette so that out of start order drivers in the ranking are highlighted. The name of the Crew and vehicle Brand follow, and a small inline step chart that shows the evolution of the Waypoint Rank of each crew (that is, their rank in terms of stage time to that point, at each waypoint). The upper grey bar shows podium ranks 1 to 3, the lower grey line is tenth. If a waypoint returns an NA time, we get a break in the line.

Much of the rest of the chart relies on “rebased” times. So what do I mean by “rebased”?

One of the things the original data gives us the stage time it took each driver to get to each way point.

For example, it took Loeb 18 minutes dead to get to waypoint 1, and Peterhansel 17m 58. Rebasing this relative to Loeb suggests Loeb lost 2s to Perterhansel on that split. On the other hand, Coronel took 22:50, so Loeb gained 290s.

Rebasing times relative to a particular driver finds the time difference (delta) between that driver and all the other drivers at that timing point. The rebased times show for each driver other than the target driver are thus the deltas between their times and the time recorded for the target driver. The rebased time display was developed to be primarily useful to the driver with reference to who the rebased times are calculated.

So what’s going on in the other columns? Let’s rebase relative to Loeb.

Here’s what it looks like, again;

The left hand  middle of the table/chart shows time taking in making progress between waypoints.

To start with we have the Stage Gap of each driver relative to Loeb. This is intended to be read from the target driver’s perspective, so where a driver made time over the target driver, we colour it red to show our target lost time relative to that driver. If a driver was slower than the target driver (the target made up time), we colour it green.

The Stage Gap is incremental, based on differences between drivers of based on the total time in stage at each waypoint. In the above case, Loeb was losing out slightly to the first two drivers at the first couple of waypoint, but was ahead of the third place driver. Then something went bad and a larget amount of time was lost.

But how much time? That what the inline bar chart cells show: the time gained / dropped going from one waypoint to the next. The D0_ times capture differences in the time taken going from one split/waypoint to the next. The horizontal bar chart x-axis limits are set on a per column basis, so you need to look at the numbers get a size of how much time gained/lost they represent. The numbers are time deltas in seconds. I ummed and ahhed about the sign of these. At the moment, a positive time means the target (Loeb) was that much time slower (extra, plus) than the driver indicated by the row.

Finally, the Pos column is rank position at the end of the stage.

If we look down the table, around Loeb, we see how Loeb’s times compare to the drivers who finished just ahead —and behind— hi. For drivers ahead in the ranking, their Stage Gap will end up red at the end of the stage, for drivers behind, it’ll be green (look closely!)

Scanning the D0_ bars within a column, it’s obvious which bits of the stage Loeb made, and dropped, time.

The right hand side of the figure considers the stage evolution as a whole.

The Gap to Leader column shows how much time each driver was behind the stage leader at each waypoint (that is, at each waypoint, rank the drivers to see who was quickest getting to that point).

Along with the Waypoint Rank, the Road Position and Gap to Leader, this is the only aspect of the table that is relative to the driver associated with that row: it helps our target (Loeb) put each other driver’s performance on the stage in the context of the overall stage rankings. The dot marker indicates the gap to leader at the end of the stage.

The 0N_ columns show the time delta on stage between each driver and Loeb, which is the say, the delta between the accumulated stage time for each driver at each waypoint. The final column records the amount of time, in seconds, gained or lost by Loeb relative to each driver in the final stage ranking (penalties excepted).

Looking at the table aound Loeb we see the column entries are empty except for the Gap to Leader evolution.

The original version of this chart, which I was working up around WRC 2018, also includes a couple more columns relating to overall rally position at the start and end of the stage. Adding those is part of my weekend playtime homework!

Visualising Rally Route Stages (with help from rayshader and some ecologists…)

Inspired by some 3D map views generated using the rayshader and rgl R packages, I wondered how easy it would be to render some 3D maps of rally stages.

It didn’t take too long to get a quick example up and running but then I started wondering what else I could do with route and elevation data. And it turns out, quite a lot.

The result of my tinkerings to date is at rallydatajunkie.com/visualising-rally-stages. It concentrates soley on a "static analysis" of rally routes: no results, no telemetry, just the route.

Along the way, it covers the following topics:

  • using R spatial (sp) and simple features (sf) packages to represent routes;
  • using the leaflet, mapview and ggplot2 packages to render routes;
  • annotating and splitting routes / linestrings;
  • downloading elevation rasters using elevatr;
  • using the raster package to work with elevation rasters;
  • a review of map projections;
  • exploring various ways of rendering rasters and annotating them with derived terrain features;
  • rendering elevation rasters in 2D using rayshader;
  • an aside on converting images to elevation rasters;
  • rendering and cropping elevation rasters in 3D using rayshader;
  • rendering shadows for particular datetimes at specific locations (suncalc);
  • stage route analysis: using animal movement trajectory analysis tools (trajr, amt, rLFT) to characterise stage routes;
  • stage elevation visualisation and analysis (including elevation analysis using slopes);
  • adding OpenStreetMap data inclduing highways and buildings to maps (osmdata);
  • steps towards creating a road book / tulip map using by mapping stage routes onto OSM road networks (sfnetworks, dodgr).

Along the way, I had to learn various knitr tricks, eg for rendering images, HTML and movies in the output document.

The book itself was written uisng Rmd and then published via bookdown and Github Pages. The source repo is on Github at RallyDataJunkie/visualising-rally-stages.

Automatically Detecting Corners on Rally Stage Routes Using R

One of the things I’ve started pondering with my rally stage route metrics is the extent to which we might be able to generate stage descriptions of the sort you might find on the It Gets Faster Now blog. The idea wouldn’t necessarily be to create finished stage descriptions, more a set of notes that a journalist or fan could use as a prompt to create a more relevant description. (See these old Notes on Robot Churnalism, Part I – Robot Writers for a related discussion.)

So here’s some sketching related to that: identifying corners.

We can use the rLFT (Linear Feature Tools) R package to calculate a convexity measure at fixed sample points along a route (for a fascinating discussion of the curvature/convexity metric, see *Albeke, S.E. et al. Measuring boundary convexity at multiple spatial scales using a linear ‘moving window’ analysis: an application to coastal river otter habitat selection Landscape Ecology 25 (2010): 1575-1587).

By filtering on high absolute convexity sample points, we can do a little bit of reasoning around the curvature at each point to make an attempt at identifying the start of a corner:

library(rLFT)

stepdist = 10
window = 20
routeConvTable <- bct(utm_routes[1,],
                      # distance between measurements 
                      step = stepdist,
                      window = window, ridName = "Name")

head(routeConvTable)

We can then use the convexity index to highlight the sample points with a high convexity index:

corner_conv = 0.1

tight_corners = routeConvTable[abs(routeConvTable$ConvexityIndex)>corner_conv,]
tight_corners_zoom1 = tight_corners$Midpoint_Y>4964000 & tight_corners$Midpoint_Y<4965000

ggplot(data=trj[zoom1, ],
       aes(x=x, y=y)) + geom_path(color='grey') + coord_sf() +
  geom_text(data=tight_corners[tight_corners_zoom1,],
                           aes(label = ConvexityIndex,
                               x=Midpoint_X, y=Midpoint_Y),
                           size=2) +
  geom_point(data=tight_corners[tight_corners_zoom1,],
             aes(x=Midpoint_X, y=Midpoint_Y,
                 color= (ConvexityIndex>0) ), size=1) +
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45))
High convexity points along a route

We can now do a bit of reasoning to find the start of a corner (see Automatically Generating Stage Descriptions for more discussion about the rationale behind this):

cornerer = function (df, slight_conv=0.01, closeby=25){
  df %>%
    mutate(dirChange = sign(ConvexityIndex) != sign(lag(ConvexityIndex))) %>%
    mutate(straightish =  (abs(ConvexityIndex) < slight_conv)) %>%
    mutate(dist =  (lead(MidMeas)-MidMeas)) %>%
    mutate(nearby =  dist < closeby) %>%
    mutate(firstish = !straightish & 
                        ((nearby & !lag(straightish) & lag(dirChange)) |
                        # We don't want the previous node nearby
                        (!lag(nearby)) )  & !lag(nearby) )
}

tight_corners = cornerer(tight_corners)

Let’s see how it looks, labeling the points as we do so with the distance to the next sample point:

ggplot(data=trj[zoom1,],
       aes(x=x, y=y)) + geom_path(color='grey') + coord_sf() +
  ggrepel::geom_text_repel(data=tight_corners[tight_corners_zoom1,],
                           aes(label = dist,
                               x=Midpoint_X, y=Midpoint_Y),
                           size=3) +
  geom_point(data=tight_corners[tight_corners_zoom1,],
             aes(x=Midpoint_X, y=Midpoint_Y,
                 color= (firstish) ), size=1) +
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45))
Corner entry

In passing, we note we can identify the larg gap distances as "straights" (and then perhaps look for lower convexity index corners along the way we could label as "flowing" corners, perhaps).

Something else we might do is number the corners:

Numbered corners

There’s all sorts of fun to be had here, I think!

Idle Reflections on Sensemaking Around Sporting Events, Part 1: Three Phases of Sports Event Journalism

Tinkering with motorsport data again has, as is the way of these things, also got me thinking about (sports) journalism again. In particular, a portion of what I’m tinkering with relates to ideas associated with "automated journalism" (aka "robot journalism"), a topic that I haven’t been tracking so much over the last couple of years and I should probably revisit (for an previous consideration, see Notes on Robot Churnalism, Part I – Robot Writers).

But as well as that, it’s also got me thinking more widely about what sort of a thing sports journalism is, the sensemaking that goes on around it, and how automation might be used to support that sensemaking.

My current topic of interest is rallying, most notably the FIA World Rally Championship (WRC), but also rallying in more general terms, including, but not limited to, the Dakar Rally, the FIA European Rally Championship (ERC), and various British rallies that I follow, whether as a fan, spectator or marshal.

This post is the first in what I suspect will be an ad hoc series of posts following a riff on the idea of a sporting event as a crisis situation in which fans want to make sense of the event and journalists mediate, concentrate and curate information release and help to interpret the event. In an actual crisis, the public might want to make sense of an event in order to moderate their own behaviour or inform actions they should take, or they may purely be watching events unfold without any requirement to modify their behaviour.

So how does the reporting and sensemaking unfold?

Three Phases of Sports Event Journalism

I imagine that "event" journalism is a well categorised thing amongst communications and journalism researchers, and I should probably look up some scholarly references around it, but it seems to me that there are several different ways in which a sports journalist could cover a rally event and the sporting context it is situated in, such as a championship, or even a wider historical context ("best rallies ever", "career history" and so on).

In Seven characteristics defining online news formats: Towards a typology of online news and live blogs, Digital Journalism, 6(7), pp.847-868, 2018, Thorsen, E. & Jackson, D. characterise live event coverage in terms of "the vernacular interaction audiences would experience when attending a sporting event (including build-up banter, anticipation, commentary of the event, and emotive post-event analysis)".

More generally, it seems to me that there are three phases of reporting: pre-event, on-event, and post-event. And it also seems to me that each one of them has access to, and calls on, different sorts of dataset.

In the run up to an event, a journalist may want to set the championship and historical context, reviewing what has happened in the season to date, what changes might result to the championship standings, and how a manufacturer or crew have performed on the same rally in previous years; they may want to provide a technical context, in terms of recent updates to a car, or a review of how the environment may affect performance (for example, How very low ambient temperatures impact on the aero of WRC cars); or they may want to set the scene for the sporting challenge likely to be provided by the upcoming event — in the case of rallying, this is likely to include a preview of each of the stages (for example, Route preview: WRC Arctic Rally, 2021), as well as the anticipated weather! (A journalist covering an international event may also consider a wider social or political view around, or potential economic impact on, the event location or host country, but that is out-of-scope for my current consideration.)

Once the event starts, the sports journalist may move into live coverage as well as rapid analysis, and, for multi-day events, backward looking session, daily or previous day reviews and forward looking next day / later today upcoming previews. For WRC rallies, live timing gives updates to timing and results data as stages run, with split times appearing on a particular stage as they are recorded, along with current stage rankings and time gaps. Stage level timing and results data from large range of international and national rallies is more generally available, in near real-time, from the ewrc-results.com rally results database. For large international rallies, live GPS traces with update refreshes of ervy few seconds for the WRC+ live tracker map, also provide a source of near real time location data. In some cases, "champaionship predictions" will be available shwoing what the championship status would be if the event were to finish with the competitors in the current positions. One other feature of WRC and ERC events is that drivers often give a short, to-camera interviews at the end of each stage, as well as more formal "media zone" interviews after each loop. Often, the drivers or co-drivers themseleves, or their social media teams, will post social media updates, as will the official teams. Fans on-stage may also post social media footage and commentary in near real-time. The event structure also allows for review and preview opportunities througout the event. Each day of a stage rally tends to be segmented into loops, each typically of three or four stages. Loops are often repeated, typically with a service or other form of regroup, (including tyre and light fitting regroups), in-between. This means that the same stages are often run twice, although in many cases the state of the surface may have changed significantly between loops. (Gravel roads start off looking like tarmac; they end up being completely shredded, with twelve inch deep and twelve inch wide ruts carved into what looks like a black pebble beach…)

In the immediate aftermath of the event, a complete set of timing and results data will be available, along with crew and team boss interviews and updated championship standings. At this point, there is an opportunity for a quick to press event review (in Formula One, the Grand Prix + magazine is published within a few short hours of the end of the race), followed by more leisurely analysis of what happened during the event, along with counterfactual speculation about what could have happened if things had gone differently or different choices had been made, in the days following the event.

Throughout each phase, explainer articles may also be used as fillers to raise general background understanding of the sport, as well as specific understanding of the generics of the sport that may be relevant to an actual event (for example, for a winter rally, an explainer article on studded snow tyres).

Fractal Reporting and the Macroscopic View

One thing that is worth noting is that the same reporting structures may appear at different scales in a multi-day event. The review-preview-live-review model works at the overall event level, (previous event, upcoming event, on-event, review event), day level (previous event, upcoming day, on-day, review day), intra-day level (previous loop, upcoming loop, on-loop, review loop), intra-session level (previous stage, upcoming stage, on-stage, review stage) and intra-stage level (previous driver, upcoming driver, on-driver, review driver).

One of the graphical approaches I value for exploring datasets is the ability to take a macroscopic view, where you can zoom out to get an overall view of an event as well as being bale to zoom in to a particular part of the event.

My own tinkerings will rally timing and results information has the intention not only of presenting the information in a summary form as a glanceable summary, but also presenting the material in a way that supports story discovery using macroscope style tools that work at different levels.

By making certain things pictorial, a sports journalist may scan the results table for potential story points, or even story lines: what happened to driver X in stage Y? See how driver Z made steady progress from a miserable start to end up finishing well? And so on.

Rally timing and stage results review chartable.

The above chart summarises timing data at an event level, with the evolution of the rally positions tracked at the stage level. Where split times exist within a stage, a similar sort of chartable can be used to summarise evolution within a stage by tracking times at the splits level.

These "fractal" views thus provide the same sort of view over an event but at different levels of scale.

What Next?

Such are the reporting phases available to the sports journalist; but as I hope to explore in future posts, I believe there is also a potential for crossover in the research or preparation that journalists, event organisers, competitors and fans alike might indulge in, or benefit from when trying to make sense of an event.

In the next post in this series, I’ll explore in more detail some of the practices involved in each phase, and start to consider how techniques used for collaborative sensemaking and developing situational awareness in a crisis might relate to making sense of a sporting event.

Thinks: Symbolic Dynamics for Categorising Rally Stage Wiggliness?

Many years ago, i had the privilege of attending a month long complex systems summer school organised by the Santa Fe Institute. One of the lecture series presented was by Michael Jordan and from it I remember a couple of really werful concepts, if not the detail. One was the Bayes Ball, and the other was symbolic dynamics.

I’ve briefly tinkered with a very simple symbolic dynamics representatio before in an attempt to come up with signatures for identifying different sorts of simple dynamics for summarisingdriver’s performance in rally stages (e.g. Detecting Features in Data Using Symbolic Coding and Regular Expression Pattern Matching) and I’ve started wondering again about whether the approach might also be useful in trying to capture something of the wiggliness of rally stage routes.

To this end, the following quote looks relevant, even if it does come from a paper on heart rate dynamics in rats:

Symbolic Dynamics

The symbolic dynamics method, proposed by Porta, aims to convert the CI and SAP series in a sequence of symbols and evaluates the dynamics of each three consecutive symbols (words). First, a procedure known as uniform quantization is applied to the CI or SAP series, where the full range of values is divided into six equal levels. Each quantization level is represented by a symbol (0 to 5) and all points within the same level will be assigned the same symbol. Next, sequences of three consecutive symbols (words) are evaluated and classified according to its variation pattern: zero variation (0V), one variation (1V), two like variations (2LV) or two unlike variations (2UV).

The 0V family comprises words where there is no variation between symbols, i.e., all symbols are equal. The sequences {0,0,0} and {3,3,3} are examples of sequences from this class. The 1V family represents words that have only one variation from one symbol to another, i.e. sequences with two consecutive equal symbols and one different. Examples of sequences of this family are {5,2,2} and {0,0,1}. The 2LV family is composed of words containing three different symbols but with the same variations direction, i.e. in ascending or descending order. Examples of sequences of this family are {1,2,5} and {3,2,1}. Lastly, 2UV family comprises sequences that form a peak or a valley, i.e. with two different variations, in opposite directions. The sequences {2,4,2} and {3,0,1} are examples of this family.

Once this classification is made for the entire series, the percentage of patterns classified in each family is used for analysis.

Silva, L.E.V., Geraldini, V.R., de Oliveira, B.P. et al. Comparison between spectral analysis and symbolic dynamics for heart rate variability analysis in the rat. Sci Rep 7, 8428 (2017). https://doi.org/10.1038/s41598-017-08888-w

So, something to play with there: three tuple sequences and the changes within them, which could perhaps be useful for identifying right-left-right / left-right-left sections in a route etc. Hmm…

Thinks Another: Using Spectrograms to Identify Stage Wiggliness?

Last night I started wondering about ways in which I might be able to use signal processing (Fourier analysis) or symbol dynamics (eg Thinks: Symbolic Dynamics for Categorising Rally Stage Wiggliness?) to help categorise the nature of rally stage twistiness.

Over a morning coffee break, I reminded myself of spectrograms, graphical devices that chunk a time series into a sequence of steps, and than display a frequency plot of each part. Which got me wondering: could I use a spectrogram to segment a stage route and analyse the spectrum of some signal taken along the route to identify wiggliness at that part of the stage?

If I’m reading it right [I wasn’t… the distances were wrong for a start: note to self – check the default parameter settings!], I think the following spectrogram does show some possible differences in wiggliness for different segments along the stage?

Image

The question then becomes: what signal (as a function of distance along line) to use? The above spectrogram is based on the perpendicular distance of the route from the straight line connecting the start and end points of the route.

# trj is a trajr route
straight = st_linestring(data.matrix(rbind(head(trj[,c('x','y')], 1),
                                           tail(trj[,c('x','y')], 1))))

straight_sf = st_sfc(straight,
                     crs=st_crs(utm_routes))

trj_d = TrajRediscretize(trj, 10)
utm_discretised = trj_d %>% 
                    sf::st_as_sf(coords = c("x","y")) %>% 
                    sf::st_set_crs(st_crs(utm_routes[route_index,]))

# Get the rectified distance from the midline
# Can we also get whether it's to left or right?
perp_distances = data.frame(d_ = st_distance(utm_discretised,
                                             straight_sf))
# Returned distance is given as units
perp_distances$d = as.integer(perp_distances$d_)

perp_distances$i = 10 * (1:nrow(perp_distances))
#perp_distances$i = units::set_units(10 * (1:nrow(perp_distances)), 'm')

We can then do something like a low pass filter:

library(signal)

# High pass filter
bf <- butter(2, 0.9, type="high")
perp_distances$d_hi <- filter(bf, perp_distances$d)

and generate the spectrogram show above:

# We could just plot this direct
spec = specgram(perp_distances$d_hi)

# Or make pretty
# Via:https://hansenjohnson.org/post/spectrograms-in-r/
library(oce)
# discard phase information
P = abs(spec$S)

# normalize
P = P/max(P)

# convert to dB
P = 10*log10(P)

# config time axis
t = spec$t

# plot spectrogram
imagep(x = t,
       y = spec$f,
       z = t(P),
       col = oce.colorsViridis,
       ylab = 'Frequency [Hz]',
       xlab = 'Time [s]',
       drawPalette = T,
       decimate = F
)

However, it would possibly make more sense to use something line the angle of turn, convexity index, or radius of curvature at each 10m step as the signal…

Hmmm…


Related: Rapid ipywidgets Prototyping Using Third Party Javascript Packages in Jupyter Notebooks With jp_proxy_widget (example of a waversurfer.js spectrogram js app widgetised for use in Jupyter notebooks).

If you listen to that track it’s really interesting seeing how the imagery maps onto the sound. Eg in the above image you can see a lag in an edge between right and left channels towards the end of the trace, which translates to hearing an effect in the left channel echoed a moment later in the right.

Which makes me think: could I use telemetry from two drivers as left and right stereo tracks and try to sonify the telemetry differences between them using distance along stage as the x axis value and some mapping of different telemetry channels onto frequency…? For example, brake on the bass, throttle at the top, and lateral acceleration in the mid-range?

From Visual Impressions to Visual Opinions

In The Analytics Trap I scribbled some notes on how I like using data not as a source of "truth", but as a lens, or a perspective, from a particular viewpoint.

One idea I’ve increasingly noticed being talked about explcitly across various software projects I follow is the idea of opionated software and opionated design.

According to the Basecamp bible, Getting Real, [th]e best software takes sides. … [Apps should] have an attitude. This seems to lie at the heart of opinionated design.

A blog post from 2015, The Rise of Opinionated Software presents a widely shared definition: Opinionated Software is a software product that believes a certain way of approaching a business process is inherently better and provides software crafted around that approach. Other widely shared views relate to software design: opinonated software should have "a view" on how things are done and should enforce that view.

So this idea of opinion is perhaps one we can riff on.

I’ve been playing with data for years, and one of things I’ve believed, throughout, in my opinionated way, is that its an unreliable and opinionated witness.

In the liminal space between wake and sleep this morning, I started wondering about how visualisations in particular could range from providing visual impressions to visual opinions.

For example, here’s a view of a rally stage, overlaid onto a map:

This sort of thing is widely recongnisable to anyboy had use an online map, and anyone who has seen a printed map and drawn a route on it.

Example interactive map view

Here’s a visual impression of just the route:

View of route

Even this view is opinionated because the co-ordinates are projected to a particular co-ordinate system, albeit the one we are most familiar with when viewing online maps; but other projections are available.

Now here’s a more opinionated view of the route, with it cut into approximuately 1km segments:

Or the chart can express an opinion about where it things significant left and right hand corners are:

The following view has strong opinions about how to display each kilometer section: not only does it make claims about where it things significant right and left corners are, it also rotates each segment to so the start and end point of the section lay on the same horixontal line:

Another viewpoint brings in another dimension: elevation. It also transforms the flat 2D co-ordinates of each point along the route to a 1-D distance-along-route measure allowing us to plot the elevation against a 1-D representation of the route in a 2D (1D!) line chart.

Again, the chart expresses an opinion about where the significant right and left corners are. The chart also chooses not to be more helpful than it could be: if vertical grid lines corresponded to the start and end distace-into-stage values for the segmented plots, it would be easier to see how this chart relates to the 1km segmented sections.

At this point, you may say that the points are "facts" from the data, but again, they really aren’t. There are various ways of trying to define the intensity of a turn, and there may be various ways of calculating any particular measure that give slightly differnent results. Many definitions rely on particular parameter settings (for example, if you measure radius of curvature from three points on a route, how far should those points be apart? 1m? 10m? 20m? 50m?

The "result" is only a "fact" insofar as it represents the output of a particular calculation of a particular measure using a particular set of parameters, things that are typically not disclosed in chart labels, often aren’t mentioned in chart captions, and may or may not be disclosed in the surrounding text.

On the surface, the chart is simply expressing an opion about how tight any of the particular corners are. If we take it a face value, and trust its opinion is based on reasonable foundations, then we can accept (or not accept) the chart’s opinion aabout where the significant turns are.

If we were really motivated to understand the chart’s opinion further, if we had access to the code that generated it we could start to probe its definition of "significnant curvature" to see if we agree with the principles on which the chart has based its opinion. But in most cases, we don’t do that. We take the chart for what it is, typically accept it for what it appears to say, and ascribe some sort of truth to it.

But at the end of the day, it’s just an opinion.

The charts were generated using R based on ideas inspired by Visualising WRC Rally Stages With rayshader and R [repo].

When Less is More: Data Tables That Make a Difference

In the previous post, From Visual Impressions to Visual Opinions, I gave various examples of charts that express opinions. In this post, I’ll share a few examples of how we can take a simple data table and derive multiple views from it that each provide a different take on the same story (or does that mean, tells different stories from the same set of "facts"?)

Here’s the original, base table, showing the recorded split times from a single rally stage. The time is the accumulated stage time to each split point (i.e. the elapsed stage time you see for a driver as they reach each split point):

From this, we immediately note the ordering (more on this in another post) which seems not useful. It is, in fact, the road order (i.e. the order in which each driver started the stage).

We also note that the final split is not the actual final stage time: the final split in this case was a kilometer or so before the stage end. So from the table, we can’t actually determine who won the stage.

Making a Difference

The times presented are the actual split times. But one thing we may be more interested in is the differences to see how far ahead or behind one driver another driver was at a particular point. We can subtract one driver’s time from anothers to find this difference. For example, how did the times at each split compare to first on road Ogier’s (OGI)?

Note that we can “rebase” the table relative to any driver by subtracting the required driver’s row from every other row in the original table.

From this “rebased” table, which has fewer digits (less ink) in it than the original, we can perhaps more easily see who was in the lead at each split, specifically, the person with the minimum relative time. The minimum value is trivially the most negative value in a column (i.e. at each split), or, if there are no negative values, the minimum zero value.

As well a subtracting one row from every other row to find the differences realative to a specified driver, we can also subtract the first column from the second, the second from the third etc to find the time it took to get from one split point to the next (we subtract 0 from the first split point time since the elapsed time into stage at the start of the stage is 0 seconds).

The above table shows the time taken to traverse the distance from one split point to the next; the extra split_N column is based on the final stage time. Once again, we could subtract one row from all the other rows to rebase these times relative to a particular driver to see the difference in time it took each driver to traverse a split section, relative to a specified driver.

As well as rebasing relative to an actual driver, we can also rebase relative to variously defined “ultimate” drivers. For example, if we find the minimum of each of the “split traverse” table columns, we create a dummy driver whose split section times represent the ultimate quickest times taken to get from one split to the next. We can then subtract this dumny row from every row of the split section times table:

In this case, the 0 in the first split tells us who got to the first split first, but then we lose information (withiut further calculation) about anything other than relative performance on each split section traverse. Zeroes in the other columns tell us who completed that particular split section traverse in the quickest time.

Another class of ultimate time dummy driver is the accumulated ultimate section time driver. That is, take the ultimate split sections then find the cumulative sum of them. These times then represent the dummy elapsed stage times of an ultimate driver who completed each split in the fastest split section time. If we rebase against that dummy driver:

In this case, there may be only a single 0, specifically at the first split.

A third possible ultimate dummy driver is the one who “as if” recorded the minimum actual elapsed time at each split. Again, we can rebase according to that driver:

In this case, will be at least one zero in each column (for the driver who recorded that particular elapsed time at each split).

Visualising the Difference

Viewing the above tables as purely numerical tables is fine as far as it goes, but we can also add visual cues to help us spot patterns, and different stories, more readily.

For example, looking at times rebased to the ultimate split section dummy driver, we get the following:

We see that SOL was flying from the second split onwards, getting from one split to another in pretty much the fastest time after a relatively poor start.

The variation in columns may also have something interesting to say. SOL somehow made time against pretty much every between split 4 and 5, but in the other sections (apart from the short last section to finish), there is quite a lot of variability. Checking this view against a split sectioned route map might help us understand whether there were particular features of the route that might explain these differences.

How about if we visualise the accumulated ultimate split section time dummy driver?

Here, we see that TAN was recording the best time compared the ultimate time as calculated against the sum of best split section times, but was still off the ultimate pace: it was his first split that made the difference.

How about if we rebase against the dummy driver that represents the driver with the fastest actual recorded accumulated time at each split:

Here, we see that TAN led the stage at each split point based on actual accumulated time.

Remember, all these stories were available in the original data table, but sometimes it takes a bit of differencing to see them clearly…