# Getting My Eye In Around F1 Quali Data – Parallel Coordinate Plots, Sort of…

Looking over the sector times from the qualifying session for tomorrow’s Hungarian Grand Prix, I noticed that Vettel was only fastest in one of the sectors.

Whilst looking for an easy way of shaping an R data frame so that I could plot categorical values sector1, sector2, sector3 on the x-axis, and then a line for each driver showing their time in the sector on the y-axis (I still haven’t worked out how to do that? Any hints? Add them to the comments please…;-), I came across a variant of the parallel coordinate plot hidden away in the lattice package:

What this plot does is for each row (i.e. each driver) take values from separate columns (i.e. times from each sector), normalise them, and then plot lines between the normalised value, one “axis” per column; each row defines a separate category.

The normalisation obviously hides the magnitude of the differences between the time deltas in each sector (the min-max range might be hundredths in one sector, tenths in another), but this plot does show us that there are different groupings of cars – there is clear(?!) white space in the diagram:

Whilst the parallel co-ordinate plot helps identify groupings of cars, and shows where they may have similar performance, it isn’t so good at helping us get an idea of which sector had most impact on the final lap time. For this, I think we need to have a single axis in seconds showing the delta from the fastest time in the sector. That is, we should have a parallel plot where the parallel axes have the same scale, but in terms of sector time, a floating origin (so e.g. the origin for one sector might be 28.6s and for another, 22.4s). For convenience, I’d also like to see the deltas shown on the y-axis, and the categorical ranges arranged on the x-axis (in contrast to the diagrams above, where the different ranges appear along the y-axis).

PS I also wonder to what extent we can identify signatures for the different teams? Eg the fifth and sixth slowest cars in sector 1 have the same signature across all three sectors and come from the same team; and the third and fourth slowest cars in sector 2 have a very similar signature (and again, represent the same team).

Where else might we look for signatures? In the speed traps maybe? Here’s what the parallel plot for the speed traps looks like:

(In this case, max is better = faster speed.)

To combine the views (timings and speed), we might use a formulation of the flavour:

parallel(~data.frame(a\$sector1,a\$sector2,a\$sector3, -a\$inter1,-a\$inter2,-a\$finish,-a\$trap))

This is a bit too cluttered to pull much out of though? I wonder if changing the order of parallel axes might help, e.g. by trying to come up with an order than minimises the number of crossed lines?

And if we colour lines by team, can we see any characteristics?

Using a dashed, rather than solid, line makes the chart a little easier to read (more white space). Using a thinking line also helps bring out the colours.

parallel(~data.frame(a\$sector1,-a\$inter1,-a\$inter2,a\$sector2,a\$sector3, -a\$finish,-a\$trap),col=a\$team,lty=2,lwd=2)

Here’s another ordering of the axes:

Here are the sector times ordered by team (min is better):

Here are the speeds by team (max is better):

Again, we can reorder this to try to make it easier(?!) to pull out team signatures:

(I wonder – would it make sense to try to order these based on similarity eg derived from a circuit guide?)

Hmmm… I need to ponder this…

## Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

## 2 thoughts on “Getting My Eye In Around F1 Quali Data – Parallel Coordinate Plots, Sort of…”

1. Graphical representations – very good. Up to a point. When the data becomes sufficiently complex, they eye is unable to pick out key patterns. At that point, is it better to resort to numerical summaries – e.g. median sector times by driver?

1. @alan agreed – it would be good if I could find a way of greying out traces from all but one or two teams. (What I was exploring was extent to which we could identify team signatures in the time and speed data…)

BTW, do you have any idea how I could do the floating point origin/same scale parallel chart in R?