Detecting Undercuts in F1 Races Using R

One of the things that’s been on my to do list for some time has been the identification of tactical or strategic events within a race that might be detected automatically. One such event is an undercut described by F1 journalist James Allen in the following terms (The secret of undercut and offset):

An undercut is where Driver A leads Driver B, but Driver B turns into the pits before Driver A and changes to new tyres. As Driver A is ahead, he’s unaware that this move is coming until it’s too late to react and he has passed the pit lane entry.
On fresh tyres, Driver B then drives a very fast “Out” lap from the pits. Driver A will react to the stop and pit on the next lap, but his “In” lap time will have been set on old tyres, so will be slower. As he emerges from the pit lane after his stop, Driver B is often narrowly ahead of him into the first corner.

In logical terms, we might characterise this as follows:

    • two drivers, d1 and d2: d1 !=d2;
    • d1 pits on lap X, and drives an outlap on lap X+1;
    • d1’s position on their pitlap (lap X) is greater than d2’s position on the same lap X;
    • d2 pits on lap X+1, with an outlap on lap X+2;
    • d2’s position on their outlap (lap X+2) is greater than d1’s position on the same lap X+2.

We can generalise this formulation, and try to make it more robust, by comparing positions on the lap prior to d1’s stop (lap A) with the positions on d2’s outlap (lap B):

        • two drivers, d1 and d2: d1 !=d2;
        • d1 pits on lap A+1;
        • d1’s position on their “prelap” (lap A), the lap prior to their pitlap (lap A+1), is greater than d2’s position on lap A; this condition tries to ensure that d1 is behind d2 as they enter the pit stop phase but it misses the effect on any first lap stops (unless we add a lap 0 containing the grid positions);
        • d1’s outlap is on lap A+2;
        • d2 pits on lap B-1 within the inclusive range [lap A+2, lap A+1+N]: N>=1, (that is, within N laps of D1’s stop) with an outlap on lap B; the parameter, N, allows us to test for changes of position within a pit stop window, rather than requiring that d2 pits on the lap immediately following d1’s stop;
        • d2’s position on their outlap (lap B, in the inclusive range [lap A+3, lap A+2+N]) is greater than d1’s position on the same lap B.

One way of implementing these constraints is to write a declarative style query that specifies the conditions we want the solution to meet, rather than writing a procedural programme to find such an answer. Using the sqldf package, we can use a SQL query to achieve just this result.

One way of writing the query is to create two situations, a and b, where situation a corresponds to a lap on which d1 stops, and situation b corresponds to the driver d2’s stop. We then capture the data for each driver in each situation, to give four data states: d1a, d1b, d2a, d2b. These states are then subjected to the conditions specified above (using N=5).

#First get laptime data from the ergast API
lapTimes=lapsData.df(2015,9)

#Now find pit times
p=pitsData.df(2015,9)

#merge pitdata with lapsdata
lapTimesp=merge(lapTimes, p, by = c('lap','driverId'), all.x=T)

#flag pit laps
lapTimesp$ps = ifelse(is.na(lapTimesp$milliseconds), F, T)

#Ensure laps for each driver are sorted
library(plyr)
lapTimesp=arrange(lapTimesp, driverId, lap)

#do an offset on the laps that are pitstops for each driver
#to set outlap flags for each driver
lapTimesp=ddply(lapTimesp, .(driverId), transform, outlap=c(FALSE, head(ps,-1)))

#identify lap before pit lap by reverse sorting
lapTimesp=arrange(lapTimesp, driverId, -lap)
#So we can do an offset going the other way
lapTimesp=ddply(lapTimesp, .(driverId), transform, prelap=c(FALSE, head(ps,-1)))

#tidy up
lapTimesp=arrange(lapTimesp,acctime)

#Now we can run the SQL query
library(sqldf)
ss=sqldf('SELECT d1a.driverId AS d1, d2a.driverId AS d2, \
            d1a.lap AS A, d1a.position AS d1posA, d1b.position AS d1posB, \
            d2b.lap AS B, d2a.position AS d2posA, d2b.position AS d2posB \
          FROM lapTimesp d1a, lapTimesp d1b, lapTimesp d2a, lapTimesp d2b \
          WHERE d1a.driverId=d1b.driverId AND d2a.driverId=d2b.driverId \
            AND d1a.driverId!=d2a.driverId \
            AND d1a.prelap AND d1a.lap=d2a.lap AND d2b.outlap AND d2b.lap=d1b.lap \
            AND (d1a.lap+3<=d1b.lap AND d1b.lap<=d1a.lap+2+5) \
            AND d1a.position>d2a.position AND d1b.position < d2b.position')

For the 2015 British Grand Prix, here’s what we get:

          d1         d2  A d1posA d2posA  B d1posB d2posB
1  ricciardo      sainz 10     11     10 13     12     13
2     vettel      kvyat 13      8      7 19      8     10
3     vettel hulkenberg 13      8      6 20      7     10
4      kvyat hulkenberg 17      6      5 20      9     10
5   hamilton      massa 18      3      1 21      2      3
6   hamilton     bottas 18      3      2 22      1      3
7     alonso   ericsson 36     11     10 42     10     11
8     alonso   ericsson 36     11     10 43     10     11
9     vettel     bottas 42      5      4 45      3      5
10    vettel      massa 42      5      3 45      3      4
11     merhi    stevens 43     13     12 46     12     13

With a five lap window we have evidence that supports successful undercuts in several cases, including VET taking KVY and HUL with his early stop at lap 13+1 (KVY pitting on lap 19-1 and HUL on lap 20-1), and MAS and BOT both being taken first by HAM’s stop at lap 18+1 and then by VET’s stop at lap 42+1.

To make things easier to read, we may instead define d1a.lap+1 AS d1Pitlap and d2b.lap-1 AS d2Pitlap.

The query doesn’t guarantee that the pit stop was responsible for change in order, but it does at least gives us some prompts as to where we might look.

One comment