OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Posts Tagged ‘f1datajunkie

Paths to the F1 2012 Championship Based on How They Might Finish in the US Grand Prix

If you haven’t already seen it, one of the breakthrough visualisations of the US elections was the New York Times Paths to the Election scenario builder. With the F1 drivers’ championship in the balance this weekend, I wondered what chances were of VET claiming the championship this weekend. The only contender is ALO, who is currently ten points behind.

A quick Python script shows the outcome depending on the relative classification of ALO and VET at the end of today’s race. (If the drivers are 25 points apart, and ALO then wins in Brazil with VET out of the points, I think VET will win on countback based on having won more races.)

#The current points standings
vetPoints=255
aloPoints=245

#The points awarded for each place in the top 10; 0 points otherwise
points=[25,18,15,12,10,8,6,4,2,1,0]

#Print a header row (there's probably a more elegant way of doing this...;-)
for x in ['VET\ALO',1,2,3,4,5,6,7,8,9,10,'11+']: print str(x)+'\t',
print ''

#I'm going to construct a grid, VET's position down the rows, ALO across the columns
for i in range(len(points)):
	#Build up each row - start with VET's classification
	row=[str(i+1)]
	#Now for the columns - that is, ALO's classification
	for j in range(len(points)):
		#Work out the points if VET is placed i+1  and ALO at j+1 (i and j start at 0)
		#Find the difference between the points scores
		#If the difference is >= 25 (the biggest points diff ALO could achieve in Brazil), VET wins
		if ((vetPoints+points[i])-(aloPoints+points[j])>=25):
			row.append("VET")
		else: row.append("?")
	#Print the row a slightly tidier way...
	print '\t'.join(row)

(Now I wonder – how would I write that script in R?)

And the result?

VET\ALO	1	2	3	4	5	6	7	8	9	10	11+	
1	?	?	?	?	VET	VET	VET	VET	VET	VET	VET
2	?	?	?	?	?	?	?	?	VET	VET	VET
3	?	?	?	?	?	?	?	?	?	?	VET
4	?	?	?	?	?	?	?	?	?	?	?
5	?	?	?	?	?	?	?	?	?	?	?
6	?	?	?	?	?	?	?	?	?	?	?
7	?	?	?	?	?	?	?	?	?	?	?
8	?	?	?	?	?	?	?	?	?	?	?
9	?	?	?	?	?	?	?	?	?	?	?
10	?	?	?	?	?	?	?	?	?	?	?
11	?	?	?	?	?	?	?	?	?	?	?

Which is to say, VET wins if:

  • VET wins the race and ALO is placed 5th or lower;
  • VET is second in the race and ALO is placed 9th or lower;
  • VET is third in the race and ALO is out of the points (11th or lower)

We can also look at the points differences (define a row2 as row, then use row2.append(str((vetPoints+points[i])-(aloPoints+points[j])))):

VET\ALO	1	2	3	4	5	6	7	8	9	10	11+	
1	10	17	20	23	25	27	29	31	33	34	35
2	3	10	13	16	18	20	22	24	26	27	28
3	0	7	10	13	15	17	19	21	23	24	25
4	-3	4	7	10	12	14	16	18	20	21	22
5	-5	2	5	8	10	12	14	16	18	19	20
6	-7	0	3	6	8	10	12	14	16	17	18
7	-9	-2	1	4	6	8	10	12	14	15	16
8	-11	-4	-1	2	4	6	8	10	12	13	14
9	-13	-6	-3	0	2	4	6	8	10	11	12
10	-14	-7	-4	-1	1	3	5	7	9	10	11
11	-15	-8	-5	-2	0	2	4	6	8	9	10

We could then do a similar exercise for the Brazil race, and essentially get all the information we need to do a scenario builder like the New York Times election scenario builder… Which I would try to do, but I’ve had enough screen time for the weekend already…:-(

PS FWIW, here’s a quick table showing the awarded points difference between two drivers depending on their relative classification in a race:

A\B	1	2	3	4	5	6	7	8	9	10	11+
1	X	7	10	13	15	17	19	21	23	24	25
2	-7	X	3	6	8	10	12	14	16	17	18
3	-10	-3	X	3	5	7	9	11	13	14	15
4	-13	-6	-3	X	2	4	6	8	10	11	12
5	-15	-8	-5	-2	X	2	4	6	8	9	10
6	-17	-10	-7	-4	-2	X	2	4	6	7	8
7	-19	-12	-9	-6	-4	-2	X	2	4	5	6
8	-21	-14	-11	-8	-6	-4	-2	X	2	3	4
9	-23	-16	-13	-10	-8	-6	-4	-2	X	1	2
10	-24	-17	-14	-11	-9	-7	-5	-3	-1	X	1
11	-25	-18	-15	-12	-10	-8	-6	-4	-2	-1	X

Here’s how to use this chart in association with the previous. Looking at the previous chart, if VET finishes second and ALO third, the points difference is 13 in favour of VET. Looking at the chart immediately above, if we let VET = A and ALO = B, then the columns correspond to ALO’s placement, and the rows to VET. VET (A) needs to lose 14 or more points to lose the championship (that is, we’re looking for values of -14 or less). In particular, ALO (B, columns) needs to finish 1st with VET (A) 5th or worse, 2nd with A 8th or worse, or 3rd with VET 10th or worse.

And the script:

print '\t'.join(['A\B','1','2','3','4','5','6','7','8','9','10','11+'])
for i in range(len(points)):
	row=[str(i+1)]
	for j in range(len(points)):
		if i!=j:row.append(str(points[i]-points[j]))
		else: row.append('X')

And now for the rest of the weekend…

Written by Tony Hirst

November 18, 2012 at 12:59 pm

Posted in Infoskills, Tinkering

Tagged with ,

The Race to the F1 2012 Drivers’ Championship – Initial Sketches

In part inspired by the chart described in The electoral map sans the map, I thought I’d start mulling over a quick sketch showing the race to the 2012 Formula One Drivers’ Championship.

The chart needs to show tension somehow, so in this first really quick and simple rough sketch, you really do have to put yourself in the graph and start reading it from left to right:

The data is pulled in from the Ergast API as JSON data, which is then parsed and visualised using R:

require(RJSONIO)
require(ggplot2)

#initialise a data frame
champ <- data.frame(round=numeric(),
                 driverID=character(), 
                 position=numeric(), points=numeric(),wins=numeric(),
                 stringsAsFactors=FALSE)

#This is a fudge at the moment - should be able to use a different API call to 
#get the list of races to date, rather than hardcoding latest round number
for (j in 1:18){
  resultsURL=paste("http://ergast.com/api/f1/2012/",j,"/driverStandings",".json",sep='')
  print(resultsURL)
  results.data.json=fromJSON(resultsURL,simplify=FALSE)
  rd=results.data.json$MRData$StandingsTable$StandingsLists[[1]]$DriverStandings
  for (i in 1:length(rd)){
    champ=rbind(champ,data.frame(round=j, driverID=rd[[i]]$Driver$driverId,
                               position=as.numeric(as.character(rd[[i]]$position)),
                                points=as.numeric(as.character(rd[[i]]$points)),
                                                  wins=as.numeric(as.character(rd[[i]]$wins)) ))
  }
}
champ

#Horrible fudge - should really find a better way of filtering?
test2=subset(champ,( driverID=='vettel' | driverID=='alonso' | driverID=='raikkonen'|driverID=='webber' | driverID=='hamilton'|driverID=='button' ))

#Really rough sketch, in part inspired by http://junkcharts.typepad.com/junk_charts/2012/11/the-electoral-map-sans-the-map.html
ggplot(test2)+geom_line(aes(x=round,y=points,group=driverID,col=driverID))+labs(title="F1 2012 - Race to the Championship")

#I wonder if it would be worth annotating the chart with labels explaining any DNF reasons at parts where points stall?

So, that’s the quickest and dirtiest chart I could think of – where to take this next? One way would be to start making the chart look cleaner; another possibility would be to start looking at adding labels, highlights, and maybe pushing all but ALO and VET into the background? (GDS do some nice work in this vein, eg Updating the GOV.UK Performance Dashboard; this StoryTellingWithData post on stacked bar charts also has some great ideas about how to make simple, clean and effective use of text and highlighting…).

Let’s try cleaning it up a little, and then highlight the championship contenders?

test3=subset(test,( driverID=='vettel' | driverID=='alonso' ))
test4=subset(test,( driverID=='raikkonen'|driverID=='webber' | driverID=='hamilton'|driverID=='button' ))

ggplot(test4) + geom_line(aes(x=round,y=position,group=driverID),col='lightgrey') + geom_line(data=test3,aes(x=round,y=position,group=driverID,col=driverID)) + labs(title="F1 2012 - Race to the Championship")

Hmm… I’m not sure about those colours? Maybe use Blue for VET and Red for ALO?

I really hacked the path to this – there must be a cleaner way?!

ggplot(test4)+geom_line(aes(x=round,y=points,group=driverID),col='lightgrey') + geom_line(data=subset(test3,driverID=='vettel'),aes(x=round,y=points),col='blue') + geom_line(data=subset(test3,driverID=='alonso'),aes(x=round,y=points),col='red') + labs(title="F1 2012 - Race to the Championship")

Other chart types are possible too, I suppose? Such as something in the style of a lap chart?

ggplot(test2)+geom_line(aes(x=round,y=position,group=driverID,col=driverID))+labs(title="F1 2012 - Race to the Championship")

Hmmm… Just like the first sketch, this one is cluttered and confusing too… How about if we clean it as above to highlight just the contenders?

ggplot(test4) + geom_line(aes(x=round,y=points,group=driverID),col='lightgrey') + geom_line(data=test3,aes(x=round,y=points,group=driverID,col=driverID)) + labs(title="F1 2012 - Race to the Championship")

A little cleaner, maybe? And with the colour tweak:

ggplot(test4) + geom_line(aes(x=round,y=position,group=driverID),col='lightgrey') + geom_line(data=subset(test3,driverID=='vettel'),aes(x=round,y=position),col='blue') + geom_line(data=subset(test3,driverID=='alonso'),aes(x=round,y=position),col='red') + labs(title="F1 2012 - Race to the Championship")

Something that really jumps out at me in this chart are the gridlines – they really need fixing? But what would be best to show?

Hmm, before we do that, how about an animation? (Does WordPress.com allow animated gifs?)

Here’s the code (it requires the animation package):

library(animation)
race.ani= function(...) {
  for (i in 1:18) {
    g=ggplot(subset(test3, round<=i)) + geom_line(aes(x=round,y=position,group=driverID),col='lightgrey')+geom_line(data=subset(test3,driverID=='vettel' & round<=i),aes(x=round,y=position),col='blue')+geom_line(data=subset(test3,driverID=='alonso' & round <=i),aes(x=round,y=position),col='red')+labs(title="F1 2012 - Race to the Championship")+xlim(1,18)
    print(g)
  }
}
saveMovie(race.ani(), interval = 0.4, outdir = getwd())

And for the other chart:

Hmmm…

How’s about another sort of view – the points difference between VET and ALO?

alo=subset(test3,driverID=='alonso')
vet=subset(test3,driverID=='vettel')
colnames(vet)=c("round","driverID","vposition","vpoints","vwins")
colnames(alo)=c("round","driverID","aposition","apoints","awins")
cf= merge(alo,vet,by=c('round'))
ggplot(cf) + geom_bar( aes(x=round,y=vpoints-apoints,fill=(vpoints-apoints)>0), stat='identity') + labs(title="F1 2012 Championship - VET vs ALO")

Written by Tony Hirst

November 16, 2012 at 11:59 pm

Posted in Rstats

Tagged with ,

F1 2012 Mid-Season Review

Rather belatedly, I got around to posting a series of posts summarising the Formula One season to date:

It’s also worth comparing the charts to the F1 2011 Season Review charts.

The R-code used to generate the graphics can be found here: F1 2012 Mid-Season Review – R Markdown.

Comments/suggestions/code improvements and extensions etc all welcome…

Written by Tony Hirst

August 30, 2012 at 8:26 am

Posted in Anything you want, Rstats

Tagged with

F1 Championship Points as a d3.js Powered Sankey Diagram

d3.js crossed my path a couple of times yesterday: firstly, in the form of an enquiry about whether I’d be interested in writing a book on d3.js (I’m not sure I’m qualified: as I responded, I’m more of a script kiddie who sees things I can reuse, rather than have any understanding at all about how d3.js does what it does…); secondly, via a link to d3.js creator Mike Bostock’s new demo of Sankey diagrams built using d3.js:

Hmm… Sankey diagrams are good for visualising flow, so to get to grips myself with seeing if I could plug-and-play with the component, I needed an appropriate data set. F1 related data is usually my first thought as far as testbed data goes (no confidences to break, the STEM/innovation outreach/tech transfer context, etc etc) so what things flow in F1? What quantities are conserved whilst being passed between different classes of entity? How about points… points are awarded on a per race basis to drivers who are members of teams. It’s also a championship sport, run over several races. The individual Driver Championship is a competition between drivers to accumulate the most points over the course of the season, and the Constructor Chanmpionship is a battle between teams. Which suggests to me that a Sankey plot of points from races to drivers and then constructors might work?

So what do we need to do? First up, look at the source code for the demo using View Source. Here’s the relevant bit:

Data is being pulled in from a relatively addressed file, energy.json. Let’s see what it looks like:

Okay – a node list and an edge list. From previous experience, I know that there is a d3.js JSON exporter built into the Python networkx library, so maybe we can generate the data file from a network representation of the data in networkx?

Here we are: node_link_data(G) “[r]eturn data in node-link format that is suitable for JSON serialization and use in Javascript documents.”

Next step – getting the data. I’ve already done a demo of visualising F1 championship points sourced from the Ergast motor racing API as a treemap (but not blogged it? Hmmm…. must fix that) that draws on a JSON data feed constructed from data extracted from the Ergast API so I can clone that code and use it as the basis for constructing a directed graph that represents points allocations: race nodes are linked to driver nodes with edges weighted by points scored in that race, and driver nodes are connected to teams by edges weighted according to the total number of points the driver has earned so far. (Hmm, that gives me an idea for a better way of coding the weight for that edge…)

I don’t have time to blog the how to of the code right now – train and boat to catch – but will do so later. If you want to look at the code, it’s here: Ergast Championship nodelist. And here’s the result – F1 Chanpionship 2012 Points as a Sankey Diagram:

See what I mean about being a cut and paste script kiddie?!;-)

Written by Tony Hirst

May 24, 2012 at 8:39 am

Visualising F1 Telemetry Data and Plotting Latitude and Longitude with ggplot Map Projections in R

Why don’t X-Y plots of latitude and longitude data look “right” compared to traditional map views?

For example, here’s an X-Y scatterplot of some of Jenson Button’s McLaren telemetry data from the 2010 Australian Formula One Grand Prix:

The image was generated, from a data file hosted on Google Spreadsheets, using the following R script, and the ggplot2 library:

require(ggplot2)
require(RCurl)

#gsqAPI is a helper function that loads data in from a shared as public Google Spreadsheet.
gsqAPI = function(key,query,gid=0){ return( read.csv( paste( sep="",'http://spreadsheets.google.com/tq?', 'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', gid) ) ) }

#Provide the spreadsheet key
#Data was originally grabbed from the McLaren F1 Live Dashboard during the race and is Copyright (�) McLaren Marketing Ltd 2010 (I think? Or possibly Vodafone McLaren Mercedes F1 2010(?)). I believe that speed, throttle and brake data were sponsored by Vodafone.
key='0AmbQbL4Lrd61dER5Qnl3bHo4MkVNRlZ1OVdicnZnTHc'
#We can write SQL like queries over the spreadsheet, as described in http://blog.ouseful.info/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/
q='select *'

#Run the query on the database
df=gsqAPI(key,q)

#Sanity check - preview the imported data
head(df)

#Example circuit map - sort of - showing the gLat (latitudinal 'g-force') values around the circuit (point size is absolute value of gLat, colour has two values, one for + and one for - values (swing to left and swing to right)).
g=ggplot(df) + geom_point(aes(x=NGPSLongitude,y=NGPSLatitude,col=sign(gLat),size=abs(gLat)))
print(g)

What’s lacking is a projection from the everyday Cartesian coordinate system to something like a Mercator based projection. Fortunately, the Grammar of Graphics model that underpins ggplot allows us to write the necessary co-ordinate system transformation into our chart generating command:

ggplot(df) + geom_point(aes(x=NGPSLongitude,y=NGPSLatitude,col=sign(gLat),size=abs(gLat))) + coord_map(project="mercator")

Here’s the result:

(Note: I haven’t totally got my head round what the different co-ordinate transforms do, or how they relate to any sort of ‘reality’! But they’re another thing I’m now aware of…;-)

As to what the chart shows? It’s a plot of how the latitudinal (left-right) ‘g-force’ acts on Button as he tours the circuit. The points are coloured according to whether the force acts from the left or the right (+ or -) and sized according to the magnitude of the force (away from normal?). So it points out left and right hand corners and how tight they are, essentially;-)

To show just how easy it is to write simple graphics and even statistical charts using ggplot, here are a few more examples:

#Example "driver DNA" trace, showing low gear  throttle usage (distance round track on x-axis, lap number on y axis, node size is inversely proportional to gear number (low gear, large point size), colour relativ to throttlepedal depression
g=ggplot(df) + geom_point(aes(x=sLap,y=Lap,col=rThrottlePedal,size=-NGear)) + scale_colour_gradient(low='red',high='green')
print(g)

I started calling things like the above chart “Driver DNA” charts – the x-axis represents distance round the track, the y-axis is lap number. In this case, nodes are sized inversely proportionally to the gear (so low gear, large pointsize) and coloured by throttle pedal pressure. You’ll notice how consistent Button is lap on lap. The idea behind the colouring/sizing for this chart was that it would provide a glimpse into behaviour around low gear turns.

#Example of gear value around the track
g=ggplot(df) + geom_line(aes(x=sLap,y=NGear))
print(g)

This chart soooo reminds me of simple op art…:-) I’m not sure how useful it is for showing gear selection according to distance round the track, but I just love the line of it:-)

#We can also show a trace for a single lap, such as speed coloured by gear
g=ggplot(subset(df,Lap==22)) + geom_line(aes(x=sLap,y=vCar,colour=NGear))
print(g)

ggplot can, of course, do line charts. In the above example, I make an intitial exploration into how line segment colour can be used to highlight gear selection that allows the car to reach the speed (y-axis) it does as it goes round the circuit (x is distance round lap).

#We can also do statistical graphics - like a boxplot showing the distribution of speed values by gear
g = ggplot(df) + geom_boxplot(aes(factor(NGear),vCar))
print(g)

ggplot isn’t just about literal graphing from data elements directly to marks on a canvas. It can also do stats as part of the mapping; in this case, we generate a boxplot that summarises the range of speeds achieved for different gear values.

#Footwork - brake and throttle pedal depression based on gear
g = ggplot(df) + geom_jitter(aes(factor(NGear),rThrottlePedal),colour='darkgreen') + geom_jitter(aes(factor(NGear),pBrakeF),colour='darkred')
print(g)

Some more dots… in the above case, I try to explore Button;s footwork in a little more detail, seeing how he applies brake and throttle pressure according to gear selection (throttle depression is green, brake is red).

#Forces on the driver
#gLong by brake and gear
g = ggplot(df) + geom_jitter(aes(factor(NGear),gLong,col=pBrakeF)) + scale_colour_gradient(low='red',high='green')
print(g)

In the diagrams immediately above and below, I try to show what sorts of longitudinal forces are typically experienced by Button according to gear selection, the idea being that we may get to see whether gears are used for linear acceleration or deceleration.

#gLong by throttle and gear
g = ggplot(df) + geom_jitter(aes(factor(NGear),gLong,col=rThrottlePedal)) + scale_colour_gradient(low='red',high='green')
print(g)

#gLong boxplot
ggplot(df) + geom_boxplot(aes(factor(NGear),gLong))+ geom_jitter(aes(factor(NGear),gLong),size=1)

Here, I use a boxplot to to try to see whether or not the longitudinal g-force is typically experienced under acceleration or braking by gear. Note that the points are scattered according to random jitter about their actual, integer values.

Finally, here’s a look at how engine RPM and the car speed relate to gear selection. Would you be able to work out how to write this diagram?

Here’s how I did it…

#How do engine revs and speed relate to gear selction?
ggplot(df)+geom_point(aes(x=nEngine,y=vCar,col=factor(NGear)))

Hopefully what this quick tour of ggplot has illustrated is how easy it can be to generate a wide range of charts from the same data set. May all your charts be written, and then generated directly from their source data;-)

PS I did try to generate these images via CloudStat, but for some reason the images didn’t generate properly [WORKS NOW - THANKS @cloudstatorg] (it seemed to work fine when I tried just a couple? Here’s the link anyway: Cloudstat: F1 telemetry demo. The plots were generated in RStudio and saved using ggsave()

As to how this fits in with other things? Regular readers may remember the occasional rant I’ve had about the importance of providing the queries that map from data sets onto summary data tables so that the means by which the summaries were generated from open raw data are made transparent. In a similar way, data cleansing tools such as Google Refine or Stanford Data Wrangler allow you to log the transformations applied to a messy raw data set in order to get it into a state where you can actually work with it. The same is true of images. It’s far too easy to generate complex graphics from even more complex datasets, and then forget how the image was actually created, maybe what it even represents. By writing the diagram, essentially generating a query that maps from the data onto the visual representation provided by the generated diagram or chart, we preserve the audit trail from data to the chart output.

In the same way we might imagine the word equation:

DATA + QUERY = REPORT

we might also imagine:

DATA + GRAPHER = CHART

(You get the idea? The words are probably not the right words, but the sentiment is there…)

In an academic research setting, where it’s common to find lists of figures presented separately in books or theses, it would also make sense to include a ‘figure appendix’ which gives, for example, the ggplot or ggplot commands required to generate each statistical graphic presented from the actual data source.

Sigh…if only my first name was Damien and I could persuade my minions to do the Lichtenstein thing and paint-by-numbers over some large projections of some of the more aesthetically interesting charts;-)

Written by Tony Hirst

March 14, 2012 at 2:14 pm

Posted in Rstats, Visualisation

Tagged with

Over on F1DataJunkie, 2011 Season Review Doodles…

Things have been a little quiet, post wise here, of late, in part because of the holiday season… but I have been posting notes on a couple of charts in progress over on the F1DataJunkie blog. Here are links to the posts in chronological order – they capture the evolution of the chart design(s) to date:

You can find a copy of the data I used to create the charts here: F1 2011 Year in Review spreadsheet.

I used R to generate the charts (scripts are provided and/or linked to from the posts, or included in the comments – I’ll tidy them and pop them into a proper Github repository if/when I get a chance), loading the data in to RStudio using this sort of call:

require(RCurl)

gsqAPI = function(key,query,gid=0){ return( read.csv( paste( sep="",'http://spreadsheets.google.com/tq?', 'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', curlEscape(gid) ), na.strings = "null" ) ) }

key='0AmbQbL4Lrd61dEd0S1FqN2tDbTlnX0o4STFkNkc0NGc'
sheet=4

qualiResults2011=gsqAPI(key,'select *',sheet)

If any other folk out there are interested in using R to wrangle with F1 data, either from 2011 or looking forward to 2012, let me know and maybe we could get a script collection going on Github:-)

Written by Tony Hirst

December 30, 2011 at 3:59 pm

Posted in Data, digital storytelling, Rstats

Tagged with ,

Getting My Eye In Around F1 Quali Data – Parallel Coordinate Plots, Sort of…

Looking over the sector times from the qualifying session for tomorrow’s Hungarian Grand Prix, I noticed that Vettel was only fastest in one of the sectors.

Whilst looking for an easy way of shaping an R data frame so that I could plot categorical values sector1, sector2, sector3 on the x-axis, and then a line for each driver showing their time in the sector on the y-axis (I still haven’t worked out how to do that? Any hints? Add them to the comments please…;-), I came across a variant of the parallel coordinate plot hidden away in the lattice package:

f1 2011 hun quali sector times parralel coordinate plot

What this plot does is for each row (i.e. each driver) take values from separate columns (i.e. times from each sector), normalise them, and then plot lines between the normalised value, one “axis” per column; each row defines a separate category.

The normalisation obviously hides the magnitude of the differences between the time deltas in each sector (the min-max range might be hundredths in one sector, tenths in another), but this plot does show us that there are different groupings of cars – there is clear(?!) white space in the diagram:

Clusters in sectors times

Whilst the parallel co-ordinate plot helps identify groupings of cars, and shows where they may have similar performance, it isn’t so good at helping us get an idea of which sector had most impact on the final lap time. For this, I think we need to have a single axis in seconds showing the delta from the fastest time in the sector. That is, we should have a parallel plot where the parallel axes have the same scale, but in terms of sector time, a floating origin (so e.g. the origin for one sector might be 28.6s and for another, 22.4s). For convenience, I’d also like to see the deltas shown on the y-axis, and the categorical ranges arranged on the x-axis (in contrast to the diagrams above, where the different ranges appear along the y-axis).

PS I also wonder to what extent we can identify signatures for the different teams? Eg the fifth and sixth slowest cars in sector 1 have the same signature across all three sectors and come from the same team; and the third and fourth slowest cars in sector 2 have a very similar signature (and again, represent the same team).

Where else might we look for signatures? In the speed traps maybe? Here’s what the parallel plot for the speed traps looks like:

SPeed trap parallel plot

(In this case, max is better = faster speed.)

To combine the views (timings and speed), we might use a formulation of the flavour:

parallel(~data.frame(a$sector1,a$sector2,a$sector3, -a$inter1,-a$inter2,-a$finish,-a$trap))

Combined parallel plot - times and speeds

This is a bit too cluttered to pull much out of though? I wonder if changing the order of parallel axes might help, e.g. by trying to come up with an order than minimises the number of crossed lines?

And if we colour lines by team, can we see any characteristics?

Looking for patterns across teams

Using a dashed, rather than solid, line makes the chart a little easier to read (more white space). Using a thinking line also helps bring out the colours.

parallel(~data.frame(a$sector1,-a$inter1,-a$inter2,a$sector2,a$sector3, -a$finish,-a$trap),col=a$team,lty=2,lwd=2)

Here’s another ordering of the axes:

ANother attempt at ordering axes

Here are the sector times ordered by team (min is better):

Sector times coloured by team

Here are the speeds by team (max is better):

Speeds by team

Again, we can reorder this to try to make it easier(?!) to pull out team signatures:

Reordering speed traps

(I wonder – would it make sense to try to order these based on similarity eg derived from a circuit guide?)

Hmmm… I need to ponder this…

Written by Tony Hirst

July 30, 2011 at 11:45 pm

Marussia Virgin Racing F1 Factory Visit

Yesterday, I had the good fortune to visit the F1 Marussia Virgin Racing factory at Dinnington, near Sheffield, as a result of “winning” a luck dip competition run via GoMotorSport (part of a series of National Motorsport week promotions being run by the F1 teams based in the UK).

Marussia Virgin F1 Factory
[Thanks to @markhendy for the pic...]

Thanks to Finance Director Mark Hendy and engineer Shakey for the insight into the team’s operations:-)

Over the next few days and weeks, I’ll try to pick up on a few of the things I learned from the tour on the F1DataJunkie blog, tying them in to the corresponding technical regulations and other bits and pieces, but for now, here are some of the noticings I came away with…

- the engines aren’t that big, weighing 90kg or so and looking small than the engine in my own car…

- wheels are slotted onto the axles using a 3 pin mount on the front and a six(?) pin mount on the rear. (The engines are held on using a 6(?) point fixing.)

- the drivers aren’t that heavy either, weight wise (not that we met either of the drivers: neither Timo Glock nor Jerome D’Ambrosio are frequent visitors to the Dinnington factory, where the team’s cars are prepared fro before, and overhauled after, each race…): 70 kg or so. With cars prepared to meet racing weight regulations to a tolerance of 0.5kg or so, a large mixed grill and a couple of pints can make a big difference… (Hmm, I guess it would be easy enough to calculate the “big dinner weight effect” penalty on laptime?!)

I’m not sure if this was a “right-handed vs left-handed spanner” remark, but a comment was also made that the adhesive sponsor sticker can have a noticeable effect on the car’s aerodynamics as the corners become unstuck and start to flap. (Which made me wonder, of that is the case, is the shape of stickers taken into account? Is a leading edge on a label with a point/right angled corner rather than a smooth curve likely to come unstuck more easily, for example?!) Cars also need repainting every few races (stripping back to the carbon, and repainting afresh) because of pitting and chipping and other minor damage than can affect smooth airflow.

- side impact tubes are an integral part of the safety related design of the car:

- to track the usage of tyres during a race weekend, an FIA official scans a barcode on each tyre as it is used on the car:

The data junkie in me in part wonders whether this data could be made available in a timely fashion via the Pirelli website (or a Pirelli gadget on each team’s website) – or would that me giving away too much race intelligence to the other teams? That way, we could get an insight into the tyre usage over the course weekend…

- IT plays an increasingly important part of the the pit garage setup; local area networks (cabled and wifi?) are set up by each team for the weekend, the data engineers sitting behind the screen and viewing area in the garage (rather than having a fixed set up in one of the 5(?) trucks that attends each race.).

- the cars are rigged up with 60 or sensors; there is only redundancy on throttle and clutch sensors. Data analysis is in part provided through engineers provided by parts suppliers (McLaren Electronics, who supply the car’s ECU (and telemetry box(?)) provide a dedicated person(?) to support the team; data analysis is, in part, carried out using the Atlas (9?) Advanced Telemetry Linked Acquisition System from McLaren Electronic Systems. Data collected during a stint is transmitted under encryption back to the the pits, as well as being logged on the car itself. A full data dump is available to the team and the FIA scrutineers via an umbilical/wired connection when the car is pitted.

UST Global, one of the teams partners, also provide 3(?) data analysts to support the team during a race (presumably using UST Global’s “Race Management System”?).

- for design and testing, weekly reporting is required that conforms to a trade-off between the number of hours per week that each team can spend on wind tunnel testing (60 hours per week) and and CFD (“can’t find downforce”;-) simulation (40 teraflops per week). My first impression there was that efficient code could effectively mean more simulation testing?! (CFD via CSC? CSC expands relationship with Marussia Virgin Racing, doubling computing power for the team’s 2011 formula 1 season, or are things set to change with the replacement of Nick Wirth by Pat Symonds…?)

- the resource restriction agreement also limits the number of people who can work on the chassis. For a race weekend, teams are limited to 50 (47?) people. We were given a quick run down of at least (8?) engineer roles assigned to each car, but I forget them…

So – that’s a quick summary of some of the things I can remember off the top of my head…

…but here are a couple of other things to note that may be of interest…

Marussia Virgin are making the most of their Virgin partnership over the Silverstone race weekend with a camping party/Virgin Experience at Stowe School (Silverstone Weekend) and a hook-up with Joe Saward’s “An Audience With Joe“… (If you don’t listen to @sidepodcast’s An Aside With Joe podcast series, you should…;-)

The team has also got en education thing going with race ticket sweeteners for folk signing up to the course: Motorsport Management Online Course.

I can’t help thinking there may be a market for a “hardcore fans” course on F1 that could run over a race season and run as an informal, open online course… I still don’t really know how a car works, for example ;-)

Anyway – that’s by the by: thanks again to the GoMotorsport and the Marussia Virgin Racing team (esp. Mark Hendy and Shakey) for a great day out :-)

PS I think the @marussiavirgin team are trying to build up their social media presence too… to see who they’re listening to, here’s how their friends connect:

How friends of @marussiavirgin connect

;-)

Written by Tony Hirst

July 2, 2011 at 1:55 pm

Visualising Sports Championship Data Using Treemaps – F1 Driver & Team Standings

I *love* treemaps. If you’re not familiar with them, they provide a very powerful way of visualising categorically organised hierarchical data that bottoms out with a quantitative, numerical dimension in a single view.

For example, consider the total population of students on the degrees offered across UK HE by HESA subject code. As well as the subject level, we might also categorise the data according to the number of students in each year of study (first year, second year, third year).

If we were to tabulate this data, we might have columns: institution, HESA subject code, no. of first year students, no. of second year students, no. of third year students. We could also restructure the table so that the data was presented in the form: institution, HESA subject code, year of study, number of students. And then we could visualise it in a treemap… (which I may do one day… but not now; if you beat me to it, please post a link in the comments;-)

Instead, what I will show is how to visualise data from a sports championship, in particular the start of the Formula One 2011 season. This championship has the same entrants in each race, each a member of one of a fixed number of teams. Points are awarded for each race (that is, each round of the championship) and totalled across rounds to give the current standing. As well as the driver championship (based on points won by individual drivers) is the team championship (where the points contribution form drivers within a team is totalled).

Here’s what the results from the third round (China) looks like:

Driver Team Points
Lewis Hamilton McLaren-Mercedes 25
Sebastian Vettel RBR-Renault 18
Mark Webber RBR-Renault 15
Jenson Button McLaren-Mercedes 12
Nico Rosberg Mercedes 10
Felipe Massa Ferrari 8
Fernando Alonso Ferrari 6
Michael Schumacher Mercedes 4
Vitaly Petrov Renault 2
Kamui Kobayashi Sauber-Ferrari 1
Paul di Resta Force India-Mercedes 0
Nick Heidfeld Renault 0
Rubens Barrichello Williams-Cosworth 0
Sebastien Buemi STR-Ferrari 0
Adrian Sutil Force India-Mercedes 0
Heikki Kovalainen Lotus-Renault 0
Sergio Perez Sauber-Ferrari 0
Pastor Maldonado Williams-Cosworth 0
Jarno Trulli Lotus-Renault 0
Jerome d’Ambrosio Virgin-Cosworth 0
Timo Glock Virgin-Cosworth 0
Vitantonio Liuzzi HRT-Cosworth 0
Narain Karthikeyan HRT-Cosworth 0
Jaime Alguersuari STR-Ferrari 0

F1 2011 Results – China, © 2011 Formula One World Championship Ltd

We can represent data from across all the races using a table of the form:

Driver Team Points Race
Lewis Hamilton McLaren-Mercedes 25 China
Sebastian Vettel RBR-Renault 18 China
Felipe Massa Ferrari 10 Malaysia
Fernando Alonso Ferrari 8 Malaysia
Kamui Kobayashi Sauber-Ferrari 6 Malaysia
Michael Schumacher Mercedes 0 Australia
Pastor Maldonado Williams-Cosworth 0 Australia
Michael Schumacher Mercedes 0 Australia
Pastor Maldonado Williams-Cosworth 0 Australia
Narain Karthikeyan HRT-Cosworth 0 Australia
Vitantonio Liuzzi HRT-Cosworth 0 Australia

Sample of F1 2011 Results 2011, © 2011 Formula One World Championship Ltd

I’ve put a copy of the data to date at Many Eyes, IBM’s online interactive data visualisation site: F1 2011 Championship Points

Here’s what it looks like when we view it in a treemap visualisation:

The size of the boxes is proportional to the (summed) values within the hierarchical categories. In the above case, the large blocks are the total points awarded to each driver across teams and races. (The team field might be useful if a driver were to change team during the season.)

I’m not certain, but I think the Many Eyes treemap algorithm populates the map using a sorted list of summed numerical values taken through the hierarchical path from left to right, top to bottom. Which means top left is the category with the largest summed points. If this is the case, in the above example we can directly see that Webber is in fourth place overall in the championship. We can also look within each blocked area for more detail: for example, we can see Hamilton didn’t score as many points in Malaysia as he did in the other two races.

One of the nice features about the Many Eyes treemap is that it allows you to reorder the levels of the hierarchy that is being displayed. So for example, with a simple reordering of the labels we can get a view over the team championship too:

The Many Eyes treemap can be embedded in a web page (it’s a Java applet), although I’m not sure what, if any, licensing restrictions apply (I do know that the Guardian datastore blog embeds Many Eyes widgets on that site, though). Other treemap widgets are available (for example, Protovis and JIT both offer javascript enabled treemap displays).

What might be interesting would be to feed Protovis or the JIT with data dynamically form a Google Spreadsheet, for example, so that a single page could be used to display the treemap with the data being maintained in a spreadsheet.

Hmm, I wonder – does Google spreadsheets have a treemap gadget? Ooh – it does: treemap-gviz. It looks as if a bit of wrangling may be required around the data, but if the display works out then just popping the points data into a Google spreadsheet and creating the gadget should give an embeddable treemap display with no code required:-) (It will probably be necessary to format the data hierarchy by hand, though, requiring differently layed out data tables to act as source for individual and team based reports.)

So – how long before we see some “live” treemap displays for F1 results on the F1 blogs then? Or championship tables from other sports? Or is the treemap too confusing as a display for the uninitiated? (I personally don’t think so.. but then, I love macroscopic views over datasets:-)

PS see also More Olympics Medal Table Visualisations which includes a demonstration of a treemap visualisation over Olympic medal standings.

Written by Tony Hirst

April 28, 2011 at 11:38 am

Follow

Get every new post delivered to your Inbox.

Join 762 other followers