OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Posts Tagged ‘f1datajunkie

Visualising Sports Championship Data Using Treemaps – F1 Driver & Team Standings

I *love* treemaps. If you’re not familiar with them, they provide a very powerful way of visualising categorically organised hierarchical data that bottoms out with a quantitative, numerical dimension in a single view.

For example, consider the total population of students on the degrees offered across UK HE by HESA subject code. As well as the subject level, we might also categorise the data according to the number of students in each year of study (first year, second year, third year).

If we were to tabulate this data, we might have columns: institution, HESA subject code, no. of first year students, no. of second year students, no. of third year students. We could also restructure the table so that the data was presented in the form: institution, HESA subject code, year of study, number of students. And then we could visualise it in a treemap… (which I may do one day… but not now; if you beat me to it, please post a link in the comments;-)

Instead, what I will show is how to visualise data from a sports championship, in particular the start of the Formula One 2011 season. This championship has the same entrants in each race, each a member of one of a fixed number of teams. Points are awarded for each race (that is, each round of the championship) and totalled across rounds to give the current standing. As well as the driver championship (based on points won by individual drivers) is the team championship (where the points contribution form drivers within a team is totalled).

Here’s what the results from the third round (China) looks like:

Driver Team Points
Lewis Hamilton McLaren-Mercedes 25
Sebastian Vettel RBR-Renault 18
Mark Webber RBR-Renault 15
Jenson Button McLaren-Mercedes 12
Nico Rosberg Mercedes 10
Felipe Massa Ferrari 8
Fernando Alonso Ferrari 6
Michael Schumacher Mercedes 4
Vitaly Petrov Renault 2
Kamui Kobayashi Sauber-Ferrari 1
Paul di Resta Force India-Mercedes 0
Nick Heidfeld Renault 0
Rubens Barrichello Williams-Cosworth 0
Sebastien Buemi STR-Ferrari 0
Adrian Sutil Force India-Mercedes 0
Heikki Kovalainen Lotus-Renault 0
Sergio Perez Sauber-Ferrari 0
Pastor Maldonado Williams-Cosworth 0
Jarno Trulli Lotus-Renault 0
Jerome d’Ambrosio Virgin-Cosworth 0
Timo Glock Virgin-Cosworth 0
Vitantonio Liuzzi HRT-Cosworth 0
Narain Karthikeyan HRT-Cosworth 0
Jaime Alguersuari STR-Ferrari 0

F1 2011 Results – China, © 2011 Formula One World Championship Ltd

We can represent data from across all the races using a table of the form:

Driver Team Points Race
Lewis Hamilton McLaren-Mercedes 25 China
Sebastian Vettel RBR-Renault 18 China
Felipe Massa Ferrari 10 Malaysia
Fernando Alonso Ferrari 8 Malaysia
Kamui Kobayashi Sauber-Ferrari 6 Malaysia
Michael Schumacher Mercedes 0 Australia
Pastor Maldonado Williams-Cosworth 0 Australia
Michael Schumacher Mercedes 0 Australia
Pastor Maldonado Williams-Cosworth 0 Australia
Narain Karthikeyan HRT-Cosworth 0 Australia
Vitantonio Liuzzi HRT-Cosworth 0 Australia

Sample of F1 2011 Results 2011, © 2011 Formula One World Championship Ltd

I’ve put a copy of the data to date at Many Eyes, IBM’s online interactive data visualisation site: F1 2011 Championship Points

Here’s what it looks like when we view it in a treemap visualisation:

The size of the boxes is proportional to the (summed) values within the hierarchical categories. In the above case, the large blocks are the total points awarded to each driver across teams and races. (The team field might be useful if a driver were to change team during the season.)

I’m not certain, but I think the Many Eyes treemap algorithm populates the map using a sorted list of summed numerical values taken through the hierarchical path from left to right, top to bottom. Which means top left is the category with the largest summed points. If this is the case, in the above example we can directly see that Webber is in fourth place overall in the championship. We can also look within each blocked area for more detail: for example, we can see Hamilton didn’t score as many points in Malaysia as he did in the other two races.

One of the nice features about the Many Eyes treemap is that it allows you to reorder the levels of the hierarchy that is being displayed. So for example, with a simple reordering of the labels we can get a view over the team championship too:

The Many Eyes treemap can be embedded in a web page (it’s a Java applet), although I’m not sure what, if any, licensing restrictions apply (I do know that the Guardian datastore blog embeds Many Eyes widgets on that site, though). Other treemap widgets are available (for example, Protovis and JIT both offer javascript enabled treemap displays).

What might be interesting would be to feed Protovis or the JIT with data dynamically form a Google Spreadsheet, for example, so that a single page could be used to display the treemap with the data being maintained in a spreadsheet.

Hmm, I wonder – does Google spreadsheets have a treemap gadget? Ooh – it does: treemap-gviz. It looks as if a bit of wrangling may be required around the data, but if the display works out then just popping the points data into a Google spreadsheet and creating the gadget should give an embeddable treemap display with no code required:-) (It will probably be necessary to format the data hierarchy by hand, though, requiring differently layed out data tables to act as source for individual and team based reports.)

So – how long before we see some “live” treemap displays for F1 results on the F1 blogs then? Or championship tables from other sports? Or is the treemap too confusing as a display for the uninitiated? (I personally don’t think so.. but then, I love macroscopic views over datasets:-)

PS see also More Olympics Medal Table Visualisations which includes a demonstration of a treemap visualisation over Olympic medal standings.

Written by Tony Hirst

April 28, 2011 at 11:38 am

Marussia Virgin Racing F1 Factory Visit

Yesterday, I had the good fortune to visit the F1 Marussia Virgin Racing factory at Dinnington, near Sheffield, as a result of “winning” a luck dip competition run via GoMotorSport (part of a series of National Motorsport week promotions being run by the F1 teams based in the UK).

Marussia Virgin F1 Factory
[Thanks to @markhendy for the pic...]

Thanks to Finance Director Mark Hendy and engineer Shakey for the insight into the team’s operations:-)

Over the next few days and weeks, I’ll try to pick up on a few of the things I learned from the tour on the F1DataJunkie blog, tying them in to the corresponding technical regulations and other bits and pieces, but for now, here are some of the noticings I came away with…

- the engines aren’t that big, weighing 90kg or so and looking small than the engine in my own car…

- wheels are slotted onto the axles using a 3 pin mount on the front and a six(?) pin mount on the rear. (The engines are held on using a 6(?) point fixing.)

- the drivers aren’t that heavy either, weight wise (not that we met either of the drivers: neither Timo Glock nor Jerome D’Ambrosio are frequent visitors to the Dinnington factory, where the team’s cars are prepared fro before, and overhauled after, each race…): 70 kg or so. With cars prepared to meet racing weight regulations to a tolerance of 0.5kg or so, a large mixed grill and a couple of pints can make a big difference… (Hmm, I guess it would be easy enough to calculate the “big dinner weight effect” penalty on laptime?!)

I’m not sure if this was a “right-handed vs left-handed spanner” remark, but a comment was also made that the adhesive sponsor sticker can have a noticeable effect on the car’s aerodynamics as the corners become unstuck and start to flap. (Which made me wonder, of that is the case, is the shape of stickers taken into account? Is a leading edge on a label with a point/right angled corner rather than a smooth curve likely to come unstuck more easily, for example?!) Cars also need repainting every few races (stripping back to the carbon, and repainting afresh) because of pitting and chipping and other minor damage than can affect smooth airflow.

- side impact tubes are an integral part of the safety related design of the car:

- to track the usage of tyres during a race weekend, an FIA official scans a barcode on each tyre as it is used on the car:

The data junkie in me in part wonders whether this data could be made available in a timely fashion via the Pirelli website (or a Pirelli gadget on each team’s website) – or would that me giving away too much race intelligence to the other teams? That way, we could get an insight into the tyre usage over the course weekend…

- IT plays an increasingly important part of the the pit garage setup; local area networks (cabled and wifi?) are set up by each team for the weekend, the data engineers sitting behind the screen and viewing area in the garage (rather than having a fixed set up in one of the 5(?) trucks that attends each race.).

- the cars are rigged up with 60 or sensors; there is only redundancy on throttle and clutch sensors. Data analysis is in part provided through engineers provided by parts suppliers (McLaren Electronics, who supply the car’s ECU (and telemetry box(?)) provide a dedicated person(?) to support the team; data analysis is, in part, carried out using the Atlas (9?) Advanced Telemetry Linked Acquisition System from McLaren Electronic Systems. Data collected during a stint is transmitted under encryption back to the the pits, as well as being logged on the car itself. A full data dump is available to the team and the FIA scrutineers via an umbilical/wired connection when the car is pitted.

UST Global, one of the teams partners, also provide 3(?) data analysts to support the team during a race (presumably using UST Global’s “Race Management System”?).

- for design and testing, weekly reporting is required that conforms to a trade-off between the number of hours per week that each team can spend on wind tunnel testing (60 hours per week) and and CFD (“can’t find downforce”;-) simulation (40 teraflops per week). My first impression there was that efficient code could effectively mean more simulation testing?! (CFD via CSC? CSC expands relationship with Marussia Virgin Racing, doubling computing power for the team’s 2011 formula 1 season, or are things set to change with the replacement of Nick Wirth by Pat Symonds…?)

- the resource restriction agreement also limits the number of people who can work on the chassis. For a race weekend, teams are limited to 50 (47?) people. We were given a quick run down of at least (8?) engineer roles assigned to each car, but I forget them…

So – that’s a quick summary of some of the things I can remember off the top of my head…

…but here are a couple of other things to note that may be of interest…

Marussia Virgin are making the most of their Virgin partnership over the Silverstone race weekend with a camping party/Virgin Experience at Stowe School (Silverstone Weekend) and a hook-up with Joe Saward’s “An Audience With Joe“… (If you don’t listen to @sidepodcast’s An Aside With Joe podcast series, you should…;-)

The team has also got en education thing going with race ticket sweeteners for folk signing up to the course: Motorsport Management Online Course.

I can’t help thinking there may be a market for a “hardcore fans” course on F1 that could run over a race season and run as an informal, open online course… I still don’t really know how a car works, for example ;-)

Anyway – that’s by the by: thanks again to the GoMotorsport and the Marussia Virgin Racing team (esp. Mark Hendy and Shakey) for a great day out :-)

PS I think the @marussiavirgin team are trying to build up their social media presence too… to see who they’re listening to, here’s how their friends connect:

How friends of @marussiavirgin connect

;-)

Written by Tony Hirst

July 2, 2011 at 1:55 pm

Getting My Eye In Around F1 Quali Data – Parallel Coordinate Plots, Sort of…

Looking over the sector times from the qualifying session for tomorrow’s Hungarian Grand Prix, I noticed that Vettel was only fastest in one of the sectors.

Whilst looking for an easy way of shaping an R data frame so that I could plot categorical values sector1, sector2, sector3 on the x-axis, and then a line for each driver showing their time in the sector on the y-axis (I still haven’t worked out how to do that? Any hints? Add them to the comments please…;-), I came across a variant of the parallel coordinate plot hidden away in the lattice package:

f1 2011 hun quali sector times parralel coordinate plot

What this plot does is for each row (i.e. each driver) take values from separate columns (i.e. times from each sector), normalise them, and then plot lines between the normalised value, one “axis” per column; each row defines a separate category.

The normalisation obviously hides the magnitude of the differences between the time deltas in each sector (the min-max range might be hundredths in one sector, tenths in another), but this plot does show us that there are different groupings of cars – there is clear(?!) white space in the diagram:

Clusters in sectors times

Whilst the parallel co-ordinate plot helps identify groupings of cars, and shows where they may have similar performance, it isn’t so good at helping us get an idea of which sector had most impact on the final lap time. For this, I think we need to have a single axis in seconds showing the delta from the fastest time in the sector. That is, we should have a parallel plot where the parallel axes have the same scale, but in terms of sector time, a floating origin (so e.g. the origin for one sector might be 28.6s and for another, 22.4s). For convenience, I’d also like to see the deltas shown on the y-axis, and the categorical ranges arranged on the x-axis (in contrast to the diagrams above, where the different ranges appear along the y-axis).

PS I also wonder to what extent we can identify signatures for the different teams? Eg the fifth and sixth slowest cars in sector 1 have the same signature across all three sectors and come from the same team; and the third and fourth slowest cars in sector 2 have a very similar signature (and again, represent the same team).

Where else might we look for signatures? In the speed traps maybe? Here’s what the parallel plot for the speed traps looks like:

SPeed trap parallel plot

(In this case, max is better = faster speed.)

To combine the views (timings and speed), we might use a formulation of the flavour:

parallel(~data.frame(a$sector1,a$sector2,a$sector3, -a$inter1,-a$inter2,-a$finish,-a$trap))

Combined parallel plot - times and speeds

This is a bit too cluttered to pull much out of though? I wonder if changing the order of parallel axes might help, e.g. by trying to come up with an order than minimises the number of crossed lines?

And if we colour lines by team, can we see any characteristics?

Looking for patterns across teams

Using a dashed, rather than solid, line makes the chart a little easier to read (more white space). Using a thinking line also helps bring out the colours.

parallel(~data.frame(a$sector1,-a$inter1,-a$inter2,a$sector2,a$sector3, -a$finish,-a$trap),col=a$team,lty=2,lwd=2)

Here’s another ordering of the axes:

ANother attempt at ordering axes

Here are the sector times ordered by team (min is better):

Sector times coloured by team

Here are the speeds by team (max is better):

Speeds by team

Again, we can reorder this to try to make it easier(?!) to pull out team signatures:

Reordering speed traps

(I wonder – would it make sense to try to order these based on similarity eg derived from a circuit guide?)

Hmmm… I need to ponder this…

Written by Tony Hirst

July 30, 2011 at 11:45 pm

Over on F1DataJunkie, 2011 Season Review Doodles…

Things have been a little quiet, post wise here, of late, in part because of the holiday season… but I have been posting notes on a couple of charts in progress over on the F1DataJunkie blog. Here are links to the posts in chronological order – they capture the evolution of the chart design(s) to date:

You can find a copy of the data I used to create the charts here: F1 2011 Year in Review spreadsheet.

I used R to generate the charts (scripts are provided and/or linked to from the posts, or included in the comments – I’ll tidy them and pop them into a proper Github repository if/when I get a chance), loading the data in to RStudio using this sort of call:

require(RCurl)

gsqAPI = function(key,query,gid=0){ return( read.csv( paste( sep="",'http://spreadsheets.google.com/tq?', 'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', curlEscape(gid) ), na.strings = "null" ) ) }

key='0AmbQbL4Lrd61dEd0S1FqN2tDbTlnX0o4STFkNkc0NGc'
sheet=4

qualiResults2011=gsqAPI(key,'select *',sheet)

If any other folk out there are interested in using R to wrangle with F1 data, either from 2011 or looking forward to 2012, let me know and maybe we could get a script collection going on Github:-)

Written by Tony Hirst

December 30, 2011 at 3:59 pm

Posted in Data, digital storytelling, Rstats

Tagged with ,

Visualising F1 Telemetry Data and Plotting Latitude and Longitude with ggplot Map Projections in R

Why don’t X-Y plots of latitude and longitude data look “right” compared to traditional map views?

For example, here’s an X-Y scatterplot of some of Jenson Button’s McLaren telemetry data from the 2010 Australian Formula One Grand Prix:

The image was generated, from a data file hosted on Google Spreadsheets, using the following R script, and the ggplot2 library:

require(ggplot2)
require(RCurl)

#gsqAPI is a helper function that loads data in from a shared as public Google Spreadsheet.
gsqAPI = function(key,query,gid=0){ return( read.csv( paste( sep="",'http://spreadsheets.google.com/tq?', 'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', gid) ) ) }

#Provide the spreadsheet key
#Data was originally grabbed from the McLaren F1 Live Dashboard during the race and is Copyright (�) McLaren Marketing Ltd 2010 (I think? Or possibly Vodafone McLaren Mercedes F1 2010(?)). I believe that speed, throttle and brake data were sponsored by Vodafone.
key='0AmbQbL4Lrd61dER5Qnl3bHo4MkVNRlZ1OVdicnZnTHc'
#We can write SQL like queries over the spreadsheet, as described in http://blog.ouseful.info/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/
q='select *'

#Run the query on the database
df=gsqAPI(key,q)

#Sanity check - preview the imported data
head(df)

#Example circuit map - sort of - showing the gLat (latitudinal 'g-force') values around the circuit (point size is absolute value of gLat, colour has two values, one for + and one for - values (swing to left and swing to right)).
g=ggplot(df) + geom_point(aes(x=NGPSLongitude,y=NGPSLatitude,col=sign(gLat),size=abs(gLat)))
print(g)

What’s lacking is a projection from the everyday Cartesian coordinate system to something like a Mercator based projection. Fortunately, the Grammar of Graphics model that underpins ggplot allows us to write the necessary co-ordinate system transformation into our chart generating command:

ggplot(df) + geom_point(aes(x=NGPSLongitude,y=NGPSLatitude,col=sign(gLat),size=abs(gLat))) + coord_map(project="mercator")

Here’s the result:

(Note: I haven’t totally got my head round what the different co-ordinate transforms do, or how they relate to any sort of ‘reality’! But they’re another thing I’m now aware of…;-)

As to what the chart shows? It’s a plot of how the latitudinal (left-right) ‘g-force’ acts on Button as he tours the circuit. The points are coloured according to whether the force acts from the left or the right (+ or -) and sized according to the magnitude of the force (away from normal?). So it points out left and right hand corners and how tight they are, essentially;-)

To show just how easy it is to write simple graphics and even statistical charts using ggplot, here are a few more examples:

#Example "driver DNA" trace, showing low gear  throttle usage (distance round track on x-axis, lap number on y axis, node size is inversely proportional to gear number (low gear, large point size), colour relativ to throttlepedal depression
g=ggplot(df) + geom_point(aes(x=sLap,y=Lap,col=rThrottlePedal,size=-NGear)) + scale_colour_gradient(low='red',high='green')
print(g)

I started calling things like the above chart “Driver DNA” charts – the x-axis represents distance round the track, the y-axis is lap number. In this case, nodes are sized inversely proportionally to the gear (so low gear, large pointsize) and coloured by throttle pedal pressure. You’ll notice how consistent Button is lap on lap. The idea behind the colouring/sizing for this chart was that it would provide a glimpse into behaviour around low gear turns.

#Example of gear value around the track
g=ggplot(df) + geom_line(aes(x=sLap,y=NGear))
print(g)

This chart soooo reminds me of simple op art…:-) I’m not sure how useful it is for showing gear selection according to distance round the track, but I just love the line of it:-)

#We can also show a trace for a single lap, such as speed coloured by gear
g=ggplot(subset(df,Lap==22)) + geom_line(aes(x=sLap,y=vCar,colour=NGear))
print(g)

ggplot can, of course, do line charts. In the above example, I make an intitial exploration into how line segment colour can be used to highlight gear selection that allows the car to reach the speed (y-axis) it does as it goes round the circuit (x is distance round lap).

#We can also do statistical graphics - like a boxplot showing the distribution of speed values by gear
g = ggplot(df) + geom_boxplot(aes(factor(NGear),vCar))
print(g)

ggplot isn’t just about literal graphing from data elements directly to marks on a canvas. It can also do stats as part of the mapping; in this case, we generate a boxplot that summarises the range of speeds achieved for different gear values.

#Footwork - brake and throttle pedal depression based on gear
g = ggplot(df) + geom_jitter(aes(factor(NGear),rThrottlePedal),colour='darkgreen') + geom_jitter(aes(factor(NGear),pBrakeF),colour='darkred')
print(g)

Some more dots… in the above case, I try to explore Button;s footwork in a little more detail, seeing how he applies brake and throttle pressure according to gear selection (throttle depression is green, brake is red).

#Forces on the driver
#gLong by brake and gear
g = ggplot(df) + geom_jitter(aes(factor(NGear),gLong,col=pBrakeF)) + scale_colour_gradient(low='red',high='green')
print(g)

In the diagrams immediately above and below, I try to show what sorts of longitudinal forces are typically experienced by Button according to gear selection, the idea being that we may get to see whether gears are used for linear acceleration or deceleration.

#gLong by throttle and gear
g = ggplot(df) + geom_jitter(aes(factor(NGear),gLong,col=rThrottlePedal)) + scale_colour_gradient(low='red',high='green')
print(g)

#gLong boxplot
ggplot(df) + geom_boxplot(aes(factor(NGear),gLong))+ geom_jitter(aes(factor(NGear),gLong),size=1)

Here, I use a boxplot to to try to see whether or not the longitudinal g-force is typically experienced under acceleration or braking by gear. Note that the points are scattered according to random jitter about their actual, integer values.

Finally, here’s a look at how engine RPM and the car speed relate to gear selection. Would you be able to work out how to write this diagram?

Here’s how I did it…

#How do engine revs and speed relate to gear selction?
ggplot(df)+geom_point(aes(x=nEngine,y=vCar,col=factor(NGear)))

Hopefully what this quick tour of ggplot has illustrated is how easy it can be to generate a wide range of charts from the same data set. May all your charts be written, and then generated directly from their source data;-)

PS I did try to generate these images via CloudStat, but for some reason the images didn’t generate properly [WORKS NOW - THANKS @cloudstatorg] (it seemed to work fine when I tried just a couple? Here’s the link anyway: Cloudstat: F1 telemetry demo. The plots were generated in RStudio and saved using ggsave()

As to how this fits in with other things? Regular readers may remember the occasional rant I’ve had about the importance of providing the queries that map from data sets onto summary data tables so that the means by which the summaries were generated from open raw data are made transparent. In a similar way, data cleansing tools such as Google Refine or Stanford Data Wrangler allow you to log the transformations applied to a messy raw data set in order to get it into a state where you can actually work with it. The same is true of images. It’s far too easy to generate complex graphics from even more complex datasets, and then forget how the image was actually created, maybe what it even represents. By writing the diagram, essentially generating a query that maps from the data onto the visual representation provided by the generated diagram or chart, we preserve the audit trail from data to the chart output.

In the same way we might imagine the word equation:

DATA + QUERY = REPORT

we might also imagine:

DATA + GRAPHER = CHART

(You get the idea? The words are probably not the right words, but the sentiment is there…)

In an academic research setting, where it’s common to find lists of figures presented separately in books or theses, it would also make sense to include a ‘figure appendix’ which gives, for example, the ggplot or ggplot commands required to generate each statistical graphic presented from the actual data source.

Sigh…if only my first name was Damien and I could persuade my minions to do the Lichtenstein thing and paint-by-numbers over some large projections of some of the more aesthetically interesting charts;-)

Written by Tony Hirst

March 14, 2012 at 2:14 pm

Posted in Rstats, Visualisation

Tagged with

F1 Championship Points as a d3.js Powered Sankey Diagram

d3.js crossed my path a couple of times yesterday: firstly, in the form of an enquiry about whether I’d be interested in writing a book on d3.js (I’m not sure I’m qualified: as I responded, I’m more of a script kiddie who sees things I can reuse, rather than have any understanding at all about how d3.js does what it does…); secondly, via a link to d3.js creator Mike Bostock’s new demo of Sankey diagrams built using d3.js:

Hmm… Sankey diagrams are good for visualising flow, so to get to grips myself with seeing if I could plug-and-play with the component, I needed an appropriate data set. F1 related data is usually my first thought as far as testbed data goes (no confidences to break, the STEM/innovation outreach/tech transfer context, etc etc) so what things flow in F1? What quantities are conserved whilst being passed between different classes of entity? How about points… points are awarded on a per race basis to drivers who are members of teams. It’s also a championship sport, run over several races. The individual Driver Championship is a competition between drivers to accumulate the most points over the course of the season, and the Constructor Chanmpionship is a battle between teams. Which suggests to me that a Sankey plot of points from races to drivers and then constructors might work?

So what do we need to do? First up, look at the source code for the demo using View Source. Here’s the relevant bit:

Data is being pulled in from a relatively addressed file, energy.json. Let’s see what it looks like:

Okay – a node list and an edge list. From previous experience, I know that there is a d3.js JSON exporter built into the Python networkx library, so maybe we can generate the data file from a network representation of the data in networkx?

Here we are: node_link_data(G) “[r]eturn data in node-link format that is suitable for JSON serialization and use in Javascript documents.”

Next step – getting the data. I’ve already done a demo of visualising F1 championship points sourced from the Ergast motor racing API as a treemap (but not blogged it? Hmmm…. must fix that) that draws on a JSON data feed constructed from data extracted from the Ergast API so I can clone that code and use it as the basis for constructing a directed graph that represents points allocations: race nodes are linked to driver nodes with edges weighted by points scored in that race, and driver nodes are connected to teams by edges weighted according to the total number of points the driver has earned so far. (Hmm, that gives me an idea for a better way of coding the weight for that edge…)

I don’t have time to blog the how to of the code right now – train and boat to catch – but will do so later. If you want to look at the code, it’s here: Ergast Championship nodelist. And here’s the result – F1 Chanpionship 2012 Points as a Sankey Diagram:

See what I mean about being a cut and paste script kiddie?!;-)

Written by Tony Hirst

May 24, 2012 at 8:39 am

F1 2012 Mid-Season Review

Rather belatedly, I got around to posting a series of posts summarising the Formula One season to date:

It’s also worth comparing the charts to the F1 2011 Season Review charts.

The R-code used to generate the graphics can be found here: F1 2012 Mid-Season Review – R Markdown.

Comments/suggestions/code improvements and extensions etc all welcome…

Written by Tony Hirst

August 30, 2012 at 8:26 am

Posted in Anything you want, Rstats

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 815 other followers