Trying to find useful things to do with emerging technologies in open education and data journalism. Snarky and sweary to anyone who emails to offer me content for the site.
Just a quick post (that I could actually have published 20 mins or so ago), showing a couple of graphics generated from my scrape of the 2011 China Formula One Grand Prix timing data (via FIA press releases).
The representations I have used to date are graph based, with each node corresponding a particular lap performance by a particular driver, and edges connecting consecutive laps.
**If you want to play along, you’ll need to download Gephi and this data file: F1 timing, Malaysia 2011 (NB it’s not throughly checked… glitches may have got through in the scraping process:-(**
The nodes carry the following data, as specified using the GDF format:
name VARCHAR: the ID of each node, given as driverNumber_lapNumber (e.g. 12_43)
label VARCHAR: the name of the driver (e.g. S. VETTEL
driverID INT: the driver number (e.g. 7)
driverNum VARCHAR: an ID for the driver of the lap (e.g. driver_12
team VARCHAR: the team name (e.g. Vodafone McLaren Mercedes)
lap INT: the lap number (e.g. 41)
pos INT: the position at the end of the lap (e.g. 5)
pitHistory INT: the number of pitstops to date (e.g. 2)
pitStopThisLap DOUBLE: the duration of any pitstop this lap, else 0 (e.g. 12.321)
laptime DOUBLE: the laptime, in seconds (e.g. 72.125)
lapdelta DOUBLE: the difference between the current laptime and the previous laptime (e.g. 1.327)
elapsedTime DOUBLE: the summed laptime to date (e.g. 1839.021)
elapsedTimeHun DOUBLE: the elapsed time divided by a hundred (e.g. )
Using the geolayout with an equirectangular (presumably this means Cartesian?) layout, we can generate a range of charts simply by selecting suitable co-ordinate dimensions. For example, if we select the laptime as the y (“latitude”) co-ordinate and x (“longitude”) as the lap, filtering out the nodes with a null laptime value, we can generate a graph of the form:
We can then tweak this a little – e.g. colour the nodes by driver (using a Partition based coluring), and edges according to node, resize the nodes to show the number of pit stops to date, and then filter to compare just a couple of drivers :
This sort of lap time comparison is all very well, but it doesn’t necessarily tell us relative track positions. If we size the nodes non-linearly according to position, with a larger size for the “smaller” numerical position (so first is less than second, and hence first is sized larger than second), we can see whether the relative positions change (in this case, they don’t…)
Another sort of chart we might generate will be familiar to many race fans, with a tweak – simply plot position against lap, colour according to driver, and then size the nodes according to lap time:
Again, filtering is trivial:
If we plot the elapsed time against lap, we get a view of separations (deltas between cars are available in the media centre reports, but I haven’t used this data yet…):
In this example, lap time flows up the graph, elapsed time increases left to right. Nodes are coloured by driver, and sized according to postion. If a driver has a hight lap count and lower total elapsed time than a driver on the previous lap, then it’s lapped that car… Within a lap, we also see the separation of the various cars. (This difference should be the same as the deltas that are available via FIA press releases.)
If we zoom into a lap, we can better see the separation between cars. (Using the data I have, I’m hoping I haven’t introduced any systematic errors arising from essentially dead reckoning the deltas between cars…)
Also note that where lines between two laps cross, we have a change of position between laps.
[ADDED] Here’s another view, plotting elapsed time against itself to see where folk are on the track-as-laptime:
Okay, that’s enough from me for now.. Here’s something far more beautiful from @bencc/Ben Charlton that was built on top of the McLaren data…
First up, a 3D rendering of the lap data:
And then a rather nice lap-by-lap visualisation:
So come on F1 teams – give us some higher resolution data to play with and let’s see what we can really do… ;-)
PS I see that Joe Saward is a keen user of Lap charts…. That reminds me of an idea for an app I meant to do for race days that makes grabbing position data as cars complete a lap as simple as clicking…;-) Hmmm….
PPS for another take of visualising the timing data/timing stats, see Keith Collantine/F1Fanatic’s Malaysia summary post.
Last year, I popped up an occasional series of posts visualising captures of the telemetry data that was being streamed by the Vodoafone McLaren F1 team (F1 Data Junkie).
I’m not sure what I’m going to do with the data this year, but being a lazy sort, it struck me that I should be able to visualise the data using Gephi (using in particular the geo layout that lets you specify which node attributes should be used as x and y co-ordinates when placing the nodes.
Taking a race worth of data, and visualising each node as follows (size as throttle value, colour as brake) we get something like this:
(Note that the resolution of the data is 1Hz, which explains the gaps…)
It’s possible to filter the data to show only a lap’s worth:
We could also filter out the data to only show points where the throttle value is above a certain value, or the lateral acceleration (“G-force”) and so on… or a combination of things (points where throttle and brake are applied, for example). I’ll maybe post examples of these using data from this year’s races…. err..?;-)
For now though, here’s a little video tour of Gephi in action on the data:
What I’d like to be able to do is animate this so I could look at each lap in turn, or maybe even animate an onion skin of the “current” point and a couple of previous ones) but that’s a bit beyond me… (for now….?!;-) If you know how, maybe we should talk?!:-)
[Thanks to McLaren F1 for streaming this data. Data was captured from the McLaren F1 website in 2010. I believe the speed, throttle and brake data were sponsored by Vodafone.]
PS If McLaren would like to give me some slightly higher resolution data, maybe from an old car on a test circuit, I’ll see what I can do with it… Similarly, any other motor racing teams in any other formula who have data they’d like to share, I’m happy to have a play… I’m hoping to go to a few of the BTCC races this year, so I’d particularly like to hear from anyone from any of those teams, or teams in the supporting races:-) If a Ginetta Junior team is up for it, we might even be able to get an education/outreach thing going into school maths, science, design and engineering clubs…;-)
If you want F1 summary timing data from practice sessions, qualifying and the race itself, you might imagine that the the FIA Media Centre is the place to go:
Hmm… PDFs…
Some of the documents provide all the results on a single page in a relatively straightforward fashion:
Others are split into tables over multiple pages:
Following the race, the official classification was available as a scrapable PDF in preliminary for, but the final result – with handwritten signature – looked to be a PDF of a photocopy, and as such defies scraping without an OCR pass first… which I didn’t try…
I did consider setting up separate scrapers for each timing document, and saving the data into a corresponding Scraperwiki database, but a quick look at the license conditions made me a little wary…
No part of these results/data may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording, broadcasting or otherwise without prior permission of the copyright holder except for reproduction in local/national/international daily press and regular printed publications on sale to the public within 90 days of the event to which the results/data relate and provided that the copyright symbol appears together with the address shown below …
Instead, I took the scrapers just so far such that I (that is, me ;-) could see how I would be able to get hold of the data without too much additional effort, but I didn’t complete the job… there’s partly an ulterior motive for this too… if anyone really wants the data, then you’ll probably have to do a bit of delving into the mechanics of Scraperwiki;-)
(The other reason for not my spending more time on this at the moment is that I was looking for a couple of simple exercises to get started with grabbing data from PDFs, and the FIA docs seemed quite an easy way in… Writing the scrapers is also bit like doing Sudoku, or Killer, which is one of my weekend pastimes…;-)
To use the scrapers, you need to open up the Scraperwiki editor, and do a little bit of configuration:
(Note the the press releases may disappear a few days after the race – I’m not sure how persistent the URLs are?)
When you’ve configured the scraper, run it…
The results of the scrape should now be displayed…
Scraperwiki does allow scraped data to be deposited into a database, and then accessed via an API, or other scrapers, or uploaded to Google Spreadsheets. However, my code stops at the point of getting the data into a Python list. (If you want a copy of the code, I posted it as a gist: F1 timings – press release scraper; you can also access it via Scraperwiki, of course).
Note that so far I’ve only tried the docs from a single race, so the scrapers may break on the releases published for future (or previous) races… Such is life when working with scrapers… I’ll try to work on robustness as the races go by. (I also need to work on the session/qualifying times and race analysis scrapers… they currently report unstructured data and also display an occasional glitch that I need to handle via a post-scrape cleanser.
If you want to use the scraper code as a starting point for building a data grabber that publishes the timing information as data somewhere, that’s what it’s there for (please let me know in the comments;-)
Although F1 circuit maps are provided by the FIA for each Formula One Grand Prix, they don’t give elevation data. Nor did a quick web search turn up any elevation maps.
In several previous posts on the topic of visualising F1 telemetry data I’ve plotted various map views, so I wondered whether I could generate elevation maps too… and it appears I can, using the Google elevation API:
The x-axis is distance round the lap, so if data from multiple laps is captured, we can start to get a more complete set of altitude data round the circuit.
Simply pass the API one or more sets of latitude/longitude co-ordinates, and I can get back elevation data. So for example, the above data (captured from the Vodafone Mclaren Live website) shows a lap by Jenson Button of the Malaysia circuit earlier this year.
So here’s what I’m thinking: how about a set of telemetric circuit guides derived from the telemetry data?
To do this, I think it would make sense to use data captured across the whole of a race, putting it into meter wide sLap bins (or maybe moving average bins 3 meters wide?) and then recording for each bin:
– the average latitude and longitude, to try to generate some idea of racing line;
– the most common (mode) gear setting
– the most common “g-force” values to show directional forces on the car; (or maybe we could derive an angle of travel based on current and next, or current and previous locations?);
– the average (mean) speed (maybe with outliers removed?)
– some average of brake and throttle values?
As to how to display the map – the use of Google elevation data requires that a Google map is also displayed, so using something like this map/scatterplot mashup technique might be appropriate?
Having good (average) resolution data for lat/long around the circuit as a whole also means we should be able to generate a reasonable KML tour to view a lap animation in Google Earth (e.g. reusing this Google Earth path/tour simulator (Silverstone data, F1 car kmz model)).
The only thing is – I can’t get myself motivated to hacking the code required to do this today:-(
PS hmmm… seems like racing line data is avaliable on the Racecar Enginerring website (e.g. Formula 1 2010: Round 5 Barcelona tech data). There’s also this interesting article on GPS data – it’s just a shame that the resolution we get on a per lap basis from the Mclaren site is at too poor a resolution (one sample per second) to be able to do anything really interesting with it…
Another Formula One Grand Prix race weekend, another chance to tinker with some F1 visualisations, this time from China. Not much new this week – it’s been more a case of me making a start on tidying up my scripts, but I have started trying to think about driver comparisons.
So for example, on the driver DNA charts, we see a difference in gear change behavior between Button and Hamilton about 40% of the way into the track:
We can see this a little more clearly on a geographical projection of the gear change data (the tracks are offset from each other to aid visualisation):
We can also see some different in rThrottlePedal behaviour:
Okay, that’s enough for just now – back to coding up some KML views for Google Earth ;-)
Although I missed the live race for the second time in a row, and didn’t get a chance to play with the data as quickly as I would have liked to, I did spend some of my time away wondering how to plot all the telemetry data for a driver captured during a race in a single graphic.
The single lap view, like this one from one of Button’s laps at the 2010 Malaysian Grand Prix:
is all very well, but if we overlay traces from each lap onto the distance labeled x-axis, the charts just become messy to read.
So how about this instead. On the x-axis, we have the distance traveled round the track per lap. The drivers are pretty consistent in the lines they take, so the overall distance is pretty consistent. If we have a 4km track, and a chart that’s 400 pixels wide, each pixel corresponds to 10m resolution of track distance. For the y-axis, we use the lap number. And to plot the actual value of a telemetry measurement, let’s use colour. Put these together, and we come up with some driver DNA charts – the ones below are form Hamilton:
So how do you read these? Each strip is a different measure. The colour intensity increases with increasing value up to the maximum recorded value. Within each strip, time flows down the strip.
The top, blue strip shows the gear (1 to 7); the green strip shows the throttle pedal depression (0-100%), and the red strip shows the brake (0-100%). The light blue strip is a composite of the previous three strips. The whiter the pixel, the closer it is to 100% throttle in 7th gear with no braking.
The bottom two traces show the longitudinal and lateral g-force respectively. For the longitudinal trace, red shows braking – being forced into the steering wheel; green shows acceleration – being forced back into your seat. You’ll see the greatest g-force under braking occurs when the brakes are slapped full on… (the red bits in the third and fifth traces line up). For the latitudinal g-force, the red shows the driving being flung to the left (i.e. right hand corner), the green shows them being pushed out to the right.
I’m slowly pulling enough tools together to be able to start telling some stories… so stay tuned ;-)
I seem to have no free time to do anything this week, or next, or the week after, so this is just a another placeholder – a couple of quick views over some of Hamilton’s telemetry from the Australian Grand Prix.
First, a quick Google Earth to check the geodata looks okay – the number labels on the pins show the gear the car is in:
Next, a quick look over the telemetry video (hopefully I’ll have managed to animate this properly for the next race…)
And finally, a Google map shows the locations where the pBrakeF (brake pedal force?) is greater than 10 %.
Oh to have some time to play with this more fully…;-)
Another scheduled eye candy tease of a post… this time, visualising the braking zone ( > 5% brake force) over several tours of the Bahrain circuit:
These markers really need colouring into bins (e.g. 5-10%, 10-20%, 21-40%, 41-60%, 61-80%, >80%) or treating via a heat map to show the brake going on/coming off, if we can assume that Hamilton is doing pretty much the same thing each lap… (What we’re doing i essentially trying to create a fine degree of resolution in space by taking samples over returns to the same space over multiple laps.)
By way of comparison, here’s where Hamilton is full on with the throttle (throttle at 100%):
Gulp… remember when the Hamster tried to do that?
Sheesh… so you you hear him talking about the braking forces, here’s what it looks like in the 4G (longitudinal, very heavy braking) and above bin:
Okay – I know I said that the next post in this series would start looking at the stories the Mclaren F1 telemetry data is telling us, but I’m away this weekend so I thought I’d schedule another eye-candy post…
So here you go – a couple of ways of looking at the data in Google Earth by popping the data into a google spreadsheet, grabbing the CSV data out and pushing it though a Yahoo Pipe, which helpfully generates a KML file for us. Firstly, a simple tour with speed labels on the markers:
Then it struck me – if I do a couple of minor tweaks to the KML, I can produce some coloured markers. So for example, here we have a view where the marker colour represents the gear Hamilton’s car is in during a race-day single lap of the 2010 Bahrain Grand Prix circuit:
The tilted view of the first image is far more appealing, don’t you think?
(What I really want to do is a heat map, but I think that’ll take a couple of hours the first time round, which I just don’t have at the moment…)