Spectator Centric Motor Racing Circuit Commentary

A bit over a decade ago, and several times since, I’ve idly wondered about being able to compete virtually in replay of an actual sporting event (Re:Play – The Future of Sports Gaming? “I’ll Take it From Here…”). Every so often, the idea pops up again (for example, Real racing in the virtual world), but now, it seems that real time gaming against live F1 racers [is] “only two years away”:

“We launched our virtual Grand Prix channel this year, which gives us the platform to produce a fully virtual version of the race live using the data,” said Morrison [John Morrison, Chief Technical Officer, Formula One Management]. “The thing we have to crack is we have to produce accurate positioning.
“Then we can do the gaming stuff and you can be in the car racing against other drivers. I reckon we are about two years away from that. We need accuracy to the nearest centimetre, so cars aren’t touching when they shouldn’t be touching. Right now we are more at 100-200mm accuracy.”


With multiple cameras offering 360 views, there are increasing opportunities for providing customised viewing perspectives using real footage. But simulated views from arbitrary viewpoints are also possible. For example, think of the virtual camera views that can be generated by Hawk Eye over a snooker table and then apply the same thing to 3D rendered models of F1 cars as they drive round a circuit (which has also been lidar scanned):

But that’s video… What about providing audio commentaries for spectators at a circuit that are created specifically for the listener according to where they are on the circuit?

For example, as a particular car goes by, I want my personal commentary to tell me what position they are in, as well as having bits of more general commentary about what’s going on elsewhere on the circuit. Through knowing the position of the cars on the circuit, and the position of the listener on the circuit (for example, based on wifi hotspot triangulation), we should be able to automatically generate a textual commentary that passes on information about the cars that the spectator can see from their current location, and then render that commentary to audio via a text to speech service.

Increasingly, I think there is a market in the automated generation of sports commentaries from sports data, it’s just I hadn’t thought about generating commentaries from a particular perspective to support the viewing of a live event from a particular location (“location specific” or “location sensitive” commentary).

The Associated Press (AP) would perhaps agree, aspiring as they are to the automation of 80 percent of their content production by 2020 (The AP wants to use machine learning to automate turning print stories into broadcast ones). They’re also looking at generating multiple versions of the same story, appropriate for different formats, from a single source.

Apparently, [o]n average, when an AP sportswriter covers a game, she produces eight different versions of the same story. Aside from writing the main print story, they have to write story summaries, separate ledes for both teams, convert the story to broadcast format, and more. How much easier it would be to just write one version and then generate the alternative presentations from it, which leads to this:

… a cross-sectional team of five AP staffers has been working on developing a framework to automate the process of converting print stories to broadcast format.

The team built a prototype that just identifies elements in print stories that need to be altered for broadcast. (Stories are shorter, sentences are more concise, attribution comes at the beginning of a sentence, numbers are rounded, and more.)

Hmmm… for location specific commentaries, I see another possibility: a generic commentary about events happening across a motor-racing circuit, intercut with live, custom commentary relating to what the spectator can actually see in front of them at that time, as if the commentator were sat by their side.

Related: eg in terms of automatically generating race commentaries from data – Detecting Undercuts in F1 Races Using R.

Crowd-Sourced Social Media Subtitling of the F1 Video Archive – Or Not…

As a team entry with Martin Hawksey, we put in an entry to the third Tata F1 Connectivity Prize Challenge, which was to catalogue the F1 video archive. We didn’t win (prizewinning entries here), but here’s the gist of our entry… You may recognise it as something we’d bounced ideas around with before…

Social Media Subtitling of F1 Races

Every race weekend, a multitude of F1 fans informally index live race coverage. Along with broadcasters and F1 teams, audiences use Twitter and other social media platforms to generate real-time metadata which could be used to index video footage. The same approach can be used to index the 60,000 hours of footage dating back to 1981.

We propose an approach that harnesses the collection of race commentaries with the promotion of the race being watched, an approach referred to as social media subtitling. Social media style updates collected while a race is being watched are harvested and used to provide commentary-like subtitles for each race. These subtitles can then be used to index and search into each race video.


Annotating Archival Footage

Audiences log in to an authenticated area using an account linked to one or more of their linked social media profiles. They select a video to watch and are presented with a DRM-enabled embedded streaming video player. An associated text editor can be used to create the social media subtitles. On starting to type into the text editor, a timestamp is grabbed from the video for a few seconds before the typing started (so a replay of the commented event can be seen) and associated with the text entry. On posting the subtitle, it is placed into timestamped comment database. Optionally, the comment can be published via a public social media account with an appropriate hashtag and a link to the timestamped part of the video. The link could lead to an authentication page to gain access to the video and the commentary client, or it may lead to a teaser video clip containing a second or two of the commented upon scene.


Examples of the evolution of the original iTitle Twitter subtitler by M. Hawksey, showing: timestamped social media subtitle editor linked to video player; searching into a video using timestamped social media updates; video transcript from social media harvested updates collected in realtime.

The subtitles can be searched and used to act as a timestamped text index describing the video. Subtitles can also be used to generate commentary transcripts.

If a fan watches a replay of a race and comments on it using their own social media account, a video start time tweet could be sent from the player when they start watching the video (“I’ve just started watching the XXXX #F1 race on #TataF1.tv [LINK]”). This tweet then acts to timestamp their social media updates relative the corresponding video timestamp as well as publicising the video.

Subtitle Quality Control

The quality of subtitles can be controlled in several ways:

  • a stream of subtitles can be played alongside the video and liked (positively scored) or disliked (negatively scored) by a viewer. (There is a further opportunity here for liked comments to be shared to public social media (along with a timestamped link into the video).) This feedback can also be used to generate trust ratings for commenters (someone whose comments are “liked” by a wide variety of people may be seen as providing trusted commentary);
  • text mining / topic modeling of aggregated comments around the same time can be used to identify crowd consensus topic or keywords.

If available, historical race timing data may be used to confirm certain sorts of information. For example, from timing sheets we can get data about pitstops, or the laps on which cars exited a race from an accident or mechanical failure. This information can be matched to the racetime timestamp of a comment; if comment topics match timing data identified events at about the right time, those comments can automatically be rated positively.

Making Use of Public Social Media Updates

For current and future races, logging social media updates around live races provides a way of bootstrapping the comment database. (Timestamps would be taken as realtime updates, although offsetting mechanisms to account for several second delays in digital TV feeds, for example, would need to be accounted for.) Feeds from known F1 journalists, race teams etc. would be taken as trusted feeds. Harvesting hashtagged feeds from the wider F1 audience would allow the collection of race comment social media updates more widely.


Social media updates can also be harvested in real time around live races or replayed races if we know the video start time.

For recent historical races, archived social media updates, as for example collected by Datasift, could be purchased and used to bootstrap the social media subtitle database.

Race Club

Social media subtitling provides a great opportunity for social activity. Groups of individuals can choose to watch a race at the same time, commenting to each other either through the bespoke subtitler client or by using public social media updates and an appropriate hashtag. If a user logs in to the video playback area, timestamps of concurrent updates from their linked public social media accounts can be reconciled with timestamps associated with the streamed video they are watching in the authenticated race video area.

In the off-season, or in the days leading up to a particular race “Historic Race Weekend” videos could be shown, perhaps according to a streamed broadcast model. That is, a race is streamed from within the authenticated area at a particular set time. Fans watch this scheduled event (under authentication) but comment on it in public using social media. These updates are harvested and the timestamps reconciled with the streamed video.


Social media subtitling draws on the idea that social media updates can be used to provide race commentary. Live social media comments collected around live events can be used to bootstrap a social media commentary database. Replayed streamed events can be annotated by associating social media update timestamps with known start/stop times of video replays. A custom client tied to a video player can be used to enter commentary directly to the database as well as issuing it as a social media update.

Team entry: Tony Hirst and Martin Hawksey

PS Rather than referring to social media subtitles and social media subtitling, I think social media captions and social media captioning is more generic?

Learning around F1…?!;-) learndirect Sponsor Marussia F1 Team

I often get quizzical looks when I drop F1 related visualisations into random presentations (“Tony slacking around again”), whereas if I said “Raspberry Pi” then it would somehow be rather more legitimate… However, one of the ways I see it is that I’m trying to engage in an informal way with a large audience in a target demographic, a significant proportion of which are prequalified as ‘interested in STEM’. I’m also trying to engage, albeit slackly, in some sort of weak knowledge transfer (hey, motor racing folk: you increasingly haz data, and maybe there are ways of visualising it to try and gain value from it that you haven’t really thought about yet…)

In case you didn’t already know, motorsport is worth shedloads* to quite a lot* of UK companies in both domestic and export sales and employs probably more than seven* people. (*Official trade association stats.)

Anyway, what prompted this post? This did:

Check out the sponsor...

learndirect as sponsor of the Marussia F1 Team?!

I have to admit, for some reason I associate learndirect with DirectGov, the government one stop-shop (will gov.uk be rebranded as DirectGov when it comes out of beta, I wonder? Or will DirectGov go the way of the open2.net and be quietly run down and then out?!)… but the truth of the matter is that learndirect is a VCprivate equity operated outfit, “the UK’s leading online learning provider”, apparently, “[acquired in] October 2011 [by LDC] … in a transaction valued in the order of £40 million.” LDC Portfolio: learndirect.

Ah, here’s where my memory tricked me (like it does with supermarket and bank “promises”…): “LDC bought learndirect by acquiring its parent Ufi Limited from the Ufi Charitable Trust (UCT). UCT, a registered charity, was set up in 1998 to use new technology to transform the delivery of learning and skills.” Ufi, of course, was the University for Industry, an ill-fated government venture that I seem to remember the OU partnered to a certain extent…

So why would LDC be splashing the learndirect brand all over the MarussiaF1 racing car (aside from the fact the learndirect owners LDC also have a stake in the Marussia F1 team (one aim of which is to “meet our latent sponsorship potential”, which presumably means getting sponsorship mileage for other LDC companies?), as well as having at least one person on both the learndirect and Marussia Virgin(?, or should that be F1?) Racing boards…

And there was me thinking there were absolutely no opportunities for wrangling F1 freebies, seeing as I am stuck in the education sector… Hmmm… time to dig out some of my old science, technology, engineering and maths outreach pitches, maybe…?! (If anyone at the Marussia F1 Racing team fancy chatting about exploring the use of data visualisation either for outreach, or maybe in research, please feel free to get in touch…:-) The (nearby, Milton Keynes based) OU also has various lab facilities and experience in instrumentation (including space flown instruments – so good on the heat, mass, volume and vibration front, I’m guessing…?), materials and CFD (though I suspect too much CFD may be something of a sore point!?), and I’ll happily put you in touch with folk who can tell you more if you’re interested…;-) There’s also some experience in Twitter audience interest profiling, heh heh;-)

PS MarussiaF1 also happen to have appointed a female test driver, Maria de Villota, which may or may not also be a good thing as far as WISE-like initiatives go (I know the drivers aren’t engineers, but it’s a aspiration-related funnel thing; see also James Allen on Why aren’t there more women engineers in F1, where he writes: “F1 in Schools has a very high ratio of female competitors, around 35%, and all-girl teams are quite common. And yet when they get to around 15 years of age, the numbers fall away and few girls pursue engineering degrees.”)

PPS During National Motorsport Week last year, I won a trip round the Marussia(-Virgin, as it then was) F1 factory in Dinnington, near Sheffield (it’s since moved to Banbury; the factory, that is, not Dinnington…;-). Here’s the obligatory blog post: Marussia Virgin Racing F1 Factory Visit. Btw, National Motorsport Week runs again this year too: National Motorsport Week 2012).

PPPS this reminds me of a noticing by @barnstormed (?) a couple of weeks ago that the OU had an ad on ?rotating digital hoardings during Six Nations rugby? (Confirmed by @stuartbrown: “the ou was advertising on boards during scot vs england in the 6 nations rugby”. Photo of that anyone?) Anyone got other examples of education related orgs sponsoring sports to a significant extent?

Marussia Virgin Racing F1 Factory Visit

Yesterday, I had the good fortune to visit the F1 Marussia Virgin Racing factory at Dinnington, near Sheffield, as a result of “winning” a luck dip competition run via GoMotorSport (part of a series of National Motorsport week promotions being run by the F1 teams based in the UK).

Marussia Virgin F1 Factory
[Thanks to @markhendy for the pic…]

Thanks to Finance Director Mark Hendy and engineer Shakey for the insight into the team’s operations:-)

Over the next few days and weeks, I’ll try to pick up on a few of the things I learned from the tour on the F1DataJunkie blog, tying them in to the corresponding technical regulations and other bits and pieces, but for now, here are some of the noticings I came away with…

– the engines aren’t that big, weighing 90kg or so and looking small than the engine in my own car…

– wheels are slotted onto the axles using a 3 pin mount on the front and a six(?) pin mount on the rear. (The engines are held on using a 6(?) point fixing.)

– the drivers aren’t that heavy either, weight wise (not that we met either of the drivers: neither Timo Glock nor Jerome D’Ambrosio are frequent visitors to the Dinnington factory, where the team’s cars are prepared fro before, and overhauled after, each race…): 70 kg or so. With cars prepared to meet racing weight regulations to a tolerance of 0.5kg or so, a large mixed grill and a couple of pints can make a big difference… (Hmm, I guess it would be easy enough to calculate the “big dinner weight effect” penalty on laptime?!)

I’m not sure if this was a “right-handed vs left-handed spanner” remark, but a comment was also made that the adhesive sponsor sticker can have a noticeable effect on the car’s aerodynamics as the corners become unstuck and start to flap. (Which made me wonder, of that is the case, is the shape of stickers taken into account? Is a leading edge on a label with a point/right angled corner rather than a smooth curve likely to come unstuck more easily, for example?!) Cars also need repainting every few races (stripping back to the carbon, and repainting afresh) because of pitting and chipping and other minor damage than can affect smooth airflow.

– side impact tubes are an integral part of the safety related design of the car:

– to track the usage of tyres during a race weekend, an FIA official scans a barcode on each tyre as it is used on the car:

The data junkie in me in part wonders whether this data could be made available in a timely fashion via the Pirelli website (or a Pirelli gadget on each team’s website) – or would that me giving away too much race intelligence to the other teams? That way, we could get an insight into the tyre usage over the course weekend…

– IT plays an increasingly important part of the the pit garage setup; local area networks (cabled and wifi?) are set up by each team for the weekend, the data engineers sitting behind the screen and viewing area in the garage (rather than having a fixed set up in one of the 5(?) trucks that attends each race.).

– the cars are rigged up with 60 or sensors; there is only redundancy on throttle and clutch sensors. Data analysis is in part provided through engineers provided by parts suppliers (McLaren Electronics, who supply the car’s ECU (and telemetry box(?)) provide a dedicated person(?) to support the team; data analysis is, in part, carried out using the Atlas (9?) Advanced Telemetry Linked Acquisition System from McLaren Electronic Systems. Data collected during a stint is transmitted under encryption back to the the pits, as well as being logged on the car itself. A full data dump is available to the team and the FIA scrutineers via an umbilical/wired connection when the car is pitted.

UST Global, one of the teams partners, also provide 3(?) data analysts to support the team during a race (presumably using UST Global’s “Race Management System”?).

– for design and testing, weekly reporting is required that conforms to a trade-off between the number of hours per week that each team can spend on wind tunnel testing (60 hours per week) and and CFD (“can’t find downforce”;-) simulation (40 teraflops per week). My first impression there was that efficient code could effectively mean more simulation testing?! (CFD via CSC? CSC expands relationship with Marussia Virgin Racing, doubling computing power for the team’s 2011 formula 1 season, or are things set to change with the replacement of Nick Wirth by Pat Symonds…?)

– the resource restriction agreement also limits the number of people who can work on the chassis. For a race weekend, teams are limited to 50 (47?) people. We were given a quick run down of at least (8?) engineer roles assigned to each car, but I forget them…

So – that’s a quick summary of some of the things I can remember off the top of my head…

…but here are a couple of other things to note that may be of interest…

Marussia Virgin are making the most of their Virgin partnership over the Silverstone race weekend with a camping party/Virgin Experience at Stowe School (Silverstone Weekend) and a hook-up with Joe Saward’s “An Audience With Joe“… (If you don’t listen to @sidepodcast’s An Aside With Joe podcast series, you should…;-)

The team has also got en education thing going with race ticket sweeteners for folk signing up to the course: Motorsport Management Online Course.

I can’t help thinking there may be a market for a “hardcore fans” course on F1 that could run over a race season and run as an informal, open online course… I still don’t really know how a car works, for example ;-)

Anyway – that’s by the by: thanks again to the GoMotorsport and the Marussia Virgin Racing team (esp. Mark Hendy and Shakey) for a great day out :-)

PS I think the @marussiavirgin team are trying to build up their social media presence too… to see who they’re listening to, here’s how their friends connect:

How friends of @marussiavirgin connect


Visualising Sports Championship Data Using Treemaps – F1 Driver & Team Standings

I *love* treemaps. If you’re not familiar with them, they provide a very powerful way of visualising categorically organised hierarchical data that bottoms out with a quantitative, numerical dimension in a single view.

For example, consider the total population of students on the degrees offered across UK HE by HESA subject code. As well as the subject level, we might also categorise the data according to the number of students in each year of study (first year, second year, third year).

If we were to tabulate this data, we might have columns: institution, HESA subject code, no. of first year students, no. of second year students, no. of third year students. We could also restructure the table so that the data was presented in the form: institution, HESA subject code, year of study, number of students. And then we could visualise it in a treemap… (which I may do one day… but not now; if you beat me to it, please post a link in the comments;-)

Instead, what I will show is how to visualise data from a sports championship, in particular the start of the Formula One 2011 season. This championship has the same entrants in each race, each a member of one of a fixed number of teams. Points are awarded for each race (that is, each round of the championship) and totalled across rounds to give the current standing. As well as the driver championship (based on points won by individual drivers) is the team championship (where the points contribution form drivers within a team is totalled).

Here’s what the results from the third round (China) looks like:

Driver Team Points
Lewis Hamilton McLaren-Mercedes 25
Sebastian Vettel RBR-Renault 18
Mark Webber RBR-Renault 15
Jenson Button McLaren-Mercedes 12
Nico Rosberg Mercedes 10
Felipe Massa Ferrari 8
Fernando Alonso Ferrari 6
Michael Schumacher Mercedes 4
Vitaly Petrov Renault 2
Kamui Kobayashi Sauber-Ferrari 1
Paul di Resta Force India-Mercedes 0
Nick Heidfeld Renault 0
Rubens Barrichello Williams-Cosworth 0
Sebastien Buemi STR-Ferrari 0
Adrian Sutil Force India-Mercedes 0
Heikki Kovalainen Lotus-Renault 0
Sergio Perez Sauber-Ferrari 0
Pastor Maldonado Williams-Cosworth 0
Jarno Trulli Lotus-Renault 0
Jerome d’Ambrosio Virgin-Cosworth 0
Timo Glock Virgin-Cosworth 0
Vitantonio Liuzzi HRT-Cosworth 0
Narain Karthikeyan HRT-Cosworth 0
Jaime Alguersuari STR-Ferrari 0

F1 2011 Results – China, © 2011 Formula One World Championship Ltd

We can represent data from across all the races using a table of the form:

Driver Team Points Race
Lewis Hamilton McLaren-Mercedes 25 China
Sebastian Vettel RBR-Renault 18 China
Felipe Massa Ferrari 10 Malaysia
Fernando Alonso Ferrari 8 Malaysia
Kamui Kobayashi Sauber-Ferrari 6 Malaysia
Michael Schumacher Mercedes 0 Australia
Pastor Maldonado Williams-Cosworth 0 Australia
Michael Schumacher Mercedes 0 Australia
Pastor Maldonado Williams-Cosworth 0 Australia
Narain Karthikeyan HRT-Cosworth 0 Australia
Vitantonio Liuzzi HRT-Cosworth 0 Australia

Sample of F1 2011 Results 2011, © 2011 Formula One World Championship Ltd

I’ve put a copy of the data to date at Many Eyes, IBM’s online interactive data visualisation site: F1 2011 Championship Points

Here’s what it looks like when we view it in a treemap visualisation:

The size of the boxes is proportional to the (summed) values within the hierarchical categories. In the above case, the large blocks are the total points awarded to each driver across teams and races. (The team field might be useful if a driver were to change team during the season.)

I’m not certain, but I think the Many Eyes treemap algorithm populates the map using a sorted list of summed numerical values taken through the hierarchical path from left to right, top to bottom. Which means top left is the category with the largest summed points. If this is the case, in the above example we can directly see that Webber is in fourth place overall in the championship. We can also look within each blocked area for more detail: for example, we can see Hamilton didn’t score as many points in Malaysia as he did in the other two races.

One of the nice features about the Many Eyes treemap is that it allows you to reorder the levels of the hierarchy that is being displayed. So for example, with a simple reordering of the labels we can get a view over the team championship too:

The Many Eyes treemap can be embedded in a web page (it’s a Java applet), although I’m not sure what, if any, licensing restrictions apply (I do know that the Guardian datastore blog embeds Many Eyes widgets on that site, though). Other treemap widgets are available (for example, Protovis and JIT both offer javascript enabled treemap displays).

What might be interesting would be to feed Protovis or the JIT with data dynamically form a Google Spreadsheet, for example, so that a single page could be used to display the treemap with the data being maintained in a spreadsheet.

Hmm, I wonder – does Google spreadsheets have a treemap gadget? Ooh – it does: treemap-gviz. It looks as if a bit of wrangling may be required around the data, but if the display works out then just popping the points data into a Google spreadsheet and creating the gadget should give an embeddable treemap display with no code required:-) (It will probably be necessary to format the data hierarchy by hand, though, requiring differently layed out data tables to act as source for individual and team based reports.)

So – how long before we see some “live” treemap displays for F1 results on the F1 blogs then? Or championship tables from other sports? Or is the treemap too confusing as a display for the uninitiated? (I personally don’t think so.. but then, I love macroscopic views over datasets:-)

PS see also More Olympics Medal Table Visualisations which includes a demonstration of a treemap visualisation over Olympic medal standings.

Visualising China 2011 F1 – Timing Charts

Just a quick post (that I could actually have published 20 mins or so ago), showing a couple of graphics generated from my scrape of the 2011 China Formula One Grand Prix timing data (via FIA press releases).

First up, the race to the podium:

Chna f1 2011 - the race to the podium
Data © 2011 Formula One World Championship Ltd, 6 Princes Gate, London, SW7 1QJ, England

The full lap chart, with pit stops:

China F1 pit 2011 stops
Data © 2011 Formula One World Championship Ltd, 6 Princes Gate, London, SW7 1QJ, England

Both the above graphics were using data scraped from press releases published on the FIA media centre website. You can find the data in the GDF format I used to generate the images using Gephi here (howto).

PS @bencc has also been on the case, visualising telemetry data from Vodafone McLaren Mercedes. For example, Hamilton’s tour and Button’s tour.

PPS which reminds me – here’s an example of how to use Gephi to visualise telemetry data captured from the McLaren websire: Visualising Vodafone Mclaren F1 Telemetry Data in Gephi

Visualising F1 Timing Sheet Data

Putting together a couple of tricks from recent posts (Visualising Vodafone Mclaren F1 Telemetry Data in Gephi and PDF Data Liberation: Formula One Press Release Timing Sheets), I thought I’d have a little play with the timing sheet data in Gephi…

The representations I have used to date are graph based, with each node corresponding a particular lap performance by a particular driver, and edges connecting consecutive laps.

**If you want to play along, you’ll need to download Gephi and this data file: F1 timing, Malaysia 2011 (NB it’s not throughly checked… glitches may have got through in the scraping process:-(**

The nodes carry the following data, as specified using the GDF format:

  • name VARCHAR: the ID of each node, given as driverNumber_lapNumber (e.g. 12_43)
  • label VARCHAR: the name of the driver (e.g. S. VETTEL
  • driverID INT: the driver number (e.g. 7)
  • driverNum VARCHAR: an ID for the driver of the lap (e.g. driver_12
  • team VARCHAR: the team name (e.g. Vodafone McLaren Mercedes)
  • lap INT: the lap number (e.g. 41)
  • pos INT: the position at the end of the lap (e.g. 5)
  • pitHistory INT: the number of pitstops to date (e.g. 2)
  • pitStopThisLap DOUBLE: the duration of any pitstop this lap, else 0 (e.g. 12.321)
  • laptime DOUBLE: the laptime, in seconds (e.g. 72.125)
  • lapdelta DOUBLE: the difference between the current laptime and the previous laptime (e.g. 1.327)
  • elapsedTime DOUBLE: the summed laptime to date (e.g. 1839.021)
  • elapsedTimeHun DOUBLE: the elapsed time divided by a hundred (e.g. )

Using the geolayout with an equirectangular (presumably this means Cartesian?) layout, we can generate a range of charts simply by selecting suitable co-ordinate dimensions. For example, if we select the laptime as the y (“latitude”) co-ordinate and x (“longitude”) as the lap, filtering out the nodes with a null laptime value, we can generate a graph of the form:

We can then tweak this a little – e.g. colour the nodes by driver (using a Partition based coluring), and edges according to node, resize the nodes to show the number of pit stops to date, and then filter to compare just a couple of drivers :

This sort of lap time comparison is all very well, but it doesn’t necessarily tell us relative track positions. If we size the nodes non-linearly according to position, with a larger size for the “smaller” numerical position (so first is less than second, and hence first is sized larger than second), we can see whether the relative positions change (in this case, they don’t…)

Another sort of chart we might generate will be familiar to many race fans, with a tweak – simply plot position against lap, colour according to driver, and then size the nodes according to lap time:

Again, filtering is trivial:

If we plot the elapsed time against lap, we get a view of separations (deltas between cars are available in the media centre reports, but I haven’t used this data yet…):

In this example, lap time flows up the graph, elapsed time increases left to right. Nodes are coloured by driver, and sized according to postion. If a driver has a hight lap count and lower total elapsed time than a driver on the previous lap, then it’s lapped that car… Within a lap, we also see the separation of the various cars. (This difference should be the same as the deltas that are available via FIA press releases.)

If we zoom into a lap, we can better see the separation between cars. (Using the data I have, I’m hoping I haven’t introduced any systematic errors arising from essentially dead reckoning the deltas between cars…)

Also note that where lines between two laps cross, we have a change of position between laps.

[ADDED] Here’s another view, plotting elapsed time against itself to see where folk are on the track-as-laptime:

Okay, that’s enough from me for now.. Here’s something far more beautiful from @bencc/Ben Charlton that was built on top of the McLaren data…

First up, a 3D rendering of the lap data:

And then a rather nice lap-by-lap visualisation:

So come on F1 teams – give us some higher resolution data to play with and let’s see what we can really do… ;-)

PS I see that Joe Saward is a keen user of Lap charts…. That reminds me of an idea for an app I meant to do for race days that makes grabbing position data as cars complete a lap as simple as clicking…;-) Hmmm….

PPS for another take of visualising the timing data/timing stats, see Keith Collantine/F1Fanatic’s Malaysia summary post.

PDF Data Liberation: Formula One Press Release Timing Sheets

If you want F1 summary timing data from practice sessions, qualifying and the race itself, you might imagine that the the FIA Media Centre is the place to go:

Hmm… PDFs…

Some of the documents provide all the results on a single page in a relatively straightforward fashion:

Others are split into tables over multiple pages:

Following the race, the official classification was available as a scrapable PDF in preliminary for, but the final result – with handwritten signature – looked to be a PDF of a photocopy, and as such defies scraping without an OCR pass first… which I didn’t try…

I did consider setting up separate scrapers for each timing document, and saving the data into a corresponding Scraperwiki database, but a quick look at the license conditions made me a little wary…

No part of these results/data may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording, broadcasting or otherwise without prior permission of the copyright holder except for reproduction in local/national/international daily press and regular printed publications on sale to the public within 90 days of the event to which the results/data relate and provided that the copyright symbol appears together with the address shown below …

Instead, I took the scrapers just so far such that I (that is, me ;-) could see how I would be able to get hold of the data without too much additional effort, but I didn’t complete the job… there’s partly an ulterior motive for this too… if anyone really wants the data, then you’ll probably have to do a bit of delving into the mechanics of Scraperwiki;-)

(The other reason for not my spending more time on this at the moment is that I was looking for a couple of simple exercises to get started with grabbing data from PDFs, and the FIA docs seemed quite an easy way in… Writing the scrapers is also bit like doing Sudoku, or Killer, which is one of my weekend pastimes…;-)

The scraper I set up is here: F1 Timing Scraperwiki

To use the scrapers, you need to open up the Scraperwiki editor, and do a little bit of configuration:

(Note the the press releases may disappear a few days after the race – I’m not sure how persistent the URLs are?)

When you’ve configured the scraper, run it…

The results of the scrape should now be displayed…

Scraperwiki does allow scraped data to be deposited into a database, and then accessed via an API, or other scrapers, or uploaded to Google Spreadsheets. However, my code stops at the point of getting the data into a Python list. (If you want a copy of the code, I posted it as a gist: F1 timings – press release scraper; you can also access it via Scraperwiki, of course).

Note that so far I’ve only tried the docs from a single race, so the scrapers may break on the releases published for future (or previous) races… Such is life when working with scrapers… I’ll try to work on robustness as the races go by. (I also need to work on the session/qualifying times and race analysis scrapers… they currently report unstructured data and also display an occasional glitch that I need to handle via a post-scrape cleanser.

If you want to use the scraper code as a starting point for building a data grabber that publishes the timing information as data somewhere, that’s what it’s there for (please let me know in the comments;-)

PS by the by, Mercedes GP publish an XML file of the latest F1 Championship Standings. They also appear to be publishing racetrack information in XML form using URLs of the form http://assets.mercedes-gp.com/—9—swf/assets/xml/race_23_en.xml. Presumably the next race will be 24?

If you know of any other “data” sources or machine readable, structured/semantic data relating to F1, please let me know via a comment below:-)

F1 Pit Stop Strategist (What I’d Like to See): Post Pitstop Re-entry Points

Why oh why doesn’t F1 get into the spirit of releasing live time data in a API form during the race?

Here’s something I’d like to build, based on track position graphics:

LF1 driver track position

The ability to play along as a pit lane strategist looking for opportunities about when to pit….

For example, I’d select my driver, then using a model of how long it takes to pit, how far behind the traffic is, and how the time difference maps onto distance round the track, we could pop up a graphic showing the window the pitting car would look to return in to…

F1 track position

Post hoc timing data is available, I suppose, so I guess I could model what this might look like anyway…?

F1 Data Junkie – What Does This Data Point Refer To Again?

The count down is on to my first post unpicking some of the telemetry data grabbed from the Mclaren F1 site during the Bahrain Grand Prix, and then maybe this weekend’s race, but first, here’s another tease…

One of the problems I’ve found from a data-based (groan…) storytelling perspective is relating what the data’s telling us to what we know the car is doing is from where it is on the track. As I/we refine our data anlaysis skills we’ll be able just to look at the data and work out what the likely features of the track are at the point the data was collected; but as novice data engineers, we need all the cribs we can get. Which is why I had a little play with my Processing code and built an interactive data explorer that looks something like this:

The idea is that I can easily select a data trace, or a location on the track, and get a snapshot of the data collected at that point in the context of the other data points. That is, this data navigator allows me to expose the data collected in a single sample, in the the context of the position of the car on the track, and given the state of the other data values at the same point in time, as well as immediately before and immediately after.

I’ll post a version of this data explorer somewhere when I post the first data analysis post proper, but for now, you’ll just have to make do with the video…;-)

PS As to where the data came from, that story is described here: F1 Data Junkie – Looking at What’s There