d3.js crossed my path a couple of times yesterday: firstly, in the form of an enquiry about whether I’d be interested in writing a book on d3.js (I’m not sure I’m qualified: as I responded, I’m more of a script kiddie who sees things I can reuse, rather than have any understanding at all about how d3.js does what it does…); secondly, via a link to d3.js creator Mike Bostock’s new demo of Sankey diagrams built using d3.js:
Hmm… Sankey diagrams are good for visualising flow, so to get to grips myself with seeing if I could plug-and-play with the component, I needed an appropriate data set. F1 related data is usually my first thought as far as testbed data goes (no confidences to break, the STEM/innovation outreach/tech transfer context, etc etc) so what things flow in F1? What quantities are conserved whilst being passed between different classes of entity? How about points… points are awarded on a per race basis to drivers who are members of teams. It’s also a championship sport, run over several races. The individual Driver Championship is a competition between drivers to accumulate the most points over the course of the season, and the Constructor Chanmpionship is a battle between teams. Which suggests to me that a Sankey plot of points from races to drivers and then constructors might work?
So what do we need to do? First up, look at the source code for the demo using View Source. Here’s the relevant bit:
Data is being pulled in from a relatively addressed file, energy.json. Let’s see what it looks like:
Okay – a node list and an edge list. From previous experience, I know that there is a d3.js JSON exporter built into the Python networkx library, so maybe we can generate the data file from a network representation of the data in networkx?
Next step – getting the data. I’ve already done a demo of visualising F1 championship points sourced from the Ergast motor racing API as a treemap (but not blogged it? Hmmm…. must fix that) that draws on a JSON data feed constructed from data extracted from the Ergast API so I can clone that code and use it as the basis for constructing a directed graph that represents points allocations: race nodes are linked to driver nodes with edges weighted by points scored in that race, and driver nodes are connected to teams by edges weighted according to the total number of points the driver has earned so far. (Hmm, that gives me an idea for a better way of coding the weight for that edge…)
I don’t have time to blog the how to of the code right now – train and boat to catch – but will do so later. If you want to look at the code, it’s here: Ergast Championship nodelist. And here’s the result – F1 Chanpionship 2012 Points as a Sankey Diagram:
See what I mean about being a cut and paste script kiddie?!;-)
I had a bit of a play with Shiny over the weekend, using the Ergast Motor Racing Data API and the magical Shiny library for R, that makes building interactive, browser based applications around R a breeze.
As this is just a quick heads-up/review post, I’ll largely limit myself to a few screenshots. When I get a chance, I’ll try to do a bit more of a write-up, though this may actually just take the form of more elaborate documentation of the app, both within the code and in the form of explanatory text in the app itself.
Here’s the initial view – the frist race of the season is selected as a default and data loaded in. The driver list is for all drivers represented during the season.
THe driver selectors allow us to just display traces for selected drivers.
The Race History chart is a classic results chart. It show the difference between the race time to date for each driver, by lap, compared to the average lap time for the winner times the lap number. (As such, this is an offline statistic – it is calculated when the winner’s overall average laptime is known).
Variants of the classic Race History chart are possible, for example, using different base line times, but I haven’t implemented any of them – or the necessary UI controls. Yet…
The Lap Chart is another classic:
Annotations for this chart are also supported, describing all drivers who final status was not “Finished”.
The Lap Evolution chart shows how each driver’s laptime evolved over the course of the race compared with the fastest overall recorded laptime.
The Personal Lap Evolution chart shows how each driver’s laptime evolved over the course of the race compared with their personal fastest laptime.
The Personal Deltas Chart shows the difference between one laptime and the next for each driver.
The Race Summary Chart is a chart of my own design that tries to capture notable features relating to race position – the grid position (blue circle), final classification (red circle), position at the end of the first lap (the + or horizontal bar). The violin plot shows the distribution of how many laps the driver spent in each race position. Where the chart is wide, the driver spent a large number of laps in that position.
The x-axis ordering pulls out different features about how the race progressed. I need to add in a control that lets the user select different orderings.
Finally, the Fast Lap text scatterplot shows the fastest laptime for each driver and the lap at which they recorded it.
So – that’s a quick review of the app. All in all it took maybe 3 hours getting my head round the data parsing, 2-3 hours figuring what I wanted to do and learning how to do it in Shiny, and a couple of hours doing it/starting to document/annotate it. Next time, it’ll be much quicker…