F1 Championship Points as a d3.js Powered Sankey Diagram

d3.js crossed my path a couple of times yesterday: firstly, in the form of an enquiry about whether I’d be interested in writing a book on d3.js (I’m not sure I’m qualified: as I responded, I’m more of a script kiddie who sees things I can reuse, rather than have any understanding at all about how d3.js does what it does…); secondly, via a link to d3.js creator Mike Bostock’s new demo of Sankey diagrams built using d3.js:

Hmm… Sankey diagrams are good for visualising flow, so to get to grips myself with seeing if I could plug-and-play with the component, I needed an appropriate data set. F1 related data is usually my first thought as far as testbed data goes (no confidences to break, the STEM/innovation outreach/tech transfer context, etc etc) so what things flow in F1? What quantities are conserved whilst being passed between different classes of entity? How about points… points are awarded on a per race basis to drivers who are members of teams. It’s also a championship sport, run over several races. The individual Driver Championship is a competition between drivers to accumulate the most points over the course of the season, and the Constructor Chanmpionship is a battle between teams. Which suggests to me that a Sankey plot of points from races to drivers and then constructors might work?

So what do we need to do? First up, look at the source code for the demo using View Source. Here’s the relevant bit:

Data is being pulled in from a relatively addressed file, energy.json. Let’s see what it looks like:

Okay – a node list and an edge list. From previous experience, I know that there is a d3.js JSON exporter built into the Python networkx library, so maybe we can generate the data file from a network representation of the data in networkx?

Here we are: node_link_data(G) “[r]eturn data in node-link format that is suitable for JSON serialization and use in Javascript documents.”

Next step – getting the data. I’ve already done a demo of visualising F1 championship points sourced from the Ergast motor racing API as a treemap (but not blogged it? Hmmm…. must fix that) that draws on a JSON data feed constructed from data extracted from the Ergast API so I can clone that code and use it as the basis for constructing a directed graph that represents points allocations: race nodes are linked to driver nodes with edges weighted by points scored in that race, and driver nodes are connected to teams by edges weighted according to the total number of points the driver has earned so far. (Hmm, that gives me an idea for a better way of coding the weight for that edge…)

I don’t have time to blog the how to of the code right now – train and boat to catch – but will do so later. If you want to look at the code, it’s here: Ergast Championship nodelist. And here’s the result – F1 Chanpionship 2012 Points as a Sankey Diagram:

See what I mean about being a cut and paste script kiddie?!;-)


    • Tony Hirst

      Hi Bruce – Thanks for pointing to that post (Odd’s on I’d have also picked it up via a trackback/clickthru if you’d linked to this post too;-)

      I don’t tend to use Excel much, but it seems like you’ve get a great set of tools built around it?

      Out of interest, do you use NodeXL at all? And/or have you tinkered with outputting various graph formats (eg GEXF, which renders quite nicely with http://sigmajs.org/ )?

  1. brucemcpherson (@brucemcpherson)

    Hi tony
    Actually I haven’t used either of those, but I had, coincidentally , just been looking at nodexl just the other day ,but work intervened as it often does. Generally speaking i prefer to work on showing ways to complement excel whilst still recognizing the power of its legacy .. I think there are something like half a billion excels users who could really benefit from being released from the excel shackles so that’s my main interest and focus of my site and blog.
    Some of your posts have really made me think about the plethora of ‘stuff’ (opportunities) out there .. I don’t know how you find time to keep up (and write about it … For me the ratio of doing something and writing about is about 1/5 … ) .. Now I’m going to have to go and find it about Gexf ;)
    Best regards

  2. Pingback: Sankey your Google Spreadsheet Data #d3js Jisc CETIS MASHe
  3. Pingback: Over 2000 D3.js Examples and Demos | TechSlides