Visualising F1 Timing Sheet Data

Putting together a couple of tricks from recent posts (Visualising Vodafone Mclaren F1 Telemetry Data in Gephi and PDF Data Liberation: Formula One Press Release Timing Sheets), I thought I’d have a little play with the timing sheet data in Gephi…

The representations I have used to date are graph based, with each node corresponding a particular lap performance by a particular driver, and edges connecting consecutive laps.

**If you want to play along, you’ll need to download Gephi and this data file: F1 timing, Malaysia 2011 (NB it’s not throughly checked… glitches may have got through in the scraping process:-(**

The nodes carry the following data, as specified using the GDF format:

  • name VARCHAR: the ID of each node, given as driverNumber_lapNumber (e.g. 12_43)
  • label VARCHAR: the name of the driver (e.g. S. VETTEL
  • driverID INT: the driver number (e.g. 7)
  • driverNum VARCHAR: an ID for the driver of the lap (e.g. driver_12
  • team VARCHAR: the team name (e.g. Vodafone McLaren Mercedes)
  • lap INT: the lap number (e.g. 41)
  • pos INT: the position at the end of the lap (e.g. 5)
  • pitHistory INT: the number of pitstops to date (e.g. 2)
  • pitStopThisLap DOUBLE: the duration of any pitstop this lap, else 0 (e.g. 12.321)
  • laptime DOUBLE: the laptime, in seconds (e.g. 72.125)
  • lapdelta DOUBLE: the difference between the current laptime and the previous laptime (e.g. 1.327)
  • elapsedTime DOUBLE: the summed laptime to date (e.g. 1839.021)
  • elapsedTimeHun DOUBLE: the elapsed time divided by a hundred (e.g. )

Using the geolayout with an equirectangular (presumably this means Cartesian?) layout, we can generate a range of charts simply by selecting suitable co-ordinate dimensions. For example, if we select the laptime as the y (“latitude”) co-ordinate and x (“longitude”) as the lap, filtering out the nodes with a null laptime value, we can generate a graph of the form:

We can then tweak this a little – e.g. colour the nodes by driver (using a Partition based coluring), and edges according to node, resize the nodes to show the number of pit stops to date, and then filter to compare just a couple of drivers :

This sort of lap time comparison is all very well, but it doesn’t necessarily tell us relative track positions. If we size the nodes non-linearly according to position, with a larger size for the “smaller” numerical position (so first is less than second, and hence first is sized larger than second), we can see whether the relative positions change (in this case, they don’t…)

Another sort of chart we might generate will be familiar to many race fans, with a tweak – simply plot position against lap, colour according to driver, and then size the nodes according to lap time:

Again, filtering is trivial:

If we plot the elapsed time against lap, we get a view of separations (deltas between cars are available in the media centre reports, but I haven’t used this data yet…):

In this example, lap time flows up the graph, elapsed time increases left to right. Nodes are coloured by driver, and sized according to postion. If a driver has a hight lap count and lower total elapsed time than a driver on the previous lap, then it’s lapped that car… Within a lap, we also see the separation of the various cars. (This difference should be the same as the deltas that are available via FIA press releases.)

If we zoom into a lap, we can better see the separation between cars. (Using the data I have, I’m hoping I haven’t introduced any systematic errors arising from essentially dead reckoning the deltas between cars…)

Also note that where lines between two laps cross, we have a change of position between laps.

[ADDED] Here’s another view, plotting elapsed time against itself to see where folk are on the track-as-laptime:

Okay, that’s enough from me for now.. Here’s something far more beautiful from @bencc/Ben Charlton that was built on top of the McLaren data…

First up, a 3D rendering of the lap data:

And then a rather nice lap-by-lap visualisation:

So come on F1 teams – give us some higher resolution data to play with and let’s see what we can really do… ;-)

PS I see that Joe Saward is a keen user of Lap charts…. That reminds me of an idea for an app I meant to do for race days that makes grabbing position data as cars complete a lap as simple as clicking…;-) Hmmm….

PPS for another take of visualising the timing data/timing stats, see Keith Collantine/F1Fanatic’s Malaysia summary post.

Visualising Vodafone Mclaren F1 Telemetry Data in Gephi

Last year, I popped up an occasional series of posts visualising captures of the telemetry data that was being streamed by the Vodoafone McLaren F1 team (F1 Data Junkie).

I’m not sure what I’m going to do with the data this year, but being a lazy sort, it struck me that I should be able to visualise the data using Gephi (using in particular the geo layout that lets you specify which node attributes should be used as x and y co-ordinates when placing the nodes.

Taking a race worth of data, and visualising each node as follows (size as throttle value, colour as brake) we get something like this:

(Note that the resolution of the data is 1Hz, which explains the gaps…)

It’s possible to filter the data to show only a lap’s worth:

We could also filter out the data to only show points where the throttle value is above a certain value, or the lateral acceleration (“G-force”) and so on… or a combination of things (points where throttle and brake are applied, for example). I’ll maybe post examples of these using data from this year’s races…. err..?;-)

For now though, here’s a little video tour of Gephi in action on the data:

What I’d like to be able to do is animate this so I could look at each lap in turn, or maybe even animate an onion skin of the “current” point and a couple of previous ones) but that’s a bit beyond me… (for now….?!;-) If you know how, maybe we should talk?!:-)

[Thanks to McLaren F1 for streaming this data. Data was captured from the McLaren F1 website in 2010. I believe the speed, throttle and brake data were sponsored by Vodafone.]

PS If McLaren would like to give me some slightly higher resolution data, maybe from an old car on a test circuit, I’ll see what I can do with it… Similarly, any other motor racing teams in any other formula who have data they’d like to share, I’m happy to have a play… I’m hoping to go to a few of the BTCC races this year, so I’d particularly like to hear from anyone from any of those teams, or teams in the supporting races:-) If a Ginetta Junior team is up for it, we might even be able to get an education/outreach thing going into school maths, science, design and engineering clubs…;-)

UK Journalists on Twitter

A post on the Guardian Datablog earlier today took a dataset collected by the Tweetminster folk and graphed the sorts of thing that journalists tweet about ( Journalists on Twitter: how do Britain’s news organisations tweet?).

Tweetminster maintains separate lists of tweeting journalists for several different media groups, so it was easy to grab the names on each list, use the Twitter API to pull down the names of people followed by each person on the list, and then graph the friend connections between folk on the lists. The result shows that the hacks are follow each other quite closely:

UK Media Twitter echochamber (via tweetminster lists)

Nodes are coloured by media group/Tweetminster list, and sized by PageRank, as calculated over the network using the Gephi PageRank statistic.

The force directed layout shows how folk within individual media groups tend to follow each other more intensely than they do people from other groups, but that said, inter-group following is still high. The major players across the media tweeps as a whole seem to be @arusbridger, @r4today, @skynews, @paulwaugh and @BBCLauraK.

I can generate an SVG version of the chart, and post a copy of the raw Gephi GDF data file, if anyone’s interested…

PS if you’re interested in trying out Gephi for yourself, you can download it from gephi.org. One of the easiest ways in is to explore your Facebook network

PPS for details on how the above was put together, here’s a related approach:
Trying to find useful things to do with emerging technologies in open education
Doodlings Around the Data Driven Journalism Round Table Event Hashtag Community
.

For a slightly different view over the UK political Twittersphere, see Sketching the Structure of the UK Political Media Twittersphere. And for the House and Senate in the US: Sketching Connections Between US House and Senate Tweeps

More Pivots Around Twitter Data (little-l, little-d, again;-)

I’ve been having a play with Twitter again, looking at how we can do the linked thing without RDF, both within a Twitter context and also (heuristically) outside it.

First up, hashtag discovery from Twitter lists. Twitter lists can be used to collect together folk who have a particular interest, or be generated from lists of people who have used a particular hashtag (as Martin H does with his recipe for Populating a Twitter List via Google Spreadsheet … Automatically! [Hashtag Communities]).

The thinking is simple: grab the most recent n tweets from the list, extract the hashtags, and count them, displaying them in descending order. This gives us a quick view of the most popular hashtags recently tweeted by folk on the list: Popular recent tags from folk a twitter list

This is only a start, of course: it might be that a single person has been heavily tweeting the same hashtag, so a sensible next step would be to also take into account the number of people using each hashtag in ranking the tags. It might also be useful to display the names of folk on the list who have used the hashtag?

I also updated a previous toy app that makes recommendations of who to follow on twitter based on a mismatch between the people you follow (and who follow you) and the people following and followed by another person – follower recommender (of a sort!):

The second doodle was inspired by discussions at Dev8D relating to a possible “UK HE Developers’ network”, and relies on an assumption – that the usernames people use Twitter might be used by the same person on Github. Again, the idea is simple: can we grab a list of Twitter usernames for people that have used the dev8d hashtag (that much is easy) and then lookup those names on Github, pulling down the followers and following lists from Github for any IDs that are recognised in order to identify a possible community of developers on Github from the seed list of dev8d hashtagging Twitter names. (It also occurs to me that we can pull down projects Git-folk are associated with (in order to identify projects Dev8D folk are committing to) and the developers who also commit or subscribe to those projects.)

follower/following connections on github using twitter usernames that tweeted dev8d hashtag

As the above network shows, it looks like we get some matches on usernames…:-)

Visualising Ad Hoc Tweeted Link Communities, via BackType

So you’ve tweeted a link as part of your social media/event amplification strategy, and it’s job done, right? Or is there maybe some way you can learn something about who else found that interesting?

Notwitshtanding the appearance of yet another patent of the bleedin’ obvious, here’s one way I’ve been experimenting with for tracking informal, ad hoc communities around a link. (In part this harkens back to some of my previous “social life of a URL” doodles such as delicious URL History – Hyperbolic Tree Visualisation, More Hyperbolic Tree Visualisations – delicious URL History: Users by Tag.)

In part inspired by a comment by Chris Jobling on one of my flickr Twitter network images, here’s a recipe for identifying a core community that may be interested in a retweeted link:

– given the URL, look up who’s tweeted it via the BackType API;
– for each tweeter of the link, grab the list of people they follow (i.e. their friends);
– plot the “inner” network showing which of the people who tweeted the link the follow each other.

To explore the possible reach of the tweeted link, grab the followers of each person who tweeted the link and plot that network. This is likely to be quite a large network, so you may want to prune it a little, for example by filtering out everyone with node degree less than two.

So for example, earlier today I spotted a tweet about an OU philosophy game (To Lie or Not to Lie?), which I also retweeted. Here’s what the “inner” retweet graph looks like at the moment:

The node size is related to degree, the colour to total follower count. The graph can be used to identify a core network of folk who may be willing to promote OU activities (maybe…?!;-)

The next image shows the retweeters and their followers, filtered to show followers with degree 2 or more (the “double hit” audience). [Actually, I should filter on ((out-degree > 0) and (degree > 1 and in-degree > 1)).]

The nodes are partitioned into clusters using the Gephi modularity statistic and coloured accordingly. Node size is related to total follower count. Layout is done using an expanded Yifan Hu layout.

In a follow on post, I’ll show how we can generate network maps for people on delicious who either bookmarked a particular URL, or follows someone who did…

Dominant Tags in My Delicious Network

Following on from Social Networks on Delicious, here’s a view over my delicious network (that is, the folk I “follow” on delicious) and the dominant tags they use:

The image is created from a source file generated by:

1) grabbing the list of folk in my delicious network;
2) grabbing the tags each of them uses;
3) generating a bipartite network specification graph containing user and edge nodes, with weighted links corresponding to the number of times a user has used a particular tag (i.e. the number of bookmarks they have bookmarked using that tag).

Because the original graph is a large, sparse one (many users define lots of tags but only use them rarely), I filtered the output view to show only those tags that have been used more than 150 times each by any particular user, based on the weight of each edge (remember, the edge weight describes the number of times a used has used a particular tag). (So if every user had used the same tag up to but not more 149 times each, it wouldn’t be displayed). The tag nodes are sized according to the number of users who have used the tag 150 or more times.

I also had a go at colouring the nodes to identify tags used heavily by a single user, compared to tags heavily used by several members of my network.

Here’s the Python code:

import urllib, simplejson

def getDeliciousUserNetwork(user,network):
  url='http://feeds.delicious.com/v2/json/networkmembers/'+user
  data = simplejson.load(urllib.urlopen(url))
  for u in data:
    network.append(u['user'])
    #time also available: u['dt']
  #print network
  return network

def getDeliciousTagsByUser(user):
  tags={}
  url='http://feeds.delicious.com/v2/json/tags/'+user
  data = simplejson.load(urllib.urlopen(url))
  for tag in data:
    tags[tag]=data[tag]
  return tags

def printDeliciousTagsByNetwork(user,minVal=2):
  f=openTimestampedFile('delicious-socialNetwork','network-tags-' + user+'.gdf')
  f.write(gephiCoreGDFNodeHeader(typ='delicious')+'\n')
 
  network=[]
  network=getDeliciousUserNetwork(user,network)

  for user in network:
    f.write(user+','+user+',user\n')
  f.write('edgedef> user1 VARCHAR,user2 VARCHAR,weight DOUBLE\n')
  for user in network:
    tags={}
    tags=getDeliciousTagsByUser(user)
    for tag in tags:
      if tags[tag]>=minVal:
         f.write(user+',"'+tag.encode('ascii','ignore') + '",'+str(tags[tag])+'\n')
  f.close()

Looking at the network, it’s possible to see which members of my network are heavy users of a particular tag, and furthermore, which tags are heavily used by more than one member of my network. The question now is: to what extent might this information help me identify whether or not I am following people who are likely to turn up resources that are in my interest area, by virtue of the tags used by the members of my network.

Picking up on the previous post on Social Networks on Delicious, might it be worth looking at the tags used heavily by my followers to see what subject areas they are interested in, and potentially the topic area(s) in which they see me as acting as a resource investigator?

Social Networks on Delicious

One of the many things that the delicious social networking site appears to have got wrong is how to gain traction from its social network. As well as the incidental social network that arises from two or more different users using the same tag or bookmarking the same resource (for example, Visualising Delicious Tag Communities Using Gephi), there is also an explicit social network constructed using an asymmetric model similar to that used by Twitter: specifically, you can follow me (become a “fan” of me) without my permission, and I can add you to my network (become a fan of you, again without your permission).

Realising that you are part of a social network on delicious is not really that obvious though, nor is the extent to which it is a network. So I thought I’d have a look at the structure of the social network that I can crystallise out around my delicious account, by:

1) grabbing the list of my “fans” on delicious;
2) grabbing the list of the fans of my fans on delicious and then plotting:
2a) connections between my fans and and their fans who are also my fans;
2b) all the fans of my fans.

(Writing “fans” feels a lot more ego-bollox than writing “followers”; is that maybe one of the nails in the delicious social SNAFU coffin?!)

Here’s the way my “fans” on delicious follow each other (maybe? I’m not sure if the fans call always grabs all the fans, or whether it pages the results?):

The network is plotted using Gephi, of course; nodes are coloured according to modularity clusters, the layout is derived from a Force Atlas layout).

Here’s the wider network – that is, showing fans of my fans:

In this case, nodes are sized according to betweenness centrality and coloured according to in-degree (that is, the number of my fans who have this people as fans). [This works in so far as we’re trying to identify reputation networks. If we’re looking for reach in terms of using folk as a resource discovery network, it would probably make more sense to look at the members of my network, and the networks of those folk…)

If you want to try to generate your own, here’s the code:

import simplejson

def getDeliciousUserFans(user,fans):
  url='http://feeds.delicious.com/v2/json/networkfans/'+user
  #needs paging? or does this grab all the fans?
  data = simplejson.load(urllib.urlopen(url))
  for u in data:
    fans.append(u['user'])
    #time also available: u['dt']
  #print fans
  return fans

def getDeliciousFanNetwork(user):
  f=openTimestampedFile("fans-delicious","all-"+user+".gdf")
  f2=openTimestampedFile("fans-delicious","inner-"+user+".gdf")
  f.write(gephiCoreGDFNodeHeader(typ="min")+"\n")
  f.write("edgedef> user1 VARCHAR,user2 VARCHAR\n")
  f2.write(gephiCoreGDFNodeHeader(typ="min")+"\n")
  f2.write("edgedef> user1 VARCHAR,user2 VARCHAR\n")
  fans=[]
  fans=getDeliciousUserFans(user,fans)
  for fan in fans:
    time.sleep(1)
    fans2=[]
    print "Fetching data for fan "+fan
    fans2=getDeliciousUserFans(fan,fans2)
    for fan2 in fans2:
      f.write(fan+","+fan2+"\n")
      if fan2 in fans:
        f2.write(fan+","+fan2+"\n")
  f.close()
  f2.close()

So what”s the next step…?!