OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Uncourse’ Category

F1 Data Junkie – Getting Started

Data is a wonderful thing, isn’t it…?! Take the following image, for example:

Hamilton on the brake (red 0% blue 100%) round bahrain

It depicts the racetrack used for the Bahrain Formula One grand prix last week, and it was generated from 2 laps worth of data collected at 1 second intervals from Lewis Hamilton’s car.

Long time readers will probably know that over the last couple of years I tried to track down bits of F1 motor racing related data, but to no real avail. Last year, I had the chance to look round the Red Bull Factory a couple of times (it’s located just a few minutes walk away from the OU campus in Milton Keynes) and chat to a couple of the folk there about possible outreach activites around data among other things (I’m still hopeful that something may come of that…).

Related to that possibility, we commissioned a rather nice gadget for displaying time series data against a map, in anticipation of getting car data we could visualise.

Anyway, ever impatient, when I saw that the Mclaren F1 website included a racetime dashboard, a Dev8D payoff in the form of a quick twitter conversation with @bencc resulted in him grabbing the last 30 or so laps worth of that data… :-)

I’ve had a quick play with it to see what sorts of thing might be possible (such as the map above), and there’s a whole bundle of stories that I think the data can turn up. I intend to explore as many of these stories as I can over the next few months, hopefully aided, abetted and corrected by a colleague from my department who is far better at physics than me… because the data contains a whole raft of physics related stories (and remember, folks: physics is fun).

If this: a) sounds like a turn off, but b) you claim to be interested in things like data journalism, please try to stick with the posts in this series (who knows – it may even turn into an uncourse;-) On the other hand, if you’re an F1 junkie (i.e. you follow @sidepodcast and/or listen to the Sidepodcast podcast;-) I’ll tag the posts f1data, so you can visit the tag page or grab the feed if you want to and not have to expose yourself to the rest of the ramblings that appear on this blog…

There’ll be no magic involved, though the results may turn out to be magical if you’re into geeky F1 stuff…;-) but to try and widen the appeal I’ll try to explore what stories the data holds about what’s happening to the car and driver, reflect a bit on what we can learn about extracting stories from data, and look to try to unpick the additional knowledge we might need to bring to the data in order to extract the most meaning from it; you can then see if these lessons apply to data – and stories – that you are interested in!

PS the F1 Fanatic blog is also doing a series on visualising and making sense of F1 related data, as this post on Bahrain Grand Prix FP2 analysis demonstrates. (See my own attempts at doing similar things with timing related data from last year: Visualising Lap Time Data – Australian Grand Prix, 2009). [UPDATE: As pointed out in the comments - and how could I have forgotten this?! (doh!) - there is also the F1 Numbers blog.]

Depending on how my F1 data related posts go, I may even try to hook up with the F1 Fanatic blog and/or the Sidepodcast folk to see if we can work together on ways of presenting this data stuff in a way that your everyday F1 fan might appreciate; just like T151 Digital Worlds helps your everyday computer game player appreciate just what’s involved in the design, development, business and culture of computer gaming:-) <- that’s a shameless plug if Christine or Mr C are reading…:-)

Written by Tony Hirst

March 19, 2010 at 12:45 am

Posted in Data, Uncourse

Tagged with

F1 Data Junkie – Looking at What’s There

…and by looking, I mean looking at what’s there as raw data structure format and content, rather than looking at what stories a visual data analysis reveals, I’m afraid… (that’ll come later:-) But if you do need something visual to inspire you, here’s a tease of what the data from a single lap of Hamilton’s race day tour of the Bahrain 2010 F1 Grand Prix looks like:

Hamilton - single lap data from bahrain

In his book Visualising Data: Exploring and Explaining Data with the Processing Environment, Ben Fry describes a seven stage process for understanding data:

Acquire
Obtain the data, whether from a file on disk or a source over a network.
Parse
Provide some structure for the data’s meaning, and order it into categories.
Filter
Remove all but the data of interest.
Mine
Apply methods from statistics or data mining as a way to discern patterns or place the data in a mathematical context.
Represent
Choose a basic visual model, such as a bar graph, list or tree.
Refine
Improve the basic representation to make it clearer and more visually engaging.
Interact
Add methods for manipulating the data or controlling what features are visible.

I’m not sure I’d agree with the elements above defining a rigid linear process


(Produced using Graphviz; dot file)

I prefer to think of the way I work as something like this:


(Produced using Graphviz; dot file)

but what is clear is that we need to understand what data is available to us.

As I mentioned in F1 Data Junkie – Getting Started, the data I’m going to be playing with (at least at first) is data grabbed from the Mclaren F1 Live Dashboard (as developed by Work Club) so let’s have a look at it… (I’ll come back to how the data was acquired in a later post.)

The data – which I think is updated on the server at a rate of once per second (maybe someone form Work Club could confirm that?) – is published as JSON (Javascript Object Notation) data in a callback function (so it uses what is referred to as the JSON-P (“JSON with Padding”) convention. What this means is that the dashboard web page can call the server and get a some data back in such a way that as soon as the data is loaded into the page, a Javascript programme function can run using the data that has just been downloaded).

Here’s what the data as grabbed from the server looks like:

Dashboard.jsonCallback('{
 "drivers":{
  "HAM":{
   "code":"HAM",
   "telemetry" {
    "timestamp":"15:40:58.383",
    "nEngine":13920,
    "NGear":2,
    "rThrottlePedal":68,
    "pBrakeF":2,
    "gLat":1,
    "gLong":1,
    "sLap":3116,
    "vCar":96.9,
    "NGPSLatitude":26.03185,
    "NGPSLongitude":50.51304
   },
   "additional":{
    "lap":"17",
    "position":"5",
    "is_racing":"1"
   }
  },
  "BUT":{"code":"BUT","telemetry":{"timestamp":"15:40:58.790","nEngine":13106,"NGear":3,"rThrottlePedal":49,"pBrakeF":2,"gLat":2,"gLong":0,"sLap":2681,"vCar":140.2,"NGPSLatitude":26.0333,"NGPSLongitude":50.51635},"additional":{"lap":"17","position":"8","is_racing":"1"}}},
  "commentary":[
   {
    "name":"COM",
    "initials":"CM",
    "text":"Lewis sets the fastest lap with a 2\'00:447 that time around.",
    "timestamp":"2010-03-14 12:40:14"
   }
  ]
 }
');

So what data is there? Well, there are telemetry data fields for each driver:

  • timestamp – the time of day;
  • nEngine – the number of revs the engine is doing;
  • NGear – the gear the car is in (over the range 1 to 7);
  • rThrottlePedal – the amount of throttle depression(?) (as a percentage);
  • pBrakeF – the amount of brake depression(?) (as a percentage)
  • gLat – the lateral “g-force” (that is, the side-to-side g-force that you feel in a car when going round a corner too quickly);
  • gLong – the longitudinal “g-force” (that is, the forwards and backwards force you feel in a car that accelerates or decelerates quickly when going in a straight line);
  • sLap – the distance round the lap (in metres); this resets to zero on each tour, presumably at the start/finish line);
  • vCar – the speed of the car (km/h);
  • NGPSLatitude – the GPS identified latitude of the car;
  • NGPSLongitude – the GOS identified longitude of the car.

On some samples, there is also commentary information, but I’m going to largely ignore that..

Parsing

The data I got hold of was a bundle of files containing data in the JSONP format like that shown above, with one file containing one package of data created once a second.

In order to parse the data, I needed to decide what format I wanted it in for processing. The format I chose was CSV – comma separated variable data – that looks like this:

The first column is the original filename, the other columns correspond to data downloaded from the Mclaren site.

In order to generate the CSV data, I wrote a Python script that would:
– strip off the padding around the JSON data;
– parse the JSON using a standard Python JSON parsing library;
– (add a line to strip out escape characters that weren’t handled correctly by the parser and place it before the parsing step);
– use a CSV library to write out the data in the CSV format.

(See an example Python script here.)

I then refined my parsing script so that it would generate one CSV file per lap. To do this, the script had to:
– detect when the lap distance in one sample was less than the distance in the previous sample (i.e. the lap distance measure has been reset to zero between the two samples, using a construction of the form if oldDist > d['sLap']: where oldDist = d['sLap'] once we have written the corresponding data record to the CSV file);
– if a new lap has been started, close the old CSV file, create a new one, write the column header information into the top of the file, and then start adding the data to that file.

Having got the data into a CSV format, I could then load it in to an environment where I could start to think about visualising it. A spreadsheet for example, or Processing, (which is what I used to create the single lap view shown at the start of this post).

But to see how do that on the one hand, and what stories we can find in the data on the other, we’ll need to move on to another post…

[Reflection on this post: to get a large number of folk interested, I really need to do less of the geeky techie stuff, and more of the "what does the data say" cool viz stuff... but if I don't log the bootstrap techie stuff now before I overcomplicate it(?!) my record of the simplest file handling and parsing code will get lost...;-) In the book version, a large part of this post would be a technical appendix... But the "what data fields are available" bit would be in Chapter 2 (after a fluffy Chapter 1!).]

PS some of the technical details behind the Mclaren site have started appearing on the personal blog of one of the developers – e.g. Building McLaren.com – Part 3: Reading Telemetry. In that post it was pointed out I haven’t been adding copyright notices about the data to the posts – which I’ll happily do once I know who to acknowledge, and how… In the meantime, it appears that “the speed, throttle and brake are sponsored by Vodafone” and “McLaren are providing this data for you to view” so I should link to them: thanks McLaren :-)

Written by Tony Hirst

March 21, 2010 at 10:44 am

Posted in Data, Tinkering, Uncourse, Visualisation

Tagged with

Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I

A couple of weeks ago, I came across Gephi, a desktop application for visualising networks.

And quite by chance, a day or two after I was asked about any tools I knew of that could visualise and help analyse social network activity around an OU course… which I take as a reasonable justification for exploring exactly what Gephi can do :-)

So, after a few false starts, here’s what I’ve learned so far…

First up, we need to get some graph data – netvizz – facebook to gephi suggests that the netvizz facebook app can be used to grab a copy of your Facebook network in a format that Gephi understands, so I installed the app, downloaded my network file, and then uninstalled the app… (can’t be too careful ;-)

Once Gephi is launched (and updated, if it’s a new download – you’ll see an updates prompt in the status bar along the bottom of the Gephi window, right hand side) Open… the network file you downloaded.

NB I think the graph should probably be loaded as an undirected graph… That is, if A connects to B, B connects to A. But I’m committed to the directed version in this case, so we’ll stick with it… (The directed version would make sense for a Twitter network (which has an asymmetric friending model), where A may follow B, but B might choose not to follow A. In Facebook, friending is symmetric – A can only friend B if B friends A.

(Btw, I’ve come across a few gotchas using Gephi so far, including losing the window layout shown above. Playing with the Reset Windows from the Windows menu sometimes helps… There may be an easier way, but I haven’t found it yet…)

The graph window gives a preview of the network – in this case, the nodes are people and the edges show that one person is following another. (Remember, I should have loaded this as an undirected graph. The directed edges are just an artefact of the way the edge list that states who is connected to whom was generated by netvizz.)

Using the scroll wheel on a mouse (or two finger push on my Mac mousepad), you can zoom in and out of the network in the graph view. You can also move nodes around, view the labels, switch the edges on and off off, and recenter the view.

Not shown – but possible – is deleting nodes from the graph, as well as editing their properties.

You can also generate views of the graph that show information about the network. In the Ranking panel, if you select the Nodes tab, set the option to Degree (the number of edges/connections attached to a node) and then choose the node size button (the jewel), you can set the size of the node to be proportional to the number of connections. Tune the min and max sizes as required, then hit apply:

You can also colour the nodes according to properties:

So for example, we might get something like this:

Label size and colour can also be proportional to node attributes:

To view the labels, make sure you click on the Text labels option at the bottom of the graph panel. You may also need to tweak the label size slider that’s also on the bottom of the panel.

If you want to generate a pretty version of the graph, you need to do a couple of things. Firstly, in the layout panel, select a layout algorithm. Force Atlas is the one that the original tutorial recommends. The repulsion strength determines how dispersed the final graph will be (i.e. it sets the “repulsive force” between nodes); I set a value of 2000, but feel free to play:

When you hit Run, the button label will change to Stop and the graph should start to move and reorganise itself. Hit Stop when the graph looks a little better laid out. Remember, you can also move nodes around in the graph as show in the video above.

Having run the Layout routine, we can now generate a pretty view of the graph. In the Preview Settings panel on the left-hand side of the Gephi environment, select “Show Labels” and then hit “Refresh”:

In the Preview panel, (next tab along from Preview Settings), you should see a the prettified, 3D layout view:

Note that in this case I haven’t made much attempt at generating a nice layout, for example by moving nodes around in the graph window to better position them, but you can do… (just remember to Refresh the Preview view in the Preview Settings… (There must be a shortcut way of doing that, but I haven’t found it…!:-(

If you want to look at who any particular individual is connected to, you can go to the
Data Table panel (again in the set of panels on the right hand side, just along from the Preview tab panel) and search for people by name. Here, I’m searching the edges to see who of my Facebook friends a certain Martin W is also connected to on Facebook;

It’s easy enough to highlight/select and copy these cells and then post them into a spreadsheet if required.

So that’s step 1 of getting started with Gephi… a way of using it to explore a graph in very general terms; but that’s not where the real fun lies. That starts when you start processing the graph by running statistics and filters over it. But for that, you’ll have to wait for the next post in this series… which is here: Getting Started With Gephi Network Visualisation App – My Facebook Network, Part II: Basic Filters

Written by Tony Hirst

April 16, 2010 at 11:20 am

Posted in Tutorial, Uncourse, Visualisation

Tagged with ,

Getting Started With Gephi Network Visualisation App – My Facebook Network, Part II: Basic Filters

In Getting Started With Gephi Network Visualisation App – My Facebook Network, Part I I described how to get up and running with the Gephi network visualisation tool using social graph data pulled out of my Facebook account. In this post, I’ll explore some of the tools that Gephi provides for exploring a network in a more structured way.

If you aren’t familiar with Gephi, and if you haven’t read Part I of this series, I suggest you do so now…

…done that…?

Okay, so where do we begin? As before, I’m going to start with a fresh worksheet, and load my Facebook network data, downloaded via the netvizz app, into Gephi, but as an undirected graph this time! So far, so exactly the same as last time. Just to give me some pointers over the graph, I’m going to set the node size to be proportional to the degree of each node (that is, the number of people each person is connected to).

I can activate text labels for the nodes that are proportional to the node sizes from the toolbar along the bottom of the Graph panel:

…remembering to turn on the text labels, of course!

So – how can we explore the data visually using Gephi? One way is to use filters. The notion of filtering is incredibly powerful one, and one that I think is both often assumed and underestimated, so let’s just have a quick recap on what filtering is all about.

This maybe?

grean beans - House Of Sims (via flickr)
["green beans" by House Of Sims]

Filters – such as sieves, or colanders, but also like EQ settings and graphic, bass or treble equalisers on music players, colour filters on cameras and so on – are things that can be used to separate one thing from another based on their different properties. So for example, a colander can be used to separate green beans from the water it was boiled in, and a bass filter can be used to filter out the low frequency pounding of the bass on an audio music track. In Gephi, we can use filters to separate out parts of a network that have particular properties from other parts of the network.

The graph of Facebook friends that we’re looking at shows people I know as nodes; a line connecting two nodes (generally known as an edge) shows that that the two people represented by the corresponding nodes are also friends with each other. The size of the node depicts its degree, that is, the number of edges that are connected to it. We might interpret this as the popularity (or at least, the connectedness) of a particular person in my Facebook network, as determined by the number of my friends that they are also a friend of.

(In an undirected network like Facebook, where if A is a friend of B, B is also a friend of A, the edges are simple lines. In a directed network, such as the social graph provided by Twitter, the edges have a direction, and are typically represented by arrows. The arrow shows the direction of the relationship defined by the edge, so in Twitter an arrow going from A to B might represent that A is a follower of B; but if there is no second arrow going from B to A, then B is not following A.)

We’ve already used degree property of the nodes to scale the size of the nodes as depicted in the network graph window. But we can also use this property to filter the graph, and see just who the most (or least) connected members of my Facebook friends are. That is, we can see which people are friends of lots of the people am I friends of.

So for example – of my Facebook friends, which of them are friends of at least 35 people I am friends with? In the Filter panel, click on the Degree Range element in the Topology folder in the Filter panel Library and drag and drop it on to the Drag Filter Here

Adjust the Degree Range settings slider and hit the Filter button. The changes to allow us to see different views over the network corresponding to number of connections. So for example, in the view shown above, we can see members of my Facebook network who are friends with at least 30 other friends in my network. In my case, the best connected are work colleagues.

Going the other way, we can see who is not well connected:

One of the nice things we can do with Gephi is use the filters to create new graphs to work with, using the notion of workspaces.

If I export the graph of people in my network with more than 35 connections, it is place into a nw workspace, where I can work on it separately from the complete graph.

Navigating between workspaces is achieved via a controller in the status bar at the bottom right of the Gephi environment:

The new workspace contains just the nodes that had 35 or more connections in the original graph. (I’m not sure if we can rename, or add description information, to the workspace? If you know how to do this, please add a comment to the post saying how:-)

If we go back to the original graph, we can now delete the filter (right click, delete) and see the whole network again.

One very powerful filter rule that it’s worth getting to grips with is the Union filter. This allows you to view nodes (and the connections between them) of different filtered views of the graph that might otherwise be disjoint. So for example, if I want to look at members of my network with ten or less connections, but also see how they connect to each other to Martin Weller, who has over 60 connections, the Union filter is the way to do it:

That is, the Union filter will display all nodes, and the connections between them, that either have 10 or less connections, or 60 or more connections.

As before, I can save just the members of this subnetwork to a new workspace, and save the whole project from the File menu in the normal way.

Okay, that’s enough for now… have a play with some of the other filter options, and paste a comment back here about any that look like they might be interesting. For example, can you find a way of displaying just the people who are connected to Martin Weller?

Written by Tony Hirst

April 23, 2010 at 12:06 pm

Posted in Tutorial, Uncourse, Visualisation

Tagged with ,

Getting Started With Gephi Network Visualisation App – My Facebook Network, Part III: Ego Filters and Simple Network Stats

In a couple of previous posts on exploring my Facebook network with Gephi, I’ve shown how to plot visualise the network, and how to start constructing various filtered views over it (Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I and Getting Started With Gephi Network Visualisation App – My Facebook Network, Part II: Basic Filters). In this post, I’ll explore a new feature, ego filters, as well as looking at some simple social network analysis tools that can help us better understand the structure of a social network.

To start with, I’m going to load my Facebook network data (grabbed via the Netvizz app, as before) into Gephi as an undirected graph. As mentioned above, the ego network filter is a new addition to Gephi, which will show that part of a graph that is connected to a particular person. So for example, I can apply the ego filter (from the Topology folder in the list of filters) to “George Siemens” to see which of my Facebook friends George knows.

Gephi - ego filter - my Facebook friends who are friends with George Siemens

If I save this as a workspace, I can then tunnel into it a little more, for example by applying a new ego filter to the subgraph of my friends who George Siemens knows. In this case, lets add Grainne to the mix – and see who of my friends know both George Siemens and Grainne:

Ego filter applied within an ego filtered workspace

Note that I could have achieved a similar effect with the full graph by using the intersection filter (as introduced in the previous post in this series):

Seeing my facebook connections that two of my Facebook friends know

The depth of the ego filter also allows you to see who of of my friends the named individual knows either directly, or through one of my other friends. Using an ego filtered network to depth two (frined of a friend) around George Siemens, I can run some network statistics over just that group of people. So for example, if I run the Degree statistics over the network, and then set the node size according to node degree within that network this is what I get:

Running stats on the network

(I also turned node labels on and set their size proportional to node size.)

Running Network Diameter stats generates the following sorts of report:

Gephi Network diameter stats

That is:

- betweenness centrality;
– closeness centrality;
– eccentricity.

These all sound pretty technical, so what do they refer to?

Betweenness centrality is a measure based on the number of shortest paths between any two nodes that pass through a particular node. Nodes around the edge of the network would typically have a low betweenness centrality. A high betweenness centrality might suggest that the individual is connecting various different parts of the network together.

Closeness centrality is a measure that indicates how close a node is to all the other nodes in a network, whether or not the node lays on a shortest path between other nodes. A high closeness centrality means that there is a large average distance to other nodes in the network. (So a small closeness centrality means there is a short average distance to all other nodes in the network. Geddit? (I think sometimes the reciprocal of this measure is given as closeness centrality:-).

The eccentricity measure captures the distance between a node and the node that is furthest from it; so a high eccentricity means that the furthest away node in the network is a long way away, and a low eccentricity means that the furthest away node is actually quite close.

So let’s have a look at the structure of my Facebook network, as filtered according to George’s ego filter, depth 2:

Plotting size proportional to betweenness centrality, we see Martin Weller, Grainne and Stephen Downes are influential in keeping different parts of my network connected:

Betweenness centrality

As far as outliers go, we can look at the closeness centrality and eccentricity (to protect the innocent, I shall suppress the names!)

Eccentricity (size) and closeness centrality (colour) in gephi

Here, the colour field defines the closeness centrality and the size of the node the eccentricity. It’s quite easy to identify the people in this network who are not well connected and who are unlikely to be able to reach each other easily through those of my friends they know.

From nods with similar sizes and different colours, we also see how it’s quite possible for two nodes to have a similar eccentricity (similar distances to the furthest away nodes) and very different closeness centrality (that is, the node may have a small or large average distance to every other node in the graph). For example, if a node is connected to a very well connected node, it will lower the closeness centrality.

So for example, if we look at the ego network with the above netwrok based around the very well connected Martin Weller, what do we see?

Further filter

The colder, blue shaded circles (high closeness centrality) have disappeared. Being a Martin Weller friend (in my Facebook network at least) has the effect of lowering your closeness centrality, i.e. bringing you closer to all the other people in the network.

Okay, that’s definitely more than enough for now. Why not have a play looking at your Facebook network, and seeing if you can identify who the best connected folk are?

PS when plotting charts, I think Gephi uses data from the last statistics run it did, even if that was in another workspace, so it’s always worth running the statistics over the current graph if you intend to chart something based on those stats…

Written by Tony Hirst

May 10, 2010 at 1:02 pm

Posted in Tutorial, Uncourse

Tagged with ,

Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part IV

In the first two posts in this series, I described how to use Gephi to visualise various different views over a personal social network in Facebook using data pulled from a Facebook account using the Netvizz application. This was followed by a post describing how to and run some simple social network analysis statistics over the network. In this post, we’ll look at another powerful analytic tool provided by Gephi: clustering. [Note that after publishing this post, a far better way of visualising the clustered groups was suggested to me - find out more in the next post in this series.]

Clustering is a mathematical process in which different elements in a particular set are grouped together based on certain similarities between the different elements. That is, like is grouped with like. In a social network, clustering algorithms typically group together individuals who form “subnetworks” – for example, subsets of the the whole population who all know one another.

Being able to identify clusters within a social network allows you to identify “subnetworks” within that larger network. In Gephi, a graph can be clustered by running the Modularity in the Statistics panel – so here’s what I get when I run this measure over my Facebook network:

Modularity clustering in Gephi

The Modularity clustering tool identifies several different clusters, (that is, groups or classes) within the network as a whole, associating each node with one of the groups.

One you have run the tool, you can view the cluster each node has been associated with by using the Modularity Class Ranking parameter; I find that the colour mapping is the most effective:

Gephi modularity applied

You can inspect the size of the various classes in a crude fashion via the Filter panel: select the Modularity Class option from the Partition folder in the filter Library and drag the filter to the query window. If you click on the Partition column element, you will be presented with a window showing each pf the partition classes:

Modularity class partitions, gephi

If you now filter on one of the partition classes, and switch on the node names, you can see which nodes have been clustered together. Looking separately at two of the largest clusters in my Facebook network, I can see two OU clusters:

OU cluster in my Facebook network via Gephi

and:

My Facebook community - OU cluster via gephi

and a North America/Canada dominated ed-tech cluster (which also includes some BBC folk via Bill Thompson…):

Clustering my Facebook network in gephi

(Note that it is possible to highlight/filter on more than one cluster within a single filter (on a Mac, fn-click allows you to select multiple individual clusters).

The Modularity statistic thus provides us with a powerful tool for identifying subgroupings within a social network. So why not try it using your own data – are the clusters that are identified meaningful to you?

PS a far better way of visualising the clustered groups was suggested to me – find out more in the next post in this series.

Written by Tony Hirst

May 12, 2010 at 10:19 pm

Posted in Tutorial, Uncourse

Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part V

A comment from one of the Gephi developers to Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part IV, in which I described how to use the Modularity statistic to partition a network in terms of several different similar subnetwork groupings, suggested that a far better way of visualising the groups was to use the Partion parameter… and how right they were…

Running the Modularity statistic over my Facebook netwrok, as captured using Netvizz, and then refreshing the view in the Partition panel allows us to colour the netwrok using different partitions – such as the Modularity classes that the Modularity statistic generates and assigns nodes to:

Partition functions in gephi

Here’s what happens when we applying the colouring:

Partition colouring by modularity class

Selecting the Group view collects all the nodes in a partition together as a group:

Partition groups in gephi

These grouped nodes can be individually ungrouped by right-clicking on a group node and ungrouping it, or they can be expanded which maintains the group identity whilst still letting us look at the local structure:

Group node management in gephi

Here’s what the expanded view of one of the classes looks like, with text labels turned on:

Expanded group node in gephi

We see that the members of the group are visible, allowing us to explore the make-up of the subnetwork. As you might expect, we can then colour or resize nodes within the expanded group in the normal way:

Node resizing within an expanded group in gephi

To create a workspace containing just the members of a particular partition, ungroup all the nodes via the Partition module and filter on the required partition using a Modularity Class filter:

Create a workspace with just members of a given partition in gephi

The Partition module is incredibly powerful, as you can hopefully see; but it isn’t limited to dealing with just partitions created using Gephi statistics – it can also deal with partitions defined over the graph as loaded into Gephi (see the GUESS format for more details on how to structure the input file).

So for example, the most recent version of Netvizz will return additional data alongside just the identities of your friends, such as their gender (if revealed to you by their profile privacy settings) and the number of their wall posts. Loading this richer network specification into Gephi, and refreshing the Partion module settings reveals the following:

Gephi partiion over a preloaded partition

Which in turn means we can colour the graph as follows:

Gephi - partition colouring based oon pre-specified partititons

The wall count parameter is made available through the Ranking panel:

User specified Ranking parameters in Gephi

So as we can see, if you have partition data available for network members, Gephi can provide a great way of visualising it :-)

Written by Tony Hirst

May 16, 2010 at 10:12 pm

Posted in Uncourse, Visualisation

Tagged with ,

Follow

Get every new post delivered to your Inbox.

Join 821 other followers