Plotting Comment Networks in Gephi, Part II – Merging Datasets Using Google Fusion Tables

Following on from the previous post on comment graphs, here’s something a little more interesting.

The data I’m working with is a CSV file that contains a list of data pairs; for each comment on a photo, it gives the user ID of the person making the comment and the PhotoID the comment was made on. In a second file, I have pairs of photo ID and the user ID of the person who took the photo. I’m thinking it would be interesting to look directly at the edges between user IDs, using a directed graph in which the edges go from the person who made a comment to the person who uploaded the photo being commented on.

So how to generate this merged data file? One non-programmatic way that occurred to me was to use a Google Fusion Table. Simply upload the two data files, and then create a new one that merges the data around the photo ID (that is, photo ID is the common term that joins together the ID of someone commenting, and the person who uploaded the photo being commented on):

Google Fusion Table

That gives a merged data table that looks something like this:

Merged data

This data can be exported, and the photo ID column removed to give a two column CSV file that contains the user ID of someone who took a photo, paired (optionally) with the ID of a person who commented on that photo.

Where multiple people commented on the same photo, multiple rows will result.

Loading the data into Gephi, colouring nodes by out-degree (from photographer ID to commenter ID) and sizing them by in-degree (number of incoming comments) gives something like this:

comment/phographer network

If we auto-select neighbours and colour the edges according to direction, we get:

Neighbours in a directed graph (gephi)

We can also see what happens if we try to cluster the network using the Modularity filter – several partitions are identified, and expanding one lets us see which users were grouped together, presumbaly becuase there was a high degree of commenting between them:

Gephi - clustering using modularity statistic

If we now run the Network Diameter statistic, we can look at the Betweeness Centrality across the commenting network, colouring for InDegree and sizing for Betweenness:

Gephi - betweenness centrality

The resulting chart shows us which individuals are most active in terms of commenting on, and being commented upon, by other members of the network.

We can generate a similar saw of representation based on favouriting behaviour too. In this case, I started off withteh Favouriter table, and then merged in the Photo user id table – which meant that every favourite had a user ID of the photographer associated with it compared to the above case where I started with the photo table and then merged in the commenter IDs – which meant there were some rows that only had photographer ID (ie some photos had no comments… which means, if we filter the nodes in the comment graph based on in-degree zero, we can view the individuals who have received no comments? (Which may be because they didn’t upload any photos? Eg they might be moderators?)

In the following image, we see individuals whose photos have been heavily favourited. In the top left we see an individual who has favourited lots of of photos (lots of links going out) but not been favourited in return:

Favourting behaviour in gephi

If we return the Network stats, and look at Betweenness, we can see which individuals are favouriting widely across the whole userbase:

Betweenness

So there we have it. Using Google Fusion tables, we can generate a user-to-user graph that relates the IDs of users commenting on (or favouriting) each others photos, based on two separate data sets: one that relates user ID of a commenter to a photo; and a second that relates a photo to the ID of the user who uploaded the photo. The resulting graph userID to userID data allows us to use Gephi to plot diagrams that use directed edges to show person A favourited person B’s photo, or person A had a photo commented on by person B.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

One thought on “Plotting Comment Networks in Gephi, Part II – Merging Datasets Using Google Fusion Tables”

Comments are closed.