Simple Map Making With Google Fusion Tables

A quicker than quick recipe to make a map from a list of addresses in a simple text file using Google Fusion tables…

Here’s some data (grabbed from The Gravesend Reporter via this recipe) in a simple two column CSV format; the first column contains address data. Here’s what it looks like when I import it into Google Fusion Tables:

data in a fusion table

Now let’s map it:-)

First of all we need to tell the application which column contains the data we want to geocode – that is, the addrerss we want Fusion Tables to find the latitude and longitude co-ordinates for…

tweak the column

Then we say we want the column to be recognised as a column type:

change name make location

Computer says yes, highlighting the location type cells with a yellow background:

fusion table.. yellow...

As if by magic a Map tab appears (though possibly not if you are using Google Fusion Tables as apart of a Google Apps account…) The geocoder also accepts hints, so we can make life easier for it by providing one;-)

map tab...

Once the points have been geocoded, they’re placed onto a map:

mapped

We can now publish the map in preparation for sharing it with the world…

publish map

We need to change the visibility of the map to something folk can see!

privacy and link

Public on the web, or just via a shared link – your choice:

make seeable

Here’s my map:-)

The data used to generate this map was originally grabbed from the Gravesend Reporter: Find your polling station ahead of the Kent County Council elections. A walkthrough of how the data was prepared can be found here: A Simple OpenRefine Example – Tidying Cut’n’Paste Data from a Web Page.

Plotting Comment Networks in Gephi, Part II – Merging Datasets Using Google Fusion Tables

Following on from the previous post on comment graphs, here’s something a little more interesting.

The data I’m working with is a CSV file that contains a list of data pairs; for each comment on a photo, it gives the user ID of the person making the comment and the PhotoID the comment was made on. In a second file, I have pairs of photo ID and the user ID of the person who took the photo. I’m thinking it would be interesting to look directly at the edges between user IDs, using a directed graph in which the edges go from the person who made a comment to the person who uploaded the photo being commented on.

So how to generate this merged data file? One non-programmatic way that occurred to me was to use a Google Fusion Table. Simply upload the two data files, and then create a new one that merges the data around the photo ID (that is, photo ID is the common term that joins together the ID of someone commenting, and the person who uploaded the photo being commented on):

Google Fusion Table

That gives a merged data table that looks something like this:

Merged data

This data can be exported, and the photo ID column removed to give a two column CSV file that contains the user ID of someone who took a photo, paired (optionally) with the ID of a person who commented on that photo.

Where multiple people commented on the same photo, multiple rows will result.

Loading the data into Gephi, colouring nodes by out-degree (from photographer ID to commenter ID) and sizing them by in-degree (number of incoming comments) gives something like this:

comment/phographer network

If we auto-select neighbours and colour the edges according to direction, we get:

Neighbours in a directed graph (gephi)

We can also see what happens if we try to cluster the network using the Modularity filter – several partitions are identified, and expanding one lets us see which users were grouped together, presumbaly becuase there was a high degree of commenting between them:

Gephi - clustering using modularity statistic

If we now run the Network Diameter statistic, we can look at the Betweeness Centrality across the commenting network, colouring for InDegree and sizing for Betweenness:

Gephi - betweenness centrality

The resulting chart shows us which individuals are most active in terms of commenting on, and being commented upon, by other members of the network.

We can generate a similar saw of representation based on favouriting behaviour too. In this case, I started off withteh Favouriter table, and then merged in the Photo user id table – which meant that every favourite had a user ID of the photographer associated with it compared to the above case where I started with the photo table and then merged in the commenter IDs – which meant there were some rows that only had photographer ID (ie some photos had no comments… which means, if we filter the nodes in the comment graph based on in-degree zero, we can view the individuals who have received no comments? (Which may be because they didn’t upload any photos? Eg they might be moderators?)

In the following image, we see individuals whose photos have been heavily favourited. In the top left we see an individual who has favourited lots of of photos (lots of links going out) but not been favourited in return:

Favourting behaviour in gephi

If we return the Network stats, and look at Betweenness, we can see which individuals are favouriting widely across the whole userbase:

Betweenness

So there we have it. Using Google Fusion tables, we can generate a user-to-user graph that relates the IDs of users commenting on (or favouriting) each others photos, based on two separate data sets: one that relates user ID of a commenter to a photo; and a second that relates a photo to the ID of the user who uploaded the photo. The resulting graph userID to userID data allows us to use Gephi to plot diagrams that use directed edges to show person A favourited person B’s photo, or person A had a photo commented on by person B.