Another Blooming Look at Gource and the Edina OpenURL Data

Having done a first demo of how to use Gource to visualise activity around the EDINA OpenURL data (Visualising OpenURL Referrals Using Gource), I thought I’d trying something a little more artistic, and use the colour features to try to pull out a bit more detail from the data [video].

What this one shows is how the mendeley referrals glow brightly green, which – if I’ve got my code right – suggests a lot of e-issn lookups are going on (the red nodes correspond to an issn lookup, blue to an isbn lookup and yellow/orange to an unknown lookup). The regularity of activity around particular nodes also shows how a lot of the activity is actually driven by a few dominant services, at least during the time period I sampled to generate this video.

So how was this visualisation created?

Firstly, I pulled out a few more data columns, specifically the issn, eissn, isbn and genre data. I then opted to set node colour according to whether the issn (red), eissn (green) or isbn (blue) columns were populated using a default reasoning approach (if all three were blank, I coloured the node yellow). I then experimented with colouring the actors (I think?) according to whether the genre was article-like, book-like or unkown (mapping these on to add, modify or delete actions), before dropping the size of the actors altogether in favour of just highlighting referrers and asset type (i.e. issn, e-issn, book or unknown).

cut -f 1,2,3,4,27,28,29,32,40 L2_2011-04.csv > openurlgource.csv

When running the Pythin script, I got a “NULL Byte” error that stopped the script working (something obviously snuck in via one of the newly added columns), so I googled around and turned up a little command line cleanup routine for the cut data file:

tr < openurlgource.csv -d '\000' > openurlgourcenonulls.csv

Here’s the new Python script too that shows the handling of the colour fields:

import csv
from time import *

# Command line pre-processing step to handle NULL characters
#tr < openurlgource.csv -d '\000' > openurlgourcenonulls.csv
#alternatively?: sed 's/\x0/ /g' openurlgource.csv > openurlgourcenonulls.csv

f=open('openurlgourcenonulls.csv', 'rb')

reader = csv.reader(f, delimiter='\t')
writer = csv.writer(open('openurlgource.txt','wb'),delimiter='|')
headerline = reader.next()

for row in reader:
	if row[8].strip() !='':
		t=int(mktime(strptime(row[0]+" "+row[1], "%Y-%m-%d %H:%M:%S")))
		if row[4]!='':
			col='FF0000'
		elif row[5]!='':
			col='00FF00'
		elif row[6]!='':
			col='0000FF'
		else:
			col='666600'
		if row[7]=='article' or row[7]=='journal':
			typ='A'
		elif row[7]=='book' or row[7]=='bookitem':
			typ='M'
		else:
			typ='D'
		agent=row[8].rstrip(':').replace(':','/')
		writer.writerow([t,row[3],typ,agent,col])

The new gource command is:

gource -s 1 --hide usernames --start-position 0.8 --stop-position 0.82 --user-scale 0.1 openurlgource.txt

and the command to generate the video:

gource -s 1 --hide usernames --start-position 0.8 --stop-position 0.82 --user-scale 0.1 -o - openurlgource.txt | ffmpeg -y -b 3000K -r 60 -f image2pipe -vcodec ppm -i - -vcodec libx264 -vpre slow -threads 0 gource.mp4

If you’ve been tempted to try Gource out yourself on some of your own data, please post a link in the comments below:-) (AI wonder just how many different sorts of data we can force into the shape that Gource expects?!;-)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

3 thoughts on “Another Blooming Look at Gource and the Edina OpenURL Data”

    1. Gource (and Code Swarm) were both developed for visualising repo commits… I’ve recently started wondering what else we might be able to visualise using the same tools. If you have data that can be represented as a tree/hierarchical dataset that grows over time, it’s a good candidate….

Comments are closed.