OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Visualising OpenURL Referrals Using Gource

with 4 comments

Picking up on the OpenURL referrer data that I played with here, here’s a demo of how to visualise it using Gource [video]:

If you haven’t come across it before, Gource is a repository visualiser (Code Swarm is another one) that lets you visualise who has been checking documents into and out of a code repository. As the documentation describes it, “software projects are displayed by Gource as an animated tree with the root directory of the project at its centre. Directories appear as branches with files as leaves. Developers can be seen working on the tree at the times they contributed to the project.”

One of the nice things about Gource is that it accepts a simple custom log format that can be used to visualise anything you can represent as a series of actors, doing things to something that lives down a path, over time… (So for example, PyEvolve which visualises Google Analytics data to show website usage.)

In the case of the Edina OpenURL resolver, I mapped referring services onto the “flower”/file nodes, and institutional IDs onto the people. (If someone could clarify what the institutional IDs – column 4 of the log – actually refer to, I’d be really grateful?)

To generate the Gource log file – which needs to look like this:

  • timestamp – A unix timestamp of when the update occured.
  • username – The name of the user who made the update.
  • type – initial for the update type – (A)dded, (M)odified or (D)eleted.
  • file – Path of the file updated.

That is: 1275543595|andrew|A|src/main.cpp

I used a command line trick and a Python trick:

cut -f 1,2,3,4,40 L2_2011-04.csv > openurlgource.csv
head -n 100 openurlgource.csv > openurlgource100.csv

(Taking the head of the file containing just columns 1,2,3,4 and 40 of the log data meant I could try out my test script on a small file to start with…)

import csv
from time import *
f=open('openurlgource.csv', 'rb')

reader = csv.reader(f, delimiter='\t')
writer = csv.writer(open('openurlgource.txt','wb'),delimiter='|')
headerline = reader.next()
for row in reader:
	if row[4].strip() !='':
		t=int(mktime(strptime(row[0]+" "+row[1], "%Y-%m-%d %H:%M:%S")))
		writer.writerow([t,row[3],'A',row[4].rstrip(':').replace(':','/')])

(Thanks to @quentinsf for the Python time handling crib:-)

This gives me log data of the required form:
1301612404|687369|A|www.isinet.com/WoK/UA
1301612413|305037|A|www.isinet.com/WoK/WOS
1301612414|117143|A|OVID/Ovid MEDLINE(R)
1301612436|822542|A|mendeley.com/mendeley

Running Gource uses commands of the form:

gource -s 1 --hide usernames --start-position 0.5 --stop-position 0.51 openurlgource.txt

The video was generated using ffmpeg with a piped command of the form:

gource -s 1 --hide usernames --start-position 0.5 --stop-position 0.51 -o - openurlgource.txt | ffmpeg -y -b 3000K -r 60 -f image2pipe -vcodec ppm -i - -vcodec libx264 -vpre slow -threads 0 gource.mp4

Note that I had to compile ffmpeg myself, which required hunting down a variety of libraries (e.g. Lame, the WebM encoder, and the x264 encoder library), compiling them as shared resources (./configure --enable-shared) and then adding them into the build (in the end, on my Macbook Pro, I used ./configure –enable-libmp3lame –enable-shared –enable-libvpx –enable-libx264 –enable-gpl –disable-mmx –arch=x86_64 followed by the usual make and then sudo make install).

Getting ffmpeg and its dependencies configured and compiled was the main hurdle (I had an older version installed for transforming video between formats, as described in ffmpeg – Handy Hints, but needed the update), but now it’s in place, it’s yet another toy in the toybox that can do magical things when given data in the right format: gource:-)

Written by Tony Hirst

June 7, 2011 at 2:09 pm

Posted in Data, Visualisation

Tagged with , ,

4 Responses

Subscribe to comments with RSS.

  1. [...] done a first demo of how to use Gource to visualise activity around the EDINA OpenURL data (Visualising OpenURL Referrals Using Gource), I thought I’d trying something a little more artistic, and use the colour features to try [...]

  2. >>If someone could clarify what the institutional IDs – column 4 of the log – actually refer to

    Tony – the institutionResolverID is an anonymised institutional identifier, i.e. the institutional resolver to which the request was routed. http://openurl.ac.uk/doc/data/whatare.html has more info on the fields. Sheila

    Sheila Fraser

    June 8, 2011 at 9:04 am

  3. [...] Large (ish) CSV Files, and Using Them as a Database from the Command Line: EDINA OpenURL Logs and Visualising OpenURL Referrals Using Gource), but I seem to have left it a bit let considering other work I need to get done this week… [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 427 other followers

%d bloggers like this: