Visualising Delicious Tag Communities Using Gephi

Years ago, I used the Javascript Infovis Toolkit to put together a handful of data visualisations around the idea of the “social life of a URL” by looking up bookmarked URLs on delicious and then seeing who had bookmarked them and using what tags (delicious URL History – Hyperbolic Tree Visualisation, More Hyperbolic Tree Visualisations – delicious URL History: Users by Tag). Whilst playing with some Twitter hashtag network visualisations today, I wondered whether I could do something similar based around delicious bookmark tags, so here’s a first pass attempt…

As a matter of course, delicious publishes RSS and JSON feeds from tag pages, optionally containing up to 100 bookmarked entries. Each item in the response is a bookmarked URL, along with details of the single individual person who saved that particular bookmark and the tags they used.

That is, for a particular tag on delicious we can trivially get hold of the 100 most recent bookmarks saved with that tag and data on:

– who bookmarked it;
– what tags they used.

Here’s a little script in Python to grab the user and tag data for each lak11 bookmark and generate a Gephi gdf file to represent the bipartite graph that associates users with the tags they have used:

import simplejson
import urllib

def getDeliciousTagURL(tag,typ='json', num=100):
  #need to add a pager to get data when more than 1 page
  return "http://feeds.delicious.com/v2/json/tag/"+tag+"?count=100"

def getDeliciousTaggedURLDetailsFull(tag):
  durl=getDeliciousTagURL(tag)
  data = simplejson.load(urllib.urlopen(durl))
  userTags={}
  uniqTags=[]
  for i in data:
    url= i['u']
    user=i['a']
    tags=i['t']
    title=i['d']
    if user in userTags:
      for t in tags:
        if t not in uniqTags:
          uniqTags.append(t)
        if t not in userTags[user]:
          userTags[user].append(t)
    else:
      userTags[user]=[]
      for t in tags:
        userTags[user].append(t)
        if t not in uniqTags:
          uniqTags.append(t)
  
  f=open('bookmarks-delicious_'+tag+'.gdf')
  f.write('nodedef> name VARCHAR,label VARCHAR, type VARCHAR\n')
  for user in userTags:
    f.write(user+','+user+',user\n')
  for t in uniqTags:
    f.write(t+','+t+',tag\n')

  f.write('edgedef> user VARCHAR,tag VARCHAR\n')
  for user in userTags:
    for t in userTags[user]:
      f.write(user+','+t+'\n')
  f.close()

tag='lak11'
getDeliciousTaggedURLDetailsFull(tag)

[Note to self: this script needs updating to grab additional results pages?]

Here’s an example of the output, in this case using the tag for Jim Groom’s Digital Storytelling course: ds106. The nodes are coloured according to whether they represent a user or a tag, and sized according to degree, and the layout is based on a force atlas layout with a few tweaks to allow us to see labels clearly.

Note that the actual URLs that are bookmarked are not represented in any way in this visualisation. The netwroks shows the connections between users and the tags they have used irrespective of whether the tags were applies to the same or different URLs. Even if two users share common tags, they may not share any common bookmarks…

Here’s another example, this time using the lak11 tag:

Looking at these networks, a couple of things struck me:

– the commonly used tags might express a category or conceptual tag that describes the original tag used to source the data;

– folk sharing similar tags may share similar interests.

Here’s a view over part of the LAK11 network with the LAK11 tag filtered out, and the Gephi ego filter applied with depth two to a particular user, in this case delicious user rosemary20:

The filtered view shows us:

– the tags a particular user (in this case, rosemary20) has used;

– the people who have used the same tags as rosemary20; note that this does not necessarily mean that they bookmarked any of the same URLs, nor that they are using the tags to mean the same thing*…

(* delicious does allow users to provide a description of a tag, though I’m not sure if this information is generally available via a public API?)

By sizing the nodes according to degree in this subnetwork, we can readily identify the tags commonly used alongside the tag used to source the data, and also the users who have used the largest number of identical tags.

PS it struck me that a single page web app should be quite easy to put together to achieve something similar to the above visualisations. The JSON feed from delicious is easy enough to pull in to any page, and the Protovis code library has a force directed layout package that works on a simple graph representation not totally dissimilar to the Gephi/GDF format.

If I get an hour I’ll try to have a play to put a demo together. If you beat me to it, please post a link to your demo (or even fully blown app!) in the comments:-)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

7 thoughts on “Visualising Delicious Tag Communities Using Gephi”

  1. I like the graphs you created – I imported the gdf into gephi, but didn’t get far. Could you hint to how to get from there to the graphs, please?

    1. @Tony Thanks for your answer. As I worte I get the data by manual work (I can explain in a closed conversation). This won´t work in bigger scale.

      I read your answer, but don´t get all. Do you have tips for some more basic literature or other sources that can be useful?

      As you see the next step would be to “plot thread connections/co-occurring users across threads”, but also here I know to little about the technology (yet).

      Yes there can be a problem that it´s not “large enough sample of users in a group”, but I´m actully not interested in the #Lak11 group rather in a group used in upper secondary biology class http://groups.diigo.com/group/biologia.
      I think it´s enough sample of users in this group – what do you think?

  2. Hi Tony!

    I like your graphs and would like to do the same but in diigo and especially to visualize the tag-community in groups as #lak11. Do you know if it is possible? I also wounder if it´s possible to get the name of the website and the names in the same visualization? I´ve have a done a first test (manually link below), I hope that it will get you to understand what I would like to do.

    I´m not sure that it is useful, but I think it is interesting start.

    (http://learning-research.blogspot.com/2011/01/looking-forward-to-learning-analytics.html)

    1. @niklas – I had a quick look at diigo: my first thought was – grab an RSS feed out of a group and use that to hook into the users and tags used in that group; the feed can also be used to plot tag co-occurrence edges. Next step would be to see what using names and tags in context of diigo API makes possible? I think you can pull down links by tag and/or user from the API? A big issue in doing something sensible in a group context is getting a large enough sample of users in a group (the default feed length seems quite limited in terms of the number of links it contains?).
      I notice groups alsohave comment threads associated with resources. this suggests you could start to plot thread connections/co-occurring users across threads?

Comments are closed.