More Pivots Around Twitter Data (little-l, little-d, again;-)

I’ve been having a play with Twitter again, looking at how we can do the linked thing without RDF, both within a Twitter context and also (heuristically) outside it.

First up, hashtag discovery from Twitter lists. Twitter lists can be used to collect together folk who have a particular interest, or be generated from lists of people who have used a particular hashtag (as Martin H does with his recipe for Populating a Twitter List via Google Spreadsheet … Automatically! [Hashtag Communities]).

The thinking is simple: grab the most recent n tweets from the list, extract the hashtags, and count them, displaying them in descending order. This gives us a quick view of the most popular hashtags recently tweeted by folk on the list: Popular recent tags from folk a twitter list

This is only a start, of course: it might be that a single person has been heavily tweeting the same hashtag, so a sensible next step would be to also take into account the number of people using each hashtag in ranking the tags. It might also be useful to display the names of folk on the list who have used the hashtag?

I also updated a previous toy app that makes recommendations of who to follow on twitter based on a mismatch between the people you follow (and who follow you) and the people following and followed by another person – follower recommender (of a sort!):

The second doodle was inspired by discussions at Dev8D relating to a possible “UK HE Developers’ network”, and relies on an assumption – that the usernames people use Twitter might be used by the same person on Github. Again, the idea is simple: can we grab a list of Twitter usernames for people that have used the dev8d hashtag (that much is easy) and then lookup those names on Github, pulling down the followers and following lists from Github for any IDs that are recognised in order to identify a possible community of developers on Github from the seed list of dev8d hashtagging Twitter names. (It also occurs to me that we can pull down projects Git-folk are associated with (in order to identify projects Dev8D folk are committing to) and the developers who also commit or subscribe to those projects.)

follower/following connections on github using twitter usernames that tweeted dev8d hashtag

As the above network shows, it looks like we get some matches on usernames…:-)

Visualising Ad Hoc Tweeted Link Communities, via BackType

So you’ve tweeted a link as part of your social media/event amplification strategy, and it’s job done, right? Or is there maybe some way you can learn something about who else found that interesting?

Notwitshtanding the appearance of yet another patent of the bleedin’ obvious, here’s one way I’ve been experimenting with for tracking informal, ad hoc communities around a link. (In part this harkens back to some of my previous “social life of a URL” doodles such as delicious URL History – Hyperbolic Tree Visualisation, More Hyperbolic Tree Visualisations – delicious URL History: Users by Tag.)

In part inspired by a comment by Chris Jobling on one of my flickr Twitter network images, here’s a recipe for identifying a core community that may be interested in a retweeted link:

– given the URL, look up who’s tweeted it via the BackType API;
– for each tweeter of the link, grab the list of people they follow (i.e. their friends);
– plot the “inner” network showing which of the people who tweeted the link the follow each other.

To explore the possible reach of the tweeted link, grab the followers of each person who tweeted the link and plot that network. This is likely to be quite a large network, so you may want to prune it a little, for example by filtering out everyone with node degree less than two.

So for example, earlier today I spotted a tweet about an OU philosophy game (To Lie or Not to Lie?), which I also retweeted. Here’s what the “inner” retweet graph looks like at the moment:

The node size is related to degree, the colour to total follower count. The graph can be used to identify a core network of folk who may be willing to promote OU activities (maybe…?!;-)

The next image shows the retweeters and their followers, filtered to show followers with degree 2 or more (the “double hit” audience). [Actually, I should filter on ((out-degree > 0) and (degree > 1 and in-degree > 1)).]

The nodes are partitioned into clusters using the Gephi modularity statistic and coloured accordingly. Node size is related to total follower count. Layout is done using an expanded Yifan Hu layout.

In a follow on post, I’ll show how we can generate network maps for people on delicious who either bookmarked a particular URL, or follows someone who did…

Dominant Tags in My Delicious Network

Following on from Social Networks on Delicious, here’s a view over my delicious network (that is, the folk I “follow” on delicious) and the dominant tags they use:

The image is created from a source file generated by:

1) grabbing the list of folk in my delicious network;
2) grabbing the tags each of them uses;
3) generating a bipartite network specification graph containing user and edge nodes, with weighted links corresponding to the number of times a user has used a particular tag (i.e. the number of bookmarks they have bookmarked using that tag).

Because the original graph is a large, sparse one (many users define lots of tags but only use them rarely), I filtered the output view to show only those tags that have been used more than 150 times each by any particular user, based on the weight of each edge (remember, the edge weight describes the number of times a used has used a particular tag). (So if every user had used the same tag up to but not more 149 times each, it wouldn’t be displayed). The tag nodes are sized according to the number of users who have used the tag 150 or more times.

I also had a go at colouring the nodes to identify tags used heavily by a single user, compared to tags heavily used by several members of my network.

Here’s the Python code:

import urllib, simplejson

def getDeliciousUserNetwork(user,network):
  data = simplejson.load(urllib.urlopen(url))
  for u in data:
    #time also available: u['dt']
  #print network
  return network

def getDeliciousTagsByUser(user):
  data = simplejson.load(urllib.urlopen(url))
  for tag in data:
  return tags

def printDeliciousTagsByNetwork(user,minVal=2):
  f=openTimestampedFile('delicious-socialNetwork','network-tags-' + user+'.gdf')

  for user in network:
  f.write('edgedef> user1 VARCHAR,user2 VARCHAR,weight DOUBLE\n')
  for user in network:
    for tag in tags:
      if tags[tag]>=minVal:
         f.write(user+',"'+tag.encode('ascii','ignore') + '",'+str(tags[tag])+'\n')

Looking at the network, it’s possible to see which members of my network are heavy users of a particular tag, and furthermore, which tags are heavily used by more than one member of my network. The question now is: to what extent might this information help me identify whether or not I am following people who are likely to turn up resources that are in my interest area, by virtue of the tags used by the members of my network.

Picking up on the previous post on Social Networks on Delicious, might it be worth looking at the tags used heavily by my followers to see what subject areas they are interested in, and potentially the topic area(s) in which they see me as acting as a resource investigator?

Social Networks on Delicious

One of the many things that the delicious social networking site appears to have got wrong is how to gain traction from its social network. As well as the incidental social network that arises from two or more different users using the same tag or bookmarking the same resource (for example, Visualising Delicious Tag Communities Using Gephi), there is also an explicit social network constructed using an asymmetric model similar to that used by Twitter: specifically, you can follow me (become a “fan” of me) without my permission, and I can add you to my network (become a fan of you, again without your permission).

Realising that you are part of a social network on delicious is not really that obvious though, nor is the extent to which it is a network. So I thought I’d have a look at the structure of the social network that I can crystallise out around my delicious account, by:

1) grabbing the list of my “fans” on delicious;
2) grabbing the list of the fans of my fans on delicious and then plotting:
2a) connections between my fans and and their fans who are also my fans;
2b) all the fans of my fans.

(Writing “fans” feels a lot more ego-bollox than writing “followers”; is that maybe one of the nails in the delicious social SNAFU coffin?!)

Here’s the way my “fans” on delicious follow each other (maybe? I’m not sure if the fans call always grabs all the fans, or whether it pages the results?):

The network is plotted using Gephi, of course; nodes are coloured according to modularity clusters, the layout is derived from a Force Atlas layout).

Here’s the wider network – that is, showing fans of my fans:

In this case, nodes are sized according to betweenness centrality and coloured according to in-degree (that is, the number of my fans who have this people as fans). [This works in so far as we’re trying to identify reputation networks. If we’re looking for reach in terms of using folk as a resource discovery network, it would probably make more sense to look at the members of my network, and the networks of those folk…)

If you want to try to generate your own, here’s the code:

import simplejson

def getDeliciousUserFans(user,fans):
  #needs paging? or does this grab all the fans?
  data = simplejson.load(urllib.urlopen(url))
  for u in data:
    #time also available: u['dt']
  #print fans
  return fans

def getDeliciousFanNetwork(user):
  f.write("edgedef> user1 VARCHAR,user2 VARCHAR\n")
  f2.write("edgedef> user1 VARCHAR,user2 VARCHAR\n")
  for fan in fans:
    print "Fetching data for fan "+fan
    for fan2 in fans2:
      if fan2 in fans:

So what”s the next step…?!

Visualising Delicious Tag Communities Using Gephi

Years ago, I used the Javascript Infovis Toolkit to put together a handful of data visualisations around the idea of the “social life of a URL” by looking up bookmarked URLs on delicious and then seeing who had bookmarked them and using what tags (delicious URL History – Hyperbolic Tree Visualisation, More Hyperbolic Tree Visualisations – delicious URL History: Users by Tag). Whilst playing with some Twitter hashtag network visualisations today, I wondered whether I could do something similar based around delicious bookmark tags, so here’s a first pass attempt…

As a matter of course, delicious publishes RSS and JSON feeds from tag pages, optionally containing up to 100 bookmarked entries. Each item in the response is a bookmarked URL, along with details of the single individual person who saved that particular bookmark and the tags they used.

That is, for a particular tag on delicious we can trivially get hold of the 100 most recent bookmarks saved with that tag and data on:

– who bookmarked it;
– what tags they used.

Here’s a little script in Python to grab the user and tag data for each lak11 bookmark and generate a Gephi gdf file to represent the bipartite graph that associates users with the tags they have used:

import simplejson
import urllib

def getDeliciousTagURL(tag,typ='json', num=100):
  #need to add a pager to get data when more than 1 page
  return ""+tag+"?count=100"

def getDeliciousTaggedURLDetailsFull(tag):
  data = simplejson.load(urllib.urlopen(durl))
  for i in data:
    url= i['u']
    if user in userTags:
      for t in tags:
        if t not in uniqTags:
        if t not in userTags[user]:
      for t in tags:
        if t not in uniqTags:
  f.write('nodedef> name VARCHAR,label VARCHAR, type VARCHAR\n')
  for user in userTags:
  for t in uniqTags:

  f.write('edgedef> user VARCHAR,tag VARCHAR\n')
  for user in userTags:
    for t in userTags[user]:


[Note to self: this script needs updating to grab additional results pages?]

Here’s an example of the output, in this case using the tag for Jim Groom’s Digital Storytelling course: ds106. The nodes are coloured according to whether they represent a user or a tag, and sized according to degree, and the layout is based on a force atlas layout with a few tweaks to allow us to see labels clearly.

Note that the actual URLs that are bookmarked are not represented in any way in this visualisation. The netwroks shows the connections between users and the tags they have used irrespective of whether the tags were applies to the same or different URLs. Even if two users share common tags, they may not share any common bookmarks…

Here’s another example, this time using the lak11 tag:

Looking at these networks, a couple of things struck me:

– the commonly used tags might express a category or conceptual tag that describes the original tag used to source the data;

– folk sharing similar tags may share similar interests.

Here’s a view over part of the LAK11 network with the LAK11 tag filtered out, and the Gephi ego filter applied with depth two to a particular user, in this case delicious user rosemary20:

The filtered view shows us:

– the tags a particular user (in this case, rosemary20) has used;

– the people who have used the same tags as rosemary20; note that this does not necessarily mean that they bookmarked any of the same URLs, nor that they are using the tags to mean the same thing*…

(* delicious does allow users to provide a description of a tag, though I’m not sure if this information is generally available via a public API?)

By sizing the nodes according to degree in this subnetwork, we can readily identify the tags commonly used alongside the tag used to source the data, and also the users who have used the largest number of identical tags.

PS it struck me that a single page web app should be quite easy to put together to achieve something similar to the above visualisations. The JSON feed from delicious is easy enough to pull in to any page, and the Protovis code library has a force directed layout package that works on a simple graph representation not totally dissimilar to the Gephi/GDF format.

If I get an hour I’ll try to have a play to put a demo together. If you beat me to it, please post a link to your demo (or even fully blown app!) in the comments:-)

Digging Deeper into the Structure of My Twitter Friends Network: Librarian Spotting

A couple of days ago, I grabbed the Twitter friends lists of all my Twitter friends (that is, lists of all the people that the people I follow on Twitter follow…) and plotted the connections between them filtered through the people I follow (Small World? A Snapshot of How My Twitter “Friends” Follow Each Other…). That is, for all of the people I follow on Twitter, I plotted the extent to which they follow each other… got that?

Running the resulting network through Gephi’s modularity statistic (some sort of clustering algorithm; I really need to find out which), several distinct clusters of people turned up: OU folk, data journalism folk, ed techies, JISC/Museums/library folk, and open gov data folk.

(Gephi allows you to export the graph file for the current project, including annotations, if appropriate, (such as modularity class) that are added by running Gepi’s statistics. Extracting the list of nodes (i.e. Twitter users), and filtering them by modularity class means we can create separate lists of individuals based on which cluster they appear in; which in turn means that we could generate a Twitter list from those individuals.)

From my “curated” list of Twitter friends, we can identify a set of “OU twitterers” through a cluster analysis of the mass action of their own friending behaviour, and I could use this to automatically generate a Twitter list of (potential) OU Twitterers that other people can follow.

Here’s the total set of my followers, coloured by modularity class and sized by in-degree (that is, the number of my friend who follow that person).

My Twitter friends, coloured by modularity class

If we filter on modularity class, we can just look at the folk in what I have labelled “OU Twitterers”. There are one or two folk in there who donlt quite fit this label (e.g. University of Leicester folk, and a handful of otherwise “disconnected” folk…), but it’s not bad.

OU Twitterers

Note that if I grab the complete friends and followers lists of these individuals, and look for users who are commonly followed, who also tend to follow back, and who donlt have huge numbers of followers (ie they aren’t celebrities who automatically follow back…) I may discover other OU Twitterers that I don’t follow…

If we run the modularity stat over this group of people, the “OU Twitterers” (most easily done by generating a new workspace from the filtered group), we see three more partitions fall out. Broadly, this first one corresponds to OU Library folk (ish…):

OU LIbrary folk...

Twitterers from my faculty (several whom rarely, if ever, tweet):

Twitterers I follow in my faculty

And the rest (the vast majority, in fact):

OU folk

(Note that a coule of folk are completely disconnected, and have nothing to do with the OU…)

Running the modulraity class over this larger group turns up nothing of interest.

So… so what? So this. Firstly, I can mine the friends lists of the friends of arbitrary people on Twitter and pull out clusters from that may tell me something about the interests of those people. (For example, we might grab their twitter biography statements and run them through a word cloud as a first approximation; or grab their recent tweets and do some text mining on that to see if there is any common interest. Hashtag analysis might also be revealing…) Secondly, we could use the members of cluster to act as a first approximation for a list of connected members of a community interested in a particular topic area; for these community members we could then pull down lists of all their friends and followers and look to see if we can grow the list through other commonly connected to individuals.

PS after tweeting the original post, a couple of people asked if I could grab the data from their friends lists. For example, @neilkod’s turned up clusters relating to “Utah tweeps, my cycling ones, and of course data/#rstats.” So the approach appears to work in general…:-)