Fishing for OU Twitter Folk…

Just a quick observation inspired by the online “focus group” on Twitter yesterday around the #twitterou hashtag (a discussion for OU folk about Twitter usage): a few minutes in to the discussion, I grabbed a list of the folk who had used the tag so far (about 10 or people at the time), pulled down a list of the people they followed to construct a graph of hashtaggers->friends, and then filtered the resulting graph to show folk with node degree of 5 or more.

twitterOU - folk followed by 5 or more folk using twitterou before 2.10 or so today

Because a large number of OU Twitter folk follow each other, the graph is quite dense, which means that if we take a sample of known OU users and look for people that a majority of that sample follow, we stand a reasonable chance of identifying other OU folk…

Doing a bit of List Intelligence (looking up the lists that a significant number of hashtag users were on, I identified several OU folk Twitter lists, most notably @liamgh/planetou and @guyweb/openuniversity.

Just for completeness, it’s also worth pointing out that simple community analysis of followers of a known OU person might also turn up OU clusters, e.g. as described in Digging Deeper into the Structure of My Twitter Friends Network: Librarian Spotting. I suspect if we did clique analysis on the followers, this might also identify ‘core’ members of organisational communities that could be used to seed a snowball discovery mechanism for more members of that organisation.

PS hmmm… maybe I need to do a post or two on how we might go about discovering enterprise/organisation networks/communities on Twitter…?

Charting the Social Landscape – Who’s Notable Amongst Followers of UK HE Twitter Accounts?

Over the last week or two, I’ve been playing around with a few ideas relating to where Twitter accounts are located in the social landscape. There are several components to this: who does a particular Twitter account follow, and who follows it; do the friends, or followers cluster in any ways that we can easily and automatically identify (for example, by term analysis applied to the biographies of folk in an individual cluster); who’s notable amongst the friends or followers of an individual that aren’t also a friend or follower of the individual, and so on…

Just to place a stepping stone in my thinking so far, here’s a handful of examples, showing who’s notable amongst the followers of a couple of official HE Twitter accounts but who doesn’t follow the corresponding followed_by account.

Firstly, here’s a snapshot of who followers of @OU_Community follow in significant numbers:

Positioing @ou_community

Hmmm – seems the audience are into their satire… Should the OU be making some humorous videos to tap into that interest?

Here’s how a random sample (I think!) of 250 of @UCLnews’ followers seem to follow at the 4% or more level (that is, at least 0.04 * 250 = 10 of @UCLnews followers follow them…)

positioning of @uclnews co-followed accounts

Seems to be quite a clustering of other university accounts being followed in there, but also “notable” figures and some evidence of a passing interest in serious affairs/commentators? That other UCL accounts are also being followed might suggest evidence that the @UCLnews account is being followed by current students?

How about the followers of @boltonuni? (Again, using a sample of 250 followers, though from a much smaller total follower population when compared to @UCLnews):

@boltonuni cofollowed

The dominance of other university accounts is noticeable here. A couple of possible reasons for this suggesting are that the sampled accounts skew towards other “professional” accounts from within the sector (or that otherwise follow it), or that the student and potential students have a less coherent (in the nicest possible sense of the word!) world view… Or that maybe there are lots of potential students out there following several university twitter accounts trying to get a feel for what the universities are offering.

If we actually look at friend connections between the @boltonuni 250 follower sample, 40% or so are not connected to other followers (either because they are private accounts or because they don’t follow any of the other followers – as we might expect from potential students, for example?)

The connected followers split into two camps:

Tunnelling in on boltonuni follower sample

A gut reaction reading of these communities that they represent sector and locale camps.

Finally, let’s take a look at 250 random followers of @buckssu (Buckinghamshire University student union); this time we get about 75% of followers in the giant connected component:

@buckssu follower sample

Again, we get a locale and ‘sector’ cluster. If we look at folk followed by 4% or more of the follower sample, we get this:

Flk followed by a sample of followers of buckssu

My reading of this is that the student union accounts are pretty tightly connected (I’m guessing we’d find some quite sizeable SU account cliques), there’s a cluster of “other student society” type accounts top left, and then a bunch of celebs…

So what does this tell us? Who knows…?! I’m working on that…;-)

Circles vs Community Detection

One take on the story so far:

– Facebook supports symmetrical follows and allows you to see connections between your Facebook friends;
– Twitter supports asymmetric follows and allows you to see pretty much everyones’ friend and follower connections;
– Google+ supports asymmetric follows

Facebook and Twitter both support lists but hardly anyone uses them. Google+ encourages you to put people into addressable circles (i.e. lists).

If you can grab a copy of connections between folk in your social network, you can run social network statistics that will partition out different social groupings:

My annotated twitter follower network

If you’re familiar with the interests of people in a particular cluster, you can label them (there are also ways you might try to do this automagically).

Now a Facebook app, Super Friends, will help you identify – and label – clusters in your Facebook network (via ReadWriteWeb):

Super friends facebook app

This is a great feature, and something I could imagine being supported to some extent in Gephi, for example, by allowing the user to create a node attribute where the values represent label mappings from different modularity clusters (or more simply by allowing a user to add a label to each modularity class?).

The SuperFriends app also stands in contrast to the Google+ approach. I’d class SuperFriends as gardening, whereas he Google+ approach is more one of planning. The Google+ approach encourages you to think you’re in control of different parts of your network and makes your life really complicated (which circle do I put this person in; do I need a new circle for this?); the SuperFriends approach helps you realise how complicated (or not) your social circle is. In terms of filters, the Google+ approach encourages you to add your own, whereas the SuperFriends approach helps you identify setting that emerges out of network properties.

Given that in many respects Google is an AI/machine learning company, it’s odd that they’re getting the user to define circle/set membership; maybe it’d be too creepy if they automatically suggested groups? Maybe there’s too much scope for error if you don’t deliberately place people into a group yourself (and instead trust an algorithm to do it?)

Superfriends helps uncover structure, Google+ forces you to make all sorts of choices and decisions every time you “follow” another person. Google+ makes you define tags and categories to label people up front; SuperFriends identifies clusters that might be covered by an obvious tag.

Looking at my delicous bookmarks, I have almost as many tags a bookmarks… But if I ran some sort of grouping analysis, (not sure what?!) maybe natural clusters – and natural tags – would emerge as a result?

Maybe I need to read Everything is Miscellaneous again…?

PS if you want to run a more hands on analysis of your Facebook network, try this: Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I

PPS here’s another Facebook app that identifies clusters: http://www.fellows-exp.com/ h/t @jacomyal

PPPS @danmcquillan also tweeted that LinkedIn InMaps do a similar clustering job on LinkedIn connections. They do indeed; and they use Gephi. I wonder if they’ve released the code that handles things from the point at which a social network graph data is prpvided to the rendering of the map?

Dominant Tags in My Delicious Network

Following on from Social Networks on Delicious, here’s a view over my delicious network (that is, the folk I “follow” on delicious) and the dominant tags they use:

The image is created from a source file generated by:

1) grabbing the list of folk in my delicious network;
2) grabbing the tags each of them uses;
3) generating a bipartite network specification graph containing user and edge nodes, with weighted links corresponding to the number of times a user has used a particular tag (i.e. the number of bookmarks they have bookmarked using that tag).

Because the original graph is a large, sparse one (many users define lots of tags but only use them rarely), I filtered the output view to show only those tags that have been used more than 150 times each by any particular user, based on the weight of each edge (remember, the edge weight describes the number of times a used has used a particular tag). (So if every user had used the same tag up to but not more 149 times each, it wouldn’t be displayed). The tag nodes are sized according to the number of users who have used the tag 150 or more times.

I also had a go at colouring the nodes to identify tags used heavily by a single user, compared to tags heavily used by several members of my network.

Here’s the Python code:

import urllib, simplejson

def getDeliciousUserNetwork(user,network):
  url='http://feeds.delicious.com/v2/json/networkmembers/'+user
  data = simplejson.load(urllib.urlopen(url))
  for u in data:
    network.append(u['user'])
    #time also available: u['dt']
  #print network
  return network

def getDeliciousTagsByUser(user):
  tags={}
  url='http://feeds.delicious.com/v2/json/tags/'+user
  data = simplejson.load(urllib.urlopen(url))
  for tag in data:
    tags[tag]=data[tag]
  return tags

def printDeliciousTagsByNetwork(user,minVal=2):
  f=openTimestampedFile('delicious-socialNetwork','network-tags-' + user+'.gdf')
  f.write(gephiCoreGDFNodeHeader(typ='delicious')+'\n')
 
  network=[]
  network=getDeliciousUserNetwork(user,network)

  for user in network:
    f.write(user+','+user+',user\n')
  f.write('edgedef> user1 VARCHAR,user2 VARCHAR,weight DOUBLE\n')
  for user in network:
    tags={}
    tags=getDeliciousTagsByUser(user)
    for tag in tags:
      if tags[tag]>=minVal:
         f.write(user+',"'+tag.encode('ascii','ignore') + '",'+str(tags[tag])+'\n')
  f.close()

Looking at the network, it’s possible to see which members of my network are heavy users of a particular tag, and furthermore, which tags are heavily used by more than one member of my network. The question now is: to what extent might this information help me identify whether or not I am following people who are likely to turn up resources that are in my interest area, by virtue of the tags used by the members of my network.

Picking up on the previous post on Social Networks on Delicious, might it be worth looking at the tags used heavily by my followers to see what subject areas they are interested in, and potentially the topic area(s) in which they see me as acting as a resource investigator?

Social Networks on Delicious

One of the many things that the delicious social networking site appears to have got wrong is how to gain traction from its social network. As well as the incidental social network that arises from two or more different users using the same tag or bookmarking the same resource (for example, Visualising Delicious Tag Communities Using Gephi), there is also an explicit social network constructed using an asymmetric model similar to that used by Twitter: specifically, you can follow me (become a “fan” of me) without my permission, and I can add you to my network (become a fan of you, again without your permission).

Realising that you are part of a social network on delicious is not really that obvious though, nor is the extent to which it is a network. So I thought I’d have a look at the structure of the social network that I can crystallise out around my delicious account, by:

1) grabbing the list of my “fans” on delicious;
2) grabbing the list of the fans of my fans on delicious and then plotting:
2a) connections between my fans and and their fans who are also my fans;
2b) all the fans of my fans.

(Writing “fans” feels a lot more ego-bollox than writing “followers”; is that maybe one of the nails in the delicious social SNAFU coffin?!)

Here’s the way my “fans” on delicious follow each other (maybe? I’m not sure if the fans call always grabs all the fans, or whether it pages the results?):

The network is plotted using Gephi, of course; nodes are coloured according to modularity clusters, the layout is derived from a Force Atlas layout).

Here’s the wider network – that is, showing fans of my fans:

In this case, nodes are sized according to betweenness centrality and coloured according to in-degree (that is, the number of my fans who have this people as fans). [This works in so far as we’re trying to identify reputation networks. If we’re looking for reach in terms of using folk as a resource discovery network, it would probably make more sense to look at the members of my network, and the networks of those folk…)

If you want to try to generate your own, here’s the code:

import simplejson

def getDeliciousUserFans(user,fans):
  url='http://feeds.delicious.com/v2/json/networkfans/'+user
  #needs paging? or does this grab all the fans?
  data = simplejson.load(urllib.urlopen(url))
  for u in data:
    fans.append(u['user'])
    #time also available: u['dt']
  #print fans
  return fans

def getDeliciousFanNetwork(user):
  f=openTimestampedFile("fans-delicious","all-"+user+".gdf")
  f2=openTimestampedFile("fans-delicious","inner-"+user+".gdf")
  f.write(gephiCoreGDFNodeHeader(typ="min")+"\n")
  f.write("edgedef> user1 VARCHAR,user2 VARCHAR\n")
  f2.write(gephiCoreGDFNodeHeader(typ="min")+"\n")
  f2.write("edgedef> user1 VARCHAR,user2 VARCHAR\n")
  fans=[]
  fans=getDeliciousUserFans(user,fans)
  for fan in fans:
    time.sleep(1)
    fans2=[]
    print "Fetching data for fan "+fan
    fans2=getDeliciousUserFans(fan,fans2)
    for fan2 in fans2:
      f.write(fan+","+fan2+"\n")
      if fan2 in fans:
        f2.write(fan+","+fan2+"\n")
  f.close()
  f2.close()

So what”s the next step…?!

Discovering Co-location Communities – Twitter Maps of Tweets Near Wherever…

As privacy erodes further and further, and more and more people start to reveal where they using location services, how easy is it to identify communities based on location, say, or postcode, rather than hashtag? That is, how easy is it to find people who are colocated in space, rather than topic, as in the hashtag communities? Very easy, it turns out…

One of the things I’ve been playing with lately is “community detection”, particularly in the context of people who are using a particular hashtag on Twitter. The recipe in that case runs something along the lines of: find a list of twitter user names for people using a particular hashtag, then grab their Twitter friends lists and look to see what community structures result (e.g. look for clusters within the different twitterers). The first part of that recipe is key, and generalisable: find a list of twitter user names

So, can we create a list of names based on co-location? Yep – easy: Twitter search offers a “near:” search limit that lets you search in the vicinity of a location.

Here’s a Yahoo Pipe to demonstrate the concept – Twitter hyperlocal search with map output:

Pipework for twitter hyperlocal search with map output

[UPDATE: since grabbing that screenshot, I’ve tweaked the pipe to make it a little more robust…]

And here’s the result:

Twitter local trend

It’s easy enough to generate a widget of the result – just click on the Get as Badge link to get the embeddable widget code, or add the widget direct to a dashboard such as iGoogle:

Yahoo pipes map badge

(Note that this pipe also sets the scene for a possible demo of a “live pipe”, e.g. one that subscribes to searches via pubsubhubbub, so that whenever a new tweet appears it’s pushed to the pipe, and that makes the output live, for example by using a webhook.)

You can also grab the KML output of the pipe using a URL of the form:
http://pipes.yahoo.com/pipes/pipe.run?_id=f21fb52dc7deb31f5fffc400c780c38d&_render=kml&distance=1&location=YOUR+LOCATION+STRING
and post it into a Google maps search box… like this:

Yahoo pipe in google map

(If you try to refresh the Google map, it may suffer from result cacheing.. in which case you have to cache bust, e.g. by changing the distance value in the pipe URL to 1.0, 1.00, etc…;-)

Something else that could be useful for community detection is to search through the localised/co-located tweets for popular hashtags. Whilst we could probably do this in a separate pipe (left as an exercise for the reader), maybe by using a regular expression to extract hashtags and then the unique block filtering on hashtags to count the reoccurrences, here’s a Python recipe:

import simplejson, urllib

def getYahooAppID():
  appid='YOUR_YAHOO_APP_ID_HERE'
  return appid

def placemakerGeocodeLatLon(address):
  encaddress=urllib.quote_plus(address)
  appid=getYahooAppID()
  url='http://where.yahooapis.com/geocode?location='+encaddress+'&flags=J&appid='+appid
  data = simplejson.load(urllib.urlopen(url))
  if data['ResultSet']['Found']>0:
    for details in data['ResultSet']['Results']:
      return details['latitude'],details['longitude']
  else:
    return False,False

def twSearchNear(tweeters,tags,num,place='mk7 6aa,uk',term='',dist=1):
  t=int(num/100)
  page=1
  lat,lon=placemakerGeocodeLatLon(place)
  while page<=t:
    url='http://search.twitter.com/search.json?geocode='+str(lat)+'%2C'+str(lon)+'%2C'+str(1.0*dist)+'km&rpp=100&page='+str(page)+'&q=+within%3A'+str(dist)+'km'
    if term!='':
      url+='+'+urllib.quote_plus(term)

    page+=1
    data = simplejson.load(urllib.urlopen(url))
    for i in data['results']:
     if not i['text'].startswith('RT @'):
      u=i['from_user'].strip()
      if u in tweeters:
        tweeters[u]['count']+=1
      else:
        tweeters[u]={}
        tweeters[u]['count']=1
      ttags=re.findall("#([a-z0-9]+)", i['text'], re.I)
      for tag in ttags:
        if tag not in tags:
    	  tags[tag]=1
    	else:
    	  tags[tag]+=1
    	    
  return tweeters,tags

''' Usage:
tweeters={}
tags={}
num=100 #number of search results, best as a multiple of 100 up to max 1500
location='PLACE YOU WANT TO SEARCH AROUND'
term='OPTIONAL SEARCH TERM TO NARROW DOWN SEARCH RESULTS'
tweeters,tags=twSearchNear(tweeters,tags,num,location,searchTerm)
'''

What this code does is:
– use Yahoo placemaker to geocode the address provided;
– search in the vicinity of that area (note to self: allow additional distance parameter to be set; currently 1.0 km)
– identify the unique twitterers, as well as counting the number of times they tweeted in the search results;
– identify the unique tags, as well as counting the number of times they appeared in the search results.

Here’s an example output for a search around “Bath University, UK”:

Having got the list of Twitterers (as discovered by a location based search), we can then look at their social connections as in the hashtag community visualisations:

Community detected around Bath U.. Hmm,,, people there who shouldnlt be?!

And wondering why the likes @pstainthorp and @martin_hamilton appear to be in Bath? Is the location search broken, picking up stale data, or some other error….? Or is there maybe a UKOLN event on today I wonder..?

PS Looking at a search near “University of Bath” in the web based Twitter search, it seems that: a) there arenlt many recent hits; b) the search results pull up tweets going back in time…

Which suggests to me:
1) the code really should have a time window to filter the tweets by time, e.g. excluding tweets that are more than a day or even an hour old; (it would be so nice if Twitter search API offered a since_time: limit, although I guess it does offer since_id, and the web search does offer since: and until: limits that work on date, and that could be included in the pipe…)
2) where there aren’t a lot of current tweets at a location, we can get a profile of that location based on people who passed through it over a period of time?

UPDATE: Problem solved…

The location search is picking up tweets like this:

Twitter locations...

but when you click on the actual tweet link, it’s something different – a retweet:

Twitter reweets pass through the original location

So “official” Twitter retweets appear to pass through the location data of the original tweet, rather than the person retweeting… so I guess my script needs to identify official twitter retweets and dump them…

PS if you want to see how folk tweeting around a location are socially connected (i.e. whether they follow each other), check out A Bit of NewsJam MoJo – SocialGeo Twitter Map).

Gephi Bits 2: A Further Look at Comments on Social Objects in a Closed Community

In the previous post in this set (Gephi Bits 1: Comments on Social Objects in a Closed Community), I started having a play with comment and favourites data from a series of peer review activities in the OU course Design thinking: creativity for the 21st century.

In particular, I loaded simple pairwise CSV data directly into Gephi, relating comment id and favourite ids to photo ids. The resulting images provided a view over the photos that showed which photos were heavily commented and/or favourited. Towards the end of the post, I suggested it might be interesting to be able to distinguish between the comment and favorite nodes by colouring them somehow. So let’s start by seeing how we might achieve that…

The easiest way I can think of is to preload Gephi with a definition of each node and the assignment of a type label to each node – photo, comment or favourite. We can then partition – and colour – each node based on the type label.

To define the nodes and type labels, we can use a file defined using the GUESS .gdf format. In particular, we define the nodes as follows:

nodedef> name VARCHAR, ltype VARCHAR
p189, photo
p191, photo

c1428, comment
c1429, comment

f1005, fave
f1006, fave

Load this file into Gephi, and then append the contents of the comment-photo and favourite-photo CSV files to the graph. We can then colour the nodes (sized according to, err, something!) according to partition:

Coloured partitions in Gephi

If we filter the network for a particular photo using an ego filter, we can get a colour coded view of the comment and favourite IDs associated with that image:

Coloured nodes and labels in Gephi

What we’ve achieved so far is a way of exploring how heavily commented or favourited a photo is, as well as picking up a tip or two about labeling and colouring nodes. But what about if we wanted a person level analysis, where we could visually identify the individuals who had posted the most images, or whose images were most heavily commented upon and favourited?

To start with, let’s capture some information about each of the nodes. In the following example, we have an identifer (for a photo, favourite or comment), followed by a user id (the person who made the comment or favourite, or who uploaded the photo), and a label (photo, comment or fave). (The ltype field also captures a sense of this.)

nodedef> name VARCHAR, username VARCHAR, ltype VARCHAR
p189,jd342,photo
p191,jd342,photo
p192,pn43,photo
..
c1189,pd73,comment
c1190,srs22,comment
..
f46,ww66,fave
f47,ee79,fave

Rather than describe edges based on connecting comment or favourite ID to photo ID, we can easily generate links of the form userID, photoID, where userID is the ID of the user making a comment or favouriting an image. However, it is possible to annotate the edges to describe whether or not the link relates to a comment or favouriting action. So for example:

edgedef> otherUser VARCHAR, photo VARCHAR, etype VARCHAR
pd73,p189,comment
srs22,p226,comment

ww66,p176,fave

Alternatively, we might just use the simpler format:
edgedef> otherUser VARCHAR, photo VARCHAR
pd73,p189
srs22,p226

ww66,p176

In this simpler case, we can just load in the node definition gdf file, and follow it by adding the actual graph edge data from CSV files, which is what I’ve done for what follows.

Firstly, here’s the partition colour palette:

Gephi - partition colours

The null entities relate to nodes that didn’t get an explicit node specification (i.e. the person nodes).

To provide a bit of flexibility over the graph, I loaded the the favourites and comment edges in as directed edges from “Other user” to photo ID, where “Other user” is the user ID of the person making the comment or favourite.

If we size the graph by out-degree, we can look at which users are actively engaged in commenting on photos:

Gephi - who's commenting/favouriting

The size of the arrow depicts whether or not they are multiple edges going from one person to a photo, so we can see, for example, where someone has made multiple comments on the same photo.

If we size by in-degree, we can see which photos are popular:

Gephi - what photos are popular

If we run an ego filter over over a photo id, we can see who commented on it.

However, what we would really like to be able to do is look at the connections between people via a photo (for example, to see who has favourited who’s photos). If we add in another edge data file that links from a photo ID to a person ID (the person who uploaded the photo), we can start to explore these relationships.

NB the colour palette changes in what follows…

Having captured user to photo relationships based on commenting, favouriting or uploading behaviour, we can now do things like the following. Here for example is a use of a simple filter to see which of a user’s photo’s are popular:

Gephi - simple filter

If we run a simple ego filter, we can see the photos that a user has uploaded or commented on/favourited:

Gephi - ego filter

If we increase the depth to 2, we can see who else a particular user is connected to by virtue of a shared interest in the same photographs (I’m not sure what edge size relates to here…?):

Ego depth 2 in gephi - who connects to whom

Here, user ba49 is outsize because they uploaded a lot of the images that are shown. (The above graph shows linkage between ba49 and other users who either commented on/favourited one of ba49’s images, or who commented/favourited photo that ba49 also commented on/favourited.)

Doh – it appears I’ve crashed Gephi, so as it’s late, I’m going to stop for now! In the next post, I’ll show how we can further elaborate the nodes using extended user identifiers that designate the role a person is acting in (eg as a commenter, favouriter or photo uploader) to see what sorts of view this lets us take over the network.