Posts Tagged ‘ESP’
With the possibility that my effectively unlimited Twitter API key will die at some point in the Spring with the Twitter API upgrade, I’m starting to look around for alternative sources of interest signal (aka getting ready to say “bye, bye, Twitter interest mapping”). And Facebook groups look like they may offer once possibility…
Some time ago, I did a demo of how to map the the common Facebook Likes of my Facebook friends (Social Interest Positioning – Visualising Facebook Friends’ Likes With Data Grabbed Using Google Refine). In part inspired by a conversation today about profiling the interests of members of particular Facebook groups, I thought I’d have a quick peek at the Facebook API to see if it’s possible to grab the membership list of arbitrary, open Facebook groups, and then pull down the list of Likes made by the members of the group.
As with my other social positioning/social interest mapping experiments, the idea behind this approach is broadly this: users express interest through some sort of public action, such as following a particular Twitter account that can be associated with a particular interest. In this case, the signal I’m associating with an expression of interest is a Facebook Like. To locate something in interest space, we need to be able to detect a set of users associated with that thing, identify each of their interests, and then find interests they have in common. These shared interests (ideally over and above a “background level of shared interest”, aka the Stephen Fry effect (from Twitter, where a large number of people in any set of people appear to follow Stephen Fry oblivious of other more pertinent shared interests that are peculiar to that set of people) are then assumed to be representative of the interests associated with the thing. In this case, the thing is a Facebook group, the users associated with the thing are the group members, and the interests associated with the thing are the things commonly liked by members of the group.
So for example, here is the social interest positioning of the Red Bull Racing group on Facebook, based on a sample of 3000 members of the group. Note that a significant number of these members returned no likes, either because they haven’t liked anything, or because their personal privacy settings are such that they do not publicly share their likes.
As we might expect, the members of this group also appear to have an interest in other Formula One related topics, from F1 in general, to various F1 teams and drivers, and to motorsport and motoring in general (top half of the map). We also find music preferences (the cluster to the left of the map) and TV programmes (centre bottom of the map) that are of common interest, though I have no idea yet whether these are background radiation interests (that is, the Facebook equivalent of the Stephen Fry effect on Twitter) or are peculiar to this group. I’m not sure whether the cluster of beverage related preferences at the bottom right corner of the map is notable either?
This information is visualised using Gephi, using data grabbed via the following Python script (revised version of this code as a gist):
#This is a really simple script: ##Grab the list of members of a Facebook group (no paging as yet...) ###For each member, try to grab their Likes import urllib,simplejson,csv,argparse #Grab a copy of a current token from an example Facebook API call, eg from clicking a keyed link on: #https://developers.facebook.com/docs/reference/api/examples/ #Something a bit like this: #AAAAAAITEghMBAOMYrWLBTYpf9ciZBLXaw56uOt2huS7C4cCiOiegEZBeiZB1N4ZCqHgQZDZD parser = argparse.ArgumentParser(description='Generate social positioning map around a Facebook group') parser.add_argument('-gid',default='2311573955',help='Facebook group ID') #gid='2311573955' parser.add_argument('-FBTOKEN',help='Facebook API token') args=parser.parse_args() if args.gid!=None: gid=args.gid if args.FBTOKEN!=None: FBTOKEN=args.FBTOKEN #Quick test - output file is simple 2 column CSV that we can render in Gephi fn='fbgroupliketest_'+str(gid)+'.csv' writer=csv.writer(open(fn,'wb+'),quoting=csv.QUOTE_ALL) uids= def getGroupMembers(gid): gurl='https://graph.facebook.com/'+str(gid)+'/members?limit=5000&access_token='+FBTOKEN data=simplejson.load(urllib.urlopen(gurl)) if "error" in data: print "Something seems to be going wrong - check OAUTH key?" print data['error']['message'],data['error']['code'],data['error']['type'] exit(-1) else: return data #Grab the likes for a particular Facebook user by Facebook User ID def getLikes(uid,gid): #Should probably implement at least a simple cache here lurl="https://graph.facebook.com/"+str(uid)+"/likes?access_token="+FBTOKEN ldata=simplejson.load(urllib.urlopen(lurl)) print ldata if len(ldata['data'])>0: for i in ldata['data']: if 'name' in i: writer.writerow([str(uid),i['name'].encode('ascii','ignore')]) #We could colour nodes based on category, etc, though would require richer output format. #In the past, I have used the networkx library to construct "native" graph based representations of interest networks. if 'category' in i: print str(uid),i['name'],i['category'] #For each user in the group membership list, get their likes def parseGroupMembers(groupData,gid): for user in groupData['data']: uid=user['id'] writer.writerow([str(uid),str(gid)]) #x is just a fudge used in progress reporting x=0 #Prevent duplicate fetches if uid not in uids: getLikes(user['id'],gid) uids.append(uid) #Really crude progress reporting print x x=x+1 #need to handle paging? #parse next page URL and recall this function groupdata=getGroupMembers(gid) parseGroupMembers(groupdata,gid)
Note that I have no idea whether or not this is in breach of Facebook API terms and conditions, nor have I reflected on the ethical implications of running this sort of analysis, over and the above remarking that it’s the same general approach I apply to mapping social interests on Twitter.
As to where next with this? It brings into focus again the question of identifying common interests pertinent to this particular group, compared to background popular interest that might be expressed by any random set of people. But having got a new set of data to play with, it will perhaps make it easier to test the generalisability of any model or technique I do come up with for filtering out, or normalising against, background interest.
Other directions this could go? Using a single group to bootstrap a walk around the interest space? For example, in the above case, trying to identify groups associated with Sebastian Vettel, or F1, and then repeating the process? It might also make sense to look at the categories of the notable shared interests; (from a quick browse, these include, for example, things like Movie, Product/service, Public figure, Games/toys, Sports Company, Athlete, Interest, Sport; is there a full vocabulary available, I wonder? How might we use this information?)
Earlier this year I doodled a recipe for comparing the folk commonly followed by users of a couple of BBC programme hashtags (Social Media Interest Maps of Newsnight and BBCQT Twitterers). Prompted in part by a tweet from Michael Smethurst/@fantasticlife about generating an ESP map for UK politicians (something I’ve also doodled before – Sketching the Structure of the UK Political Media Twittersphere) I drew on the @tweetminster Twitter lists of MPs by party to generate lists of folk commonly followed by the MPs of each party.
Using the R wordcloud library commonality and comparison clouds, we can get a visual impression of folk commonly followed in significant numbers by all the MPs of the three main parties, as well as the folk the MPs of each party follow significantly and differentially to the other parties:
There’s still a fair bit to do making the methodology robust (for example, being able to cope with comparing folk commonly followed by different sets of users where the size of the set differs to a significant extent (for example, there is a large difference between the number of tweeting Conservative and LibDem MPs). I’ve also noticed that repeatedly running the comparison.cloud code turns up different clouds, so there’s some element of randomness in there. I guess this just adds to the “sketchy” nature of the visualisation; or maybe hints at a technique akin to the way a photogrpaher will take multiple shots of a subject before picking one or two to illustrate something in particular. Which is to say: the “truthiness” of the image reflects the message that you are trying to communicate. The visualisation in this case exposes a partial truth (which is to say, no absolute truth), or particular perspective about the way different groups differentially follow folk on Twitter. A couple of other quirks I’ve noticed about the comparison.cloud as currently defined: firstly, very highly represented friends are sized too large to appear in the cloud (which is why very commonly followed folk across all sets – the people that appear in the commonality cloud – tend not to appear) – there must be a better way of handling this? Secondly, if one person is represented so highly in one group that they don’t appear in the cloud for that group, they may appear elsewhere in the cloud. (So for example, I tried plotting clouds for folk commonly followed by a sample of the followers of @davegorman, as well as the people commonly followed by the friends of @davegorman – and @davegorman appeared as a small label in the friends part of the comparison.cloud (notwithstanding the fact that all the followers of @davegorman follow @davegorman, but not all his friends do… What might make more sense would be to suppress the display of a label in the colour of a particular group if that label has a higher representation in any of the other groups (and isn’t displayed because it would be too large)).
That said, as a quick sketch, I think there’s some information being revealed there (the coloured comparison.cloud seems to pull out some names that make sense as commonly followed folk peculiar to each party…). I guess way forward is to start picking apart the comparison.cloud code, another is to explore a few more comparison sets? Suggestions welcome as to what they might be…:-)
PS by the by, I notice via the Guardian datablog (Church vs beer: using Twitter to map regional differences in US culture) another Twitter based comparison project – Church or Beer? Americans on Twitter – which looked at geo-coded Tweets over a particular time period on a US state-wide basis and counted the relative occurrence of Tweets mentioning “church” or “beer”…
This is a placeholder as much as anything, something I want to try out but don’t have time to do right now… The context is the social media mapping approach I’ve been doodling with a few weeks for now, where I try to position social media users in terms of who their followers follow (for example, A Couple More Social Media Positioning Maps for UK HE Twitter Accounts).
One of the problems with the approach is that you often get some of the same-old, same-old accounts appearing again and again (@stephenfry for example). So I’ve been wondering whether it might be worth generating funnel plots that plot the rate at which followers of a target account follow the other accounts identified in the positioning maps generated around the target account? On the x we’d plot the total number of followers of each account, and on the y, the rate at which they are followed by the followers of the target account (i.e. their in-degree in the map divided by the target account follower sample size used to generate the map). We might then get useful signal from the presence of accounts that appear to be over-represented within the target account followers sample, signal that can be used to identify those accounts that are more highly associated with the target account than we might expect by chance?
Another factor that I maybe need to take into account is the total number of accounts followed by the target account followers?
PS by the by, I notice that my map of folk “in the vicinity of the #gdslaunch hashtag” appears to have been posterised…:-)
(If anyone wants SVG or graphml based representations of any of the Gephi generated images I post either here or on my flickr account, it can probably be arranged;-)