With the possibility that my effectively unlimited Twitter API key will die at some point in the Spring with the Twitter API upgrade, I’m starting to look around for alternative sources of interest signal (aka getting ready to say “bye, bye, Twitter interest mapping”). And Facebook groups look like they may offer once possibility…
Some time ago, I did a demo of how to map the the common Facebook Likes of my Facebook friends (Social Interest Positioning – Visualising Facebook Friends’ Likes With Data Grabbed Using Google Refine). In part inspired by a conversation today about profiling the interests of members of particular Facebook groups, I thought I’d have a quick peek at the Facebook API to see if it’s possible to grab the membership list of arbitrary, open Facebook groups, and then pull down the list of Likes made by the members of the group.
As with my other social positioning/social interest mapping experiments, the idea behind this approach is broadly this: users express interest through some sort of public action, such as following a particular Twitter account that can be associated with a particular interest. In this case, the signal I’m associating with an expression of interest is a Facebook Like. To locate something in interest space, we need to be able to detect a set of users associated with that thing, identify each of their interests, and then find interests they have in common. These shared interests (ideally over and above a “background level of shared interest”, aka the Stephen Fry effect (from Twitter, where a large number of people in any set of people appear to follow Stephen Fry oblivious of other more pertinent shared interests that are peculiar to that set of people) are then assumed to be representative of the interests associated with the thing. In this case, the thing is a Facebook group, the users associated with the thing are the group members, and the interests associated with the thing are the things commonly liked by members of the group.
So for example, here is the social interest positioning of the Red Bull Racing group on Facebook, based on a sample of 3000 members of the group. Note that a significant number of these members returned no likes, either because they haven’t liked anything, or because their personal privacy settings are such that they do not publicly share their likes.
As we might expect, the members of this group also appear to have an interest in other Formula One related topics, from F1 in general, to various F1 teams and drivers, and to motorsport and motoring in general (top half of the map). We also find music preferences (the cluster to the left of the map) and TV programmes (centre bottom of the map) that are of common interest, though I have no idea yet whether these are background radiation interests (that is, the Facebook equivalent of the Stephen Fry effect on Twitter) or are peculiar to this group. I’m not sure whether the cluster of beverage related preferences at the bottom right corner of the map is notable either?
This information is visualised using Gephi, using data grabbed via the following Python script (revised version of this code as a gist):
#This is a really simple script: ##Grab the list of members of a Facebook group (no paging as yet...) ###For each member, try to grab their Likes import urllib,simplejson,csv,argparse #Grab a copy of a current token from an example Facebook API call, eg from clicking a keyed link on: #https://developers.facebook.com/docs/reference/api/examples/ #Something a bit like this: #AAAAAAITEghMBAOMYrWLBTYpf9ciZBLXaw56uOt2huS7C4cCiOiegEZBeiZB1N4ZCqHgQZDZD parser = argparse.ArgumentParser(description='Generate social positioning map around a Facebook group') parser.add_argument('-gid',default='2311573955',help='Facebook group ID') #gid='2311573955' parser.add_argument('-FBTOKEN',help='Facebook API token') args=parser.parse_args() if args.gid!=None: gid=args.gid if args.FBTOKEN!=None: FBTOKEN=args.FBTOKEN #Quick test - output file is simple 2 column CSV that we can render in Gephi fn='fbgroupliketest_'+str(gid)+'.csv' writer=csv.writer(open(fn,'wb+'),quoting=csv.QUOTE_ALL) uids= def getGroupMembers(gid): gurl='https://graph.facebook.com/'+str(gid)+'/members?limit=5000&access_token='+FBTOKEN data=simplejson.load(urllib.urlopen(gurl)) if "error" in data: print "Something seems to be going wrong - check OAUTH key?" print data['error']['message'],data['error']['code'],data['error']['type'] exit(-1) else: return data #Grab the likes for a particular Facebook user by Facebook User ID def getLikes(uid,gid): #Should probably implement at least a simple cache here lurl="https://graph.facebook.com/"+str(uid)+"/likes?access_token="+FBTOKEN ldata=simplejson.load(urllib.urlopen(lurl)) print ldata if len(ldata['data'])>0: for i in ldata['data']: if 'name' in i: writer.writerow([str(uid),i['name'].encode('ascii','ignore')]) #We could colour nodes based on category, etc, though would require richer output format. #In the past, I have used the networkx library to construct "native" graph based representations of interest networks. if 'category' in i: print str(uid),i['name'],i['category'] #For each user in the group membership list, get their likes def parseGroupMembers(groupData,gid): for user in groupData['data']: uid=user['id'] writer.writerow([str(uid),str(gid)]) #x is just a fudge used in progress reporting x=0 #Prevent duplicate fetches if uid not in uids: getLikes(user['id'],gid) uids.append(uid) #Really crude progress reporting print x x=x+1 #need to handle paging? #parse next page URL and recall this function groupdata=getGroupMembers(gid) parseGroupMembers(groupdata,gid)
Note that I have no idea whether or not this is in breach of Facebook API terms and conditions, nor have I reflected on the ethical implications of running this sort of analysis, over and the above remarking that it’s the same general approach I apply to mapping social interests on Twitter.
As to where next with this? It brings into focus again the question of identifying common interests pertinent to this particular group, compared to background popular interest that might be expressed by any random set of people. But having got a new set of data to play with, it will perhaps make it easier to test the generalisability of any model or technique I do come up with for filtering out, or normalising against, background interest.
Other directions this could go? Using a single group to bootstrap a walk around the interest space? For example, in the above case, trying to identify groups associated with Sebastian Vettel, or F1, and then repeating the process? It might also make sense to look at the categories of the notable shared interests; (from a quick browse, these include, for example, things like Movie, Product/service, Public figure, Games/toys, Sports Company, Athlete, Interest, Sport; is there a full vocabulary available, I wonder? How might we use this information?)