Emergent Social Interest Mapping – Red Bull Racing Facebook Group

With the possibility that my effectively unlimited Twitter API key will die at some point in the Spring with the Twitter API upgrade, I’m starting to look around for alternative sources of interest signal (aka getting ready to say “bye, bye, Twitter interest mapping”). And Facebook groups look like they may offer once possibility…

Some time ago, I did a demo of how to map the the common Facebook Likes of my Facebook friends (Social Interest Positioning – Visualising Facebook Friends’ Likes With Data Grabbed Using Google Refine). In part inspired by a conversation today about profiling the interests of members of particular Facebook groups, I thought I’d have a quick peek at the Facebook API to see if it’s possible to grab the membership list of arbitrary, open Facebook groups, and then pull down the list of Likes made by the members of the group.

As with my other social positioning/social interest mapping experiments, the idea behind this approach is broadly this: users express interest through some sort of public action, such as following a particular Twitter account that can be associated with a particular interest. In this case, the signal I’m associating with an expression of interest is a Facebook Like. To locate something in interest space, we need to be able to detect a set of users associated with that thing, identify each of their interests, and then find interests they have in common. These shared interests (ideally over and above a “background level of shared interest”, aka the Stephen Fry effect (from Twitter, where a large number of people in any set of people appear to follow Stephen Fry oblivious of other more pertinent shared interests that are peculiar to that set of people) are then assumed to be representative of the interests associated with the thing. In this case, the thing is a Facebook group, the users associated with the thing are the group members, and the interests associated with the thing are the things commonly liked by members of the group.

Simples.

So for example, here is the social interest positioning of the Red Bull Racing group on Facebook, based on a sample of 3000 members of the group. Note that a significant number of these members returned no likes, either because they haven’t liked anything, or because their personal privacy settings are such that they do not publicly share their likes.

rbr_fbGroup_commonLikes

As we might expect, the members of this group also appear to have an interest in other Formula One related topics, from F1 in general, to various F1 teams and drivers, and to motorsport and motoring in general (top half of the map). We also find music preferences (the cluster to the left of the map) and TV programmes (centre bottom of the map) that are of common interest, though I have no idea yet whether these are background radiation interests (that is, the Facebook equivalent of the Stephen Fry effect on Twitter) or are peculiar to this group. I’m not sure whether the cluster of beverage related preferences at the bottom right corner of the map is notable either?

This information is visualised using Gephi, using data grabbed via the following Python script (revised version of this code as a gist):

#This is a really simple script:
##Grab the list of members of a Facebook group (no paging as yet...)
###For each member, try to grab their Likes

import urllib,simplejson,csv,argparse

#Grab a copy of a current token from an example Facebook API call, eg from clicking a keyed link on:
#https://developers.facebook.com/docs/reference/api/examples/
#Something a bit like this:
#AAAAAAITEghMBAOMYrWLBTYpf9ciZBLXaw56uOt2huS7C4cCiOiegEZBeiZB1N4ZCqHgQZDZD

parser = argparse.ArgumentParser(description='Generate social positioning map around a Facebook group')

parser.add_argument('-gid',default='2311573955',help='Facebook group ID')
#gid='2311573955'

parser.add_argument('-FBTOKEN',help='Facebook API token')

args=parser.parse_args()
if args.gid!=None: gid=args.gid
if args.FBTOKEN!=None: FBTOKEN=args.FBTOKEN

#Quick test - output file is simple 2 column CSV that we can render in Gephi
fn='fbgroupliketest_'+str(gid)+'.csv'
writer=csv.writer(open(fn,'wb+'),quoting=csv.QUOTE_ALL)

uids=[]

def getGroupMembers(gid):
	gurl='https://graph.facebook.com/'+str(gid)+'/members?limit=5000&access_token='+FBTOKEN
	data=simplejson.load(urllib.urlopen(gurl))
	if "error" in data:
		print "Something seems to be going wrong - check OAUTH key?"
		print data['error']['message'],data['error']['code'],data['error']['type']
		exit(-1)
	else:
		return data

#Grab the likes for a particular Facebook user by Facebook User ID
def getLikes(uid,gid):
	#Should probably implement at least a simple cache here
	lurl="https://graph.facebook.com/"+str(uid)+"/likes?access_token="+FBTOKEN
	ldata=simplejson.load(urllib.urlopen(lurl))
	print ldata
	
	if len(ldata['data'])>0:	
		for i in ldata['data']:
			if 'name' in i:
				writer.writerow([str(uid),i['name'].encode('ascii','ignore')])
				#We could colour nodes based on category, etc, though would require richer output format.
				#In the past, I have used the networkx library to construct "native" graph based representations of interest networks.
				if 'category' in i: 
					print str(uid),i['name'],i['category']

#For each user in the group membership list, get their likes				
def parseGroupMembers(groupData,gid):
	for user in groupData['data']:
		uid=user['id']
		writer.writerow([str(uid),str(gid)])
		#x is just a fudge used in progress reporting
		x=0
		#Prevent duplicate fetches
		if uid not in uids:
			getLikes(user['id'],gid)
			uids.append(uid)
			#Really crude progress reporting
			print x
			x=x+1
	#need to handle paging?
	#parse next page URL and recall this function


groupdata=getGroupMembers(gid)
parseGroupMembers(groupdata,gid)

Note that I have no idea whether or not this is in breach of Facebook API terms and conditions, nor have I reflected on the ethical implications of running this sort of analysis, over and the above remarking that it’s the same general approach I apply to mapping social interests on Twitter.

As to where next with this? It brings into focus again the question of identifying common interests pertinent to this particular group, compared to background popular interest that might be expressed by any random set of people. But having got a new set of data to play with, it will perhaps make it easier to test the generalisability of any model or technique I do come up with for filtering out, or normalising against, background interest.

Other directions this could go? Using a single group to bootstrap a walk around the interest space? For example, in the above case, trying to identify groups associated with Sebastian Vettel, or F1, and then repeating the process? It might also make sense to look at the categories of the notable shared interests; (from a quick browse, these include, for example, things like Movie, Product/service, Public figure, Games/toys, Sports Company, Athlete, Interest, Sport; is there a full vocabulary available, I wonder? How might we use this information?)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

3 thoughts on “Emergent Social Interest Mapping – Red Bull Racing Facebook Group”

  1. Hi, Tony.

    First of all, I become here to learn a bit more about Google Refine. Thank you for writing such things so others can learn more about this revolutionary software.
    I was introduced to it few weeks ago and since then I just can´t stop wondering how powerful it is, and I had success grabbing and extracting information about the presence of health units here in Brazil.
    What you write here in this post is exactly what I´ve been thinking for the last three weeks. As a statistician, I have a lot of interest in the profiling theme. Facebook ant Tweeter, as the main sites in entertainment industry, for example are a treasure mine for those who would like to study how human behavior in consumption variates.
    Your example, pointing the comparison between interests associated with Red Bull and those in general is great. How would you test that, statiscally speaking?
    And there´s another great app from google: google fusion. Till today, I didn´t knew Gephi, but started with the Network Graph in fusion. It can give clusters in data real quick and I think is good for keyword analysis too.
    I really appreciate if you could give some links to learn about the facebook API with practical examples in data grabbing the way you did.
    Keep walking, my friend. Congratulations!

    Leonardo.

Comments are closed.