Reading a recent Economist article (The value of friendship) about the announcement last week that Facebook is to float as a public company, and being amazed as ever about how these valuations, err, work, I recalled a couple of observations from a @currybet post about the Guardian Facebook app (“The Guardian’s Facebook app” – Martin Belam at news:rewired). The first related to using Facebook apps to (only partially successfully) capture attention of folk on Facebook and get them to refocus it on the Guardian website:
We knew that 77% of visits to the Guardian from facebook.com only lasted for one page. A good hypothesis for this was that leaving the confines of Facebook to visit another site was an interruption to a Facebook session, rather than a decision to go off and browse another site. We began to wonder what it would be like if you could visit the Guardian whilst still within Facebook, signed in, chatting and sharing with your friends. Within that environment could we show users a selection of other content that would appeal to them, and tempt them to stay with our content a little bit longer, even if they weren’t on our domain.
The second thing that came to mind related to the economic/business models around the app Facebook app itself:
The Guardian Facebook app is a canvas app. That means the bulk of the page is served by us within an iFrame on the Facebook domain. All the revenue from advertising served in that area of the page is ours, and for launch we engaged a sponsor to take the full inventory across the app. Facebook earn the revenue from advertising placed around the edges of the page.
I’m not sure if Facebook runs CPM (cost per thousand) display based ads, where advertisers pay per impression, or follow the Google AdWords model, where advertisers pay per click (PPC), but it got me wondering… A large number of folk on Facebook (and Twitter) share links to third party websites external to Facebook. As Martin Belam points out, the user return rate back to Facebook for folk visiting third party sites from Facebook seems very high – folk seem to follow a link from Facebook, consume that item, return to Facebook. Facebook makes an increasing chunk of its revenue from ads it sells on Facebook.com (though with the amount of furniture and Facebook open graph code it’s getting folk to include on their own websites, it presumably wouldn’t be so hard for them to roll out their own ad network to place ads on third party sites?) so keeping eyeballs on Facebook is presumably in their commercial interest.
In Twitter land, where the VC folk are presumably starting to wonder when the money tap will start to flow, I notice “sponsored tweets” are starting to appear in search results:
Relevance still appears to be quite low, possibly because they haven’t yet got enough ads to cover a wide range of keywords or prompts:
(Personally, if the relevance score was low, I wouldn’t place the ad, or I’d serve an ad tuned to the user, rather than the content, per se…)
Again, with Twitter, a lot of sharing results in users being taken to external sites, from which they quickly return to the Twitter context. Keeping folk in the Twitter context for images and videos through pop-up viewers or embedded content in the client is also a strategy pursued in may Twitter clients.
So here’s the thought, though it’s probably a commercially suicidal one: at the moment, Facebook and Twitter and Google+ all automatically “linkify” URLs (though Google+ also takes the strategy of previewing the first few lines of a single linked to page within a Google+ post). That is, given a URL in a post, they turn it into a link. But what if they turned that linkifier off for a domain, unless a fee was paid to turn it back on. Or what if the linkifier was turned off if the number of clickthrus on links to a particular domain, or page within a domain, exceeded a particular threshold, and could only be turned on again at a metered, CPM rate. (Memories here of different models for getting folk to pay for bandwidth, because what we have here is access to bandwidth out of the immediate Facebook, Twitter or Google+ context).
As a revenue model, the losses associated with irritating users would probably outweigh any revenue benefits, but as a thought experiment, it maybe suggests that we need to start paying more attention to how these large attention-consuming services are increasingly trying to cocoon us in their context (anyone remember AOL, or to a lesser extent Yahoo, or Microsoft?), rather than playing nicely with the rest of the web.
PS Hmmm…”app”. One default interpretation of this is “app on phone”, but “Facebook app” means an app that runs on the Facebook platform… So for any give app, that it is an “app” implies that that particular variant means “software application that runs on a proprietary platform”, which might actually be a combination of hardware and software platforms (e.g. Facebook API and Android phone)???
In Is Facebook Stifling the Free Flow of Information? I noted how Facebook no longer allows you to use an RSS feed to automatically syndicate content via your Facebook Notes page, instead recommending that you post the content directly into Facebook, or specifically post an update that links to your content.
There are workarounds, of course. Here’s one I’ve just tried – If this, then that (IFTT):
In a license controlled piece (more about that in another post… -ed.) regarding “Frictionless sharing” – exploring the changes to Facebook, Martin Belam hints that the Facebook “Open Graph” API supports actions that allow website publishers to add an action to their pages that will automatically post an update to logged in Facebook user’s stream announcing that they have visited that page. (I’m trying to find a simple explanation of this, with code snippets, but can’t seem to track one down. If you know of one, please let me know… The closest I can find is a walkthrough about getting started with the Facebook Open Graph API. See also non-technical reviews such as PCWorld’s Facebook’s Frictionless Sharing: A Privacy Guide.)
This brought to mind a couple of things:
1) the notion of webhooks; it seems to me that the user’s Facebook identity essentially provides a webhook/callback URL that allows the publisher of a Facebook app/owner of a web page that embeds a Facebook app to use page events to automatically trigger Facebook actions on that user’s Facebook account.
2) We get a new model of syndication, whereby readers of a page actually announce the fact that they have visited a page, and with it syndicate a link to that page. At least, until the (Facebook) algorithm kicks in that determines which of particular Facebook user’s friends see which of their updates…
PS watching the Facebook Open Graph tutorial video, I wondered whether anyone in the HE sector has looked at defining “Open Graph” elements for use in an educational context, and maybe built proof of concept apps that build up personal timelines based on course/VLE related actions (“completed this exercise”, “found this resource useful”, etc)?
Or maybe someone involved with OERs that lets folk share information about OER sites/resources they’ve viewed, used, downloaded etc?
I’m not suggesting it’s a good (or bad) idea, just wondering…
What do my Facebook friends have in common in terms of the things they have Liked, or in terms of their music or movie preferences? (And does this say anything about me?!) Here’s a recipe for visualising that data…
After discovering via Martin Hawksey that the recent (December, 2011) 2.5 release of Google Refine allows you to import JSON and XML feeds to bootstrap a new project, I wondered whether it would be able to pull in data from the Facebook API if I was logged in to Facebook (Google Refine does run in the browser after all…)
Looking through the Facebook API documentation whilst logged in to Facebook, it’s easy enough to find exemplar links to things like your friends list (https://graph.facebook.com/me/friends?access_token=A_LONG_JUMBLE_OF_LETTERS) or the list of likes someone has made (https://graph.facebook.com/me/likes?access_token=A_LONG_JUMBLE_OF_LETTERS); replacing me with the Facebook ID of one of your friends should pull down a list of their friends, or likes, etc.
(Note that validity of the access token is time limited, so you can’t grab a copy of the access token and hope to use the same one day after day.)
Grabbing the link to your friends on Facebook is simply a case of opening a new project, choosing to get the data from a Web Address, and then pasting in the friends list URL:
Click on next, and Google Refine will download the data, which you can then parse as a JSON file, and from which you can identify individual record types:
If you click the highlighted selection, you should see the data that will be used to create your project:
You can now click on Create Project to start working on the data – the first thing I do is tidy up the column names:
We can now work some magic – such as pulling in the Likes our friends have made. To do this, we need to create the URL for each friend’s Likes using their Facebook ID, and then pull the data down. We can use Google Refine to harvest this data for us by creating a new column containing the data pulled in from a URL built around the value of each cell in another column:
The Likes URL has the form https://graph.facebook.com/me/likes?access_token=A_LONG_JUMBLE_OF_LETTERS which we’ll tinker with as follows:
The throttle control tells Refine how often to make each call. I set this to 500ms (that is, half a second), so it takes a few minutes to pull in my couple of hundred or so friends (I don’t use Facebook a lot;-). I’m not sure what limit the Facebook API is happy with (if you hit it too fast (i.e. set the throttle time too low), you may find the Facebook API stops returning data to you for a cooling down period…)?
Having imported the data, you should find a new column:
At this point, it is possible to generate a new column from each of the records/Likes in the imported data… in theory (or maybe not..). I found this caused Refine to hang though, so instead I exprted the data using the default Templating… export format, which produces some sort of JSON output…
I then used this Python script to generate a two column data file where each row contained a (new) unique identifier for each friend and the name of one of their likes:
import simplejson,csv writer=csv.writer(open('fbliketest.csv','wb+'),quoting=csv.QUOTE_ALL) fn='my-fb-friends-likes.txt' data = simplejson.load(open(fn,'r')) id=0 for d in data['rows']: id=id+1 #'interests' is the column name containing the Likes data interests=simplejson.loads(d['interests']) for i in interests['data']: print str(id),i['name'],i['category'] writer.writerow([str(id),i['name'].encode('ascii','ignore')])
[I think this R script, in answer to a related @mhawksey Stack Overflow question, also does the trick: R: Building a list from matching values in a data.frame]
I could then import this data into Gephi and use it to generate a network diagram of what they commonly liked:
Rather than returning Likes, I could equally have pulled back lists of the movies, music or books they like, their own friends lists (permissions settings allowing), etc etc, and then generated friends’ interest maps on that basis.
PS dropping out of Google Refine and into a Python script is a bit clunky, I have to admit. What would be nice would be to be able to do something like a “create new rows with new column from column” pattern that would let you set up an iterator through the contents of each of the cells in the column you want to generate the new column from, and for each pass of the iterator: 1) duplicate the original data row to create a new row; 2) add a new column; 3) populate the cell with the contents of the current iteration state. Or something like that…
PPS Related to the PS request, there is a sort of related feature in the 2.5 release of Google Refine that lets you merge data from across rows with a common key into a newly shaped data set: Key/value Columnize. Seeing this, it got me wondering what a fusion of Google Refine and RStudio might be like (or even just R support within Google Refine?)
PPPS this could be interesting – looks like you can test to see if a friendship exists given two Facebook user IDs.
PPPPS This paper in PNAS – Private traits and attributes are predictable from digital records of human behavior – by Kosinski et. al suggests it’s possible to profile people based on their Likes. It would be interesting to compare how robust that profiling is, compared to profiles based on the common Likes of a person’s followers, or the common likes of folk in the same Facebook groups as an individual?
Struggling to get to sleep last night, I caught this whilst listening to episode 124 of This Week in Google from a few weeks ago (45 mins or so in to the original; I’ve excerpted the relevant bit below):
The first thing that grabbed my attention was that Importing a blog or RSS feed to your personal Facebook account is no longer available. Facebook’s recommendation is to “Use Facebook Notes to customize your blog posts in a rich format that’s compatible for readers on Facebook, [or] [l]ink directly to your blog posts from your status”.
Pretty much the only interaction I have with Facebook is (or rather, was) to automatically syndicate my OUseful.info blog posts via an RSS through my Facebook Notes application. This didn’t generate many views, clickthrus or trackbacks, but it did generate some, and now, it seems, I’m no longer posting blog post links to my Facebook friends. So much for frictionless sharing, huh? I’ve been frictionless sharing content *I* wanted to share through Facebook in frictionless way for years, and now it seems I don’t. And more that, I can’t, easily (at least, not in the same way).
Long time readers will know I’ve been a fan of RSS for years (hands up who remembers the We Ignore RSS at OUr Peril rant?!;-) for a few very simple reasons: firstly, it generally works; secondly, it’s widely adopted; thirdly, it’s a type of wiring that no-one really controls, except through various standardisation processes. So it’s pernicious moves like this one from Facebook that make me think that Facebook may have made a strategic error here, because it represents a separating of the ways from those of us who were happy to use to Facebook as a terminal in our our personal publishing networks via things like RSS but aren’t willing to spend time “doing Facebook”.
Although I’m a fan of RSS/Atom feeds, I fully appreciate the at the orange radar signal icon is meaningless to most people, and that most people don’t know what to do with it. But I also know that folk are happily subscribing to all sorts of feed based streams in a painless way via services like Facebook and Twitter. Indeed, the TWIG piece above raised the issue of dropped support for RSS imports in the context of a new Facebook button for websites that allows folk visiting the site to one-click subscribe to that site’s Facebook page from the website (err, I think?!).
So what I’m pondering is this: why doesn’t Facebook set itself up as an RSS reader, offering a Feedburner like service to feed publishers and making it one click easy for folk to subscribe to those feed proxies in the Facebook context? Which is to say: I’d be reluctant to post a “Subscribe to my Facebook page” button on the blog (mainly because I don’t post any content to Facebook), but I might be willing to put a ‘subscribe to this site in Facebook’ site? (So how might that work? First, I guess I’d have to set up a page for this site in Facebook; then I’d feed it from this site’s feed; then I’d put the ‘subscribe to this site on Facebook’ link on this site. At which point, of course, I’d have lost control of the terminal subscription point for the feed to Facebook, at least for those subscribers. (This differs slightly from my current setup where the WordPress feed goes to through feedburner, then gets published via a URL I control. So the subscription point is under my control and I can control the wiring upstream of that.) Of course, Facebook may offer this route already, and I’m just not aware of it (not least because I don’t tend to keep up with Facebook’s machinations much at all…)
For a related take on other freedom eroding steps currently being taken by consumer tech companies towards their users, see Dave Winer’s The Un-Internet.
I rarely link social apps to other social apps, but sometimes I click through on the first through stages of the linking process to see what happens. Here’s an example I just tried using Klout, which wants me to link in to my account on Facebook. The screenshot is taken from Facebook… but what does it mean?
Does that horizontal arrow aligned with the first element mean permission is only being requested for my personal information? Or is that thin vertical line an “AND” that says persmission is being requested to access my personal information AND post to my wall AND etc etc…
I have no idea….?
A comment from one of the Gephi developers to Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part IV, in which I described how to use the Modularity statistic to partition a network in terms of several different similar subnetwork groupings, suggested that a far better way of visualising the groups was to use the Partion parameter… and how right they were…
Running the Modularity statistic over my Facebook netwrok, as captured using Netvizz, and then refreshing the view in the Partition panel allows us to colour the netwrok using different partitions – such as the Modularity classes that the Modularity statistic generates and assigns nodes to:
Here’s what happens when we applying the colouring:
Selecting the Group view collects all the nodes in a partition together as a group:
These grouped nodes can be individually ungrouped by right-clicking on a group node and ungrouping it, or they can be expanded which maintains the group identity whilst still letting us look at the local structure:
Here’s what the expanded view of one of the classes looks like, with text labels turned on:
We see that the members of the group are visible, allowing us to explore the make-up of the subnetwork. As you might expect, we can then colour or resize nodes within the expanded group in the normal way:
To create a workspace containing just the members of a particular partition, ungroup all the nodes via the Partition module and filter on the required partition using a Modularity Class filter:
The Partition module is incredibly powerful, as you can hopefully see; but it isn’t limited to dealing with just partitions created using Gephi statistics – it can also deal with partitions defined over the graph as loaded into Gephi (see the GUESS format for more details on how to structure the input file).
So for example, the most recent version of Netvizz will return additional data alongside just the identities of your friends, such as their gender (if revealed to you by their profile privacy settings) and the number of their wall posts. Loading this richer network specification into Gephi, and refreshing the Partion module settings reveals the following:
Which in turn means we can colour the graph as follows:
The wall count parameter is made available through the Ranking panel:
So as we can see, if you have partition data available for network members, Gephi can provide a great way of visualising it :-)
In a couple of previous posts on exploring my Facebook network with Gephi, I’ve shown how to plot visualise the network, and how to start constructing various filtered views over it (Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I and Getting Started With Gephi Network Visualisation App – My Facebook Network, Part II: Basic Filters). In this post, I’ll explore a new feature, ego filters, as well as looking at some simple social network analysis tools that can help us better understand the structure of a social network.
To start with, I’m going to load my Facebook network data (grabbed via the Netvizz app, as before) into Gephi as an undirected graph. As mentioned above, the ego network filter is a new addition to Gephi, which will show that part of a graph that is connected to a particular person. So for example, I can apply the ego filter (from the Topology folder in the list of filters) to “George Siemens” to see which of my Facebook friends George knows.
If I save this as a workspace, I can then tunnel into it a little more, for example by applying a new ego filter to the subgraph of my friends who George Siemens knows. In this case, lets add Grainne to the mix – and see who of my friends know both George Siemens and Grainne:
Note that I could have achieved a similar effect with the full graph by using the intersection filter (as introduced in the previous post in this series):
The depth of the ego filter also allows you to see who of of my friends the named individual knows either directly, or through one of my other friends. Using an ego filtered network to depth two (frined of a friend) around George Siemens, I can run some network statistics over just that group of people. So for example, if I run the Degree statistics over the network, and then set the node size according to node degree within that network this is what I get:
(I also turned node labels on and set their size proportional to node size.)
Running Network Diameter stats generates the following sorts of report:
– betweenness centrality;
– closeness centrality;
These all sound pretty technical, so what do they refer to?
Betweenness centrality is a measure based on the number of shortest paths between any two nodes that pass through a particular node. Nodes around the edge of the network would typically have a low betweenness centrality. A high betweenness centrality might suggest that the individual is connecting various different parts of the network together.
Closeness centrality is a measure that indicates how close a node is to all the other nodes in a network, whether or not the node lays on a shortest path between other nodes. A high closeness centrality means that there is a large average distance to other nodes in the network. (So a small closeness centrality means there is a short average distance to all other nodes in the network. Geddit? (I think sometimes the reciprocal of this measure is given as closeness centrality:-).
The eccentricity measure captures the distance between a node and the node that is furthest from it; so a high eccentricity means that the furthest away node in the network is a long way away, and a low eccentricity means that the furthest away node is actually quite close.
So let’s have a look at the structure of my Facebook network, as filtered according to George’s ego filter, depth 2:
Plotting size proportional to betweenness centrality, we see Martin Weller, Grainne and Stephen Downes are influential in keeping different parts of my network connected:
As far as outliers go, we can look at the closeness centrality and eccentricity (to protect the innocent, I shall suppress the names!)
Here, the colour field defines the closeness centrality and the size of the node the eccentricity. It’s quite easy to identify the people in this network who are not well connected and who are unlikely to be able to reach each other easily through those of my friends they know.
From nods with similar sizes and different colours, we also see how it’s quite possible for two nodes to have a similar eccentricity (similar distances to the furthest away nodes) and very different closeness centrality (that is, the node may have a small or large average distance to every other node in the graph). For example, if a node is connected to a very well connected node, it will lower the closeness centrality.
So for example, if we look at the ego network with the above netwrok based around the very well connected Martin Weller, what do we see?
The colder, blue shaded circles (high closeness centrality) have disappeared. Being a Martin Weller friend (in my Facebook network at least) has the effect of lowering your closeness centrality, i.e. bringing you closer to all the other people in the network.
Okay, that’s definitely more than enough for now. Why not have a play looking at your Facebook network, and seeing if you can identify who the best connected folk are?
PS when plotting charts, I think Gephi uses data from the last statistics run it did, even if that was in another workspace, so it’s always worth running the statistics over the current graph if you intend to chart something based on those stats…