One of the blogs on my “must read” list is Bill Slawski’s SEO by the Sea, which regularly comments on a wide variety of search related patents, both recent and in the past, obtained by Google and what they might mean…
The US patent system is completely dysfunctional, of course, acting as way of preventing innovative competition in a way that I think probably wasn’t intended by its framers, but it does provide an insight into some of the crazy bar talk ideas that Silicon Valley types thought they might just go and try out on millions of people, or perhaps already are trying out.
As an example, here are a couple of recent patents from Facebook that recently crossed my radar.
Images uploaded by users of a social networking system are analyzed to determine signatures of cameras used to capture the images. A camera signature comprises features extracted from images that characterize the camera used for capturing the image, for example, faulty pixel positions in the camera and metadata available in files storing the images. Associations between users and cameras are inferred based on actions relating users with the cameras, for example, users uploading images, users being tagged in images captured with a camera, and the like. Associations between users of the social networking system related via cameras are inferred. These associations are used beneficially for the social networking system, for example, for recommending potential connections to a user, recommending events and groups to users, identifying multiple user accounts created by the same user, detecting fraudulent accounts, and determining affinity between users.
Which is to say: traces of the flaws in a particular camera that are passed through to each photograph are unique enough to uniquely identify that camera. (I note that academic research picked up on by Bruce Schneier demonstrated this getting on for a decade ago: Digital Cameras Have Unique Fingerprints.) So when a photo is uploaded to Facebook, Facebook can associate it with a particular camera. And by association with who’s uploading the photos, a particular camera, as identified by the camera signature baked into a photograph, can be associated with a particular person. Another form of participatory surveillance, methinks.
Note that this is different to the various camera settings that get baked into photograph metadata (you know, that “administrative” data stuff that folk would have you believe doesn’t really reveal anything about the content of a communication…). I’m not sure to what extent that data helps narrow down the identity of a particular camera, particularly when associated with other bits of info in a data mosaic, but it doesn’t take that many bits of data to uniquely identify a device. Like your web-browser’s settings, for example, that are revealed to webservers of sites you visit through browser metadata, and uniquely identify your browser. (See eg this paper from the EFF – How Unique Is Your Web Browser? [PDF] – and the associated test site: test your browser’s uniqueness.) And if your camera’s also a phone, there’ll be a wealth of other bits of metadata that let you associate camera with phone, and so on.
Facebook’s face recognition algorithms can also work out who’s in an image, so more relationships and associations there. If kids aren’t being taught about graph theory in school from a very young age, they should be… (So for example, here’s a nice story about what you can do with edges: SELECTION AND RANKING OF COMMENTS FOR PRESENTATION TO SOCIAL NETWORKING SYSTEM USERS. Here’s a completely impenetrable one: SYSTEMS, METHODS, AND APPARATUSES FOR IMPLEMENTING AN INTERFACE TO VIEW AND EXPLORE SOCIALLY RELEVANT CONCEPTS OF AN ENTITY GRAPH.)
Here’s another one – hinting at Facebook’s role as a future publisher:
An online publisher provides content items such as advertisements to users. To enable publishers to provide content items to users who meet targeting criteria of the content items, an exchange server aggregates data about the users. The exchange server receives user data from two or more sources, including a social networking system and one or more other service providers. To protect the user’s privacy, the social networking system and the service providers may provide the user data to the exchange server without identifying the user. The exchange server tracks each unique user of the social networking system and the service providers using a common identifier, enabling the exchange server to aggregate the users’ data. The exchange server then applies the aggregated user data to select content items for the users, either directly or via a publisher.
I don’t really see what’s clever about this – using an ad serving engine to serve content – even though Business Insider try to talk it up (Facebook just filed a fascinating patent that could seriously hurt Google’s ad revenue). I pondered something related to this way back when, but never really followed it through: Contextual Content Server, Courtesy of Google? (2008), Contextual Content Delivery on Higher Ed Websites Using Ad Servers (2010), or Using AdServers Across Networked Organisations (2014). Note also this remark on the the University of Bedfordshire using Google Banner Ads as On-Campus Signage (2011).
(By the by, I also note that Google has a complementary service where it makes content recommendations relating to content on your own site via AdSense widgets: Google Matched Content.)
PS not totally unrelated, perhaps, a recent essay by Bruce Schneier on the need to regulate the emerging automatic face recognition industry: Automatic Face Recognition and Surveillance.
Long time readers will know I was – am – a huge fan of RSS and Atom, simple feed based protocols for syndicating content and attachment links, even going so far as to write a manifesto of a sort at one point (We Ignore RSS at OUr Peril).
This blog, and the earlier archived version of it, are full of reports and recipes around various RSS experiments and doodles, although in more recent years I haven’t really been using RSS as a creative medium that much, if at all.
But today I noticed this on the official Facebook developer blog: Publishing Instant Articles Directly From Your Content Management System [Instant Article docs]. Or more specifically, this:
When publishers get started with Instant Articles, they provide an RSS feed of their articles to Facebook, a format that most Content Management Systems already support. Once this RSS feed is set up, Instant Articles automatically loads new stories as soon as they are published to the publisher’s website and apps. Updates and corrections are also automatically captured via the RSS feed so that breaking news remains up to date.
So… Facebook will use RSS to synch content into Facebook from publishers’ CMS’.
Depending on the agreement Facebook has with the publishers, it may require that those feeds are private, rather than public, feeds that sink the the content directly into Facebook.
But I wonder, will it also start sinking content from other independent publishers into the Facebook platform via those open feeds, providing even less reason for Facebook users to go elsewhere as it drops bits of content from the open web into closed, personal Facebook News Feeds? Hmmm…
There seems to be another sort of a grab for attention going on too:
Each Instant Article is associated with the URL where the web version is hosted on the publisher’s website. This means that Instant Articles are open and compatible with all of the ways that people share links around the web today:
- When a friend or page you follow shares a link in your News Feed, we check to see if there is an Instant Article associated with that URL. If so, you will see it as an Instant Article. If not, it will open on the web browser.
- When you share an Instant Article on Facebook or using email, SMS, or Twitter, you are sharing the link to the publisher website so anyone can open the article no matter what platform they use.
Associating each Instant Article with a URL makes it easy for publishers to adopt Instant Articles without changing their publishing workflows and means that people can read and share articles without thinking about the platform or technology behind the scenes.
Something like this maybe?
Which is to say, this?
Or maybe not. Maybe there is some enlightened self interest in this, and perhaps Facebook will see a reason to start letting its content out via open syndication formats, like RSS.
Or maybe RSS will end up sinking the Facebook platform, by allowing Facebook users to go off the platform but still accept content from it?
Whatever the case, as Facebook becomes a set of social platform companies rather than a single platform company, I wonder: will it have an open standard, feed based syndication bus to help content flow within and around those companies? Even if that content is locked inside the confines of a Facebook-parent-company-as-web attention wall?
PS So the ‘related content’ feature on my WordPress blog associates this post with an earlier one: Is Facebook Stifling the Free Flow of Information?, which it seems was lamenting an earlier decision by Facebook to disable the import of content into Facebook using RSS…?! What goes around, comes around, it seems?!
Reading a recent Economist article (The value of friendship) about the announcement last week that Facebook is to float as a public company, and being amazed as ever about how these valuations, err, work, I recalled a couple of observations from a @currybet post about the Guardian Facebook app (“The Guardian’s Facebook app” – Martin Belam at news:rewired). The first related to using Facebook apps to (only partially successfully) capture attention of folk on Facebook and get them to refocus it on the Guardian website:
We knew that 77% of visits to the Guardian from facebook.com only lasted for one page. A good hypothesis for this was that leaving the confines of Facebook to visit another site was an interruption to a Facebook session, rather than a decision to go off and browse another site. We began to wonder what it would be like if you could visit the Guardian whilst still within Facebook, signed in, chatting and sharing with your friends. Within that environment could we show users a selection of other content that would appeal to them, and tempt them to stay with our content a little bit longer, even if they weren’t on our domain.
The second thing that came to mind related to the economic/business models around the app Facebook app itself:
The Guardian Facebook app is a canvas app. That means the bulk of the page is served by us within an iFrame on the Facebook domain. All the revenue from advertising served in that area of the page is ours, and for launch we engaged a sponsor to take the full inventory across the app. Facebook earn the revenue from advertising placed around the edges of the page.
I’m not sure if Facebook runs CPM (cost per thousand) display based ads, where advertisers pay per impression, or follow the Google AdWords model, where advertisers pay per click (PPC), but it got me wondering… A large number of folk on Facebook (and Twitter) share links to third party websites external to Facebook. As Martin Belam points out, the user return rate back to Facebook for folk visiting third party sites from Facebook seems very high – folk seem to follow a link from Facebook, consume that item, return to Facebook. Facebook makes an increasing chunk of its revenue from ads it sells on Facebook.com (though with the amount of furniture and Facebook open graph code it’s getting folk to include on their own websites, it presumably wouldn’t be so hard for them to roll out their own ad network to place ads on third party sites?) so keeping eyeballs on Facebook is presumably in their commercial interest.
In Twitter land, where the VC folk are presumably starting to wonder when the money tap will start to flow, I notice “sponsored tweets” are starting to appear in search results:
Relevance still appears to be quite low, possibly because they haven’t yet got enough ads to cover a wide range of keywords or prompts:
(Personally, if the relevance score was low, I wouldn’t place the ad, or I’d serve an ad tuned to the user, rather than the content, per se…)
Again, with Twitter, a lot of sharing results in users being taken to external sites, from which they quickly return to the Twitter context. Keeping folk in the Twitter context for images and videos through pop-up viewers or embedded content in the client is also a strategy pursued in may Twitter clients.
So here’s the thought, though it’s probably a commercially suicidal one: at the moment, Facebook and Twitter and Google+ all automatically “linkify” URLs (though Google+ also takes the strategy of previewing the first few lines of a single linked to page within a Google+ post). That is, given a URL in a post, they turn it into a link. But what if they turned that linkifier off for a domain, unless a fee was paid to turn it back on. Or what if the linkifier was turned off if the number of clickthrus on links to a particular domain, or page within a domain, exceeded a particular threshold, and could only be turned on again at a metered, CPM rate. (Memories here of different models for getting folk to pay for bandwidth, because what we have here is access to bandwidth out of the immediate Facebook, Twitter or Google+ context).
As a revenue model, the losses associated with irritating users would probably outweigh any revenue benefits, but as a thought experiment, it maybe suggests that we need to start paying more attention to how these large attention-consuming services are increasingly trying to cocoon us in their context (anyone remember AOL, or to a lesser extent Yahoo, or Microsoft?), rather than playing nicely with the rest of the web.
PS Hmmm…”app”. One default interpretation of this is “app on phone”, but “Facebook app” means an app that runs on the Facebook platform… So for any give app, that it is an “app” implies that that particular variant means “software application that runs on a proprietary platform”, which might actually be a combination of hardware and software platforms (e.g. Facebook API and Android phone)???
In Is Facebook Stifling the Free Flow of Information? I noted how Facebook no longer allows you to use an RSS feed to automatically syndicate content via your Facebook Notes page, instead recommending that you post the content directly into Facebook, or specifically post an update that links to your content.
There are workarounds, of course. Here’s one I’ve just tried – If this, then that (IFTT):
In a license controlled piece (more about that in another post… -ed.) regarding “Frictionless sharing” – exploring the changes to Facebook, Martin Belam hints that the Facebook “Open Graph” API supports actions that allow website publishers to add an action to their pages that will automatically post an update to logged in Facebook user’s stream announcing that they have visited that page. (I’m trying to find a simple explanation of this, with code snippets, but can’t seem to track one down. If you know of one, please let me know… The closest I can find is a walkthrough about getting started with the Facebook Open Graph API. See also non-technical reviews such as PCWorld’s Facebook’s Frictionless Sharing: A Privacy Guide.)
This brought to mind a couple of things:
1) the notion of webhooks; it seems to me that the user’s Facebook identity essentially provides a webhook/callback URL that allows the publisher of a Facebook app/owner of a web page that embeds a Facebook app to use page events to automatically trigger Facebook actions on that user’s Facebook account.
2) We get a new model of syndication, whereby readers of a page actually announce the fact that they have visited a page, and with it syndicate a link to that page. At least, until the (Facebook) algorithm kicks in that determines which of particular Facebook user’s friends see which of their updates…
PS watching the Facebook Open Graph tutorial video, I wondered whether anyone in the HE sector has looked at defining “Open Graph” elements for use in an educational context, and maybe built proof of concept apps that build up personal timelines based on course/VLE related actions (“completed this exercise”, “found this resource useful”, etc)?
Or maybe someone involved with OERs that lets folk share information about OER sites/resources they’ve viewed, used, downloaded etc?
I’m not suggesting it’s a good (or bad) idea, just wondering…
What do my Facebook friends have in common in terms of the things they have Liked, or in terms of their music or movie preferences? (And does this say anything about me?!) Here’s a recipe for visualising that data…
After discovering via Martin Hawksey that the recent (December, 2011) 2.5 release of Google Refine allows you to import JSON and XML feeds to bootstrap a new project, I wondered whether it would be able to pull in data from the Facebook API if I was logged in to Facebook (Google Refine does run in the browser after all…)
Looking through the Facebook API documentation whilst logged in to Facebook, it’s easy enough to find exemplar links to things like your friends list (https://graph.facebook.com/me/friends?access_token=A_LONG_JUMBLE_OF_LETTERS) or the list of likes someone has made (https://graph.facebook.com/me/likes?access_token=A_LONG_JUMBLE_OF_LETTERS); replacing me with the Facebook ID of one of your friends should pull down a list of their friends, or likes, etc.
(Note that validity of the access token is time limited, so you can’t grab a copy of the access token and hope to use the same one day after day.)
Grabbing the link to your friends on Facebook is simply a case of opening a new project, choosing to get the data from a Web Address, and then pasting in the friends list URL:
Click on next, and Google Refine will download the data, which you can then parse as a JSON file, and from which you can identify individual record types:
If you click the highlighted selection, you should see the data that will be used to create your project:
You can now click on Create Project to start working on the data – the first thing I do is tidy up the column names:
We can now work some magic – such as pulling in the Likes our friends have made. To do this, we need to create the URL for each friend’s Likes using their Facebook ID, and then pull the data down. We can use Google Refine to harvest this data for us by creating a new column containing the data pulled in from a URL built around the value of each cell in another column:
The Likes URL has the form https://graph.facebook.com/me/likes?access_token=A_LONG_JUMBLE_OF_LETTERS which we’ll tinker with as follows:
The throttle control tells Refine how often to make each call. I set this to 500ms (that is, half a second), so it takes a few minutes to pull in my couple of hundred or so friends (I don’t use Facebook a lot;-). I’m not sure what limit the Facebook API is happy with (if you hit it too fast (i.e. set the throttle time too low), you may find the Facebook API stops returning data to you for a cooling down period…)?
Having imported the data, you should find a new column:
At this point, it is possible to generate a new column from each of the records/Likes in the imported data… in theory (or maybe not..). I found this caused Refine to hang though, so instead I exprted the data using the default Templating… export format, which produces some sort of JSON output…
I then used this Python script to generate a two column data file where each row contained a (new) unique identifier for each friend and the name of one of their likes:
import simplejson,csv writer=csv.writer(open('fbliketest.csv','wb+'),quoting=csv.QUOTE_ALL) fn='my-fb-friends-likes.txt' data = simplejson.load(open(fn,'r')) id=0 for d in data['rows']: id=id+1 #'interests' is the column name containing the Likes data interests=simplejson.loads(d['interests']) for i in interests['data']: print str(id),i['name'],i['category'] writer.writerow([str(id),i['name'].encode('ascii','ignore')])
[I think this R script, in answer to a related @mhawksey Stack Overflow question, also does the trick: R: Building a list from matching values in a data.frame]
I could then import this data into Gephi and use it to generate a network diagram of what they commonly liked:
Rather than returning Likes, I could equally have pulled back lists of the movies, music or books they like, their own friends lists (permissions settings allowing), etc etc, and then generated friends’ interest maps on that basis.
PS dropping out of Google Refine and into a Python script is a bit clunky, I have to admit. What would be nice would be to be able to do something like a “create new rows with new column from column” pattern that would let you set up an iterator through the contents of each of the cells in the column you want to generate the new column from, and for each pass of the iterator: 1) duplicate the original data row to create a new row; 2) add a new column; 3) populate the cell with the contents of the current iteration state. Or something like that…
PPS Related to the PS request, there is a sort of related feature in the 2.5 release of Google Refine that lets you merge data from across rows with a common key into a newly shaped data set: Key/value Columnize. Seeing this, it got me wondering what a fusion of Google Refine and RStudio might be like (or even just R support within Google Refine?)
PPPS this could be interesting – looks like you can test to see if a friendship exists given two Facebook user IDs.
PPPPS This paper in PNAS – Private traits and attributes are predictable from digital records of human behavior – by Kosinski et. al suggests it’s possible to profile people based on their Likes. It would be interesting to compare how robust that profiling is, compared to profiles based on the common Likes of a person’s followers, or the common likes of folk in the same Facebook groups as an individual?
Struggling to get to sleep last night, I caught this whilst listening to episode 124 of This Week in Google from a few weeks ago (45 mins or so in to the original; I’ve excerpted the relevant bit below):
The first thing that grabbed my attention was that Importing a blog or RSS feed to your personal Facebook account is no longer available. Facebook’s recommendation is to “Use Facebook Notes to customize your blog posts in a rich format that’s compatible for readers on Facebook, [or] [l]ink directly to your blog posts from your status”.
Pretty much the only interaction I have with Facebook is (or rather, was) to automatically syndicate my OUseful.info blog posts via an RSS through my Facebook Notes application. This didn’t generate many views, clickthrus or trackbacks, but it did generate some, and now, it seems, I’m no longer posting blog post links to my Facebook friends. So much for frictionless sharing, huh? I’ve been frictionless sharing content *I* wanted to share through Facebook in frictionless way for years, and now it seems I don’t. And more that, I can’t, easily (at least, not in the same way).
Long time readers will know I’ve been a fan of RSS for years (hands up who remembers the We Ignore RSS at OUr Peril rant?!;-) for a few very simple reasons: firstly, it generally works; secondly, it’s widely adopted; thirdly, it’s a type of wiring that no-one really controls, except through various standardisation processes. So it’s pernicious moves like this one from Facebook that make me think that Facebook may have made a strategic error here, because it represents a separating of the ways from those of us who were happy to use to Facebook as a terminal in our our personal publishing networks via things like RSS but aren’t willing to spend time “doing Facebook”.
Although I’m a fan of RSS/Atom feeds, I fully appreciate the at the orange radar signal icon is meaningless to most people, and that most people don’t know what to do with it. But I also know that folk are happily subscribing to all sorts of feed based streams in a painless way via services like Facebook and Twitter. Indeed, the TWIG piece above raised the issue of dropped support for RSS imports in the context of a new Facebook button for websites that allows folk visiting the site to one-click subscribe to that site’s Facebook page from the website (err, I think?!).
So what I’m pondering is this: why doesn’t Facebook set itself up as an RSS reader, offering a Feedburner like service to feed publishers and making it one click easy for folk to subscribe to those feed proxies in the Facebook context? Which is to say: I’d be reluctant to post a “Subscribe to my Facebook page” button on the blog (mainly because I don’t post any content to Facebook), but I might be willing to put a ‘subscribe to this site in Facebook’ site? (So how might that work? First, I guess I’d have to set up a page for this site in Facebook; then I’d feed it from this site’s feed; then I’d put the ‘subscribe to this site on Facebook’ link on this site. At which point, of course, I’d have lost control of the terminal subscription point for the feed to Facebook, at least for those subscribers. (This differs slightly from my current setup where the WordPress feed goes to through feedburner, then gets published via a URL I control. So the subscription point is under my control and I can control the wiring upstream of that.) Of course, Facebook may offer this route already, and I’m just not aware of it (not least because I don’t tend to keep up with Facebook’s machinations much at all…)
For a related take on other freedom eroding steps currently being taken by consumer tech companies towards their users, see Dave Winer’s The Un-Internet.
I rarely link social apps to other social apps, but sometimes I click through on the first through stages of the linking process to see what happens. Here’s an example I just tried using Klout, which wants me to link in to my account on Facebook. The screenshot is taken from Facebook… but what does it mean?
Does that horizontal arrow aligned with the first element mean permission is only being requested for my personal information? Or is that thin vertical line an “AND” that says persmission is being requested to access my personal information AND post to my wall AND etc etc…
I have no idea….?