Centralising User Tracking on the Web – Let Google Track Everyone For You

In The Curse of Our Time – Tracking, Tracking Everywhere, I noted how the likes of Google set up cookie matching services that allow advertisers to reconcile their cookies with Google’s:

The Cookie Matching Service enables a buyer to associate two kinds of data:

– the cookie that identifies a user within the buyer domain, and
– the doubleclick.net cookie that identifies the user for Google. (We share a buyer-specific encrypted Google User ID for buyers to match on.)

The data structure that the buyer uses to maintain these associations is called a Match Table. While the buyer is responsible for building and maintaining match tables, Google can host them.

With an RTB [real-time bidding] application, the buyer can bid on impressions where the user has a specific Google User ID, and can use information associated with the Google User ID as criteria in a bid for an ad impression.

But it seems that this isn’t enough for Google. It actually gets worse… A USA Today story suggests Google is exploring the idea of an AdID, an identifier that it would share with advertisers to uniquely identify eyeballs rather than them having to use a range of alternative third party user tracking services.

How AdID would actually work (if indeed it is ever comes to pass) is not explained, although a post on the AdExchanger blog – What The Google AdID Means For Ad Tech – comes up with possible mechanism: Google uniquely identifies users (presumably using cookies and authenticated user credentials (topped up with a little bit of browser fingerprinting, I wonder?)) then provides an advertiser with a hashed version of the ID. The hashed identifier means advertisers can’t share information with each other.

The Google AdID service seems like it would be offered as an alternative to tracking users using third party services that use their own third party cookies, with a user tracking system that offers more effective identity tracking techniques (such as a logged in Google user id). Which is to say, Google wants to replace third party cookie based tracking services with it’s own (logged in user + cookies + browser fingerprinting + etc etc) user tracking service? Or have I misinterpreted all this?

One AdID to rule them all, One AdID to find them,
One AdID to bring them all and in the darkness bind them
In the Land of Google where the Shadows lie.

PS by the by, I notice in a post on Author information in search results, that “If you want your authorship information to appear in search results for the content you create, you’ll need a Google+ Profile with a good, recognizable headshot as your profile photo. Then, verify authorship of your content by associating it with your profile using either of the methods below.” Ah ha… so you agree to give the Goog a good photo of yourself that it can use in it’s face matching algos, such as in the sort of thing that could be used to unlock your phone. Good. Not. Will faceID play a part of AdID, I wonder? With a gaze tracking feedback loop? Or maybe Google will be getting more into the tracked user, outside advertising market?

PPS This then also brings back to mind the face tracking approaches mentioned in The Curse of Our Time – Tracking, Tracking Everywhere

PPPS By the by, it seems the Culture, Media and Sport Committee have no problem with online targeted ads:

85. Inevitably public funding is under pressure, a point illustrated by cuts in the budget of Arts Council England.[161] Given the essential role of public funding in sustaining the wider creative economy, it is crucial that adequate resources are available. Of course, the private sector should be encouraged as much as possible to invest in the creative industries. One good example is provided by advertising, which not only provides a major source of funding but is a creative industry in its own right. Evidence from the Advertising Association points to advertising as “a major creative industry and a critical source of funding for other creative industries”. The Advertising Association’s evidence goes on to express deep concern about draft EU Data Protection Regulation “which could damage direct marketing, internet advertising, and the UK economy both off and online”.[162] Increasing use is being made of personal data to target online advertising better. While concerns around this have prompted reviews of data protection legislation, we do not think the targeting of appropriate advertising—essential to so many business models —represents the greatest threat to privacy. [original emphasis]

Inappropriate Linkification (aka redirection attacks?!) in Google Docs

Reading through another wonderful post on the FullFact blog last night (Full Fact sources index: where to find the information you need), I noticed that the linked to resources from that post were being redirected via Google URL:

drafting in Google docs

A tweet confirmed that this wasn’t intentional, so what had happened? I gather the workflow used to generate the post was to write it in Google docs, and then copy and paste the rich/HTML text into a rich text editor in Drupal, although I couldn’t recreate this effect (and nor could FullFact). However, suitably suspicious, I started having a play, writing a simple test document in Google docs:

gogle doc link tracking

The Google doc automatically links the test URL I added to the document. (This is often referred to as “linkification” – if a piece of text is recognised as something that looks like a URL or web link, it gets rewritten as a clickable link. Typically, you might assume that the link you’ll now be clicking on is the link that was recognised. This may be a bad assumption to make…) If you hover over the URL as written in the document, you get a tooltip that suggests the link is to the same URL. However, if you hover over the tooltip listed URL, (or click on it) you can see from the indicator in the bottom left hand corner of the browser what the actual URL you’re clicking on is. Like this:

google docs link direction

In this case, the link you’ll actually click on is referral to the original link via a Google URL. This one, in fact:


What this means is that if I click on the link, Google tracks the fact that the link was clicked on. From the value of the usg variable (in this case, AFQjCNHgu25L-v9rkkMqZSX54E8kP_XR-A) it presumably also knows the source document containing the link and whatever follows from that.

Hmmm… If I publish the document, the Google rewrite appears to be removed:

google doc publish to web

There are also several export options associated with the document:

google doc export options

So what links get exported?

Here’s the Word export:

google doc export docx word

That seems okay – no tracking. How about odt?

google doc export as odt

That looks okay too. RTF and and HTML export also seem to publish the “clean” link.

What about PDF?

google doc export as PDF

Hmm… so tracking is included here. So if you write a doc in Google docs that contains links that are autolinked, then you export that doc as PDF and share it with other folk, Google will know when folk click on that link from a copy of that PDF document and (presumably) the originally authored Google docs document (and all that that entails…)

How about if we email a doc as a PDF attachment to someone from within Google docs:

google doc email pdf

So that seems okay (untracked).

What’s the story then? FullFact claimed they cut and paste rich HTML from Google docs into a rich text editor and the Google redirection attack was inserted into the link. I couldn’t recreate that, and nor could the FullFact folk, so either there are some Google “experiments” going on, or the workflow was misremembered.

In my own experiments, I got a Google redirection from clicking links within my original document, and from the exported PDF, but not from any other formats?

So what do we learn? I guess this at least: be aware that when Google linkifies links for you, it may be redirecting clicks on those links through Google tracking servers. And that these tracked links may be propagated to exported and/or otherwise shared versions of the document.

PS see also Google/Feedburner Link Pollution or More Link Pollution – This Time from WordPress.com for more of the same, and Personal Declarations on Your Behalf – Why Visiting One Website Might Tell Another You Were There for a quick overview of what might happen when you actually land on a page…

Link rewriters are, of course, to be find in lots of other places too…

Twitter, for example, actually wraps all shared links in it’s t.co wrapper:

twitter rewrite

Delicious (which I’ve stopped using – I moved over to Pinboard) also uses it’s own proxy for clicked on stored bookmarks…

delicious link rewriter

If you have any other examples, particularly of link rewriting/annotation/pollution where you wouldn’t expect it, please let me know via the comments…

And So It Begins… The Disinteroperability of the Web (Or Just a Harmless Bug…?)

When does a keyboard shortcut *not* do the same thing as the menu command it shortcuts? When it’s a Google docs copy command in Google Chrome, maybe?

Although I know that I, and I suspect many of the any readers of this blog, use keyboard shortcuts unconsciously, intuitively, on a regularly basis: ctrl/cmd-f for within page search, -c for copy, -x for cut, and -v for paste. I also suspect that keyboard shortcuts are alien to many, and that a more likely route to these everyday operations is through the file menu:

or (more unlikely?) via a right-click contextual pop-up menu:

As keyboard shortcut users, we assume that the keyboard shortcuts and the menu based operations do the same thing. But whether a bug or not, I noticed today in the course of using Google docs in Google Chrome that when I tried to copy a highlighted text selection using either the file menu Copy option, or the contextual menu copy option, I was presented with this:

(The -c route to copying still worked fine.)

With Chrome well on its way to becoming the world’s most popular browser, allowing Google to dominate not just our searchable view over the web, but also intermediate our direct connection to the web through the desktop client we use to gain access to it, this makes me twitchy… Firstly, because it suggests that whilst the keyboard shortcut is still routing copied content via my clipboard, the menued option is routing it through the browser, or maybe even the cloud where an online connection is present? Secondly, because in prompting me to extend my browser, I realised I have no real idea of what sorts of updates Google is regularly pushing to me through Chrome’s silent updating behaviour (I’m on version 19.0.1084.46 at the moment, it seems… 19.0.1084.46.

A lot of Google’s activities are driven by technical decisions based on good technical reasons for “improving” how applications work and interoperate with each other. But it seems to me that Google is also closing in on itself and potentially adopting technical solutions that either break interoperability, or include a Google subsystem or process at every step (introducing an alternative de facto operating system onto out desktop by a thousand tiny updates and extensions). So for example, whilst I haven’t installed the Chrome copy extension, I wonder if I had: would a menu based copy from a Google doc allow me to then paste the content into a Word doc running as a Microsoft Office desktop application, or paste it into my online WordPress editor. And if so, would Chrome be cacheing that copied content via the extension?

Maybe this is something and nothing. Maybe I’m just confused about how the cut-and-paste thing works at all. Or maybe Google is starting to overstep its mark and is opening up an attack on host operating system functions from installed browser base. Which as the upcoming most popular browser in the world is not a bad beachhead to have…

PS At least Google Public DNS isn’t forced onto Chrome users as the default way of identifying the actual IP address of a website that is used to actually connect the browser to it from an entered domain name or clicked on link…

Name-based Robots.txt for Wifi Access Points?

Google just announced via a blog post – Greater choice for wireless access point owners – that owners of wifi acccess points who did not want Google to add the address and location of the access point to the Google Location Server that they need to rename the access point by adding _nomap to the end of the access point name or SSID (e.g. My Network_nomap) [UPDATE: note that this means it’s an opt-out model rather than a _mapme opt-in strategy (h/t @patparslow for that…)]

This is a bit like the declarative approach webpublishers take to identify pages they don’t want search robots indexing, by including the names/paths of “please don’t” content in a robots.txt file. The Google assumption seems to be that if anything is visible in pretty much any way, they can index it unless you explicitly tell them not to.

All well and good, but what about the access points that Google has already added to the index, even if their publishers rather they didn’t? Will these be automagically removed next time a lookup is made?

Maybe the removal protocol will work like this: Android phone or browser with location service enabled* detects local access point name, tells Google Location Service, Google notes that the name is now ‘_nomap’, deletes it from the index, returns ‘not found’?

*You do know your browser often knows where you are from local wifi points, don’t you, even if your laptop doesn’t have GPS or a 3G card? It tends to go by the name location aware browsing and involves your browser sending identifiers such as your IP address, the names of local wifi access points, and a browser ID to a Google service that has a big database of identifiers and geo-location data for where it thinks each identifier is located. (Hmmm..interesting… I hadn’t realised that Firefox uses the Google Location Service till just now..?)

I don’t think you even need to be logged on to a network for it’s name to be phoned back to the location service? As the Mozilla FAQ puts it: “By default, Firefox uses Google Location Services to determine your location by sending … information about the nearby wireless access points…” (note nearby wireless access points).

PS by the by, here’s the strategy used by Android phones for detecting location.

Obtaining locations in android http://developer.android.com/guide/topics/location/obtaining-user-location.html

Is there a similar diagram for how browsers approach location detection anywhere?