Reading through another wonderful post on the FullFact blog last night (Full Fact sources index: where to find the information you need), I noticed that the linked to resources from that post were being redirected via Google URL:
A tweet confirmed that this wasn’t intentional, so what had happened? I gather the workflow used to generate the post was to write it in Google docs, and then copy and paste the rich/HTML text into a rich text editor in Drupal, although I couldn’t recreate this effect (and nor could FullFact). However, suitably suspicious, I started having a play, writing a simple test document in Google docs:
The Google doc automatically links the test URL I added to the document. (This is often referred to as “linkification” – if a piece of text is recognised as something that looks like a URL or web link, it gets rewritten as a clickable link. Typically, you might assume that the link you’ll now be clicking on is the link that was recognised. This may be a bad assumption to make…) If you hover over the URL as written in the document, you get a tooltip that suggests the link is to the same URL. However, if you hover over the tooltip listed URL, (or click on it) you can see from the indicator in the bottom left hand corner of the browser what the actual URL you’re clicking on is. Like this:
In this case, the link you’ll actually click on is referral to the original link via a Google URL. This one, in fact:
What this means is that if I click on the link, Google tracks the fact that the link was clicked on. From the value of the usg variable (in this case, AFQjCNHgu25L-v9rkkMqZSX54E8kP_XR-A) it presumably also knows the source document containing the link and whatever follows from that.
Hmmm… If I publish the document, the Google rewrite appears to be removed:
There are also several export options associated with the document:
So what links get exported?
Here’s the Word export:
That seems okay – no tracking. How about odt?
That looks okay too. RTF and and HTML export also seem to publish the “clean” link.
What about PDF?
Hmm… so tracking is included here. So if you write a doc in Google docs that contains links that are autolinked, then you export that doc as PDF and share it with other folk, Google will know when folk click on that link from a copy of that PDF document and (presumably) the originally authored Google docs document (and all that that entails…)
How about if we email a doc as a PDF attachment to someone from within Google docs:
So that seems okay (untracked).
What’s the story then? FullFact claimed they cut and paste rich HTML from Google docs into a rich text editor and the Google redirection attack was inserted into the link. I couldn’t recreate that, and nor could the FullFact folk, so either there are some Google “experiments” going on, or the workflow was misremembered.
In my own experiments, I got a Google redirection from clicking links within my original document, and from the exported PDF, but not from any other formats?
So what do we learn? I guess this at least: be aware that when Google linkifies links for you, it may be redirecting clicks on those links through Google tracking servers. And that these tracked links may be propagated to exported and/or otherwise shared versions of the document.
PS see also Google/Feedburner Link Pollution or More Link Pollution – This Time from WordPress.com for more of the same, and Personal Declarations on Your Behalf – Why Visiting One Website Might Tell Another You Were There for a quick overview of what might happen when you actually land on a page…
Link rewriters are, of course, to be find in lots of other places too…
Twitter, for example, actually wraps all shared links in it’s t.co wrapper:
Delicious (which I’ve stopped using – I moved over to Pinboard) also uses it’s own proxy for clicked on stored bookmarks…
If you have any other examples, particularly of link rewriting/annotation/pollution where you wouldn’t expect it, please let me know via the comments…