Inappropriate Linkification (aka redirection attacks?!) in Google Docs

Reading through another wonderful post on the FullFact blog last night (Full Fact sources index: where to find the information you need), I noticed that the linked to resources from that post were being redirected via Google URL:

drafting in Google docs

A tweet confirmed that this wasn’t intentional, so what had happened? I gather the workflow used to generate the post was to write it in Google docs, and then copy and paste the rich/HTML text into a rich text editor in Drupal, although I couldn’t recreate this effect (and nor could FullFact). However, suitably suspicious, I started having a play, writing a simple test document in Google docs:

gogle doc link tracking

The Google doc automatically links the test URL I added to the document. (This is often referred to as “linkification” – if a piece of text is recognised as something that looks like a URL or web link, it gets rewritten as a clickable link. Typically, you might assume that the link you’ll now be clicking on is the link that was recognised. This may be a bad assumption to make…) If you hover over the URL as written in the document, you get a tooltip that suggests the link is to the same URL. However, if you hover over the tooltip listed URL, (or click on it) you can see from the indicator in the bottom left hand corner of the browser what the actual URL you’re clicking on is. Like this:

google docs link direction

In this case, the link you’ll actually click on is referral to the original link via a Google URL. This one, in fact:

http://www.google.com/url?q=http%3A%2F%2Fblog.ouseful.info&sa=D&sntz=1&usg=AFQjCNHgu25L-v9rkkMqZSX54E8kP_XR-A

What this means is that if I click on the link, Google tracks the fact that the link was clicked on. From the value of the usg variable (in this case, AFQjCNHgu25L-v9rkkMqZSX54E8kP_XR-A) it presumably also knows the source document containing the link and whatever follows from that.

Hmmm… If I publish the document, the Google rewrite appears to be removed:

google doc publish to web

There are also several export options associated with the document:

google doc export options

So what links get exported?

Here’s the Word export:

google doc export docx word

That seems okay – no tracking. How about odt?

google doc export as odt

That looks okay too. RTF and and HTML export also seem to publish the “clean” link.

What about PDF?

google doc export as PDF

Hmm… so tracking is included here. So if you write a doc in Google docs that contains links that are autolinked, then you export that doc as PDF and share it with other folk, Google will know when folk click on that link from a copy of that PDF document and (presumably) the originally authored Google docs document (and all that that entails…)

How about if we email a doc as a PDF attachment to someone from within Google docs:

google doc email pdf

So that seems okay (untracked).

What’s the story then? FullFact claimed they cut and paste rich HTML from Google docs into a rich text editor and the Google redirection attack was inserted into the link. I couldn’t recreate that, and nor could the FullFact folk, so either there are some Google “experiments” going on, or the workflow was misremembered.

In my own experiments, I got a Google redirection from clicking links within my original document, and from the exported PDF, but not from any other formats?

So what do we learn? I guess this at least: be aware that when Google linkifies links for you, it may be redirecting clicks on those links through Google tracking servers. And that these tracked links may be propagated to exported and/or otherwise shared versions of the document.

PS see also Google/Feedburner Link Pollution or More Link Pollution – This Time from WordPress.com for more of the same, and Personal Declarations on Your Behalf – Why Visiting One Website Might Tell Another You Were There for a quick overview of what might happen when you actually land on a page…

Link rewriters are, of course, to be find in lots of other places too…

Twitter, for example, actually wraps all shared links in it’s t.co wrapper:

twitter rewrite

Delicious (which I’ve stopped using – I moved over to Pinboard) also uses it’s own proxy for clicked on stored bookmarks…

delicious link rewriter

If you have any other examples, particularly of link rewriting/annotation/pollution where you wouldn’t expect it, please let me know via the comments…

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

4 thoughts on “Inappropriate Linkification (aka redirection attacks?!) in Google Docs”

  1. I actually encounter this quite a lot. The same form of link is produced in Google search results. Here’s an example:

    http://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDQQFjAA&url=http%3A%2F%2Fwww.amazon.ca%2FArctic-Dreams-Barry-Lopez%2Fdp%2F0375727485&ei=15pmUY30Ofa14APC0IDoAg&usg=AFQjCNGWd9pSM0ScdFDdXb8YxydKoepYwQ&sig2=RJEz-khe0xxEB0bxNcJ32Q&bvm=bv.45107431,d.dmg

    When I see something like this in a document I generally assume that the person did a search, found a title, right-clicked on the link from the search page, and passed it directly into the document (without every looking at the original article).

    1. @downes Last time I checked, (which admittedly was some time ago) it seemed that Google web search gave Google redirection links if you were logged in (perhaps to feed Search History?) but not if you weren’t? (Note to self – check to see what happens now.)

      Linkification is different I think? It’s fair enough(?!) for Google to write links in their own search results howsoever they want, but when linkifying I think the assumption is that the href points to the the link being linked to… rather than a redirect.

      I don’t even know if I can turn linkifiers off in Google docs, but the fact that Google inserts its own link, with an identifier that presumably associates clicks with the original document (and hence the original owner), into a PDF feels unreasonable to me?

      I also wonder whether clicks on the tracked link can be fed back into Google Analytics stats for the original document, eg giving information to the original document owner about clicks generated via the PDF?

  2. I noticed this in Google Drive (Docs) a little while ago, and then it seemed to show up all over. Similar to what Steven said, it’s somewhat of an indicator of what level of depth someone went to when sharing links. Also catches my eye when a friend shares a link and the (already long) URL has extra text attached to it, beginning with, “source=yaddayadda_&campaign=thisorthat…”

    I imagine most people share with sites’ built-in social sharing tools instead, which of course have tracking features too. But if you’re not making anything on the web, you might not notice the linking/tracking/pollution at all. Consume, consume, consume.

    1. @Billy THe issue in this case was not that someone had copied and pasted a link from Google Web search, it was more a case of raising the question about the extent to which platform rewrite “pristine” URLs you paste into them? If it is the case that every time I add a simple link to a Google Doc, and then share that doc as a PDF document, I’d feel a bit dischuffed that Google was embedding *a different link* to the one I used in the PDF?

Comments are closed.