Mental Health and the Web – Hounded by Ads, Bullied by Retargeters?

Some thinkses not thought through…

– can web ads feed paranoia, particularly retargeted ads that follow you round the web?
– can personalised ads be upsetting, or appear bullying? For example, suppose you’re critically ill, or in debt, or you’ve been researching a sensitive personal or family matter, and then the ads start to hound you and remind you about it…

Is one take on privacy that it is not to be reminded about something by an external agency? (Is there a more memorable quote along those lines somewhere..?!)

Are personalised ads one of the more pernicious examples of a “filter bubble” effect?

I think I need a proper holiday, well away from the web!

Time for Chaff as Google Analytics Adds Demographic and Interest Based Segmentation?

Via @mhawksey RTing @R3beccaF (I missed Rebecca’s tweet first time round), I notice that “Google Analytics can now segment visitors by age, gender and interests”, as described here: Getting Excited about Google Analytics’ Upcoming Features. The supported dimensions – age, gender and interest – allow you to get some idea about the demographics of your site visitors and segment stats on the same (though I wonder about sampling errors, how the demographic data is associated with user cookies etc?) Note also that demographics stats have previously been available in other Google products, such as Youtube and (via Karen Blakeman), Blogger, and demographic targeting of ads has been around for some time, of course…

Previously, to get demographic data into Google Analytics, I think you had to push it there yourself via custom variables (eg example; see also some of these sneaky tricks (I quite liked the idea of finessing the acquisition of user demographics data by capturing responses to ads placed via demographic targeting tools…!;-)

In passing, I just wonder about this phrase from the Google Analytics terms of service (my emphasis): You will not (and will not allow any third party to) use the Service to track, collect or upload any data that personally identifies an individual (such as a name, email address or billing information), or other data which can be reasonably linked to such information by Google.

So does this mean Google is free to try to learn from and link to whatever it thinks it can from your custom variable data, for example?

In any case, this all seems in keeping with Google’s aim to do everyone’s tracking on their behalf

Note to self: get up to speed on cohorts (90 days history only? This section in this post on unified segments suggests at least 6 months history?).

Note to self, 2: how could we go about obfuscating the data collected from us? I wonder about how we might go about creating digital/browser chaff? For example, running a background process that visits random websites and runs random searches under the guise of my Google account…?

I should probably tag this under: targeting countermeasures.

Inappropriate Linkification (aka redirection attacks?!) in Google Docs

Reading through another wonderful post on the FullFact blog last night (Full Fact sources index: where to find the information you need), I noticed that the linked to resources from that post were being redirected via Google URL:

drafting in Google docs

A tweet confirmed that this wasn’t intentional, so what had happened? I gather the workflow used to generate the post was to write it in Google docs, and then copy and paste the rich/HTML text into a rich text editor in Drupal, although I couldn’t recreate this effect (and nor could FullFact). However, suitably suspicious, I started having a play, writing a simple test document in Google docs:

gogle doc link tracking

The Google doc automatically links the test URL I added to the document. (This is often referred to as “linkification” – if a piece of text is recognised as something that looks like a URL or web link, it gets rewritten as a clickable link. Typically, you might assume that the link you’ll now be clicking on is the link that was recognised. This may be a bad assumption to make…) If you hover over the URL as written in the document, you get a tooltip that suggests the link is to the same URL. However, if you hover over the tooltip listed URL, (or click on it) you can see from the indicator in the bottom left hand corner of the browser what the actual URL you’re clicking on is. Like this:

google docs link direction

In this case, the link you’ll actually click on is referral to the original link via a Google URL. This one, in fact:

http://www.google.com/url?q=http%3A%2F%2Fblog.ouseful.info&sa=D&sntz=1&usg=AFQjCNHgu25L-v9rkkMqZSX54E8kP_XR-A

What this means is that if I click on the link, Google tracks the fact that the link was clicked on. From the value of the usg variable (in this case, AFQjCNHgu25L-v9rkkMqZSX54E8kP_XR-A) it presumably also knows the source document containing the link and whatever follows from that.

Hmmm… If I publish the document, the Google rewrite appears to be removed:

google doc publish to web

There are also several export options associated with the document:

google doc export options

So what links get exported?

Here’s the Word export:

google doc export docx word

That seems okay – no tracking. How about odt?

google doc export as odt

That looks okay too. RTF and and HTML export also seem to publish the “clean” link.

What about PDF?

google doc export as PDF

Hmm… so tracking is included here. So if you write a doc in Google docs that contains links that are autolinked, then you export that doc as PDF and share it with other folk, Google will know when folk click on that link from a copy of that PDF document and (presumably) the originally authored Google docs document (and all that that entails…)

How about if we email a doc as a PDF attachment to someone from within Google docs:

google doc email pdf

So that seems okay (untracked).

What’s the story then? FullFact claimed they cut and paste rich HTML from Google docs into a rich text editor and the Google redirection attack was inserted into the link. I couldn’t recreate that, and nor could the FullFact folk, so either there are some Google “experiments” going on, or the workflow was misremembered.

In my own experiments, I got a Google redirection from clicking links within my original document, and from the exported PDF, but not from any other formats?

So what do we learn? I guess this at least: be aware that when Google linkifies links for you, it may be redirecting clicks on those links through Google tracking servers. And that these tracked links may be propagated to exported and/or otherwise shared versions of the document.

PS see also Google/Feedburner Link Pollution or More Link Pollution – This Time from WordPress.com for more of the same, and Personal Declarations on Your Behalf – Why Visiting One Website Might Tell Another You Were There for a quick overview of what might happen when you actually land on a page…

Link rewriters are, of course, to be find in lots of other places too…

Twitter, for example, actually wraps all shared links in it’s t.co wrapper:

twitter rewrite

Delicious (which I’ve stopped using – I moved over to Pinboard) also uses it’s own proxy for clicked on stored bookmarks…

delicious link rewriter

If you have any other examples, particularly of link rewriting/annotation/pollution where you wouldn’t expect it, please let me know via the comments…

Name-based Robots.txt for Wifi Access Points?

Google just announced via a blog post – Greater choice for wireless access point owners – that owners of wifi acccess points who did not want Google to add the address and location of the access point to the Google Location Server that they need to rename the access point by adding _nomap to the end of the access point name or SSID (e.g. My Network_nomap) [UPDATE: note that this means it’s an opt-out model rather than a _mapme opt-in strategy (h/t @patparslow for that…)]

This is a bit like the declarative approach webpublishers take to identify pages they don’t want search robots indexing, by including the names/paths of “please don’t” content in a robots.txt file. The Google assumption seems to be that if anything is visible in pretty much any way, they can index it unless you explicitly tell them not to.

All well and good, but what about the access points that Google has already added to the index, even if their publishers rather they didn’t? Will these be automagically removed next time a lookup is made?

Maybe the removal protocol will work like this: Android phone or browser with location service enabled* detects local access point name, tells Google Location Service, Google notes that the name is now ‘_nomap’, deletes it from the index, returns ‘not found’?

*You do know your browser often knows where you are from local wifi points, don’t you, even if your laptop doesn’t have GPS or a 3G card? It tends to go by the name location aware browsing and involves your browser sending identifiers such as your IP address, the names of local wifi access points, and a browser ID to a Google service that has a big database of identifiers and geo-location data for where it thinks each identifier is located. (Hmmm..interesting… I hadn’t realised that Firefox uses the Google Location Service till just now..?)

I don’t think you even need to be logged on to a network for it’s name to be phoned back to the location service? As the Mozilla FAQ puts it: “By default, Firefox uses Google Location Services to determine your location by sending … information about the nearby wireless access points…” (note nearby wireless access points).

PS by the by, here’s the strategy used by Android phones for detecting location.

Obtaining locations in android http://developer.android.com/guide/topics/location/obtaining-user-location.html

Is there a similar diagram for how browsers approach location detection anywhere?

BBC Click Radio – Openness Special on “Privacy”: Jeff Jarvis vs. Andrew Keen

This week saw the latest episode in the OU/BBC World Service Click (radio) co-produced season on openness, with a focus this week on privacy… You can hear an extended version of the discussion between entrepeneurial journalism and openness advocate, Jeff Jarvis, and professional contrarian, Andrew Keen: Privacy in a connected world

Unfortunately, the episode aired just too early to pick up up on this week’s “Who needs privacy?!” news, and in particular the new iPhone’s “secret” location logging behaviour: iPhone keeps record of everywhere you go; (find out how to see where your iPhone thinks you’ve been here: Got an iPhone or 3G iPad? Apple is recording your moves); but the discussion is a great one, so I encourage you to listen to it…(I’ll be asking questions later!;-)

The programme also saw the launch of its new hashtag: #bbcClickRadio

Whilst the Digital PlanetClick twitter audience is still dwarfed by the Digital Planet Listeners’ Facebook group, I’m keen to see if we can try to grow it… one way might be to show who’s recently been tweeting about the programme, and encourage people to start following each other and chatting about the issues raised in the programme a little bit more – something Gareth Mitchell (@garethm) can now pick up on at least on the first airing, as Click now goes out live…. So to that end, I’m going to try to work up a special version of my Twtter friendviz application that shows connections between folk who’ve recently tweeted a particular term, and in this case, the #bbcClickRadio hashtag. To see the map, visit http://bit.ly/bbcclickradiocommunity.

As a tease, here’s a rather more polished version of a map I grabbed recently…

Snapshot of #bbcClickRadioCommunity - http://bit.ly/bbcclickradiocommunity

(Unfortunately, the live one is unlikely to ever look like this!)

PS I wonder if the investigation into the iPhone tracking was inspired by the recent story about German politician Malte Spitz who managed to obtain a copy of the data his phone provider had stored about his location… Zeit Online: Tell-all telephone (If you want to play with the data, it’s available from there…)

Predictive Ads…? Or Email Address Targeted Advertising…?!

As I get was getting increasingly annoyed by large flashing display ads in my feedreader this morning, the thought suddenly occurred to me: could Google serve me ads on third party sites based on my unread Gmail emails?

That is, as I check my feeds before my email in a morning, could I be seeing ads that foreshadow the content of the email I’ve been ignoring for way too long? Or could I receive ads that flag the content of my Priority Inbox messages?

Rules regarding sensitivity and privacy would have to be carefully thought through,m of course. Here’s how they currently stand regarding contextual ads delivered in Gmail (More on Gmail and privacy: Targeted ads in Gmail):

By offering Gmail users relevant ads and information related to the content of their messages, we aim to offer users a better webmail experience. For example, if you and your friends are planning a vacation, you may want to see news items or travel ads about the destination you’re considering.

To ensure a quality user experience for all Gmail users, we avoid showing ads reflecting sensitive or inappropriate content by only showing ads that have been classified as “Family-Safe.” We also avoid targeting ads to messages about catastrophic events or tragedies. [Google’s emphasis]

[See also: Ads in Gmail and your personal data Share Comment]

Not quite as future predictive as gDay™ with MATE™ that lets you “search tomorrow’s web today” and “[discover] content on the internet before it is created”, but almost…!

It’s also a step on the road to Eric Schmidt’s dream of providing you with results even before you search for them. (For a more recent interview, see Google’s Eric Schmidt predicts the future of computing – and he plans to be involved.)

Here’s another, more practical(?!) thought – suppose Google served me headers of Priority Inbox email messages that were also marked as urgent through Adwords ads, in a full-on attempt to try to attract my attention to “really important” messages?! “Flashmail” messages delivered through the Adwords network… (I can imagine at least one course manager who I suspect would try to contact me via ads when I don’t pick up my email! ;-)

Searching the internet of things may still be a little way off though….

PS thinking email address targeted ads (mailads?) through a bit more, here are a couple of ways of doing it that immediately come to mind. Suppose I want to target an ad at whoever@example.com:

1) Adwords could place that ad in my GMail sidebar; (I think they’d be unlikely to place ads within emails, even if clearly marked, because this approach has been hugely unpopular in the past (it also p****s me off in feeds ); that said, Google has apparently started experimenting with (image based) display ads in gmail;

2) Adwords could place the ad on a third party site if the Goog spots me via a cookie and sees I’m currently logged in to Google, for example, with the whoever@example.com email address.

As Facebook gets into the universal messaging game, email address based ad targeting would also work there?

PPS interesting – the best ads act as content, so maybe ads could be used to deliver linked content? Twitter promoted tweets – the AdWords for live news?. Which reminds me, I need to work up my bid for using something like AdWords to deliver targeted educational content.

Rant About URL Shorteners…

It’s late, I’m tired, and I have a 5am start… but I’ve confused several people just now with a series of loosely connected idle ranty tweets, so here’s the situation:

– I’m building a simple app that looks at URLs tweeted recently on a twitter list;
– lots of the the URLs are shortened;
– some of the shortened URLs are shortened with different services but point to the same target/destination/long URL;
– all I want to do – hah! ALL I want to do – is call a simple webservice example.com/service?short2long=shorturl that will return the long url given the short URL;
– i have two half solutions at the moment; the first is using python to call the url (urllib.urlopen(shorturl)), then use .getinfo() on the return to look-up the page that was actually returned; then I use Beautiful Soup to try and grab the <title> element for the page so I can display the page title as well as the long (actual) URL; BUT – sometimes the urllib call appears to hang (and I can’t see how to set a timeout/force and except), and sometimes the page is so tatty Beuatiful Soup borks on the scrape;
– my alternative solution is to call YQL with something like select head.title from html where url=”http://icio.us/5evqbm&#8221; and xpath = “//html” (h/t to @codepo8 for pointing out the xpath argument); if there’s a redirect, the diagnostics YQL info gives the redirect URL. But for some services, like the Yahoo owned delicious/icio.us shortener, the robots.txt file presumably tells the well-behaved YQL to f**k off, becuase 3rd party resolution is not allowed.

It seems to me that in exchange for us giving shorteners traffic, they should conform to a convention that allows users, given a shorturl, to:

1) lookup the long URL, necessarily, using some sort of sameas convention;
2) lookup the title of the target page, as an added value service/benefit;
3) (optionally) list the other alternate short URLs the service offers for the same target URL.

If I was a militant server admin, I’d be tempted to start turning traffic away from the crappy shorteners… but then. that’s because I’m angry and ranting at the mo…;-)

Even if I could reliably call the short URL and get the long URL back, this isn’t ideal… suppose 53 people all mint their own short URLs for the same page. I have to call that page 53 times to find the same URL and page title? WTF?

… or suppose the page is actually an evil spam filled page on crappyevilmalware.example.com with page title “pr0n t1t b0ll0x”; maybe I see that and don’t want to go anywhere near the page anyway…

PS see also Joshua Schachter on url shorteners

PPS sort of loosely related, ish, err, maybe ;-) Local or Canonical URIs?. Chris (@cgutteridge) also made the point that “It’s vital that any twitter (or similar) archiver resolves the tiny URLs or the archive is, well, rubbish.”