Time for Chaff as Google Analytics Adds Demographic and Interest Based Segmentation?

Via @mhawksey RTing @R3beccaF (I missed Rebecca’s tweet first time round), I notice that “Google Analytics can now segment visitors by age, gender and interests”, as described here: Getting Excited about Google Analytics’ Upcoming Features. The supported dimensions – age, gender and interest – allow you to get some idea about the demographics of your site visitors and segment stats on the same (though I wonder about sampling errors, how the demographic data is associated with user cookies etc?) Note also that demographics stats have previously been available in other Google products, such as Youtube and (via Karen Blakeman), Blogger, and demographic targeting of ads has been around for some time, of course…

Previously, to get demographic data into Google Analytics, I think you had to push it there yourself via custom variables (eg example; see also some of these sneaky tricks (I quite liked the idea of finessing the acquisition of user demographics data by capturing responses to ads placed via demographic targeting tools…!;-)

In passing, I just wonder about this phrase from the Google Analytics terms of service (my emphasis): You will not (and will not allow any third party to) use the Service to track, collect or upload any data that personally identifies an individual (such as a name, email address or billing information), or other data which can be reasonably linked to such information by Google.

So does this mean Google is free to try to learn from and link to whatever it thinks it can from your custom variable data, for example?

In any case, this all seems in keeping with Google’s aim to do everyone’s tracking on their behalf

Note to self: get up to speed on cohorts (90 days history only? This section in this post on unified segments suggests at least 6 months history?).

Note to self, 2: how could we go about obfuscating the data collected from us? I wonder about how we might go about creating digital/browser chaff? For example, running a background process that visits random websites and runs random searches under the guise of my Google account…?

I should probably tag this under: targeting countermeasures.

Repository Googalytics – Visits from HEIs

Chatting with @clarileia from the OU Library today about what a) interesting, and b) useful things we might be able to learn from web analytics around the OU’s open repository – http://oro.open.ac.uk – I wondered whether it would be possible to generate reports based around traffic coming from other HEIs.

I had a vague memory of setting up filters on Google Analytics years ago to segment out library activity based on visitor IP address, using IP ranges from from the different OU regional offices to generate reports based on library website usage by region, though I’m not sure I ever blogged it (I was asked not to publish the list of OU IP address ranges…). Trying to refresh my memory, it seems you can set up custom filters in a Google Analytics site profile to limit data collection to visits from within a particular IP address range (eg exclude internal traffic and IP address range tool):

googalytics IP range tracking

IP limits don’t appear to be available for defining custom segments; instead, GA looks up the owner of the IP address and reports that as a Service Provider attribute, which can be used to define a custom segment:

GA - university source:service provider segment

(When accessing GA though the API, I think the corresponding field is ga:networkLocation(?), though I haven’t tested it…)

Here’s an example of what a segment filter on Service Provider terms containing university turns up:

GA service provider example

So now I’m wondering: is there a full list of “Service Provider” names for UK HEIs, as picked up by Google Analytics, anywhere, that could be used as the basis of a shareable/templated Advanced Segment?

See also: OUseful.info posts on library analytics.