Time for Chaff as Google Analytics Adds Demographic and Interest Based Segmentation?

Via @mhawksey RTing @R3beccaF (I missed Rebecca’s tweet first time round), I notice that “Google Analytics can now segment visitors by age, gender and interests”, as described here: Getting Excited about Google Analytics’ Upcoming Features. The supported dimensions – age, gender and interest – allow you to get some idea about the demographics of your site visitors and segment stats on the same (though I wonder about sampling errors, how the demographic data is associated with user cookies etc?) Note also that demographics stats have previously been available in other Google products, such as Youtube and (via Karen Blakeman), Blogger, and demographic targeting of ads has been around for some time, of course…

Previously, to get demographic data into Google Analytics, I think you had to push it there yourself via custom variables (eg example; see also some of these sneaky tricks (I quite liked the idea of finessing the acquisition of user demographics data by capturing responses to ads placed via demographic targeting tools…!;-)

In passing, I just wonder about this phrase from the Google Analytics terms of service (my emphasis): You will not (and will not allow any third party to) use the Service to track, collect or upload any data that personally identifies an individual (such as a name, email address or billing information), or other data which can be reasonably linked to such information by Google.

So does this mean Google is free to try to learn from and link to whatever it thinks it can from your custom variable data, for example?

In any case, this all seems in keeping with Google’s aim to do everyone’s tracking on their behalf

Note to self: get up to speed on cohorts (90 days history only? This section in this post on unified segments suggests at least 6 months history?).

Note to self, 2: how could we go about obfuscating the data collected from us? I wonder about how we might go about creating digital/browser chaff? For example, running a background process that visits random websites and runs random searches under the guise of my Google account…?

I should probably tag this under: targeting countermeasures.

Repository Googalytics – Visits from HEIs

Chatting with @clarileia from the OU Library today about what a) interesting, and b) useful things we might be able to learn from web analytics around the OU’s open repository – http://oro.open.ac.uk – I wondered whether it would be possible to generate reports based around traffic coming from other HEIs.

I had a vague memory of setting up filters on Google Analytics years ago to segment out library activity based on visitor IP address, using IP ranges from from the different OU regional offices to generate reports based on library website usage by region, though I’m not sure I ever blogged it (I was asked not to publish the list of OU IP address ranges…). Trying to refresh my memory, it seems you can set up custom filters in a Google Analytics site profile to limit data collection to visits from within a particular IP address range (eg exclude internal traffic and IP address range tool):

googalytics IP range tracking

IP limits don’t appear to be available for defining custom segments; instead, GA looks up the owner of the IP address and reports that as a Service Provider attribute, which can be used to define a custom segment:

GA - university source:service provider segment

(When accessing GA though the API, I think the corresponding field is ga:networkLocation(?), though I haven’t tested it…)

Here’s an example of what a segment filter on Service Provider terms containing university turns up:

GA service provider example

So now I’m wondering: is there a full list of “Service Provider” names for UK HEIs, as picked up by Google Analytics, anywhere, that could be used as the basis of a shareable/templated Advanced Segment?

See also: OUseful.info posts on library analytics.

eSTEeM Project: Library Website Tracking For VLE Referrals

Assuming my projects haven’t been cut out at the final acceptance stage because I haven’t yet submitted a revised project plan,

As OU courses are increasingly presented through the VLE, many of them opt to have one or more “Library Resources” pages that contain links to course related resources either hosted on the OU Library website or made available through a Library operated web service. Links to Library hosted or moderated resources may also appear inline in course content on the VLE. However, at the current time, it is difficult to get much idea about the extent to which any of these resources are ever accessed, or how students on a course make use of other Library resources.

With the state of the collection and reporting of activity data from the VLE still evolving, this project will explore the extent to which we can make use of data I do know exists, and to which I do have access, specifically Google Analytics data for the library.open.ac.uk domain.

The intention is to produce a three-way reporting framework using Google Analytics for visitors to the OU Library website and Library managed resources from the VLE. The reports will be targeted at: subject librarians who liaise with course teams; course teams; subscription managers.

Google Analytics (to which I have access) are already running on the library website and the matter just(?!) arises now of:

1) Identifying appropriate filters and segments to capture visits from different courses;

2) development of Google Analytics API wrapper calls to capture data by course or resource based segments and enable analysis, visualisation and reporting not supported within the Google Analytics environment.

3) Providing a meaningful reporting format for the three audience types. (note: we might also explore whether a view over the activity data may be appropriate for presenting back to students on a course.)

The Project
The OU Library has been running Google Analytics for several year, but to my knowledge has not started to exploit the data being collected as part of a reporting strategy on the usage of library resources resulting from referrals from the VLE. (Whenever a user clicks on a link in the VLE that leads to the Library website, the Google Analytics on the Library website can capture that fact.)

At the moment, we do not tend to work on optimising our online courses as websites so that they deliver the sorts of behaviour we want to encourage. If we were a web company, we would regularly analyse user behaviour on our course websites and modify them as a result.

This project represents the first step in a web analytics approach to understanding how our students access Library resources from the VLE: reporting. The project will then provide the basis for a follow on project that can look at how we can take insight from those reports and make them actionable, for example in the redesign of the way links to library resources are presented or used in the VLE, or how visitors from the VLE are handled when they hit the Library website.

The project complements work that has just started in the Library on a JISC funded project to making journal recommendations to students based on previous user actions.

The first outcome will be a set of Google Analytics filters and advanced segments tuned to the VLE visitor traffic and resource usage on the Library website. The second will be a set of Google analytics API wrappers that allow us to export this data and use it outside the Google Analytics environment.

The final deliverables are three report types in two possible flavours:

1) a report to subject librarians about the usage of library resources from visitors referred from the VLE for courses they look after

2) a report to librarians responsible for particular subscription databases showing how that resource is accessed by visitors referred from the VLE, broken down by course

3) a report to course teams showing how library resources linked to from the VLE for their course are used by visitors referred to those resources from the VLE.

The two flavours are:

a) Google analytics reports

b) custom dashboard with data accessed via the Google Analytics API

Recommendations will also be made based on the extent to which Library website usage by anonymous students on particular OU courses may be tracked by other means, such as affinity strings in the SAMS cookie, and the benefits that may accrue from this more comprehensive form of tracking.

If course team members on any OU courses presenting over the next 9 months are interested in how students are using the library website following a referral from the VLE, please get in touch. If academics on courses outside the OU would like to discuss the use of Google Analytics in an educational context, I’d love to hear from you too:-)

eSTEeM is joint initiative between the Open University’s Faculty of Science and Faculty of Maths, Computing and Technology to develop new approaches to teaching and learning both within existing and new programmes.

UK HE Libraries Using Google Analytics

How many UK Higher Education Library websites are running Google Analytics, and how many of them are actually using them to report anything other than sitewide pageviews and visitor numbers?

A couple of years ago, I ran a series of posts on Library Analytics where I started to explore some of the ways in which Google Analytics (as it was then) could be used to help us start to understand how a library website was being used by its different sorts of visitors.

Two years on, and I’ve started looking again at Googalytics in the Library, and will hopefully get round to publishing a few posts at least about what I’ve learned about using as it currently stands for making sense of Library website usage, and for what we may be able to report back to course teams about library website activity of users referred from course pages on the OU VLE.

One thing I thought I’d like to try to do is come up with custom reports, segments and goal recipes that might be transferable, or useful to other HE Library websites, as well as identify “best practice” approaches that are used by other HE libraries running Google Analytics… But which libraries are running Google Analytics?

Using a list of HE Library websites grabbed from a November, 2009 dump of a scrape of the Sconul website (by @ostephens, I think?), I ran a quick python script to sniff library websites for evidence of Google Analytics tracking codes (results).

Total number of websites checked 181
Number with Google Analytics code detected 110 Percentage: 0.60773480663
Number without Google Analytics code detected 67 Percentage: 0.370165745856
Number of pages failed to load 4 Percentage: 0.0220994475138

So, it seems like a fair few folk are running Google Analytics… but I wonder: what are they reporting, what segments and custom reports do they find most useful, what goals have they defined (and do they carry a meaningful “financial” conversion value? If so, defined how?), are they in any sense “actionable” (that is, have they been used to prompt interventions to increase traffic, influence on-site behaviour, feed in to website design changes, feed in to subscription or book acquisition policies, improve links with course academics, update reading lists, contribute to VLE content or structure, schedule and staff online help services, influence opening hours etc. etc.)

If you are working in an HE library, running Google Analytics, and can provide even fragmentary answers to any of the above questions, please reply in a comment below, or feel free to email me (in confidence, if required) at: a.j.hirst@open.ac.uk

PS I’m even going to start looking to the literature, too… So for example, this is next on my reading list: Turner, S. J. (2010). Website Statistics 2.0: Using Google Analytics to Measure Library Website Effectiveness. Technical Services Quarterly, 27(3), 261-278. doi:10.1080/07317131003765910

PPS I thought I’d follow the single citation to that paper too, but it seems I can’t unless I pay for it…

This is interesting, methinks. Not only is the content of the paper kept behind a paywall, but so is its incoming link context…

“Look at me, Look at me” – Rewriting Google Analytics Tracking Codes

A couple of quick post hoc thoughts to add to Google/Feedburner Link Pollution:

1) there’s an infoskills issue here based on an understanding of what proxied links are, what is superfluous in a URI (Google tracking attributes etc);

2) there’s fun to be had… so for example, @ajcann recently posted on how students are Leicester are getting into the bookmarked resource thing and independently “doing some excellent work on delicious, creating module resources”: Where’s the social?.

Here’s the original link as polluted by Feedburner (I clicked through to the page from Google Reader):

Normally, I would have stripped the tracking cod from the link I made above to Alan’s post. Instead, I used this:

(The campaign element is the category I used for this post, the content is the shortcode for the post.)

Don’t ya just love it: tracking code spam :-)

So I’m thinking – maybe I need a WordPress plugin that will preemptively clean all external links of Google tracking codes and then add my own ‘custom’ tracking stuff on instead (under the assumption that the linked to site is running Google Analytics. If it isn’t, then the annotations are just an unsightly irrelevance, or noise in the URI…