One of the things I fondly remember about doing physics at school was being told, at the start of each school year, how what we had been told the previous year was not quite exactly true, and that this year we would actually learn how the world worked properly…
And so as this series of posts of about “Library Analytics” continues (that is, this series about the use of web analytics as applied to public Library websites), I will continue to show you examples of headline reports I have found initially compelling (or not), and then show why they are not quite right, and actually confusing at best, or misleading at worst…
Most Popular Journals
In the previous post in this series, we saw the most popular databases that were being viewed from the databases page. Is the same possible from the journals area? A quick look at the report for the find/journals/journals page suggests that such a report should be possible, but something is broken:
From the small amount of data there, the most popular journals/journal collections were as follows:
- JSTOR (271892);
- Academic Search Complete (403673);
- Blackwell Synergy (252307);
- ACM Digital Library (208448);
- IEEE Xplore (208545);
As with the databases, segmenting the traffic visiting these collections may provide insight as to which category of user (researcher, student) and maybe even which course is most active for a particular collection.
But what happened to the reporting anyway? Where has all the missing data gone?
I just had a quick look – the reporting from within the Journals area doesn’t currently appear to be showing anything…. err, oops?
it’s not the same structure as the working code on the databases pages (which you may recall from the previous post in this series uses the tracking function pageTracker._trackPageview).
Looking at which tracking script is being used on the journals page (google-analytics.com/ga.js), I think the pageTracker._trackPageview function should be being used. urchinTracker is a function from an old tracking function. Oops… I wonder whether anyone has been (not) looking at Journal use indicators lately (or indeed, ever…?!)
Where is Journal Traffic Coming From (source location)?
So what sites are referring traffic to the journals area?
Well it looks as if there’s a lot of direct traffic coming in (so it may be worth looking at the network location report to see if we can tunnel into that), but there’s also a good chunk of traffic coming from the VLE (learn.open.ac.uk). It’d be handy to know which courses were sending that traffic, so we’ll just bear that in mind as a question for a later post.
Where is Journal Traffic Coming From (network locations)?
To get a feel for how much of the traffic to the journals “homepage” is coming from on campus (presumably OU researchers?) we can segment the report for the journals homepage according to network location.
The open university network location corresponds to traffic coming in from an OU domain. This report potentially gives us the basis for an “actionable” report, and maybe even a target… That is, to increase the number of page views (if not the actual proportion of traffic from on campus – we may be wanting to grow absolute traffic numbers from the VLE too) from the OU domain, as a result of increasing the number of researchers looking up journals from the Library journals homepage whilst at work on campus.
At this point, it’s probably a good a time as any to start to think about how we might interpret data such ‘number of pages per visit’, ‘average time on site’ and bounce rate (see here for some definitions).
Just looking at the numbers going across the columns, we can see that there are different sorts of groupings of the numbers.
ip pools and open university have pages/visit around 12, an average time on site tending towards 4 minutes, about 16% new visits in the current period (down from 36% in the first period, so people keep coming back to the site, which is good, though it maybe means we’re not attracting so many new visitors), and a bounce rate a bit less than 60%, down from around 70% in the earlier period (so fewer people are entering at the journals page and then leaving the site immediately).
Compare this to addresses ip for home clients and greenwich university reports, where there are just over 1 page per visit, only a few seconds on site, hardly any new visits (which I don’t really understand?) and a very high bounce rate. These visitors are not getting any value from the site at all, and are maybe being misdirected to it? Whatever the case, their behaviour is very different to the open university visitors.
Now if I was minded to, I’d run this data through a multidimensional clustering algorithm to see if there were some well defined categories of user, but I’m not in a coding mood, so maybe we’ll just have a look to see what patterns are visually identifiable in the data.
So, taking the top 20 results from the most recent reporting period shown above, lets upload it to Many Eyes and have a play (you can find the data here).
First up, let’s see if we can spot trending in time on site and pages/visit (which is exactly what we’d expect, of course) (click through the image to see the interactive visualisation on Many Eyes):
Okay – so that looks about right; and the higher bounce rates seem to be correspond to low average time on site/low pages per visit, which is what we’d expect too. (Note that by hovering over a data point, we can see which network location the data corresponds to.)
We can also see how the scatterplot gives us a way of visualising 3 dimensions at the same time.
If we abuse the histogram visualisation, we have an easy way of looking at which network locations have a high bounce rate, or time on site (a visual equivalent of ‘sort on column’, I guess? ;-)
Finally, a treemap. Abusing this visualisation gives us a way of comparing two numerical dimensions at the same time.
Note that using network location here is not necessarily that interesting as a base category… I’m just getting my eye in as to what Many Eyes visualisations might be useful! For the really interesting insights, I reckon a grand or two per day, plus taxes and expenses, should cover it ;-) But to give a tease, here’s the raw data relating to the Source URLs for trafifc that made it to the Journals area:
Can you see any clusters in there?! ;-)
Okay – enough for now. Take homes are: a) the wrong tracking function is being used on the journals page; b) the VLE is providing a reasonable an amount of traffic to the journals area of the Library website, though I haven’t identified (yet!) exactly which courses are sourcing that traffic; c) Many Eyes style visualisations may provide a glanceable, visual view over some of the Google Analytics data.