One of the things I fondly remember about doing physics at school was being told, at the start of each school year, how what we had been told the previous year was not quite exactly true, and that this year we would actually learn how the world worked properly…
And so as this series of posts of about “Library Analytics” continues (that is, this series about the use of web analytics as applied to public Library websites), I will continue to show you examples of headline reports I have found initially compelling (or not), and then show why they are not quite right, and actually confusing at best, or misleading at worst…
Most Popular Journals
In the previous post in this series, we saw the most popular databases that were being viewed from the databases page. Is the same possible from the journals area? A quick look at the report for the find/journals/journals page suggests that such a report should be possible, but something is broken:
From the small amount of data there, the most popular journals/journal collections were as follows:
- JSTOR (271892);
- Academic Search Complete (403673);
- Blackwell Synergy (252307);
- ACM Digital Library (208448);
- IEEE Xplore (208545);
As with the databases, segmenting the traffic visiting these collections may provide insight as to which category of user (researcher, student) and maybe even which course is most active for a particular collection.
But what happened to the reporting anyway? Where has all the missing data gone?
I just had a quick look – the reporting from within the Journals area doesn’t currently appear to be showing anything…. err, oops?
it’s not the same structure as the working code on the databases pages (which you may recall from the previous post in this series uses the tracking function pageTracker._trackPageview).
Looking at which tracking script is being used on the journals page (google-analytics.com/ga.js), I think the pageTracker._trackPageview function should be being used. urchinTracker is a function from an old tracking function. Oops… I wonder whether anyone has been (not) looking at Journal use indicators lately (or indeed, ever…?!)
Where is Journal Traffic Coming From (source location)?
So what sites are referring traffic to the journals area?
Well it looks as if there’s a lot of direct traffic coming in (so it may be worth looking at the network location report to see if we can tunnel into that), but there’s also a good chunk of traffic coming from the VLE (learn.open.ac.uk). It’d be handy to know which courses were sending that traffic, so we’ll just bear that in mind as a question for a later post.
Where is Journal Traffic Coming From (network locations)?
To get a feel for how much of the traffic to the journals “homepage” is coming from on campus (presumably OU researchers?) we can segment the report for the journals homepage according to network location.
The open university network location corresponds to traffic coming in from an OU domain. This report potentially gives us the basis for an “actionable” report, and maybe even a target… That is, to increase the number of page views (if not the actual proportion of traffic from on campus – we may be wanting to grow absolute traffic numbers from the VLE too) from the OU domain, as a result of increasing the number of researchers looking up journals from the Library journals homepage whilst at work on campus.
At this point, it’s probably a good a time as any to start to think about how we might interpret data such ‘number of pages per visit’, ‘average time on site’ and bounce rate (see here for some definitions).
Just looking at the numbers going across the columns, we can see that there are different sorts of groupings of the numbers.
ip pools and open university have pages/visit around 12, an average time on site tending towards 4 minutes, about 16% new visits in the current period (down from 36% in the first period, so people keep coming back to the site, which is good, though it maybe means we’re not attracting so many new visitors), and a bounce rate a bit less than 60%, down from around 70% in the earlier period (so fewer people are entering at the journals page and then leaving the site immediately).
Compare this to addresses ip for home clients and greenwich university reports, where there are just over 1 page per visit, only a few seconds on site, hardly any new visits (which I don’t really understand?) and a very high bounce rate. These visitors are not getting any value from the site at all, and are maybe being misdirected to it? Whatever the case, their behaviour is very different to the open university visitors.
Now if I was minded to, I’d run this data through a multidimensional clustering algorithm to see if there were some well defined categories of user, but I’m not in a coding mood, so maybe we’ll just have a look to see what patterns are visually identifiable in the data.
So, taking the top 20 results from the most recent reporting period shown above, lets upload it to Many Eyes and have a play (you can find the data here).
First up, let’s see if we can spot trending in time on site and pages/visit (which is exactly what we’d expect, of course) (click through the image to see the interactive visualisation on Many Eyes):
Okay – so that looks about right; and the higher bounce rates seem to be correspond to low average time on site/low pages per visit, which is what we’d expect too. (Note that by hovering over a data point, we can see which network location the data corresponds to.)
We can also see how the scatterplot gives us a way of visualising 3 dimensions at the same time.
If we abuse the histogram visualisation, we have an easy way of looking at which network locations have a high bounce rate, or time on site (a visual equivalent of ‘sort on column’, I guess? ;-)
Finally, a treemap. Abusing this visualisation gives us a way of comparing two numerical dimensions at the same time.
Note that using network location here is not necessarily that interesting as a base category… I’m just getting my eye in as to what Many Eyes visualisations might be useful! For the really interesting insights, I reckon a grand or two per day, plus taxes and expenses, should cover it ;-) But to give a tease, here’s the raw data relating to the Source URLs for trafifc that made it to the Journals area:
Can you see any clusters in there?! ;-)
Okay – enough for now. Take homes are: a) the wrong tracking function is being used on the journals page; b) the VLE is providing a reasonable an amount of traffic to the journals area of the Library website, though I haven’t identified (yet!) exactly which courses are sourcing that traffic; c) Many Eyes style visualisations may provide a glanceable, visual view over some of the Google Analytics data.
5 thoughts on “Library Analytics (Part 4)”
It might be interesting to compare database and journal usage as shown from the Google Analytics work with the figures from the data providers to see if the most popular links followed from the library pages correspond to the most used resources based on search/download stats?
“It might be interesting to compare database and journal usage as shown from the Google Analytics work with the figures from the data providers to see if the most popular links followed from the library pages correspond to the most used resources based on search/download stats?”
Yes – agreed; I want to see if we can start to cross-reference different reports, such as Library website via GA, course websites via GA, reports back to the Library from commercial services.
I’m also looking for other people who are using Google Analytics in a Library or course/VLE context, to see if we can come up with some sensible, grounded reports with measures that mean something and are actionable in some way… and also swap stories about how to get ‘the management” to value appropriate analytics reports rather than discount them or be afraid of them…
Here you find some more of the perils of data. Firstly, there are users linking directly to resources from course websites, thereby using library resources without going via the library website. The bigger problem though is that there are lots of publishers measuring their own data in their own ways. Claire Grace has all the stats that are relevant in this area, but I think the picture is only partial.
“but I think the picture is only partial.”
I agree; so the question here is – is a partial picture better than none? Or do we need to look at the different partial pictures to get a better feel for what’s going on, and look for ways we can find better ways of tracking users?
The Library website stats don’t tell you much about how much activity is going on on the commercial databases, unless we try to reconcile data back by passing tracking codes into those databases (or looking at timestamps on logs; or giving users browser extensions to track their behaviour across multiple sites).
But the Library stats do tell you how the Library website is being used…
I have asked the Library in the past whether they would ideally educate people in the discovery and use of external sources so well that people only ever need to visit the Library website once; or do they want people to keep going back to the Library website to access resources from there? But I couldn’t really care less.. it’s not my site ;-)
That is, do we want the Library site to be a destination site, or used as stepping stone once and then it’s done it’s job.
If I know I want to search for science articles, but don’t know how, maybe the ideal is that my first thought is to go to the Library; and once I’ve found ScienceDirect or whoever, maybe the ideal is that I never have to go to the Library website again to do that sort of search – because I know where to go direct?
The ultimate success then is only having new users on the Library site, an very few returning visitors…! (In this case, maybe the Library needs to make gadgets/widgets/bookmarks/bookmarklets available that make it easier for people to go direct (maybe with a bit of tracking added in there…)?
Comments are closed.