Searching for Curriculum Development Course Insights

For almost as long as I can remember (?! e.g. Search Powered Predictions), I’ve had the gut feeling that one of the most useful indicators about the courses our students want to study is their search behaviour, both in terms of searches that drive (potential) students to the OU courses and qualifications website from organic search listings, as well as their search behaviour whilst on the OU site, and whilst floundering around within the courses and quals minisite.

A quick skim through our current strategic priorities doc (OU Futures 2008 (internal only), though you can get a flavour from the public site: Open University Strategic Priorities 2007) suggests that there is increased interest in making use of data, for example as demonstrated by the intention to develop a more systematic approach for new curriculum developments, such that the student market, demography and employment sectors are the primary considerations.

So, to give myself something to think about over the next few days/weeks, here’s a marker post about what a “course search insights” tool might offer, inspired in part by the Google Youtube Insights interface.

So, using Youtube Insight as a starting point, let’s see how far we can get…

First off, the atom is not a Youtube video, it’s a course, or to be more exact, a course page on the courses and quals website… Like this page for T320 Ebusiness technologies: foundations and practice for example. The ideas are these: what might an “Insight” report look like for a course page such as this, how might it be used to improve the discoverability of the page (and improve appropriate registration conversion rates), and how might search behaviour inform curriculum development?

Firstly, it might be handy to segment the audience reports into four:

  • people hitting the page from an organic search listing;
  • people hitting the page from an internal (OU search engine) search listing;
  • people hitting the page from an ‘organic’ link on a third party site (e.g. a link to the course page from someone’s blog);
  • people hitting the page from an external campaign/adword etc on a search engine;
  • people hitting the page from any other campaign (banner ads etc);
  • the rest…

For the purposes of this post, I’ll just focus on the first two, search related, referrers… (and maybe the third – ‘organic’ external links). What would be good to know, and how might it be useful?

First off, a summary report of the most popular search terms would be handy:

– The terms used in referrers coming from external organic search results give us some insight into the way that the search engines see the page – and may provide clues relating to how to optimise the page so as to ensure we’re getting the traffic we expect from the search engines.

– The terms used within the open.ac.uk search domain presumably come from (potential) students who have gone through at least one micro-conversion, in that they have reached, and stayed in, the OU domain. Given that we can (sometimes) identify whether users are current students (e.g. they may be logged in to the OU domain as a student) or new to the OU, there’s a possibility of segmenting here between the search terms used to find a page by current students, and new prospects.

(Just by the by, I emailed a load of OU course team chairs a month or two ago about what search terms they would expect potential students to use on Google (or on the OU search engine) to find their course page on the courses and quals site. I received exactly zero responses…)

The organic/third party incoming link traffic can also provide useful insight as to how courses are regarded from the insight – an analysis of link text, and maybe keyword analysis of the page containing the link – can provide us with clues about how other people are describing our courses (something which also feeds into the way that the search engines will rank our course pages; inlink/backlink analysis can further extend this approach.). I’m guessing there’s not a lot of backlinking out there yet (except maybe from professional societies?), but if and when we get an affiliate scheme going, this may be one to watch…?

So that’s one batch of stuff we can look at – search terms. What else?

As a distance learning organisation, the OU has a national reach (and strategically, international aspirations), so a course insight tool might also provide useful intelligence about the geographical location of users looking at a particular course. Above average numbers of people reading about a course from a particular geo-locale might provide evidence about the effectiveness of a local campaign, or even identify a local need for a particular course (such as the opening or closure of large employer).

The Youtube Insight reports shows how as the Google monster gets bigger, it knows more and more about us (I’m thinking of the Youtube Insight age demographic/gender report here). So providing insight about the gender split and age range of people viewing a course may be useful (we can find this information out for registered users – incoming users are rather harder to pin down…), and may provide further insight when these figures are compared to the demographics of people actually taking the course, particularly if the demographic of people who view a course on the course catalogue page differs markedly from the demographics of people who take the course…

(Notwithstanding the desire to be an “open” institution, I do sometimes wonder whether we should actually try to pitch different courses at particular demographics, but I’m probably not allowed to say things like that…;-)

As well as looking at search results that (appear) to provide satisfactory hits, it’s also worth looking at the internal searches that don’t get highly relevant results. These searches might indicate weak optimisation of pages – appropriate search terms donlt find appropriate course pages – or they might identify topics or courses that users are looking for that don’t exist in the current OU offerings. Once again, it’s probably worth segmenting these unfulfilled/unsatisfactory courses according to new prospects and current students (and maybe even going further, e.g. by trying to identify the intentions of current students by correlating their course history with their search behaviour, we may gain insight into emerging preferences relating to free choice courses within particular degree programmes).

To sum up… Search data is free, and may provide a degree of ‘at arms length’ insight about potential students before we know anything about them ‘officially’ by virtue of them registering with us, as well as insight relating to emerging interests that might help drive curriculum innovation. By looking at data analysis and insight tools that are already out there, we can start to dream about what course insight tools might look like, that can be used to mine the wealth of free search data that we can collect on a daily basis, and turn it into useful information that can help improve course discovery and conversion, and feed into curriculum development.

Library Analytics (Part 1)

Having had a wonderful time at ILI2007 last year (summary of my talk, according to Brian Kelly – “For most of the people, most of the time, Google’s good enough – get over it…”, though I like to think I was actually talking about the idea of search hubs), I’ve joined forces with Hassan Sheikh from the OU Library on a paper this year’s ILI2008 on the topic of using Google analytics to track user behaviour on the Library website…

First up, it’s probably worth pointing out the unique organisation of the OU, because this impacts on the way the Library website is used.

The OU is a distance learning organisation with tens of thousands active, offsite students; a campus, which is home to teaching academics (course writers), researchers, “academic related” services (software developers, etc.), and administrators; several regional offices; and part-time Associate Lecturers (group tutors), who typically work from home, although they may also work full- or part-time for other educational institutions.

The Library is a “trad” Library, in that it is home to books and a physical journal collection (as well as an OU course materials archive and several other collections) that are typically used by on-campus academics and researchers. The Library has also been quite go-ahead in obtaining online access to journal, ebook, image and reference collections – online access means that these services can be delivered to our student body (whereas the physical collections are used in the main by OU academic and research staff…. I assume…!;-)).

Anyway, to ease myself back into thinking about “Library Analytics”, (I haven’t looked at the Library stats for several months now), here are some warm-up exercises/starting point observations I made, for whatever they’re worth… (i.e. statements of the bleedin’ obvious;-)

Firstly, can we segment users into onsite and offsite users? (I’m pretty sure Hassan was running separate reports for these different gorups, but if he is, I don’t have access to them…)

Even from just the headline report, it appears that a ‘just about significant’ amount of traffic is coming from the intranet.

Just to get my eye in, is this traffic coming from the OU campus at Walton Hall? If we look at the intranet as the traffic source, and segment according to the Network Location of the user (that is, the IP network they’re on), we can see the traffic predominantly local:

By the by, if I’m reading the following report correctly, we can also see that most of the intranet traffic is incoming from the intranet homepage…

And as you might expect, this traffic comes on weekdays…

So here’s a working assumption then (and one that we could probe later for real insight in any principled cases where it doesn’t hold true!): most referrals from the OU intranet occur Monday to Friday, from onsite users, via the intranet homepage.

Secondly, how well is the Library front page working? Whilst not as quick to read as a heat map, the Google Analytics site overlay can provide a quick way way of summarising the most popular links on a page (notwithstanding it’s faults, such as appearing not to disambiguate certain links…)

A quick glimpse suggests the search links need dumping, and more real estate should be given over to the “Journals” and “Databases” links that are currently in the left hand sidebar, and which get 20% and 19% of the click-thrus respectively. Despite the large areas of the screen given over to the image-based navigation, they aren’t pulling much traffic. (That said, if we segment the users it might well be the case that the images in the middle of the page disproportionately attract clicks from certain sorts of user? I don’t think it’s possible to segment this out in the general report, however? For that, I guess we need to define some separate reports that are pre-segmented according to referrer?)

Just chasing the traffic a little more, I wonder if there are a few, popular databases or whether traffic is distributed over all of them equally? The Library databases page is pretty horrible – a long alphabetical list of databases – so can the analytics suggests ways of helping people find the pages they want?

So how are things distributed?

Well – it seems like some databases are more popular than others… but just how true is that observation…?

Let’s do a bit more drilling to see what people are clicking through to from the databases pages… I have to admit that here I start to get a bit confused, because the analytics are giving me two places where databases are being reached from, whereas I can only find one of the paths on the website…

Here’s the one I can find – traffic from:
http://library.open.ac.uk/find/databases/index.cfm:

And here’s what I can’t find on the website – traffic from:
http://library.open.ac.uk/databases/database/:

They both identify the same databases as most popular though, though which databases those are I’ll leave for another day…because as you’ll see in a minute, this might be false popularity…

Why? Well let’s just see where the traffic for one of the most popular databases is coming from over the sample period I’ve been playing with:

Any idea why the traffic isn’t coming from the OU, but is coming form other HEIs???

Well, I happen to know that Bath, Brighton and Durham are used for OU residentlal schools, so I suspect that residential school students, after a reminder about the OU online Library services, are having a play, and maybe even participating in some information literacy activities that the OU Library trainers (as well as some of the courses) run at residential school…

Data – don’t ya just love it…? ;-) It sets so many traps for you to fall into!

OU Library Jobs Round-Up (August 2008)

As I feel a flurry of Library related posts coming on, it’s perhaps appropriate to drop the following post in as something I can repeatedly link to over the next week or two (I live in hope that the OUseful.info blog will actually work one day as an OU job ad channel!) – a handful of OU Library jobs:

  • Access to Video Assets Project Manager, The Library and Learning Resource Centre: This is a superb opportunity to join a proactive world class Library service and provide leadership and excellent project management skills for an innovative digitisation project based at The Open University, Milton Keynes. The Access to Video Assets (AVA) project has been funded by The Open University to deliver a searchable collection of broadcast archives material for use in learning, teaching and research.
    You will report to the Learning Resources Development Manager based in The Open University Library and will be responsible for leading a small team consisting of the AVA Project Technical Manager and a Project Administrator to deliver the project’s objectives.
  • Access to Video Assets Technical Project Manager, The Library and Learning Resource Centre: This is a superb opportunity to join a proactive world class Library service and provide excellent technical project management skills for an innovative digitisation project based at The Open University, Milton Keynes. The Access to Video Assets (AVA) project has been funded by The Open University to deliver a searchable collection of broadcast archives material for use in learning, teaching and research.
    You will report directly to the Access to Video Assets Project Manager based in The Open University Library and be responsible for delivering a technical business case, a pilot repository for broadcast archive material and provide key input into a service implementation plan for this significant digitisation project.
  • Digital Libraries Programme Manager: Do you have the vision, creativity and project management skills to lead a programme of digital library developments for 2013?
    We are looking for a dynamic and highly motivated individual with an up to date knowledge of digital library technologies and their potential, rather than hands-on technical expertise, to manage the development of a range of new and exciting services for our students and staff. You will have excellent team working and communication skills and enjoy working with change and challenge.
  • E- Content Advisor, The Library and Learning Resource Centre: The Library’s use of electronic collections is expanding to meet the needs of students and staff for their learning, teaching and research. You will play a key role in developing operational support for the Library’s expanding range of subscriptions and electronic resources as well as co-ordinating the activities associated with purchasing and providing access to electronic content. You will provide support to the strategic development of electronic resources in line with the Library’s aims and objectives.
    A graduate in Librarianship/Information Studies or with equivalent relevant work experience, you will have good working knowledge of issues concerning the acquisition and delivery of electronic and print resources. You will also be able to demonstrate an understanding of technical issues concerning the delivery of electronic content to distance users. A flexible attitude, excellent communication skills, confidence and initiative with the ability to originate solutions are essential.

(I wonder if the video archiving project is DIVA, mark 2?!)

As ever, none of the above posts have anything to do with me…

Special Interest Custom Search Engines

A recent post by Downes (PubMed Now Indexes Videos of Experiments and Protocols in Life Sciences) reminded me of a Google custom search engine I started to put together almost a year or ago to provide a meta-search over science experiment protocols.

At the time, I managed to track three likely sites down, although despite my best intentions when I created the initial CSE, I haven’t managed even cursory maintenance of the site.

Anyway, for what it’s worth, here’s a link to my Science Experimental Protocols Video Search (a search for DNA will show you what sorts of results are typical). If you know of any other sites that publish scientific experimental protocols, please fee free to post a link in the comments to the post.

Another custom search engine I started looking at at the start of this year, inspired by a conversation with a solicitor friend over New Year, was a search of UK (English and Scottish) legislation. The intention here was to come up with a CSE that could provide a value adding vertical search site to a legal website. If i remember correctly (?!;-) the CSE only took an hour or so pull together, so even though we never pursued embedding it on live website, it wasn’t really that much time to take out…

If you want to check it out, you can find it here: LegalDemo.

One CSE I do maintain is “How Do I?”, a metasearch engine over instructional video websites. There are almost as many aggregating websites of this ilk as there are sites publishing original instructional content, but again, it didn’t take long to pull together, and it’s easy enough to maintain. You can find the search engine here: “How Do I?” instructional video metasearch engine, and a brief description of its origins here: “How Do I…” – Instructional Video Search.

Another 10 minute CSE I created, this time following a comment over a pint about the “official” OpenLearn search engine, was an OpenLearn Demo CSE (as described here: OpenLearn Custom Search).

And finally (and ignoring other the other half-baked CSEs I occasionally dabble with), there’s the CSE I’ve been doodling with most recently: the OUseful search engine (I need to get that sorted on a better URL..). This CSE searches over the various blogs I’ve written in the past, and write on at the moment. If you want to search over posts from the original incarnation of OUseful.info, this is one place to do it…

Just looking back over the above CSEs, I wonder again about who’s job it is (if anyone’s), to pull together and maintain vertical search engines in an academic environment, or show students how they can crate their own custom search engines? (And one level down from that, who’s role is it to lead the teaching of the “search query formulation” information skill?)

In the OU at least, the Library info skills unit have been instrumental in engaging with course teams to develop information literacy skills, as well as leading the roll out of Beyond Google… but I have to admit, I do wonder just how well equipped they are to helping users create linked data queries, SPARQL queries, or SQL database queries containing a handful of joins? (I also wonder where we’re teaching people how to create pivot tables, and the benefits of them…?!)

Thinking about advanced queries, and the sighs that go up when we talk about how difficult it is to persuade searchers to use more than two or three keyword search terms, I’ve also been wondering what the next step in query complexity is likely to be after the advanced search query. And it strikes me that the linked data query is possibly that next step?

Having introduced the Parallax Freebase interface to several people over the last week, it struck me that actually getting the most out of that sort of interface (even were Freebase populated enough for more than a tiny minority of linked queries to actually work together) is not likely to be the easiest of jobs, particularly when you bear in mind that it’s only a minority of people who know how to even conceptualise advanced search queries, let alone know how to construct them at a syntactic level, or even via a web form.

The flip side to helping users create queries is of course helping make information amenable to discovery by search, as Lorcan Dempsey picks up on in SEO is part of our business. Here again we have maybe another emerging role for …. I don’t know…? The library? And if not the library, then whom?

(See also: The Library Flip, where I idly wondered whether the academic library of the future-now should act “so as to raise the profile of information it would traditionally have served, within the search engine listings and at locations where the users actually are. In an academic setting, this might even take the form of helping to enhance the reputation of the IP produced by the institution and make it discoverable by third parties using public web search engines, which in turn would make it easy for our students to discover OU Library sponsored resources using those very same tools.”)

PS Just a quick reminder that there are several OU Library job vacancies open at the moment. You can check them out here: OU Library Jobs Round-Up (August 2008).

Library Analytics (Part 2)

In Library Analytics (Part 1), I did a few “warm-up” exercises looking at the OU LIbrary website from a Google Aanlytics perspective.

In this post, I’m going to do a little more scene setting, looking at how some of Google Analytics visual reports can provide a quick’n’dirty, glanceable display of the most heavily trafficked areas of the Library website, along with the most significant sources of traffic.

It may seem like the observations are coming from all over the place, but there is a method to the madness as I’ll hopefully get round to describing in a later post!

Whilst these reports are pretty crude, they do provide a good starting point for taking the first steps in a series of more refined questions which, in turn, will hopefully start to lead us towards some sort of insight about which areas of the website are serving which users and maybe even for what purpose… And as that rather clunky sentence suggests, this is likely to be quite a long journey, with the likelihood of more than a few wrong turns!

Most Popular Pages
Here’s a glimpse of the most heavily trafficked pages for the Library website – just to check there are no ‘artefacts’ arising from things like residential schools, I’ve compared the data for two consecutive two month periods (the idea being that if the bimonthly averages are similar, we can hope that this is a reasonably fair ‘steady state’ report of the state of the site).

Most significant pages (“by eye” – that is, using the pie chart display):

To view the proportions excluding the homepage, we can filter the report using a regular expression:

What this does is exclude the “/” page – that is, the library homepage. (IMHO, some understanding of regular expressions is a core information skill ;-)

A bar graph allows us to compare the bimonthly figures – they seem to be reasonably correlated (we could of course do a rank correlation, or similar, to see if the top pages ordering is really the same…):

So to summarise – the top pages (homepage aside) are (from the URLs):

  • Journals
  • Databases
  • eResources

The eResources URL actually refers to the subject collection (“Online collections by subject”) page.

The top three pages are all linked to from the same navigation area on the OU Library website homepage – the left-hand navigation sidebar:

The eResources link (that is, the subject collections/online collections by subject page) is actually the Your subject link.

Going forward, a good thing to find out at the next level down would be to see which are the most popular databases, journals and resource collections and maybe check that these are in line with Library expectations.

We might also want to explore the extent to which different user segments (students, researchers etc.) use the different areas of the site in similar or different ways. (Going deeper into the analysis (i.e. to a deeper level of user segmentation), we might even want to track the behaviour of students on different courses (or residential school maybe?) and report these findings back to the appropriate course team.)

Top Content areas
The previous report gave the top page views on the site – but what are the most heavily used “content areas”? The Library site is, in places, reasonably disciplined in its use of a hierarchical URL structure, so by the using the content drilldown tool, we should be able to see which are the most heavily used areas of the website:

The “/find” page/path element is a bit of kludge, really, (as a note to self: explore the use of this page in some detail…)

If we drill down into the content being hit below http://www.open.ac.uk/find/*, we find that the eresources area (i.e. subject collections/Your subject) is actually a hotbed of activity:

(Note that in the above figure, the “/” refers to the “/find homepage” http://www.open.ac.uk/find/ rather than Library website homepage http://www.open.ac.uk/.)

So what can we say? The front page is driving lots of traffic to database, journal and subject collection/”Your subject” areas, and lots of activity is going on in the subject area in particular.

Questions we might want to bear in mind going forward – how well does activity in different subject areas compare?

Top traffic sources
Again using pie chart display, we can looking at the top traffic sources by eye:

Again, let’s just check (by eye) that the bi-monthly reports are “typical”:

(It’s interesting to see the College of Law cropping up in there… Do we run a course from a learning environment on that domain, I wonder?)

learn.open.ac.uk is the Moodle/VLE domain, so it certainly seems like traffic is coming in from there, which is a Good Thing:-). From the previous post, we can guess that most of the intranet traffic is coming from people onsite at the OU – i.e. they’re staff or researchers.

Just to check it’s the students that are coming in from the VLE, rather than OU staff, we can use the technique used in the previous post in this series, (where we found that most intranet sourced traffic is coming from the OU campus) to check the Network Location view of users referred from learn.open.ac.uk:

So, we can see that the learn.open.ac.uk traffic is in the main not coming from the OU campus (network location: open university), which is as we’d expect, because we have no significant numbers of onsite undergraduate students.

In a traditional university library, you;d maybe expect way more traffic to be coming from onsite computer facilities, and in that case you may be able to find a way of segmenting users according to how they are accessing the network – via personal wifi connected laptops, for example, or public access machines in the library itself.

(Just by the by, I don’t know whether the ISP data is valuable (particularly if you look at analytics from the http://www.open.ac.uk domain, which gets way more traffic than the library) in terms of being information we can sell to ISPs or use as the basis for exploring a partnership with a particular ISP?)

Okay, that’s enough for today, a bit of a ramble again, but we’re trying to get our eye in, right, and see what sorts of questions we might be able to ask, whilst checking along the way that the bleedin’ obvious actually is…;-)

And today’s insight? The inconsistency in naming around “Your Subject”, “Online Collections by Subject”, http://library.open.ac.uk/find/eresources etc makes reading the report tricky. This could be addressed by using a filter to rewrite the URLs etc as displayed in the report, but it also indicates possible confusion for users in the site design itself? There’s also a recurrence of the potential confusion around http://library.open.ac.uk/databases and http://library.open.ac.uk/find/databases, that I picked up on in the previous post?

A second insight? The content drilling view helps show where most of the onsite activity is taking place – in the collections by subject area.

Library Analytics (Part 3)

In this third post of an open-ended series looking at the OU Library website under Google analytics, I’ll pick out some ‘headline’ reports that describe the most popular items in one of the most popular content areas identified in Library Analytics (Part 2): databases.

Most Popular Databases
I can imagine that a headline report that everyone will go “ooh” about (notwithstanding the fact that the report is more likely to be properly interesting when you start to segment out the possibly different databases being looked at by different user segments;-) is the list of “top databases” (produced by filtering the top content report page on URLs that contain the term “database”)

So how do we work out what those database URLs actually point to? Looking at the HTML of the http://library.open.ac.uk/find/databases page, here’s where the reference to the most popular database crops up:

<a title=”This link opens up a new browser window” target=”_blank” href=”/find/databases/linking.cfm?id=337296″ onClick=”javascript:pageTracker._trackPageview(‘/databases/database/337296’);”>LexisNexis Butterworths</a>

The implied URL http://ouseful.open.ac.uk/databases/database/337296 doesn’t actually go anywhere real… it’s an artefact created for the analytics tracking (though it does contain the all important internal OU Library database ID (337296 in this case)).

It is possible to create a set of ‘rewrite’ rules that will map these numerical database IDs onto the name of the database. Alternatively, I’m guessing that when the database collection page is written, the HTML could track against dtabase name, rather than ID (e.g. using a construction along the lines of onClick=”javascript:pageTracker._trackPageview(‘Database: LexisNexis Butterworths’);”).

For now though, here’s a quick summary of the top 5 databases, worked out by code inspection!

  1. LexisNexis Butterworths (337296);
  2. JSTOR (271892);
  3. Westlaw (338947)
  4. PsycINFO (208607)
  5. Academic Search Complete (403673)

Just to show you what I mean by things being more interesting when you start to segment the most popular databases by identifiable, here’s a comparison of the referral source for users looking at Academic Search Complete (403673), PsycINFO (208607) and Westlaw (338947).

Firstly, Academic Search Complete:

In this case, there is a large amount of traffic coming from the intranet. Bearing in mind a comment on the first post, this traffic may be coming from personal bookmarks?

I may be in the error bar (i.e. outlier), but I do almost all my research / library work at home – but I log into the OU and go onto the library via the “my links” bit set to the OU journals and OU databases www page. So that would show as in intranet user? but I work remotely.

I could be wrong of course – so that’s one question to file away for a later day…

Secondly, PsycINFO (208607), the Content Detail report for which is easily enough found by searching on the Content Detail report page:

Here’s the source of traffic that spends some time looking at PsycINFO:

Here, we find a different effect. Most of the identifiable traffic is coming from direct links or the VLE, and the intranet is nowhere to be seen.

Note however the large amount of direct/unidentifiable traffic – this could hide a multitude of sins (and mask a multitude of user origins), so we should just remain wary and open to the idea we may have been misled!

So how can we try to gain an insight into that direct referral traffic (the traffic that arises from people typing the URL directly into their browser, or clicking on a browser bookmark)?

Well, to check that the traffic isn’t coming from direct traffic/bookmarks from users on the OU network other than via the intranet, we can look at the Network Location segment:

No sign of open university there in any significant numbers – so it seems that PsycINFO is more of a student resource than an onsite researcher resource.

Thirdly, Westlaw (338947). Who’s using this database?

It seems here that the majority single referrer is actually the College of Law.

We can segment against network location just to check the direct traffic isn’t coming from users on campus via browser bookmarks:

But some of it is coming from the College of Law? Hmmm.. Could that be a VPN thing, I wonder, or do they have an actual physical location?

Summary
So what insight(s) have we picked up in this post? Firstly, a dodgy ranking of most popular databases (dodgy in that the databases appear to be used by different constituencies of user). Secondly, a crude technique for getting a feel for who the users of a particular database are, based on original source/referrer and network location segmentations.

I guess there’s also a recommendation – that the buyer or owner of each database checks out the analytics to see if the users appear to be who they expect…!

And finally, to wrap this part up, it’s worth being sceptical no matter what precautions you put in place when trying to interpret the results; for example: How Does Google Analytics Track Conversion Referrals?.

Library Analytics (Part 4)

One of the things I fondly remember about doing physics at school was being told, at the start of each school year, how what we had been told the previous year was not quite exactly true, and that this year we would actually learn how the world worked properly

And so as this series of posts of about “Library Analytics” continues (that is, this series about the use of web analytics as applied to public Library websites), I will continue to show you examples of headline reports I have found initially compelling (or not), and then show why they are not quite right, and actually confusing at best, or misleading at worst…

Most Popular Journals
In the previous post in this series, we saw the most popular databases that were being viewed from the databases page. Is the same possible from the journals area? A quick look at the report for the find/journals/journals page suggests that such a report should be possible, but something is broken:

From the small amount of data there, the most popular journals/journal collections were as follows:

  1. JSTOR (271892);
  2. Academic Search Complete (403673);
  3. Blackwell Synergy (252307);
  4. ACM Digital Library (208448);
  5. IEEE Xplore (208545);

As with the databases, segmenting the traffic visiting these collections may provide insight as to which category of user (researcher, student) and maybe even which course is most active for a particular collection.

But what happened to the reporting anyway? Where has all the missing data gone?

I just had a quick look – the reporting from within the Journals area doesn’t currently appear to be showing anything…. err, oops?

Looking at the javascript code on each journal link:
onClick=”javascript:urchinTracker(‘/find/journals/journals/323808’)”
it’s not the same structure as the working code on the databases pages (which you may recall from the previous post in this series uses the tracking function pageTracker._trackPageview).

Looking at which tracking script is being used on the journals page (google-analytics.com/ga.js), I think the pageTracker._trackPageview function should be being used. urchinTracker is a function from an old tracking function. Oops… I wonder whether anyone has been (not) looking at Journal use indicators lately (or indeed, ever…?!)

Where is Journal Traffic Coming From (source location)?
So what sites are referring traffic to the journals area?

Well it looks as if there’s a lot of direct traffic coming in (so it may be worth looking at the network location report to see if we can tunnel into that), but there’s also a good chunk of traffic coming from the VLE (learn.open.ac.uk). It’d be handy to know which courses were sending that traffic, so we’ll just bear that in mind as a question for a later post.

Where is Journal Traffic Coming From (network locations)?
To get a feel for how much of the traffic to the journals “homepage” is coming from on campus (presumably OU researchers?) we can segment the report for the journals homepage according to network location.

The open university network location corresponds to traffic coming in from an OU domain. This report potentially gives us the basis for an “actionable” report, and maybe even a target… That is, to increase the number of page views (if not the actual proportion of traffic from on campus – we may be wanting to grow absolute traffic numbers from the VLE too) from the OU domain, as a result of increasing the number of researchers looking up journals from the Library journals homepage whilst at work on campus.

At this point, it’s probably a good a time as any to start to think about how we might interpret data such ‘number of pages per visit’, ‘average time on site’ and bounce rate (see here for some definitions).

Just looking at the numbers going across the columns, we can see that there are different sorts of groupings of the numbers.

ip pools and open university have pages/visit around 12, an average time on site tending towards 4 minutes, about 16% new visits in the current period (down from 36% in the first period, so people keep coming back to the site, which is good, though it maybe means we’re not attracting so many new visitors), and a bounce rate a bit less than 60%, down from around 70% in the earlier period (so fewer people are entering at the journals page and then leaving the site immediately).

Compare this to addresses ip for home clients and greenwich university reports, where there are just over 1 page per visit, only a few seconds on site, hardly any new visits (which I don’t really understand?) and a very high bounce rate. These visitors are not getting any value from the site at all, and are maybe being misdirected to it? Whatever the case, their behaviour is very different to the open university visitors.

Now if I was minded to, I’d run this data through a multidimensional clustering algorithm to see if there were some well defined categories of user, but I’m not in a coding mood, so maybe we’ll just have a look to see what patterns are visually identifiable in the data.

So, taking the top 20 results from the most recent reporting period shown above, lets upload it to Many Eyes and have a play (you can find the data here).

First up, let’s see if we can spot trending in time on site and pages/visit (which is exactly what we’d expect, of course) (click through the image to see the interactive visualisation on Many Eyes):

Okay – so that looks about right; and the higher bounce rates seem to be correspond to low average time on site/low pages per visit, which is what we’d expect too. (Note that by hovering over a data point, we can see which network location the data corresponds to.)

We can also see how the scatterplot gives us a way of visualising 3 dimensions at the same time.

If we abuse the histogram visualisation, we have an easy way of looking at which network locations have a high bounce rate, or time on site (a visual equivalent of ‘sort on column’, I guess? ;-)

Finally, a treemap. Abusing this visualisation gives us a way of comparing two numerical dimensions at the same time.

Note that using network location here is not necessarily that interesting as a base category… I’m just getting my eye in as to what Many Eyes visualisations might be useful! For the really interesting insights, I reckon a grand or two per day, plus taxes and expenses, should cover it ;-) But to give a tease, here’s the raw data relating to the Source URLs for trafifc that made it to the Journals area:

Can you see any clusters in there?! ;-)

Summary
Okay – enough for now. Take homes are: a) the wrong tracking function is being used on the journals page; b) the VLE is providing a reasonable an amount of traffic to the journals area of the Library website, though I haven’t identified (yet!) exactly which courses are sourcing that traffic; c) Many Eyes style visualisations may provide a glanceable, visual view over some of the Google Analytics data.

More Olympics Medal Table Visualisations

So the Olympics is over, and now’s the time to start exploring various views over the data tables in a leisurely way:-)

A quick scout around shows that the New York Times (of course) have an interactive view of the medals table, also showing a historical dimension:

Channel 4’s interactive table explores medal table ‘normalisation’ according to population, GDP and so on…

GDP and population data have also been taking into account in a couple of visualisations created on Many Eyes – like this one:

Not wanting to not be part of the fun, I spent a bit of time this evening scraping data from the Overall medal standing table and popping it into Many Eyes myself.

(Note that there’s lots of mashable stuff – and some nice URLs – on the http://en.beijing2008.cn/ website… why, oh, why didn’t I think to have a play with it over the last couple of weeks?:-(

Anyway, I’ve uploaded the results, by discipline, for the Olympics 2008 Medal Table (Top 10, by Tally) and had a quick play to see what sort views might be useful in visualising the wealth of information the data contains.

First up, here are the disciplines that the top 10 countries (by medal tally) were excelling at:

Treemaps are one of my favourite visualisation tools. The Many Eyes treemap, whilst not allowing much control over colour palettes, does make it easy to reorder the order of the hierarchy used for the treemap.

Here’s a view by discipline, then country, that allow you to see the relative number of medals awarded by discipline, and the countries that ‘medalled’ within them:

Rearranging the view, we can see how well each country fared in terms of total medal haul, as well as the number of medals in each medal class.

The search tool makes it easy to see medals awarded in a particular discipline by country and medal class – so for example, here’s where the swimming medals went:

A network diagram view lets us see (sort of) another view of the disciplines that each country took medals in.

The matrix chart is more familiar, and shows relative medal hauls for gold, silver and bronze, by country.

By changing the colour display to show the disciplines medals were awarded in, we can see which of the countries won swimming medals, for example.

Enough for now… the data‘s on the Many Eye’s site if you want to create your own visualisations with it… You should be able to reduce the data (e.g. by creating copies of the data set with particular columns omitted) to produce simpler visualisations (e.g. simpler treemaps).

You can also take a copy of the data to use in your own data sets, (e.g. normalising it by GDP, population, etc, etc.)

If you do create any derived visualisations, please post a link back as a comment to this post :-)

What’s On Open2…

Chatting with Stuart over a pint last week, he mentioned that the Open2 folks had started publishing a programme announcement feed on Twitter that lets you know when a TV programme the OU’s been involved with is about to be shown on one of the BBC channels: open2 programme announcements on Twitter.

By subscribing to the RSS feed from the Open2 twitter account, it’s easy enough to get yourself an alert for upcoming BBC/OU programmes.

The link goes through to the programme page on the open2 website, which is probably a Good Thing, but it strikes me that there’s no obvious way to watch the programme from the Open2 page?

That is, there’s no link to an iplayer or BBC programmes view, such as BBC Programmes > Coast:

If I’m reading the BBC Programmes Developers’ Guide correctly, not all the URL goodness has been switched on for these URLs yet? For example, here’s the guidance:

Programmes

/programmes/:groupPID/episodes/upcoming
/programmes/:groupPID/episodes/upcoming/debut
/programmes/:groupPID/episodes/player

To access these add .xml, .json or .yaml to the end of the url.

Whilst http://www.bbc.co.uk/programmes/b006mvlc works as I expect, http://www.bbc.co.uk/programmes/b006mvlc/episodes requires a branch into a year – http://www.bbc.co.uk/programmes/b006mvlc/episodes/2008, and I can’t get the upcoming or format extensions to work at all?

As well as the BBC Programmes page, we can also find iPlayer links from a search on the iPlayer site: Search for “coast” on iPlayer:

Going back to the twitter feed, I wonder whether there’s any point in having a second twitter account that alerts people as to when a programme is available on iplayer? A second alert could give you a day’s notice that a programme is about to disappear from iPlayer?

Just by the by, here are a couple more BBC related things I spotted over the last few days: BBC Top Gear channel on Youtube; and BBC’s Tomorrow’s World to be revived (Telegraph).

Now if the “popular science magazine show” referred to is the one that was mentioned at the BBC/OU science programming brainstorming session I posted about a couple of weeks ago, I’m pretty sure the producer said it wasn’t going to be like Tomorrow’s World… Which I guess means it is – in that it is going to be like Tomorrow’s World in terms of positioning and format, but it isn’t going to be exactly like it in terms of content and delivery… (I have to admit that I got the impression is was going to be more like *** **** for Science… ;-)

Library Analytics (Part 5)

Another day, another Library Analytics post… Today, a quick glimpse at another popular content area on the OU Library website, the “Subject Resource Collections” that dangle off http://library.open.ac.uk/find/eresources/.

Most Popular Subject Resource Collections
The distribution of visits to subject resource collections is pretty flat, as the following report shows:

That said, the most popular categories are:

  1. the law/law collection:

  2. the Law_legislation page:

  3. the Psychology collection;
  4. the Education collection;
  5. the Science – General collection.

Thinking back to the previous post in this series, and the example of using Many Eyes to visualise multiple data dimensions at the same time, a similar technique might be useful here just to check that each resource is attracting similar usage stats in respect of time on site, average pages per visit, bounce rates, etc.?

Just by the by, if we look at the Entrance Source for traffic that ends up on the selector page for Psychology eresources, we can see that most of the traffic is coming in from the VLE.

The College of Law appears to be providing most of the Law/Law traffic though…

Going forwards, it would probably be useful for the collections whose traffic sourced from the VLE to try to identify which courses were providing that traffic. This information might then provide the basis for “KPIs” relating to the performance of particular Library resources on a particular course.

Onsite Search Behavior
One of the optional reports on Google Analytics (that is, one that needs to be enabled) is tracking of onsite search behaviour using the website’s own search tool. Popular search terms identified by this report may well indicate failures in support for navigation-through-browsing – in the case of the OU Library site, it seems that information about “Athens” isn’t the easiest thing to find just by clicking…

The following report is particularly interesting from a trends point of view:

The step change at the end of March, with the higher incidence of internal search terms prior to then, suggests a change in user behaviour (given that all the other reports have been showing pretty steady traffic numbers over the whole period). I’m guessing – and this is checkable – that there was a Library website redesign at the end of March, although step changes (particularly in the case of users segmented by course, if such a thing were possible) might also be indicative of participation in scheduled Library related activities within a course in presentation. I’ll try to post a bit more about that at a later date…)

Another informative report describes the proportion of visits in which the user engages in onsite search. Users tend to navigate websites either by browsing (clicking on links) or by search. A high incidence of serch may indicate weaknesses in navigation design via clickable links. So how does the Library website appear do?

Well – it seems that users are clicking their way to pages rather than searching for them… (though this may in turn reflect issues with discovery and design of the search page…!)

The Help Page
Another source of information about how well the site is working for visitors is to look at usage around the Help page. I’m not going to go into this page in any depth, but here’s an inkling of what sorts of information we might be able to extract from it…

Who’s looking at how to cite a reference?

Seems like Google traffic is high up here? So maybe another role for the Library website is outreach, in the sense of informal education? And maybe the “How to cite a reference” page would be a good place to place a link to the free Safari info skills minicourse, and an ad for TU120 Beyond Google? ;-)