Contextual Content Server, Courtesy of Google?

Earlier this week, Google announced Ad serving for everyone, the opening up of their Ad Manager tool to all comers.

Ad Manager can help you sell, schedule, deliver, and measure both directly-sold and network-based inventory.

# Ad network management: Easily manage your third-party ad networks in Ad Manager to automatically maximize your network driven revenue.

# Day and Time Targeting: Don’t want your orders to run on weekends? No problem. With day and time targeting, you can set any new line items you create to run only during specific hours or days, or as little as 15 minutes per week. Use day and time targeting in addition to geography, bandwidth, browser, user language, operating system, domain and custom targeting.

There’s an excellent video overview and basic tutorial here: Google Ad Manager Introduction. (If that doesn’t work for you, here’s a text based tutorial from somewhere else….)

In part, the Ad Manager allows you to use your own ads with Google’s ad serving technology, which can deliver ads according to:
* Browser version
* Browser language
* Bandwidth
* Day and time
* Geography (Country, region or state, metro, and city)
* Operating system
* User domain

If you can provide custom tagging information (e.g. by adding information from a personal profile into the ad code on the page displayed to the user) then the Ad Manager can also be used to provide custom targeting according to the tags you have available.

So here’s what I’m thinking – can we use the Google Ad Manager service to deliver contextualised content to users? That is, create “ad” areas on a page, and deliver our own “content ads” to it through the Google Ad Manager.

So for example, we could have a contentAd sidebar widget on a Moodle VLE page; we could add a custom tag into the widget relating to a particular course; and we could serve course related “ad” content through the Ad Manager.

By running the content of a page through a content analyser (such as Open Calais, which now offers RESTful calls via HTTP POST), or looking on a site such as delicious to see what the page has been tagged with, we can generate ‘contextual tags’ to further customise the content delivery.

So what? So think of small chunks of content as “contentAds”, and use the Google Ad Manager to serve that content in a segmented, context specific way to your users… ;-)

Library Analytics (Part 6)

In this post, I’m going to have a quick look at some filtered reports I set up a few days ago to see if they are working as I expected.

What do I mean by this? Well, Google Analytics lets you create filters that can be used to create reports for a site that focus on a particular area of the website or user segment.

At their simplest, filters work in one of two main ways (or a combination of both). Firstly, you can filter the report so that it only covers activity on a subset of the website as a whole (such as all pages along the path http://library.open.ac.uk/find/databases). Secondly, you can filter the report so that it only covers traffic that is segmented according to user characteristics (such as users arriving from a particular referral source).

Here are a couple of examples: firstly, a filter that will just report on traffic that has been referred from the VLE:

Using this filter will allow us to create a report that tracks how the Library website is being used by OU students.

Another filter in a similar vein lets us track just the traffic arriving from the College of Law:

A second type of filter allows us to just provide a report on activity within the eresources area of the Library website:

Note that multiple filters can be applied to a single report profile, so I could for example create a report profile that just looked at activity in the Journals area of the website (by applying a subdirectory filter) that came from users on the OU campus (by also applying a user segment filter).

So how does this help?

If we assume there are several different user types on the Library website (students, onsite researchers, students on partner courses (such as with the College of Law), users arriving from blond Google searches, and so on), then we can use filters to create a set of reports, each one covering a different user segment. Adding all the separate reports together would give us the “total” website report that I was using in the first five posts in this series. Looking at each report separately allows us to understand the different needs and behaviours of the different user types.

Although it is possible to segment reports from the whole site report, as I have shown previously, segmenting the report ‘on the way in’ through the application of one or more filters allows you to use the whole raft of Google Analytics reports to look at a particular segment of the data as a whole.

So for example, here’s a view of the report filtered by referrer (college of law):

Where is the traffic from the College of Law landing?

Okay – it seems like all the traffic is coming in to one page on the Library website from the College of Law?! Now this may or may not be true (there may be a single link on the College of Law website to the OU Library), it may or may not reflect an error in the way I have crafted the rule. One to watch…

How about the report filtered by users referred from the VLE?

This report looks far more natural – users are entering the site at a variety of locations, presumably from different links in the VLE.

Which is all well and good – but it would be really handy if we knew which courses the students were coming from, and/or which VLE pages were actually sending the traffic.

The way to do this is to capture the whole referrer URL (not just the “http://learn.open.ac.uk” part) and report this as a user defined value, something we can do with another filter:

Segmenting the majority landing page data (the Library homepage) by this user defined value gives the following report:

The full referrer URLs are, in the main, really nasty Moodle URLs that obfuscate the course behind an arbitrary resource ID number.

Having a quick look at the pages, the top five referrers over the short sample period the report has been running (and a Bank Holiday weekend at that!) are:

  1. EK310-08: Library Resources (53758);
  2. E891-07J: Library Resources (36196);
  3. DD308-08: Library Resources (54466);
  4. DD303-08: Library Resources (49710);
  5. DXR222-08E: Library Resources (89798);

If we knew all the VLE pages in a particular course that linked to the Library website, we could produce a filtered report that just recorded activity on the Library website that came from that course on the VLE.

Library Analytics (Part 5)

Another day, another Library Analytics post… Today, a quick glimpse at another popular content area on the OU Library website, the “Subject Resource Collections” that dangle off http://library.open.ac.uk/find/eresources/.

Most Popular Subject Resource Collections
The distribution of visits to subject resource collections is pretty flat, as the following report shows:

That said, the most popular categories are:

  1. the law/law collection:

  2. the Law_legislation page:

  3. the Psychology collection;
  4. the Education collection;
  5. the Science – General collection.

Thinking back to the previous post in this series, and the example of using Many Eyes to visualise multiple data dimensions at the same time, a similar technique might be useful here just to check that each resource is attracting similar usage stats in respect of time on site, average pages per visit, bounce rates, etc.?

Just by the by, if we look at the Entrance Source for traffic that ends up on the selector page for Psychology eresources, we can see that most of the traffic is coming in from the VLE.

The College of Law appears to be providing most of the Law/Law traffic though…

Going forwards, it would probably be useful for the collections whose traffic sourced from the VLE to try to identify which courses were providing that traffic. This information might then provide the basis for “KPIs” relating to the performance of particular Library resources on a particular course.

Onsite Search Behavior
One of the optional reports on Google Analytics (that is, one that needs to be enabled) is tracking of onsite search behaviour using the website’s own search tool. Popular search terms identified by this report may well indicate failures in support for navigation-through-browsing – in the case of the OU Library site, it seems that information about “Athens” isn’t the easiest thing to find just by clicking…

The following report is particularly interesting from a trends point of view:

The step change at the end of March, with the higher incidence of internal search terms prior to then, suggests a change in user behaviour (given that all the other reports have been showing pretty steady traffic numbers over the whole period). I’m guessing – and this is checkable – that there was a Library website redesign at the end of March, although step changes (particularly in the case of users segmented by course, if such a thing were possible) might also be indicative of participation in scheduled Library related activities within a course in presentation. I’ll try to post a bit more about that at a later date…)

Another informative report describes the proportion of visits in which the user engages in onsite search. Users tend to navigate websites either by browsing (clicking on links) or by search. A high incidence of serch may indicate weaknesses in navigation design via clickable links. So how does the Library website appear do?

Well – it seems that users are clicking their way to pages rather than searching for them… (though this may in turn reflect issues with discovery and design of the search page…!)

The Help Page
Another source of information about how well the site is working for visitors is to look at usage around the Help page. I’m not going to go into this page in any depth, but here’s an inkling of what sorts of information we might be able to extract from it…

Who’s looking at how to cite a reference?

Seems like Google traffic is high up here? So maybe another role for the Library website is outreach, in the sense of informal education? And maybe the “How to cite a reference” page would be a good place to place a link to the free Safari info skills minicourse, and an ad for TU120 Beyond Google? ;-)

Journal Impact Factor Visualisation

Whilst looking around for inspiration for things that could go into a mashup to jazz up the OU repository, I came across the rather wonderful eigenfactor.org which provides an alternative (the “eigenfactor”) to the Thompson Scientific Impact Factor measure of academic journal “weight”.

The site provides a range of graphical tools for exploring the relative impact of journals in a particular discipline, as well as a traditional search box for tracking down a particular journal.

Here’s how we can start to explore the journals in a particular area using an interactive graphical map:

The top journals in the field are listed on the right hand side, and the related fields are displayed within the central panel view.

A motion chart (you know: Hans Rosling; Gapminder…) shows how well particular journals have fared over time:

As well as providing eigenfactor (cf. impact) ratings for several hundred journals, the site also provides a “cost effectiveness” measure that attempts to reconcile a journal’s eigenfactor with it’s cost, giving buyers an idea of how much “bang per buck” their patrons are likely to get from a particular journal (e.g. in terms of how well a particular journal provides access to frequent, highly cited papers in a particular area, given its cost).

Reports are also available for each listed journal:

Finally, if you want to know how eigenfactors are calculated (it’s fun :-), the algorithm is described here: eigenfactor calculation.

Library Analytics (Part 4)

One of the things I fondly remember about doing physics at school was being told, at the start of each school year, how what we had been told the previous year was not quite exactly true, and that this year we would actually learn how the world worked properly

And so as this series of posts of about “Library Analytics” continues (that is, this series about the use of web analytics as applied to public Library websites), I will continue to show you examples of headline reports I have found initially compelling (or not), and then show why they are not quite right, and actually confusing at best, or misleading at worst…

Most Popular Journals
In the previous post in this series, we saw the most popular databases that were being viewed from the databases page. Is the same possible from the journals area? A quick look at the report for the find/journals/journals page suggests that such a report should be possible, but something is broken:

From the small amount of data there, the most popular journals/journal collections were as follows:

  1. JSTOR (271892);
  2. Academic Search Complete (403673);
  3. Blackwell Synergy (252307);
  4. ACM Digital Library (208448);
  5. IEEE Xplore (208545);

As with the databases, segmenting the traffic visiting these collections may provide insight as to which category of user (researcher, student) and maybe even which course is most active for a particular collection.

But what happened to the reporting anyway? Where has all the missing data gone?

I just had a quick look – the reporting from within the Journals area doesn’t currently appear to be showing anything…. err, oops?

Looking at the javascript code on each journal link:
onClick=”javascript:urchinTracker(‘/find/journals/journals/323808’)”
it’s not the same structure as the working code on the databases pages (which you may recall from the previous post in this series uses the tracking function pageTracker._trackPageview).

Looking at which tracking script is being used on the journals page (google-analytics.com/ga.js), I think the pageTracker._trackPageview function should be being used. urchinTracker is a function from an old tracking function. Oops… I wonder whether anyone has been (not) looking at Journal use indicators lately (or indeed, ever…?!)

Where is Journal Traffic Coming From (source location)?
So what sites are referring traffic to the journals area?

Well it looks as if there’s a lot of direct traffic coming in (so it may be worth looking at the network location report to see if we can tunnel into that), but there’s also a good chunk of traffic coming from the VLE (learn.open.ac.uk). It’d be handy to know which courses were sending that traffic, so we’ll just bear that in mind as a question for a later post.

Where is Journal Traffic Coming From (network locations)?
To get a feel for how much of the traffic to the journals “homepage” is coming from on campus (presumably OU researchers?) we can segment the report for the journals homepage according to network location.

The open university network location corresponds to traffic coming in from an OU domain. This report potentially gives us the basis for an “actionable” report, and maybe even a target… That is, to increase the number of page views (if not the actual proportion of traffic from on campus – we may be wanting to grow absolute traffic numbers from the VLE too) from the OU domain, as a result of increasing the number of researchers looking up journals from the Library journals homepage whilst at work on campus.

At this point, it’s probably a good a time as any to start to think about how we might interpret data such ‘number of pages per visit’, ‘average time on site’ and bounce rate (see here for some definitions).

Just looking at the numbers going across the columns, we can see that there are different sorts of groupings of the numbers.

ip pools and open university have pages/visit around 12, an average time on site tending towards 4 minutes, about 16% new visits in the current period (down from 36% in the first period, so people keep coming back to the site, which is good, though it maybe means we’re not attracting so many new visitors), and a bounce rate a bit less than 60%, down from around 70% in the earlier period (so fewer people are entering at the journals page and then leaving the site immediately).

Compare this to addresses ip for home clients and greenwich university reports, where there are just over 1 page per visit, only a few seconds on site, hardly any new visits (which I don’t really understand?) and a very high bounce rate. These visitors are not getting any value from the site at all, and are maybe being misdirected to it? Whatever the case, their behaviour is very different to the open university visitors.

Now if I was minded to, I’d run this data through a multidimensional clustering algorithm to see if there were some well defined categories of user, but I’m not in a coding mood, so maybe we’ll just have a look to see what patterns are visually identifiable in the data.

So, taking the top 20 results from the most recent reporting period shown above, lets upload it to Many Eyes and have a play (you can find the data here).

First up, let’s see if we can spot trending in time on site and pages/visit (which is exactly what we’d expect, of course) (click through the image to see the interactive visualisation on Many Eyes):

Okay – so that looks about right; and the higher bounce rates seem to be correspond to low average time on site/low pages per visit, which is what we’d expect too. (Note that by hovering over a data point, we can see which network location the data corresponds to.)

We can also see how the scatterplot gives us a way of visualising 3 dimensions at the same time.

If we abuse the histogram visualisation, we have an easy way of looking at which network locations have a high bounce rate, or time on site (a visual equivalent of ‘sort on column’, I guess? ;-)

Finally, a treemap. Abusing this visualisation gives us a way of comparing two numerical dimensions at the same time.

Note that using network location here is not necessarily that interesting as a base category… I’m just getting my eye in as to what Many Eyes visualisations might be useful! For the really interesting insights, I reckon a grand or two per day, plus taxes and expenses, should cover it ;-) But to give a tease, here’s the raw data relating to the Source URLs for trafifc that made it to the Journals area:

Can you see any clusters in there?! ;-)

Summary
Okay – enough for now. Take homes are: a) the wrong tracking function is being used on the journals page; b) the VLE is providing a reasonable an amount of traffic to the journals area of the Library website, though I haven’t identified (yet!) exactly which courses are sourcing that traffic; c) Many Eyes style visualisations may provide a glanceable, visual view over some of the Google Analytics data.

What’s On Open2…

Chatting with Stuart over a pint last week, he mentioned that the Open2 folks had started publishing a programme announcement feed on Twitter that lets you know when a TV programme the OU’s been involved with is about to be shown on one of the BBC channels: open2 programme announcements on Twitter.

By subscribing to the RSS feed from the Open2 twitter account, it’s easy enough to get yourself an alert for upcoming BBC/OU programmes.

The link goes through to the programme page on the open2 website, which is probably a Good Thing, but it strikes me that there’s no obvious way to watch the programme from the Open2 page?

That is, there’s no link to an iplayer or BBC programmes view, such as BBC Programmes > Coast:

If I’m reading the BBC Programmes Developers’ Guide correctly, not all the URL goodness has been switched on for these URLs yet? For example, here’s the guidance:

Programmes

/programmes/:groupPID/episodes/upcoming
/programmes/:groupPID/episodes/upcoming/debut
/programmes/:groupPID/episodes/player

To access these add .xml, .json or .yaml to the end of the url.

Whilst http://www.bbc.co.uk/programmes/b006mvlc works as I expect, http://www.bbc.co.uk/programmes/b006mvlc/episodes requires a branch into a year – http://www.bbc.co.uk/programmes/b006mvlc/episodes/2008, and I can’t get the upcoming or format extensions to work at all?

As well as the BBC Programmes page, we can also find iPlayer links from a search on the iPlayer site: Search for “coast” on iPlayer:

Going back to the twitter feed, I wonder whether there’s any point in having a second twitter account that alerts people as to when a programme is available on iplayer? A second alert could give you a day’s notice that a programme is about to disappear from iPlayer?

Just by the by, here are a couple more BBC related things I spotted over the last few days: BBC Top Gear channel on Youtube; and BBC’s Tomorrow’s World to be revived (Telegraph).

Now if the “popular science magazine show” referred to is the one that was mentioned at the BBC/OU science programming brainstorming session I posted about a couple of weeks ago, I’m pretty sure the producer said it wasn’t going to be like Tomorrow’s World… Which I guess means it is – in that it is going to be like Tomorrow’s World in terms of positioning and format, but it isn’t going to be exactly like it in terms of content and delivery… (I have to admit that I got the impression is was going to be more like *** **** for Science… ;-)

More Olympics Medal Table Visualisations

So the Olympics is over, and now’s the time to start exploring various views over the data tables in a leisurely way:-)

A quick scout around shows that the New York Times (of course) have an interactive view of the medals table, also showing a historical dimension:

Channel 4’s interactive table explores medal table ‘normalisation’ according to population, GDP and so on…

GDP and population data have also been taking into account in a couple of visualisations created on Many Eyes – like this one:

Not wanting to not be part of the fun, I spent a bit of time this evening scraping data from the Overall medal standing table and popping it into Many Eyes myself.

(Note that there’s lots of mashable stuff – and some nice URLs – on the http://en.beijing2008.cn/ website… why, oh, why didn’t I think to have a play with it over the last couple of weeks?:-(

Anyway, I’ve uploaded the results, by discipline, for the Olympics 2008 Medal Table (Top 10, by Tally) and had a quick play to see what sort views might be useful in visualising the wealth of information the data contains.

First up, here are the disciplines that the top 10 countries (by medal tally) were excelling at:

Treemaps are one of my favourite visualisation tools. The Many Eyes treemap, whilst not allowing much control over colour palettes, does make it easy to reorder the order of the hierarchy used for the treemap.

Here’s a view by discipline, then country, that allow you to see the relative number of medals awarded by discipline, and the countries that ‘medalled’ within them:

Rearranging the view, we can see how well each country fared in terms of total medal haul, as well as the number of medals in each medal class.

The search tool makes it easy to see medals awarded in a particular discipline by country and medal class – so for example, here’s where the swimming medals went:

A network diagram view lets us see (sort of) another view of the disciplines that each country took medals in.

The matrix chart is more familiar, and shows relative medal hauls for gold, silver and bronze, by country.

By changing the colour display to show the disciplines medals were awarded in, we can see which of the countries won swimming medals, for example.

Enough for now… the data‘s on the Many Eye’s site if you want to create your own visualisations with it… You should be able to reduce the data (e.g. by creating copies of the data set with particular columns omitted) to produce simpler visualisations (e.g. simpler treemaps).

You can also take a copy of the data to use in your own data sets, (e.g. normalising it by GDP, population, etc, etc.)

If you do create any derived visualisations, please post a link back as a comment to this post :-)