Journal Impact Factor Visualisation

Whilst looking around for inspiration for things that could go into a mashup to jazz up the OU repository, I came across the rather wonderful eigenfactor.org which provides an alternative (the “eigenfactor”) to the Thompson Scientific Impact Factor measure of academic journal “weight”.

The site provides a range of graphical tools for exploring the relative impact of journals in a particular discipline, as well as a traditional search box for tracking down a particular journal.

Here’s how we can start to explore the journals in a particular area using an interactive graphical map:

The top journals in the field are listed on the right hand side, and the related fields are displayed within the central panel view.

A motion chart (you know: Hans Rosling; Gapminder…) shows how well particular journals have fared over time:

As well as providing eigenfactor (cf. impact) ratings for several hundred journals, the site also provides a “cost effectiveness” measure that attempts to reconcile a journal’s eigenfactor with it’s cost, giving buyers an idea of how much “bang per buck” their patrons are likely to get from a particular journal (e.g. in terms of how well a particular journal provides access to frequent, highly cited papers in a particular area, given its cost).

Reports are also available for each listed journal:

Finally, if you want to know how eigenfactors are calculated (it’s fun :-), the algorithm is described here: eigenfactor calculation.

Library Analytics (Part 6)

In this post, I’m going to have a quick look at some filtered reports I set up a few days ago to see if they are working as I expected.

What do I mean by this? Well, Google Analytics lets you create filters that can be used to create reports for a site that focus on a particular area of the website or user segment.

At their simplest, filters work in one of two main ways (or a combination of both). Firstly, you can filter the report so that it only covers activity on a subset of the website as a whole (such as all pages along the path http://library.open.ac.uk/find/databases). Secondly, you can filter the report so that it only covers traffic that is segmented according to user characteristics (such as users arriving from a particular referral source).

Here are a couple of examples: firstly, a filter that will just report on traffic that has been referred from the VLE:

Using this filter will allow us to create a report that tracks how the Library website is being used by OU students.

Another filter in a similar vein lets us track just the traffic arriving from the College of Law:

A second type of filter allows us to just provide a report on activity within the eresources area of the Library website:

Note that multiple filters can be applied to a single report profile, so I could for example create a report profile that just looked at activity in the Journals area of the website (by applying a subdirectory filter) that came from users on the OU campus (by also applying a user segment filter).

So how does this help?

If we assume there are several different user types on the Library website (students, onsite researchers, students on partner courses (such as with the College of Law), users arriving from blond Google searches, and so on), then we can use filters to create a set of reports, each one covering a different user segment. Adding all the separate reports together would give us the “total” website report that I was using in the first five posts in this series. Looking at each report separately allows us to understand the different needs and behaviours of the different user types.

Although it is possible to segment reports from the whole site report, as I have shown previously, segmenting the report ‘on the way in’ through the application of one or more filters allows you to use the whole raft of Google Analytics reports to look at a particular segment of the data as a whole.

So for example, here’s a view of the report filtered by referrer (college of law):

Where is the traffic from the College of Law landing?

Okay – it seems like all the traffic is coming in to one page on the Library website from the College of Law?! Now this may or may not be true (there may be a single link on the College of Law website to the OU Library), it may or may not reflect an error in the way I have crafted the rule. One to watch…

How about the report filtered by users referred from the VLE?

This report looks far more natural – users are entering the site at a variety of locations, presumably from different links in the VLE.

Which is all well and good – but it would be really handy if we knew which courses the students were coming from, and/or which VLE pages were actually sending the traffic.

The way to do this is to capture the whole referrer URL (not just the “http://learn.open.ac.uk” part) and report this as a user defined value, something we can do with another filter:

Segmenting the majority landing page data (the Library homepage) by this user defined value gives the following report:

The full referrer URLs are, in the main, really nasty Moodle URLs that obfuscate the course behind an arbitrary resource ID number.

Having a quick look at the pages, the top five referrers over the short sample period the report has been running (and a Bank Holiday weekend at that!) are:

  1. EK310-08: Library Resources (53758);
  2. E891-07J: Library Resources (36196);
  3. DD308-08: Library Resources (54466);
  4. DD303-08: Library Resources (49710);
  5. DXR222-08E: Library Resources (89798);

If we knew all the VLE pages in a particular course that linked to the Library website, we could produce a filtered report that just recorded activity on the Library website that came from that course on the VLE.

Library Analytics (Part 7)

In the previous post in this series, I showed how it’s possible to identify traffic referred from particular course pages in the OU VLE, by creating a user defined variable that captured the complete (nasty) VLE referrer URL.

Now I’m not definitely sure about this, but I think that the Library provides URLs to the VLE via an RSS feed. That is, the Library controls the content that appears on the Library Resources page when a course makes such a page available.

In the Googe Analytics FAQ answer How do I tag my links?, a method is described for adding additional tags to a referrer URL that Google Analytics can use to segment traffic referred from that URL. Five tags are available (as described in Understanding campaign variables: The five dimensions of campaign tracking):

Source: Every referral to a web site has an origin, or source. Examples of sources are the Google search engine, the AOL search engine, the name of a newsletter, or the name of a referring web site.
Medium: The medium helps to qualify the source; together, the source and medium provide specific information about the origin of a referral. For example, in the case of a Google search engine source, the medium might be “cost-per-click”, indicating a sponsored link for which the advertiser paid, or “organic”, indicating a link in the unpaid search engine results. In the case of a newsletter source, examples of medium include “email” and “print”.
Term: The term or keyword is the word or phrase that a user types into a search engine.
Content: The content dimension describes the version of an advertisement on which a visitor clicked. It is used in content-targeted advertising and Content (A/B) Testing to determine which version of an advertisement is most effective at attracting profitable leads.
Campaign: The campaign dimension differentiates product promotions such as “Spring Ski Sale” or slogan campaigns such as “Get Fit For Summer”.

(For an alternative description, see Google Analytics Campaign Tracking Pt. 1: Link Tagging.)

The recommendation is that campaign source, campaign medium, and campaign name should always be used (I’m not sure if Google Analytics requires this, though?)

So here’s what I’m proposing: how about we treat a “course as campaign”? What are sensible mappings/interpretations for the campaign variables?

  • source: the course?
  • medium: the sort of link that has generated the traffic, such as a link on the Library resources page?
  • campaign: the mechanism by which the link got into the VLE, such as a particular class of Library RSS feed or the addition of the link by a course team member?

By creating URLs that point back to the Library website for the display in the VLE tagged with “course campaign” variables, we can more easily track (i.e. segment) user activity on the Library website that results from students entering the Library site from that link referral.

Where course teams upload Library URLs themselves, we could maybe provide a “URL Generator Tool” (like the “official” Tool: URL Builder) that will accept a library URL and then automatically add the course code (source), a campaign flag saying the link was course team uploaded, a medium flag saying the link is provided as part of assessment, or further information. The “content” variable might capture a section number in the course, or information about what activity in particular the resource related to?

For example, the tool would be able to create something like:
http://learn.open.ac.uk/mod/resourcepage/view.php?id=36196&utm_source=E891-07J&utm_medium=Library%2Bresource&utm_campaign=Library%2BRSS%2Bfeed

Annotating links in this way would allow Library teams to see what sorts of link (in terms of how they get into the VLE) are effective at generating traffic back to the Library, and could also enable the provision of reports back to course teams showing how effectively students on a particular course are engaging with Library resources from links on the VLE course pages.

Contextual Content Server, Courtesy of Google?

Earlier this week, Google announced Ad serving for everyone, the opening up of their Ad Manager tool to all comers.

Ad Manager can help you sell, schedule, deliver, and measure both directly-sold and network-based inventory.

# Ad network management: Easily manage your third-party ad networks in Ad Manager to automatically maximize your network driven revenue.

# Day and Time Targeting: Don’t want your orders to run on weekends? No problem. With day and time targeting, you can set any new line items you create to run only during specific hours or days, or as little as 15 minutes per week. Use day and time targeting in addition to geography, bandwidth, browser, user language, operating system, domain and custom targeting.

There’s an excellent video overview and basic tutorial here: Google Ad Manager Introduction. (If that doesn’t work for you, here’s a text based tutorial from somewhere else….)

In part, the Ad Manager allows you to use your own ads with Google’s ad serving technology, which can deliver ads according to:
* Browser version
* Browser language
* Bandwidth
* Day and time
* Geography (Country, region or state, metro, and city)
* Operating system
* User domain

If you can provide custom tagging information (e.g. by adding information from a personal profile into the ad code on the page displayed to the user) then the Ad Manager can also be used to provide custom targeting according to the tags you have available.

So here’s what I’m thinking – can we use the Google Ad Manager service to deliver contextualised content to users? That is, create “ad” areas on a page, and deliver our own “content ads” to it through the Google Ad Manager.

So for example, we could have a contentAd sidebar widget on a Moodle VLE page; we could add a custom tag into the widget relating to a particular course; and we could serve course related “ad” content through the Ad Manager.

By running the content of a page through a content analyser (such as Open Calais, which now offers RESTful calls via HTTP POST), or looking on a site such as delicious to see what the page has been tagged with, we can generate ‘contextual tags’ to further customise the content delivery.

So what? So think of small chunks of content as “contentAds”, and use the Google Ad Manager to serve that content in a segmented, context specific way to your users… ;-)

HTML Tables and the Data Web

Some time ago now, I wrote a post about progressive enhancement of HTML web pages, including some examples of how HTML data tables could be enhanced to provide graphical views of the data contained in them.

I’m not sure if anyone is actively maintaining progressive enhancement browser extensions (I haven’t checked) but here are a couple more possible enhancements released as part of the Google visualisation API, as described in Table Formatters make Visualization tables even nicer:

A couple of other options allow you to colour a table cell according to value (an implementation of the ‘format cell on value’ function you find in many spreadsheets), and a formatter that will “format numeric columns by defining the number of decimal digits, how negative values are displayed and more”, such as adding a prefix or suffix to each number.

I’m not sure if these features are included in the QGoogleVisualizationAPI Google visualisation API PHP wrapper yet, though?

Also in my feed reader recently was this post on Search Engines Extracting Table Data on the Web, which asks:

what if Google focused upon taking information from tables that contain meaningful data (as opposed to tables that might be used on a web page to control the formatting of part or all of a page)?

What if it took all those data filled tables, and created a separate database just for them, and tried to understand which of those tables might be related to each other? What if it then allowed for people to search through that data, or combine the data in those tables with other data that those people own, or that they found elsewhere on the Web?

and then links to a couple of recent papers on the topic.

It strikes me that Javascript/CSS libraries could really help out here – for example structures like Google’s Visualisation API Table component and Yahoo’s UI Library DataTable (which makes it trivial to create sortable tables in your web page, as this example demonstrates: YUI Sortable Data Tables).

Both of these provide a programmatic way (that is, a Javascript way) of representing tabular data and then displaying it in a table in a well defined way.

So I wonder, will the regular, formalised display of tabular data make it easier to scrape the data back out of the table? That is, could we define GRDDL like transformations that ‘undo’ the datatable-to-HTML-table conversions, and map back from HTML tables to e.g. a JSON, XML or javascript datatable representations of the data?

Once we get the data out of the HTML table and into a more abstract datatable representation, might we then be able to use the Javascript data representation as a ‘database table’ and run queries on it? That is, if we have data described using one of these datatable representations, could we run SQL like queries on it in the page, for example by using TrimQuery, which provides a SQL-like query language that can be run against javascript objects?

Alternatively, could we map the data contained in a “regular” Google or Yahoo UI table to a Google spreadsheets like format – in which case, we might be able to use the Google Visualisation API Query Language? (I’m not sure if the Query Language can be applied directly to Google datatable objects?)

It’s not too hard then to imagine a browser extension that can be used to overlay a SQL-like query engine on top of pages containing Yahoo or Google datatables, essentially turning the page into a queryable database? Maybe even Ubiquity could be used to support this?

OU News Tracking

A couple of days ago, Stuart pointed me to Quarkbase, a one stop shop for looking at various web stats, counts and rankings for a particular domain (here’s the open.ac.uk domain on quarkbase, for example; see also: the Silobreaker view of the OU), which reminded me that I hadn’t created a version of the media release related news stories tracker that won me a gift voucher at IWMW2008 ;-)

So here it is: OU Media release effectiveness tracker pipe.

And to make it a little more palatable, here’s a view of the same in a Dipity timeline (which will also have the benefit of aggregating these items over time): OU media release effectiveness tracker timeline.

I also had a mess around trying to see how I could improve the implementation (one was was to add the “sort by date” flag to the Google news AJAX call (News Search Specific Arguments)), but then, of course, I got sidetracked… because it seemed that the Google News source I was using to search for news stories didn’t cover the THES (Times Higher Education Supplement).

My first thought was to use a Yahoo pipe to call a normal Google search, limited by domain to http://www.timeshighereducation.co.uk/: OU THES Search (via Google).

But that was a bit hit and miss, and didn’t necessarily return the most recent results… so instead I created a pipe to search over the last month of the THES for stories that mention “open university” and then scrape the THES search results page: OU THES Scraper.

If you want to see how it works, clone the pipe and edit it…

One reusable component of the pipe is this fragment that will make sure the date is in the correct format for an RSS feed (if it isn’t in the right format, Dipity may well ignore it…):

Here’s the full expression (actually, a PHP strftime expression) for outputting the date in the required RFC 822 date-time format: %a, %d %b %Y %H:%M:%S %z

To view the OU in the THES tracker over time, I’ve fed it into another Dipity timeline: OU in the THES.

(I’ve also added the THES stories to the OUseful “OU in the news” tab at http://ouseful.open.ac.uk/.)

Going back to the media release effectiveness tracker, even if I was to add the THES as another news source, the coverage of that service would still be rather sparse. For a more comprehensive version, it would be better to plug in to something like the LexisNexis API and search their full range of indexed news from newspapers, trade magazines and so on… That said, I’m not sure if we have a license to use that API, and/or a key for it? But then again, that’s not really my job… ;-)

Managing Time in Yahoo Pipes

In the previous post – OU News Tracking – I briefly described how to get a Yahoo pipe to output a “pubDate” timestamp in the “correct” RSS standard format:

Here’s the full expression (actually, a PHP strftime expression) for outputting the date in the required RFC 822 date-time format: %a, %d %b %Y %H:%M:%S %z

The Date Builder block is being applied to a particular field (cdate) in every feed item, and assigning the results to the y:published tag. But what exactly is it outputting? A special datetime object, that’s what:

The Date Builder module is actually quite flexible in what it accepts – in the above pipe, cdate contains values like “21 August 2008”, but it can do much more than that…

For example, take the case of the THES search pipe, also described in the previous post. The pipe constructs a query that searches the Times Higher from the current date back to the start of the year. Heres what the query looks like in the original search form:

And here’s what the URL it generates looks like:

http://www.timeshighereducation.co.uk/search_results.asp?
refresh=0&keyword=%22open+university%22&searchtype=kyphase&sections=0&categories=0
&dateissuefilter=datefilter&issue=0
&sday=1&smth=1&syr=2008&eday=4&emth=9&eyr=2008

sday is “start date”, emth is “end month”, and so on…

Posting the URL into the Pipe URL builder separates out the different arguments nicely:

You’ll notice I’ve hardcoded the sday and smth to the January 1st, but the other date elements are wired in from a datetime object that has been set to the current date:

Terms like “now” also work…

Taken together, the two date/time related blocks allow you to manipulate time constructs quite easily within the Yahoo pipe :-)

The Obligatory Google Chrome Post – Sort Of…

Okay, so I’m a few days behind the rest of the web posting on this (though I tweeted it early;-), and I have to admit I still haven’t tried the Google Chrome browser out yet (it’s such a chore booting into Windows…), so here are some thoughts based on a reading of the the comic book and a viewing of the launch announcement.

Why Chrome? And how might it play out? (I’m not suggesting things were planned this way…) Here’s where you get to see how dazed and confused I am, and how very wrong I can be about stuff ;-)

First up – Chrome is for mobile devices, right? It may not have been designed for that purpose, but the tab view looks pretty odd to me, going against O/S UI style guides for pretty much everything. Each tab in its own process makes sense for mobile devices, where multiple parallel applications may be running at any time, but only one is in view. Rumbling’s around the web suggest Chrome for Android is on its way in a future Android release…

Secondly, Google Chrome draws heavily on Google Gears. Google Gears provides the browser with it’s own database, so the browser can store lots of state locally. (Does Gears also provide a lite, local webserver?) Google Gears lets you use web apps offline, and store lots of state without making a call on the host computer’s o/s…

So I’m guessing that Chrome would work well as a virtual appliance…? That is, it’s something that can be popped into a Jumpbox appliance, for example, and run…. anywhere…like from a live CD or bootable USB key (a “live USB”)? That is, run it as a “live virtual appliance”. So you don’t need a host operating system, just a boot manager? Or if all your apps are in the cloud, you just need a machine that runs Chrome (maybe with Flash and Silverlight plugins too).

Chrome lets you create standalone “desktop web apps” in the form of “single application browsers” – a preloaded tab that “runs” Gmail or Google docs, for example, (or equally, I guess, Zoho web applications), just as if they were any other desktop application. The browser becomes a container for applications. If you can run the browser, you can run the app. If you can run the browser in a virtual appliance (or on a mobile device – UI issues aside), you can run the app…

Chrome makes use of open source components – the layout engine, Javascript engine, Gears and so on. Open source presumably makes anti-trust claims harder to put together if the browser starts to take market share; if other browser developers use the code, it legitimises it, as well as increasing the developer community.

On the usability side, the major thing that jumped out at me was that there’s a single search’n’address “omnibox” within each tab. Compare that to current browsers, where the address bar and search box are separate and above the line of selectable tabs.

It’s worth noting here that many people don’t really understand the address bar and the browser search box – they just get to Google any way they can and type stuff into the Google search box: keywords, URLs, brandnames, cut’n’pasted natural language text, anything and everything…

What the omnibox appears to do is to provide a blend of Google Suggest, browser history suggest/URL autocompletion, (and maybe ultimately Google personal browsing history?) and automagically acquired site specific opensearch helpers within a single user entry text box. (I love psychic/ESP searchboxes… I even experimented with using one on searchfeedr, I think?) I guess it also makes migration of the browser easier to a mobile device – each tab satisfies most of the common UI requirements of a single window browser?

A couple of other things that struck me while pondering the above:
– what’s with the URL for the comicbook? http://www.google.com/googlebooks/chrome/ What else can we expect to appear on http://www.google.com/googlebooks/?
– has Google taken an interest in any of the virtual appliance players – Parallels, VMware, Jumpbox etc etc?

OU Library iGoogle Gadgets

Just over a month ago, the OU web team released a “Fact of the Day” Google gadget that publishes an interesting fact from an OpenLearn course once a day, along with a link to the OpenLearn course that it came from.

(By the by, compare the offical press release with Laura’s post…)

The OU Library just announced a couple of OU Library iGoogle gadgets too (though I think they have been around for some time…)…

…but whereas the Fact of the Day widget is pretty neat, err, erm, err…

Here’s the new books widget. The Library produces an RSS feed of new books for a whole host of different topic areas. So you can pick your topic and view the new book titles in a gadget on your Google personal page, right…?

Err – well, you can pick a topic area from the gadget…

…and when you click “Go” you’re taken to the Library web page listing the new books for that topic area in a new tab…

Hmmm…

[Lots of stuff deleted about broken code that gives more or less blank pages when you click through on “Art History” at least; HTTP POST rather than GET (I don’t want to have to header trace to debug their crappy code) etc etc]

I have to admit I’m a little confused as to who would want to work this way… All the gadget does is give you lots of bookmarks to other pages. It’s not regularly (not ever) bringing any content to me that I can consume within my Google personal page environment… (That said, it’s probably typical of the sort of widget I developed when I first started thinking about such things…and before lots of AJAX toolkits were around…)

This could be so, so much better… For a start, much simpler, and probably more relevant…

For example, given a feed URL, you can construct another URL that will add the feed to your iGoogle page.

Given a URL like this:
http://voyager.open.ac.uk/rss/compscience.xml
just do this:
http://fusion.google.com/add?feedurl=http://voyager.open.ac.uk/rss/compscience.xml
which takes you to a page like this:

where you can get a widget like this:

Personally, I’d do something about the feed title…

It’s not too hard to write a branded widget that will display the feed contents, or maybe a more elaborate one that will pull in book covers.

For example, here’s an old old old old example of an alternative display – a carousel (described here, in a post from two years ago: Displaying New Library Books):

Admittedly, you’re faced with the issue of how to make the URLs known to the user. But you could generate a URL from a form on th Library gadget page, and assign it to an “add to Google” image button easily enough.

And the other widget – the Library catalogue search…?

Let’s just say that in the same way as the ‘new books’ widget is really just a list of links hidden in a drop down box, so the catalogue search tool is actually just a redirecting search box. Run a query and you’re sent to the Voyager catalogue search results page, rather than having the results pulled back to you in the gadget on the Google personal page.

(I know, a lot of search widgets are like that (I’ve done more than a few myself in years gone by), but things have moved on and I think I’d really expect the results to be pulled back into the widget nowadays…)

PS okay, I’m being harsh, it’s been a long crappy day, I maybe shouldn’t post this… maybe the widgets will get loads of installs, and loads of traffic going to the Library site… I wonder if they’re checking the web stats to see, maybe because they found out how to add Google Analytics tracking to a Google gadget? And I wonder what success/failure metrics they’re using?

PPS okay, okay – I apologise for the above post, Library folks. The widgets are a good effort – keep up the good work. I’ll be interested to see how you iterate the design of these widgets over the next few weeks, and what new wonders you have in store for us all… :-) Have a think about how users might actually use these widgets, and have a look at whether it may be appropriate to pull content back into the widget using an AJAX call, rather than sending the user away from their personal page to a Library web page. If you can find any users, ask them what they think, and how they’re using the widget. Use web stats/analytics to confirm (or deny) what they’re saying (users lie… ;-). And keep trying stuff out… my machine is littered with dead code and my Google personal page covered in broken and unusable widgets that I’ve built myself. Evolution requires failure…and continual reinvention ;-)

Rehashing Old Tools to Look at CCK08

I haven’t posted for a few days (nothing to write about, sigh….) so here’s a cheap’n’lazy post reusing a couple of old visual demos (edupunk chatter, More Hyperbolic Tree Visualisations – delicious URL History: Users by Tag) to look at what’s happening around the use of the CCK08 tag that’s being used to annotate – in a distributed way – the Connectivism and Connective Knowledge online course

For example, here’s a view of people who have been using the cck08 tag on delicious:

People twittering mentions of cck08:

And here’s how people have been tagging the Connectivism and Connective Knowledge course homepage on delicious (along with te people who’ve been using those tags).

The next step is to move from hierarchical info displays (such as the above) to mining networks – grous of people who are talking about the same URLs on delicious and twitter, and maybe even blogging about CCK08 too…