Library Analytics (Part 7)

In the previous post in this series, I showed how it’s possible to identify traffic referred from particular course pages in the OU VLE, by creating a user defined variable that captured the complete (nasty) VLE referrer URL.

Now I’m not definitely sure about this, but I think that the Library provides URLs to the VLE via an RSS feed. That is, the Library controls the content that appears on the Library Resources page when a course makes such a page available.

In the Googe Analytics FAQ answer How do I tag my links?, a method is described for adding additional tags to a referrer URL that Google Analytics can use to segment traffic referred from that URL. Five tags are available (as described in Understanding campaign variables: The five dimensions of campaign tracking):

Source: Every referral to a web site has an origin, or source. Examples of sources are the Google search engine, the AOL search engine, the name of a newsletter, or the name of a referring web site.
Medium: The medium helps to qualify the source; together, the source and medium provide specific information about the origin of a referral. For example, in the case of a Google search engine source, the medium might be “cost-per-click”, indicating a sponsored link for which the advertiser paid, or “organic”, indicating a link in the unpaid search engine results. In the case of a newsletter source, examples of medium include “email” and “print”.
Term: The term or keyword is the word or phrase that a user types into a search engine.
Content: The content dimension describes the version of an advertisement on which a visitor clicked. It is used in content-targeted advertising and Content (A/B) Testing to determine which version of an advertisement is most effective at attracting profitable leads.
Campaign: The campaign dimension differentiates product promotions such as “Spring Ski Sale” or slogan campaigns such as “Get Fit For Summer”.

(For an alternative description, see Google Analytics Campaign Tracking Pt. 1: Link Tagging.)

The recommendation is that campaign source, campaign medium, and campaign name should always be used (I’m not sure if Google Analytics requires this, though?)

So here’s what I’m proposing: how about we treat a “course as campaign”? What are sensible mappings/interpretations for the campaign variables?

  • source: the course?
  • medium: the sort of link that has generated the traffic, such as a link on the Library resources page?
  • campaign: the mechanism by which the link got into the VLE, such as a particular class of Library RSS feed or the addition of the link by a course team member?

By creating URLs that point back to the Library website for the display in the VLE tagged with “course campaign” variables, we can more easily track (i.e. segment) user activity on the Library website that results from students entering the Library site from that link referral.

Where course teams upload Library URLs themselves, we could maybe provide a “URL Generator Tool” (like the “official” Tool: URL Builder) that will accept a library URL and then automatically add the course code (source), a campaign flag saying the link was course team uploaded, a medium flag saying the link is provided as part of assessment, or further information. The “content” variable might capture a section number in the course, or information about what activity in particular the resource related to?

For example, the tool would be able to create something like:
http://learn.open.ac.uk/mod/resourcepage/view.php?id=36196&utm_source=E891-07J&utm_medium=Library%2Bresource&utm_campaign=Library%2BRSS%2Bfeed

Annotating links in this way would allow Library teams to see what sorts of link (in terms of how they get into the VLE) are effective at generating traffic back to the Library, and could also enable the provision of reports back to course teams showing how effectively students on a particular course are engaging with Library resources from links on the VLE course pages.

The Tesco Data Business (Notes on “Scoring Points”)

One of the foundational principles of the Web 2.0 philosophy that Tim O’Reilly stresses relates to “self-improving” systems that get better as more and more people use them. I try to keep a watchful eye out for business books on this subject – books about companies who know that data is their business; books like the somehow unsatisfying Competing on Analytics, and a new one I’m looking forward to reading: Data Driven: Profiting from Your Most Important Business Asset (if you’d like to buy it for me… OUseful.info wishlist;-).

So as part of my summer holiday reading this year, I took away Scoring Points: How Tesco Continues to WIn Customer Loyalty, a book that tells the tale of the Tesco Loyalty Card. (Disclaimer: the Open University has a relationship with Tesco, which means that you can use Tesco clubcard points in full or part payment of certain OU courses. It also means, of course, that Tesco knows far, far more about certain classes of our students than we do…)

For those of you who don’t know of Tesco, it’s the UK’s dominant supermarket chain, taking a huge percentage of the UK’s daily retail spend, and is now one of those companies that’s so large it can’t help but be evil. (They track their millions of “users” as aggressively as Google tracks theirs.) Whenever you hand over your Tesco Clubcard alongside a purchase, you get “points for pounds” back. Every 3 months (I think?), a personalised mailing comes with vouchers that convert points accumulated over that period into “cash”. (The vouchers are in nice round sums – £1, £2.50 and so on. Unconverted points are carried over to the convertable balance in next mailing.) The mailing also comes with money off vouchers for things you appear to have stopped purchasing, rewards on product categories you frequently buy from, or vouchers trying to entice you to buy things you might not be in the habit of buying regularly (but which Tesco suspects you might desire!;-)

Anyway, that’s as maybe – this is supposed to be a brief summary of corner-turned pages I marked whilst on holiday. The book reads a bit like a corporate briefing book, repetitive in parts, continually talking up the Tesco business, and so on, but it tells a good story and contains more than a few a gems. So here for me were some of the highlights…

First of all, the “Clubcard customer contract”: more data means better segmentation, means more targeted/personalised services, means better profiling. In short, “the more you shop with us, the more benefit you will accrue” (p68).

This is at the heart of it all – just like Google wants to understand it’s users better so that it can serve them with more relevant ads (better segmentation * higher likelihood of clickthru = more cash from the Google money machine), and Amazon seduces you with personal recommendations of things it thinks you might like to buy based on your purchase and browsing history, and the purchase history of other users like you, so Tesco Clubcard works in much the same way: it feeds a recommendation engine that mines and segments data from millions of people like you, in order to keep you engaged.

Scale matters. In 1995, when Tesco Clubcard launched, dunhumby, the company that has managed the Clubcard from when it was still an idea to the present day, had to make do with the data processing capabilities that were available then, which meant that it was impossible to track every purchase, in every basket, from every shopper. (In addition, not everything could be tracked by the POS tills of the time – only “the customer ID, the total basket size and time the customer visited, and the amount spent in each department” (p102)). In the early days, this meant data had to be sampled before analysis, with insight from a statistically significant analysis of 10% of the shopping records being applied to the remaining 90%. Today, they can track everything.

Working out what to track – first order “instantaneous” data (what did you buy on a particular trip, what time of day was the visit) or second order data (what did you buy this time you didn’t buy last time, how long has it been between visits) – was a major concern, as were indicators that could be used as KPIs in the extent to which Clubcard influenced customer loyalty.

Now I’m not sure to what extent you could map website analytics onto “store analytics”, but some of the loyalty measures seem familiar to me. Take, for example, the RFV analysis (pp95-6) :

  • Recency – time between visits;
  • Frequency – “how often you shop”
  • Value – how profitable is the customer to the store (if you only buy low margin goods, you aren’t necessarily very profitable), and how valuable is the store to the customer (do you buy your whole food shop there, or only a part of it?).

Working out what data to analyse also had to fit in with the business goals – the analytics needed to be actionable (are you listening, Library folks?!;-). For example, as well as marketing to individuals, Clubcard data was to be used to optimise store inventory (p124). “The dream was to ensure that the entire product range on sale at each store accurately represented, in selection and proportion, what the customers who shopped there wanted to buy.” So another question that needed to be asked was how should data be presented “so that it answered a real business problem? If the data was ‘interesting’, that didn’t cut it. But adding more sales by doing something new – that did.” (p102). Here, the technique of putting data into “bins” meant that it could be aggregated and analysed more efficiently in bulk and without loss of insight.

Returning to the customer focus, Tesco complemented the RFV analysis with the idea of “Loyalty Cube” within which each customer could be placed (pp126-9).

  • Contribution: that is, contribution to the bottom line, the current profitability of the customer;
  • Commitment: future value – “how likely that customer is to remain a customer”, plus “headroom”, the “potential for the customer to be more valuable in the future”. If you buy all your groceries in Tesco, but not your health and beauty products, there’s headroom there;
  • Championing: brand ambassadors; you may be low contribution, low commitment, but if you refer high value friends and family to Tesco, Tesco will like you:-)

By placing individuals in separate areas of this chart, you can tune your marketing to them, either by marketing items that fall squrely within that area, or if you’re feeling particularly aggressive, by trying to move them from through the differnt areas. As ever, it’s contextual relevancy that’s the key.

But what sort of data is required to locate a customer within the loyalty cube? “The conclusion was that the difference between customers existed in each shopper’s trolley: the choices, the brqnds, the preferences, the priorities and the trade-offs in managing a grocery budget.” (p129).

The shopping basket could tel a lot about two dimensions of the loyalty cube. Firstly, it could quantify contribution, simply by looking at the profit margins on the goods each customer chose. Second, by assessing the calories in a shopping basket, it could measure the headroom dimension. Just how much of a customer’s food needs does Tesco provide?

(Do you ever feel like you’re being watched…?;-)

“Products describe People” (p131): one way of categorising shoppers is to cluster them according to the things they buy, and identify relationships between the products that people buy (people who buy this, also tend to buy that). But the same product may have a different value to different people. (Thinking about this in terms of the OU Course Profiles app, I guess it’s like clustering people based on the similar courses they have chosen. And even there, different values apply. For example, I might dip into the OU web services course (T320) out of general interest, you might take it because it’s a key part of your professional development, and required for your next promotion).

Clustering based on every product line (or SKU – stock keeping unit) is too highly dimensional to be interesting, so enter “The Bucket” (p132): “any significant combination of products that appeared from the make up of a customer’s regular shopping baskets. Each Bucket was defined initially by a ‘marker’, a high volume product that had a particular attribute. It might typify indulgence, or thrift, or indicate the tendency to buy in bulk. … [B]y picking clusters of products that might be bought for a shared reason, or from a shared taste” the large number of Buckets required for the marker approach could be reduced to just 80 Buckets using the clustered products approach. “Every time a key item [an item in one of the clusters that identifes a Bucket] was scanned [at the till],it would link that Clubcard member with an appropriate Bucket. The combination of which shoppers bought from which Buckets, and how many items in those Buckets they bought, gave the first insight into their shopping preferences” (p133).

By applying cluster analysis to the Buckets (i.e. trying to see which Buckets go together) the next step was to identify user lifestyles (p134-5). 27 of them… Things like “Loyal Low Spenders”, “Can’t Stay Aways”, “Weekly Shoppers”, “Snacking and Lunch Box” and “High Spending Superstore Families”.

Identifying people from the products they buy and clustering on that basis is one way of working. But how about defining products in terms of attributes, and then profiling people based on those attributes?

Take each product, and attach to it a series of appropriate attributes, describing what that product implicitly represented to Tesco customers. Then buy scoring those attributes for each customer based on their shopping behaviour, and building those scores into an aggregate measurement per individual, a series of clusters should appear that would create entirely new segments. (p139)

(As a sort of example of this, brand tags has a service that lets you see what sorts of things people associate with corporate brands. I imagine a similar sort of thing applies to Kellogs cornflakes and Wispa chocolate bars ;-)

In the end, 20 attributes were chosen for each product (p142). Clustering people based on the attributes of the products they buy produces segments defined by their Shopping Habits. For these segments to be at their most useful, each customer should slot neatly into a single segment, each segment needs to be large enough to be viable for it to be acted on, as well as being distinctive and meaningful. Single person segments are too small to be exploited cost effectively (pp148-9).

Here a few more insights that I vaguely seem to remember from the book, that you may or may not think are creepy and/or want to drop into conversation down the pub:-)

  • calorie count – on the food side, calorie sellers are the competition. We all need so many calories a day to live. If you do a calorie count on the goods in someone’s shopping basket, and you have an idea of the size of the household, you can find out whether someone is shopping elsewhere (you’re not buying enough calories to keep everyone fed) and maybe guess when a copmetitor has stolen some of your business or when someone has left home. (If lots of shoppers from a store stop buying pizza, maybe a new pizza delivery service has started up. If a particular family’s basket takes a 15% drop in calories, maybe someone has left home)?
  • life stage analysis – if you know the age, you can have a crack at the life stage. Pensioners probably don’t want to buy kids’ breakfast cereal, or nappies. This is about as crude as useful segmentation gets – but it’s easy to do…
  • Beer and nappies go together – young bloke has a baby, has to go shopping for the first time in his life, gets the nappies, sees the beers, knows he won’t be going anywhere for the next few months, and gets the tinnies in… (I think that was from this book!;-)

Anyway, time to go and read the Tesco Clubcard Charter I think?;-)

PS here’s an interesting, related, personal tale from a couple of years ago: Tesco stocks up on inside knowledge of shoppers’ lives (Guardian Business blog, Sept. 2005) [thanks, Tim W.]

PPS Here are a few more news stories about the Tesco Clubcard: Tesco’s success puts Clubcard firm on the map (The Sunday Times, Dec. 2004), Eyes in the till (FT, Nov 2006), and How Tesco is changing Britain (Economist, Aug. 2005) and Getting an edge (Irish Times, Oct 2007) which both require a login, so f**k off…).

PPPS see also More remarks on the Tesco data play/, although having received at takedown notice at the time from Dunnhumby, the post is less informative than in was when originally posted…

So What Do You Think You’re Doing, Sonny?

A tweet from @benjamindyer alerted me to a trial being run in Portsmouth where “behavioural analytics” are being deployed on the city’s CCTV footage in order to “alert a CCTV operator to a potential crime in the making” (Portsmouth gets crime-predicting CCTV).

I have to say this reminded me a little, in equal measures, of Phillip Kerr’s A Philosophical Investigation, and the film Minority Report, both of which explore, in different ways, the idea of “precrime”, or at least, the likelihood of a crime occurring, although I suspect the behavioural video analysis still has some way to go before it is reliable…!

When I chased the “crime predicting CCTV” story a little, it took me to Smart CCTV, the company behind the system being used in Portsmouth.

And seeing those screenshots, I wondered – wouldn’t this make for a brilliant bit of digital storytelling, in which the story is a machine interpretation of life going on, presented via a series of automatically generated, behavioural analysis subtitles, as we follow an unlikely suspect via the CCTV network?

See also: CCTV hacked by video artists, Red Road, Video Number Plate Recognition (VNPR) systems, etc. etc.

PS if you live in Portsmouth, you might as well give up on the idea of privacy. For example, add in a bit of Path Intelligence, “the only automated measurement technology that can continuously monitor the path that your shoppers or passengers take” which is (or at least, was) running in Portsmouth’s Gunwharf Quays shopping area (Shops track customers via mobile phone), and, err, erm… who knows?!

PPS it’s just so easy to feed paranoia, isn’t it? Gullible Twitter users hand over their usernames and passwords – did you get your Twitterank yet?! ;-)

Realising the Value of Library Data

For anyone listening out there in library land who hasn’t picked up on Dave Pattern’s blog post from earlier today – WHY NOT? Go and read it, NOW: Free book usage data from the University of Huddersfield:

I’m very proud to announce that Library Services at the University of Huddersfield has just done something that would have perhaps been unthinkable a few years ago: we’ve just released a major portion of our book circulation and recommendation data under an Open Data Commons/CC0 licence. In total, there’s data for over 80,000 titles derived from a pool of just under 3 million circulation transactions spanning a 13 year period.

http://library.hud.ac.uk/usagedata/

I would like to lay down a challenge to every other library in the world to consider doing the same.

So are you going to pick up the challenge…?

And if not, WHY NOT? (Dave posts some answers to the first two or three objections you’ll try to raise, such as the privacy question and the licensing question.)

He also sketches out some elements of a possible future:

I want you to imagine a world where a first year undergraduate psychology student can run a search on your OPAC and have the results ranked by the most popular titles as borrowed by their peers on similar courses around the globe.

I want you to imagine a book recommendation service that makes Amazon’s look amateurish.

I want you to imagine a collection development tool that can tap into the latest borrowing trends at a regional, national and international level.

DON’T YOU DARE NOT DO THIS…

See also a presentation Dave gave to announce this release – Can You Dig It? A systems Perspective:

What else… Library website analytics – are you making use of them yet? I know the OU Library is collecting analytics on the OU Library website, although I don’t think they’re using them? (Knowing that you had x thousand page views last week is NOT INTERESTING. Most of them were probably people flailing round the site failing to find what they wanted? (And before anyone from the Library says that’s not true, PROVE IT TO ME – or at least to yourself – with some appropriate analytics reports.) For example, I haven’t noticed any evidence of changes to the website or A/B testing going on as a result of using Googalytics on the site??? (Hmmm – that’s probably me in trouble again…!;-)

PS I’ve just realised I didn’t post a link to Course Analytics presentation from Online Info last week, so here it is:

Nor did I mention the follow up podcast chat I had about the topic with Richard Wallis from Talis: Google Analytics to analyse student course activity – Tony Hirst Talks with Talis.

Or the “commendation” I got at the IWR Information Professional Award ceremony. I like to think this was for being the “unprofessional” of the year (in the sense of “unconference”, of course…;-). It was much appreciated, anyway :-)

Library Analytics (Part 8)

In Library Analytics (Part 7), I posted a couple of ideas about how it might be an idea if the Library started crafting URLs for the the Library resources pages for individual courses in the Moodle VLE that contained a campaign tracking code, so that we could track the behaviour of students coming into the Library site by course.

From a quick peak at a handful of courses in the VLE, that recommendation either doesn’t appear to have been taken up, or it’s just “too hard” to do, so that’s another couple of months data we don’t have easy access to in the Google Analytics environment. (Or maybe the Library have moved over to using the OU’s SIte Analytics service for this sort of insight?)

Just to recall, we need to put some sort of additional measures in place because Moodle generates crappy URLs (e.g. URLs of the form http://learn.open.ac.uk/mod/resourcepage/view.php?id=119070) and crafting nice URLs or using mod-rewrite (or similar) definitely is way too hard for the VLE’n’network people to manage;-) The default set up of Google Analytics dumps everything after the “?”, unless they are official campaign tracking arguments or are captured otherwise.

(From a quick scan of Google Analytics Tracking API, I’m guessing that setting pageTracker._setCampSourceKey(“id”); in the tracking code on each Library web page might also capture the id from referrer URLs? Can anyone confirm/deny that?)

Aside: from what I’ve been told, I don’t think we offer server side compression for content served from most http://www.open.ac.uk/* sites, either (though I haven’t checked)? Given that there are still a few students on low bandwidth connections and relatively modern browsers, this is probably an avoidable breach of some sort of accessibility recommendation? For example, over the lat 3 weeks or so, here’s the number of dial-up visits to the Library website:

A quick check of the browser stats shows that IE breaks down almost completely as IE6 and above; all of which cope with compressed files, I think?

[Clarification (?! heh heh) re: dial-in stats – “when you’re looking at the dial-up use of the Library website is that we have a dial-up PC in the Library to replicate off-campus access and to check load times of our resources. So it’s probably worth filtering out that IP address (***.***.***.***) to cut out library staff checking out any problems as this will inflate the perceived use of dial-up by our students. Even if we’ve only used it once a day then that’s a lot of hits on the website that aren’t really students using dial-up” – thanks, Clari :-)]

Anyway – back to the course tracking: as a stop gap, I created a few of my own reports that use a user defined argument corresponding to the full referrer URL:

We can then view reports according to this user defined segment to see which VLE pages are sending traffic to the Library website:

Clicking through on one of these links gives a report for that referrer URL, and then it’s easy to see which landing pages the users are arriving at (and by induction, which links on the VLE page they clicked on):

If we look at the corresponding VLE page:

Then we can say that the analytics suggest that the Open University Library – http://library.open.ac.uk/, the Online collections by subject – http://library.open.ac.uk/find/eresources/index.cfm and the Library Help & Support – http://library.open.ac.uk/about/index.cfm?id=6939 are the only links that have been clicked on.

[Ooops… “Safari & Info Skills for Researchers are our sites, but don’t sit within the library.open.ac.uk domain ([ http://www.open.ac.uk/safari ]www.open.ac.uk/safari and [ http://www.open.ac.uk/infoskills-researchers ]www.open.ac.uk/infoskills-researchers respectively) and the Guide to Online Information Searching in the Social Sciences is another Moodle site.” – thanks Clari:-) So it may well be that people are clicking on the other links… Note to self – if you ever see 0 views for a link, be suspicious and check everything!]

(Note that I have only reported on data from a short period within the lifetime of the course, rather than data taken from over the life of the course. Looking at the incidence of traffic over a whole course presentation would also give an idea of when during the course students are making use of the Library resource page within the course.)

Another way of exploring how VLE referrer traffic is impacting on the Library website is to look at the most popular Landing pages and then see which courses (from the user defined segment) are sourcing that traffic.

So for example, here are the VLE pages that are driving traffic to the elluminate registration page:

One VLE page seems responsible:

Hmmm… ;-)

How about the VLE pages driving traffic to the ejournals page?

And the top hit is….

… the article for question 3 on TMA01 of the November 2008 presentation of M882.

The second most popular referrer page is interesting because it contains two links to the Library journals page:

2008-12-14_0016

Unfortunately, there’s no way of disambiguating which link is driving the tracking – which is one good reason why a separate campaign related tracking code should be associated with each link.

(Do you also see the reference to Google books in there? Heh heh – surely they aren’t suggesting that students try to get what they need from the book via the Google books previewer?!;-)

Okay – enough for now. To sum up, we have the opportunity to provide two sorts of report – one for the Library to look at how VLE sourced traffic as a whole impacts on the Library website; and a different set of reports that can be provided to course teams and course link librarians to show how students on the course are using the VLE to access Library resources.

PS if you havenlt yet watch Dave Pattern’s presentation on mining lending data records, do so NOW: Can You Dig It? A Systems Perspective.

Trackforward – Following the Consequences with N’th Order Trackbacks

One of the nice things about blogging within the WordPress ecosystem is the way that trackbacks/pingbacks capture information about posts that link back to your posts, in much the same way that using the link: search limit on a web or blog search engine allows you to see what other webpages are linking back to a particular web page.

In the latter case, for example, searching for link:http://hedebate.jiscinvolve.org/on-line-higher-education-learning/ on Google blogsearch will turn up blog posts that link back to the original HE Debate blog post on On-Line Higher Education Learning.

(Actually, that’s not quite true. In an apparent tweak of the Google blogsearch algorithm last year, the Google blogsearch engine now seems to be indexing and returning results from complete web pages rather than indexing the content of RSS feeds i.e. blog posts – which means that as well as the useful links referred to in the body of a post, links are also indexed from blogrolls, twitter feeds and bookmark lists displayed in blog sidebars, blog comments etc etc. Which in turn is to say that Google blogsearch qua a web search of blog web pages is not much use as a blog search engine at all…)

By judicious linking back to your own blog posts, it’s possible to build up quite complex pathways between related posts that are navigable in two directions: from one post that links to another, previously published post, via an inline link; and “forwards” in time to a later post that has itself linked back to a post of interest and been picked up via a trackback/pingback.

(For examples of these emergent link structures, see Emergent Structure in the Digital Worlds Uncourse Blog Experiment, Uncovering a Little More Digital Worlds Structure and Trackback Graphs and Blog Categories.)

So the question arises – if I write a blog post that several other people link back to, and several further posts in turn link back to those posts that referred back to my post, but not my original post, how do I keep track of the conversation?

Keeping track of posts that cite my post is easy enough – if I have an effective pingback set-up, that will tell me who’s linking back to my posts; or I can simply run link: searches against the URLs of my posts every so often to see who the search engines think are linking back to me.

The answer lies in a recursive algorithm of the form:

function showInLinks($url){
  $links=getLinksto($url);
  foreach ($link in $links){
    print $link;
    showInLinks($link)
  }
}

This will then display URLs for the pages that link to an originally specified URL, the URLs of pages that link to those URLs, and so on…

So here for example is a quick test:

The items numbered “1.” are links that Google blogsearch thinks link back to the original URL. The items numbered “2.” are links that link to the links that link back to the original URL.

Here’s some minimal PHP code if you want to try it out:

<?php
$urlstub = "http://ajax.googleapis.com/ajax/services/search/blogs?scoring=d&v=1.0&rsz=large&q=link%3A";
$url="http://halfanhour.blogspot.com/2008/11/future-of-online-learning-ten-years-on_16.html";
if ($_GET['url']) $url=$_GET['url'];
$testurl=$urlstub.$url;
echo "Starting with: ".$url."<br/>";
echo "via: ".$testurl."<br/><br/>";
$depth=0;

function handlelinks($url, $depth){
	$urlstub = "http://ajax.googleapis.com/ajax/services/search/blogs?v=1.0&rsz=large&q=link%3A";
	//echo "testing".$url."  ";
	$depth++;
	$testurl=$urlstub.$url;
	//echo "testing ".$testurl."  ";
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $testurl);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	$body = curl_exec($ch);
	curl_close($ch);
	// now, process the JSON string
	$json = json_decode($body);
	//var_dump($json); echo "<br/&gt";    
	if ($depth<3) 
	  foreach (responseData->results as $result) {
		for ($i=0;$i<$depth;$i++) echo "  ";
		echo $depth.".$result->title;
		echo "<a href='".$result->postUrl."'>".$result->postUrl."</a><br/>";
		handlelinks($result->postUrl, $depth);
	 }
}
handlelinks($url, $depth);
?>

By using this sort of algorithm to generate an RSS feed of links, it becomes possible to subscribe to a feed that will keep you updated of all the downstream posts (“blogversation” posts) that are contributing to a discussion that at some point referred to a URL you are interested in.

What Are JISC’s Funding Priorities?

I’ve just got back home from a rather wonderful week away at the JISC Developer Happiness Days (dev8D), getting a life (of a sort?!;-) so now it’s time to get back to the blog…

My head’s still full of things newly learned from the last few days, so while I digest it, here’s a quick taster of something I hope to dabble a little more with over the next week for the developer decathlon, along with the SplashURL.net idea (which reminds me of my to do list…oops…)

A glimpse of shiny things to do with JISC project data (scraped from Ross’s Simal site… [updated simal url] (see also: Prod).

Firstly, a Many Eyes tag cloud showing staffing on projects by theme:

Secondly, a many Eyes pie chart showing the relative number of projects by theme:

As ever, the data may not be that reliable/complete, because I believe it’s a best effort scrape of the JISC website. Now if only they made their data available in a nice way???;-)

Following a session in the “Dragon’s Den”, where I was told by Rachel Bruce that these charts might be used for good as a well as, err, heckling, I guess, Mark van Harmalen that I should probably pay lip service to who potential users might be, and Jim Downing’s suggestion that I could do something similar for research council projects, I also started having a play with data pulled from the the JISC website.

So for example, here’s a treemap showing current EPSRC Chemistry programme area grants >2M UKP by subprogramme area:

And if you were wondering who got the cash in the Chemistry area, here’s a bubble chart showing projects held by named PIs, along with their relative value:

If you try out the interactive visualisation on Many Eyes, you can hover over each person bubble to see what projects they hold and how much they’re worth:

PS thanks to Dave Flanders and all at JISC for putting the dev8D event on and managing to keep everything running so smoothly over the week:-) Happiness 11/10…