First Dabblings With @daveyp’s MOSAIC Library Competition Data API

A couple of days ago, Dave Pattern published a simple API to the JISC MOSAIC project HE library loans data that has been opened up as part of a developer competition running over the summer (Simple API for JISC MOSAIC Project Developer Competition data [competition announcement]).

The API offers a variety of simple queries on the records data (we like simple queries:-) that allow the user to retrieve records according to ISBN of the book that was borrowed, books that were taken out by people on a particuler course, (using a UCAS course code identifier) or a combination of the two.

The results file from a search is returned as an XML file containing zero or more results records; each record has the form:

<useRecord row="19">
 <from>
  <institution>University of Huddersfield</institution>
  <academicYear>2008</academicYear>
 </from>
 <resource>
  <media>book</media>
  <globalID type="ISBN">1856752321</globalID>
  <author>Mojay, Gabriel.</author>
  <title>Aromatherapy for healing the spirit : a guide to restoring emotional and mental balance through essential oils /</title>
  <localID>582083</localID>
  <catalogueURL>http://library.hud.ac.uk/catlink/bib/582083</catalogueURL
  <publisher>Gaia</publisher><published>2005</published>
 </resource>
 <context>
  <courseCode type="ucas">B390</courseCode>
  <courseName>FdSc Holistic Therapies</courseName>
  <progression>UG1</progression>
 </context>
</useRecord>

[How to style code fragments in a hosted Wordpres.com blog, via WordPress Support: “Code” shortcode]

(For a more complte description of the record format, see Mosaic data collection – A guide [PDF])

As a warm up exercise to familiarise myself with the data, I did a proof of concept hack (in part inspired by the Library Traveller script) that would simply linkify course codes appearing on a UCAS courses results page such that each course code would link to a results page listing the books that had been borrowed by students associated with courses with a similar course code in previous years.

Looking at the results page, w can see the course code appears next to tha name of each course:

A simple bookmarklet can b used to linkify the qualification code so that it points to the results of a query on the MOSAIC data with the appropriate course code:

javascript:(function(){
 var regu=new RegExp("\([A-Z0-9]{4}\)");
 var s=document.getElementsByTagName('span');
 for (i=0;i<s.length;i++){
  if (s&#91;i&#93;.getAttribute('class')=='bodyTextSmallGrey')
   if (regu.test(s&#91;i&#93;.innerHTML)){
    var id=regu.exec(s&#91;i&#93;.innerHTML);
    s&#91;i&#93;.innerHTML=s&#91;i&#93;.innerHTML.replace( regu,
     "<a href=\"http://library.hud.ac.uk/mosaic/api.pl?ucas="
     +id&#91;0&#93;+"\">"+id[0]+"</a>");}}})()

(It’s trivial to convert this to to Greasemonkey script: simple add the required Greasemonkey header and save the file with an appropriate filename – e.g. ucasCodeLinkify.user.js)

Clicking on the bookmarklet linkifies the qualification code to point to a search on http://library.hud.ac.uk/mosaic/api.pl?ucas= wigth the appropriate code.

To make the results a little friendlier, I created a simple Yahoo pipe that generates an RSS feed containing a list of books (along with their book covers) that had been borrowed more than a specified minimum number of times by people associated with that course code in previous years.

To start with, create the URI with a particular UCAS qualification code to call the web service and pull in the XML results feed:

Map elements onto legitimate RSS feed elements:

and strip out duplicate books. Note that the pipe also counts how many duplicate (?!) items there are:

Now filter the items based on the duplication (replication?) count – we want to only see books that have been borrowed at least a minimum number of times (this rules out ‘occasional’ loans on unrelated courses by single individuals – we only want the popular or heavily borrowed books associated with a particular UCAS qualification code in the first instance):

Finally, create a link to each book on Google books, and grab the book cover. Note that I assume the global identifier is an ISBN10… (if it’s an ISBN13, see Looking Up Alternative Copies of a Book on Amazon, via ThingISBN for a defensive measure using the LibraryThing Check ISBN API. That post also points towards a way by which we might find other courses that are associated with different editions of a particular book… ;-)


http://pipes.yahoo.com/pipes/pipe.info?_id=HqJfbDR93hGE5DAewmH_9A

You can find the pipe here: MOSAIC data: books borrowed on particular courses.

If we now change the linkifying URL to the RSS output of the pipe, we can provide a link for each course on the UCAS course search results page that points to an RSS feed of reading material presumably popularly associated with the course in previous years (note, however, that note all codes have books associated with them).

To do this, simply change the following URI stub in the bookmarklet:
http://library.hud.ac.uk/mosaic/api.pl?ucas=
to
http://pipes.yahoo.com/pipes/pipe.run?_id=HqJfbDR93hGE5DAewmH_9A&_render=rss&min=3&cc=

The “popular related borrowings” reading list generated now allows a potential student to explore some of the books associated with a particular course at decision time:-)

One possible follow on step would be to look up other courses related to each original course by virtue of several people having borrowed the same book (or other editions of it) on other courses. Can you see how you might achieve that, or at least what sorts of steps you need to implement?

PS If anyone would like to work this recipe up properly as (part of) a competition entry, feel free, though I’d appreciate a cut of any prize money if you win;-)

Are You Ready To Play “Search Engine Consequences”?

In the world of search, what happens when your local library categorises the film “Babe” as a teen movie in the “classic DVDs” section of the Library website? What happens if you search for “babe teen movie” on Google using any setting other than Safe Search set to “Use strict filtering”? Welcome, in a roundabout way, to the world of search engine consequences.

To set the scene, you need to know a few things. Firstly, how the web search engines know about your web pages in the first place. Secondly, how they rank your pages. And thirdly, how the ads and “related items” that annotate many search listings and web pages are selected (because the whole point about search engines is that they are run by advertising companies, right?).

So, how do the search engines know where you pages are? Any offers? You at the back there – how does Google know about your library home page?

As far as I know, there are three main ways:

  • a page that is already indexed by the search engine links to your page. When a search engine indexes a page, it makes a note of all the pages that are linked to from that page, so these pages can be indexed in turn by the search engine crawler (or spider). (If you have a local search engine, it will crawl your website in a similar way, as this documentation about the Google Search Appliance crawler describes.)
  • the page URL is listed in a Sitemap, and your website manager has told the search engine where that Sitemap lives. The Sitemap lists all the pages on your site that you want indexing. This helps the search engine out – it doesn’t have to crawl your site looking for all the pages – and it helps you out: you can tell the search engine how often the page changes, and how often it needs to be re-indexed, for example.
  • you tell the search engine at a page level that the page exists. For example, if your page includes any Google Adsense modules, or Google Analytics tracking codes, Google will know that page exists the first time it is viewed.

And once a page has been indexed by a search engine, it becomes a possible search result within that search engine.

So when someone makes a query, how are the results selected?

During the actual process of indexing, the search engine does some voodoo magic to try and understand what the page is about. This might be as simple as counting the occurrence of every different word on the page, or it might be an actual attempt to understand what the page is about using all manner of heuristics and semantic engineering approaches. Pages deemed a “hit” for a particular query if the search terms can be used to look up the page in the index. The hits are then rank ordered according to a score for each page calculated according to whatever algorithm the search engine uses. Typically this is some function of both “relevance” and “quality”.

“Relevance” is identified in part by comparing how the page has been indexed compared to the query.

“Quality” often relates to how well regarded the page is in the greater scheme of things; for example, link analysis identifies how many other pages link to the page (where we assume a link is some sort of vote of confidence for the page) and clickthrough analysis monitors how many times people click through on a particular result for a particular query in a search engine results listing (as search engine companies increasingly run web analytics packages too, they can potentially factor this information back in to calculating how satisfied a user was with a particular page).

So what are the consequences of publishing content to the web, and letting a search engine index it?

At this point it’s worth considering not just web search engines, but also image and video search engines. Take Youtube, for example. Posting a video to Youtube means that Youtube will index it. And one of nice things about Youtube is that it will index your video according to at least the title, description and tags you add to the movie and use this as the basis for a recommending other “related” videos to you. You might think of this in terms of Youtube indexing your video in a particular way, and then running a query for other videos using those index terms as the search terms.

Bearing that in mind, a major issue is that you can’t completely control where a page might turn up in a search engine results listing or what other items might be suggested as “related items”. If you need to be careful managing your “brand” online, or you have a duty of care to your users, being associated with inappropriate material in a search engine results listing can be a risky affair.

To try and ground this with a real world example, check out David Lee King’s post from a couple of weeks ago on YouTube Being Naughty Today. It tells a story of how “[a] month or so ago, some of my library’s teen patrons participated in a Making Mini Movie Masterpieces program held at my library. Cool program!”

One of our librarians just posted the videos some of the teens made to YouTube … and guess what? In the related videos section of the video page (and also on the related videos flash thing that plays at the end of an embedded YouTube video), some … let’s just say “questionable” videos appeared.

Here’s what I think happened: YouTube found “similar” videos based on keywords. And the keywords it found in our video include these words in the title and description: mini teen teens . Dump those into YouTube and you’ll unfortunately find some pretty “interesting” videos.

And here’s how I commented on the post:

If you assume everything you publish on the web will be subject to simple term extraction or semantic term extraction, then in a sense it becomes a search query crafted around those terms that will potentially associate the results of that “emergent query” with the content itself.

One of the functions of the Library used to be classifying works so they could be discovered. Maybe now there is a need for understanding how machines will classify web published content so we can try to guard against “consequential annotations”?

For a long time I’ve thought one role for the Library going forwards is SEO – and raising the profile of the host institution’s content in the dominant search engines. But maybe an understanding of SEO is also necessary in a *defensive* capacity?

This need for care is particularly relevant if you run Adsense on your website. Adsense works in a conceptually similar way to the Youtube related videos service, so you might think of it in the following terms: the page is indexed, key terms are identified and run as “queries” on the Adsense search engine, returning “relevant ads” as a result. “Relevance” in this case is calculated (in part) based on how much the advertiser is willing to pay for their advert to be returned against a particular search term or in the context of a page that appears to be about a particular topic.

Although controls are starting to appear that give the page publisher an element of control over what ads appear, their is still uncertainty in the equation, as there is whenever your content appears alongside content that is deemed “relevant” or related in some way – whether that’s in the context of a search engine results listing, an Adsense placement, or a related video.

So one thing to keep in mind – always – is how might your page be indexed, and what sorts of query, ad placement or related item might it be the perfect match for? What are the search engine consequences of writing your page in a particular way, or including particular key words in it?

David’s response to my comment identified a major issue here: “The problem is SO HUGE though, because … at least at my library’s site … everyone needs this training. Not just the web dudes! In my example above, a youth services librarian posted the video – she has great training in helping kids and teens find stuff, in running successful programs for that age group … but probably not in SEO stuff.”

I’m not sure I have the answer, but I think this is another reason Why Librarians Need to Know About SEO – not just so they can improve the likelihood of content they do approve of appearing in search engine listings or as recommended items, but also so they can defend against unanticipated SEO, where they unconsciously optimise a page so that it fares well on an unanticipated, or unwelcome, search.

What that means is, you need to know how SEO works so you don’t inadvertently do SEO on something you didn’t intend optimising for; or so you can institute “SEO countermeasures” to try to defuse any potentially unwelcome search engine consequences that might arise for a particular page.

Library Analytics (Part 8)

In Library Analytics (Part 7), I posted a couple of ideas about how it might be an idea if the Library started crafting URLs for the the Library resources pages for individual courses in the Moodle VLE that contained a campaign tracking code, so that we could track the behaviour of students coming into the Library site by course.

From a quick peak at a handful of courses in the VLE, that recommendation either doesn’t appear to have been taken up, or it’s just “too hard” to do, so that’s another couple of months data we don’t have easy access to in the Google Analytics environment. (Or maybe the Library have moved over to using the OU’s SIte Analytics service for this sort of insight?)

Just to recall, we need to put some sort of additional measures in place because Moodle generates crappy URLs (e.g. URLs of the form http://learn.open.ac.uk/mod/resourcepage/view.php?id=119070) and crafting nice URLs or using mod-rewrite (or similar) definitely is way too hard for the VLE’n’network people to manage;-) The default set up of Google Analytics dumps everything after the “?”, unless they are official campaign tracking arguments or are captured otherwise.

(From a quick scan of Google Analytics Tracking API, I’m guessing that setting pageTracker._setCampSourceKey(“id”); in the tracking code on each Library web page might also capture the id from referrer URLs? Can anyone confirm/deny that?)

Aside: from what I’ve been told, I don’t think we offer server side compression for content served from most http://www.open.ac.uk/* sites, either (though I haven’t checked)? Given that there are still a few students on low bandwidth connections and relatively modern browsers, this is probably an avoidable breach of some sort of accessibility recommendation? For example, over the lat 3 weeks or so, here’s the number of dial-up visits to the Library website:

A quick check of the browser stats shows that IE breaks down almost completely as IE6 and above; all of which cope with compressed files, I think?

[Clarification (?! heh heh) re: dial-in stats – “when you’re looking at the dial-up use of the Library website is that we have a dial-up PC in the Library to replicate off-campus access and to check load times of our resources. So it’s probably worth filtering out that IP address (***.***.***.***) to cut out library staff checking out any problems as this will inflate the perceived use of dial-up by our students. Even if we’ve only used it once a day then that’s a lot of hits on the website that aren’t really students using dial-up” – thanks, Clari :-)]

Anyway – back to the course tracking: as a stop gap, I created a few of my own reports that use a user defined argument corresponding to the full referrer URL:

We can then view reports according to this user defined segment to see which VLE pages are sending traffic to the Library website:

Clicking through on one of these links gives a report for that referrer URL, and then it’s easy to see which landing pages the users are arriving at (and by induction, which links on the VLE page they clicked on):

If we look at the corresponding VLE page:

Then we can say that the analytics suggest that the Open University Library – http://library.open.ac.uk/, the Online collections by subject – http://library.open.ac.uk/find/eresources/index.cfm and the Library Help & Support – http://library.open.ac.uk/about/index.cfm?id=6939 are the only links that have been clicked on.

[Ooops… “Safari & Info Skills for Researchers are our sites, but don’t sit within the library.open.ac.uk domain ([ http://www.open.ac.uk/safari ]www.open.ac.uk/safari and [ http://www.open.ac.uk/infoskills-researchers ]www.open.ac.uk/infoskills-researchers respectively) and the Guide to Online Information Searching in the Social Sciences is another Moodle site.” – thanks Clari:-) So it may well be that people are clicking on the other links… Note to self – if you ever see 0 views for a link, be suspicious and check everything!]

(Note that I have only reported on data from a short period within the lifetime of the course, rather than data taken from over the life of the course. Looking at the incidence of traffic over a whole course presentation would also give an idea of when during the course students are making use of the Library resource page within the course.)

Another way of exploring how VLE referrer traffic is impacting on the Library website is to look at the most popular Landing pages and then see which courses (from the user defined segment) are sourcing that traffic.

So for example, here are the VLE pages that are driving traffic to the elluminate registration page:

One VLE page seems responsible:

Hmmm… ;-)

How about the VLE pages driving traffic to the ejournals page?

And the top hit is….

… the article for question 3 on TMA01 of the November 2008 presentation of M882.

The second most popular referrer page is interesting because it contains two links to the Library journals page:

2008-12-14_0016

Unfortunately, there’s no way of disambiguating which link is driving the tracking – which is one good reason why a separate campaign related tracking code should be associated with each link.

(Do you also see the reference to Google books in there? Heh heh – surely they aren’t suggesting that students try to get what they need from the book via the Google books previewer?!;-)

Okay – enough for now. To sum up, we have the opportunity to provide two sorts of report – one for the Library to look at how VLE sourced traffic as a whole impacts on the Library website; and a different set of reports that can be provided to course teams and course link librarians to show how students on the course are using the VLE to access Library resources.

PS if you havenlt yet watch Dave Pattern’s presentation on mining lending data records, do so NOW: Can You Dig It? A Systems Perspective.

Revisiting the Library Flip – Why Librarians Need to Know About SEO

What does information literacy mean in the age of web search engines? I’ve been arguing for some time (e.g. in The Library Flip) that one of the core skills going forward for those information professionals who “help people find stuff” is going to be SEO – search engine optimisation. Why? Because increasingly people are attuned to searching for “stuff” using a web search engine (you know who I’m talking about…;-); and if your “stuff” doesn’t appear near the top of the organic results listing (or in the paid for links) for a particular query, it might as well not exist…

Whereas once academics and students would have traipsed into the library to ask the one of the High Priestesses to perform some magical incantation on a Dialog database through a privileged access terminal, for many people research now starts with a G. Which means that if you want your academics and students to find the content that you’d recommend, then you have to help get that content to the top of the search engine listings.

With the rate of content production growing to seventy three tera-peta-megabits a second, or whatever it is, does it make sense to expect library staffers to know what the good content is, any more (in the sense of “here, read this – it’s just what you need”)? Does it make even make sense to expect people to know where to find it (in the sense of “try this database, it should contain what you need”)? Or is the business now more one of showing people how to go about finding good stuff, wherever it is (in the sense of “here’s a search strategy for finding what you need”) and helping the search engines see that stuff as good stuff?

Just think about this for a moment. If your service is only usable by members of your institution and only usable within the locked down confines of your local intranet, how useful is it?

When your students leave your institution, how many reusable skills are they taking away? How many people doing informal learning or working within SMEs have access to highly priced, subscription content? How useful is the content in those archives anyway? How useful are “academic information skills” to non-academics and non-students? (I’m just asking the question…;-)

And some more: do academic courses set people up for life outside? Irrespective of whether they do or not, does the library serve students on those courses well within the context of their course? Does the library provide students with skills they will be able to use when they leave the campus and go back to the real world and live with Google. (“Back to”? Hah – I wonder how much traffic on HEI networks is launched by people clicking on links from pages that sit on the google.com domain?) Should libraries help students pass their courses, or give them skills that are useful after graduation? Are those skills the same skills? Or are they different skills (and if so, are they compatible with the course related skills?)?

Here’s where SEO comes in – help people find the good stuff by improving the likelihood that it will be surfaced on the front page of a relevant web search query. For example, “how to cite an article“. (If you click through, it will take you to a Google results page for that query. Are you happy with the results? If not, you need to do one of two things – either start to promote third party resources you do like from your website (essentially, this means you’re doing off-site SEO for those resources) OR start to do onsite and offsite SEO on resources you want people to find on your own site.

(If you don’t know what I’m talking about, you’re well on the way to admitting that you don’t understand how web search engines work. Which is a good first step… because it means you’ve realised you need to learn about it…)

As to how to go about it, I’d suggest one way is to get a better understanding of how people actually use library or course websites. (Another is Realising the Value of Library Data and finding ways of mining behavioural data to build recommendation engines that people might find useful.)

So to start off – find out what search terms are the most popular in terms of driving traffic to your Library website (ideally relating to some sort of resource on your site, such as a citation guide, or a tutorial on information skills); run that query on Google and see where you page comes in the results listing. If it’s not at the top, try to improve its ranking. That’s all…

For example, take a look at the following traffic (as collected by Google Analytics) coming in to the OU Library site over a short period some time ago.

A quick scan suggests that we maybe have some interesting content on “law cases” and “references”. For the “references” link, there’s a good proportion of new visitors to the OU site, and it looks from the bounce rate that half of those visited more than one page on the OU site. (We really should do a little more digging at this point to see what those people actually did on site, but this is just for argument’s sake, okay?!;-)

Now do a quick Google on “references” and what do we see?

On the first page, most of the links are relating to job references, although there is one citation reference near the bottom:

Leeds University library makes it in at 11 (at the time of searching, on google.co.uk):

So here would be a challenge – try to improve the ranking of an OU page on this results listing (or try to boost the Leeds University ranking). As to which OU page we could improve, first look at what Google thinks the OU library knows about references:

Now check that Google favours the page we favour for a search on “references” and if it does, try to boost it’s ranking on the organic SERP. If Google isn’t favouring the page we want as its top hit on the OU site for a search on “references”, do some SEO to correct that (maybe we want “Manage Your References” to come out as the top hit?)

Okay, enough for now – in the next post on this topic I’ll look at the related issue of Search Engine Consequences, which is something that we’re all going to have to become increasingly aware of…

PS Ah, what the heck – here’s how to find out what the people who arrived at the Library website from a Google search on “references” were doing onsite. Create an advanced segment:

Google analytics advanced segment

(PS I first saw these and learned how to use them at a trivial level maybe 5 minutes ago;-)

Now look to see where the traffic came in (i.e. the landing pages for that segment):

Okay? The power of segmentation – isn’t it lovely:-)

We can also go back to the “All Visitors” segment, and see what other keywords people were using who ended up on the “How to cite a reference” page, because we’d possibly want to optimise for those, too.

Enough – time for the weekend to start :-)

PS if you’re not sure what techniques to use to actually “do SEO”, check on Academic Search Premier (or whatever it’s called), because Google and Google Blogsearch won’t return the right sort of information, will they?;-)

Realising the Value of Library Data

For anyone listening out there in library land who hasn’t picked up on Dave Pattern’s blog post from earlier today – WHY NOT? Go and read it, NOW: Free book usage data from the University of Huddersfield:

I’m very proud to announce that Library Services at the University of Huddersfield has just done something that would have perhaps been unthinkable a few years ago: we’ve just released a major portion of our book circulation and recommendation data under an Open Data Commons/CC0 licence. In total, there’s data for over 80,000 titles derived from a pool of just under 3 million circulation transactions spanning a 13 year period.

http://library.hud.ac.uk/usagedata/

I would like to lay down a challenge to every other library in the world to consider doing the same.

So are you going to pick up the challenge…?

And if not, WHY NOT? (Dave posts some answers to the first two or three objections you’ll try to raise, such as the privacy question and the licensing question.)

He also sketches out some elements of a possible future:

I want you to imagine a world where a first year undergraduate psychology student can run a search on your OPAC and have the results ranked by the most popular titles as borrowed by their peers on similar courses around the globe.

I want you to imagine a book recommendation service that makes Amazon’s look amateurish.

I want you to imagine a collection development tool that can tap into the latest borrowing trends at a regional, national and international level.

DON’T YOU DARE NOT DO THIS…

See also a presentation Dave gave to announce this release – Can You Dig It? A systems Perspective:

What else… Library website analytics – are you making use of them yet? I know the OU Library is collecting analytics on the OU Library website, although I don’t think they’re using them? (Knowing that you had x thousand page views last week is NOT INTERESTING. Most of them were probably people flailing round the site failing to find what they wanted? (And before anyone from the Library says that’s not true, PROVE IT TO ME – or at least to yourself – with some appropriate analytics reports.) For example, I haven’t noticed any evidence of changes to the website or A/B testing going on as a result of using Googalytics on the site??? (Hmmm – that’s probably me in trouble again…!;-)

PS I’ve just realised I didn’t post a link to Course Analytics presentation from Online Info last week, so here it is:

Nor did I mention the follow up podcast chat I had about the topic with Richard Wallis from Talis: Google Analytics to analyse student course activity – Tony Hirst Talks with Talis.

Or the “commendation” I got at the IWR Information Professional Award ceremony. I like to think this was for being the “unprofessional” of the year (in the sense of “unconference”, of course…;-). It was much appreciated, anyway :-)

Continous Group Exercise Feedback via Twitter?

Yesterday I took part in a session with Martin Weller and Grainne Conole pitching SocialLearn to the Library (Martin), exploring notions of a pedagogy fit for online social learning (Grainne) and idly wodering about how the Library might fit in all this, especially if it became ‘invisible’ (my bit: The Invisible Library):

As ever, the slides are pretty meaningless without me rambling over them… but to give a flavour, I first tried to set up three ideas of ‘invisibleness’:

– invisibility in everyday life (random coffee, compared to Starbucks: if the Library services were coffee, what coffee would they be, and what relationship would, err, drinkers have with them?);

– positive action, done invisibly (the elves and the shoemaker);

– and invisible theatre (actors ‘creating a scene’ as if it were real (i.e. the audience isn’t aware it’s a performance), engaging the audience, and leaving the audience to carry on participating (for real) in the scenario that was set up).

And then I rambled a bit a some webby ways that ‘library services’, or ‘information services’ might be delivered invisibly now and in the future…

After the presentations, the Library folks went into groups for an hour or so, then reported back to the whole group in a final plenary session. This sort of exercise is pretty common, I think, but it suddenly struck me that it could be far more interesting in the ‘reporter’ on each table was actually twittering during the course of the group discussion? This would serve to act as a record for each group, might allow ‘semi-permeable’ edges to group discussions (although maybe you don’t want groups to be ‘sharing’ ideas, and would let the facilitator (my experience is that there’s usually a facilitator responsible whenever there’s a small group exercise happening!) eavesdrop on every table at the same time, and maybe use that as a prompt for wandering over to any particular group to get them back on track, or encourage them to pursue a particular issue in a little more detail?

ORO Results in Yahoo SearchMonkey

It’s been a long – and enjoyable – day today (err, yesterday, I forgot to post this last night!), so just a quick placeholder post, that I’ll maybe elaborate on with techie details at a later date, to show one way of making some use of the metadata that appears in the ORO/eprints resource splash pages (as described in ORO Goes Naked With New ePrints Server): a Yahoo SearchMonkey ORO augmented search result – ORO Reference Details (OUseful).

The SearchMonkey extension – which when “installed” in your Yahoo profile, will augment ORO results in organic Yahoo search listings with details about the publication the reference appears in, the full title (or at least, the first few characters of the title!), the keyowrds used to describe the reference and the first author, along with links to a BibTeX reference and the document download (I guess I could also add a link in there to a full HTML reference?)

The SearchMonkey script comes in two parts – a “service” that scrapes the page linked to from the results listing:

And a “presentation” part, that draws on the service to augment the results:

It’s late – I’m tired – so no more for now; if you interested, check out the Yahoo SearchMonkey documentation, or Build your own SearchMonkey app.

ORO Goes Naked With New ePrints Server

A few weeks ago, the OU Open Repository Online (“ORO”) had an upgrade to the new eprints server (breaking the screen scraping Visualising CoAuthors in Open Repository Online Papers demos I’d put together, sigh…).

I had a quick look at the time, and was pleased to see quite a bit of RSS support, as the FAQ describes:

Can I set up RSS feeds from ORO?
RSS feeds can be generated using search results.

To create a feed using a search on ORO:

Enter the search terms and click search. RSS icons will be displayed at the top of the search results. Right click the icon and click on Copy Shortcut. You can then paste the string into your RSS reader.

It is also possible to set up three types of RSS feed, by OU author, by department and by the latest 20 additions to ORO.

To create a feed by OU author start with the following URL:

http://oro.open.ac.uk/cgi/latest_tool?
mode=person&value=author&output=RSS

Please note the capital “RSS” at the end of the string

Substitute author for the author’s OUCU and paste the new string into your RSS reader.

To create a feed by department start with this URL:

http://oro.open.ac.uk/cgi/latest_tool?
mode=faculty&value=math-math&output=RSS

Please note the capital “RSS” at the end of the string

This displays all research that relates to Maths (represented by the code “math-math”). To extract the other department codes used by ORO, go to the following URL:
http://oro.open.ac.uk/view/faculty_dept/faculty_dept.html
locate your department and note the URL (this will appear in the bottom left corner of the screen when you hover over the link). The departmental code is situated between “http://oro.open.ac.uk/view/faculty_dept/&#8221; and “.html”, e.g. “cobe”, “arts-musi”, etc. Copy the department code into the relevant part of the string and paste the string into an RSS reader.

To create a feed of the latest 20 additions to ORO use this URL:
http://oro.open.ac.uk/cgi/latest_tool&output=RSS

This feed can also be generated by right clicking on the RSS icons in the top right corner of the screen and choosing copy shortcut

The previous version of eprints offered an OAI-PMH endpoint, which I haven’t found on the new setup, but there is lots of export and XML goodness for each resource lodged with the repository – at last, it’s gettin’ nekkid with its data, as a quick View Source of the HTML splash page for a resource shows:

Output formats include an ASCII, BibTeX, EndNote, Refer, Reference Manager and HTML Citations; a Dublin Core description of the resource; an EP3 XML format; METS and MODS (whatever they are?!); and an OpenURL ContextObject description.

The URLs to each export format are regularly efined and keyed by the numerical resource identifier, (which also keys the URL to the resource’s HTML splash page).

The splash page also embodies a resource description meta data in the head (although the HTML display elements in the body of the page don’t appear to be marked up with microformats, formal or ad hoc).

This meta data availability makes it easy to create a page scraping Yahoo Searchmonkey app, as I’ll show in a later post…

Joining the Flow – Invisible Library Tech Support

Idling some thoughts about what to talk about in a session the OU Library* is running with some folks from Cambridge University Library services as part of an Arcadia Trust funded project there (blog), I started wondering about how info professionals in an organisation might provide invisible support to their patrons by joining in the conversation…

*err – oops; I mentioned the OU Library without clearing the text first; was I supposed to submit this post for censor approval before publishing it? ;-)

One way to do this is to comment on blog posts, as our own Tim Wales does on OUseful.info pages from time to time (when I don’t reply, Tim, it’s because I can’t add any more… but I’ll be looking out for your comments with an eagle eye from now on… ;-) [I also get delicious links for:d to me by Keren – who’s also on Twitter – and emailed links and news stories from Juanita on the TU120 course team.]

Another way is to join the twitterati…

“Ah”, you might say, “I can see how that would work. We set up @OULibrary, then our users subscribe to us and then when they want help they can send us a message, and we can get back to them… Cool… :-)”

Err… no.

The way I’d see it working would be for @OULibrary, for example, to subscribe to the OU twitterati and then help out when they can; “legitimate, peripheral, participatory support” would be one way of thinking about it…

Now of course, it may be that @OULibrary doesn’t want to be part of the whole conversation (at least, not at first…), but just the question asking parts…

In which case, part of the recipe might go something like this: use the advanced search form to find out the pattern for cool uri that lets you search for “question-like” things from a particular user:

(Other queries I’ve found work well are searches for: ?, how OR when OR ? , etc.)

//search.twitter.com/search?q=%22how%22+from%3Apsychemedia

The query gives you something like the above, including a link to an RSS feed for the search:

http://search.twitter.com/search.atom?q=how+%3F+from%3Apsychemedia

So now what do we do? We set up a script that takes a list of the twitter usernames of OU folks – you know how to find that list, right? I took the easy way ;-)

Liam’s suggestion links to an XML stream of status messages from people who follow PlanetOU, so the set might be leaky and/or tainted, right, and include people who have nothing to do with the OU… but am I bovvered? ;-)

(You can see a list of the followers names here, if you log in:
http://twitter.com/planetou/followers)

Hmmm… a list of status messages from people who may have something to do with the OU… Okay, dump the search thing, how about this…

The XML feed of friends statuses appears to be open (at the moment) so just filter the status messages of friends of PlanetOU and hope that OU folks have declared themselves to PlanetOU? (Which I haven’t… ;-)

Subscribe to this and you’ll have a stream of questions from OU folks who you can choose to help out, if you want…

A couple of alternatives would be to take a list of OU folks twitter names, and either follow them and filter your own friends stream for query terms, or generate search feed URLs for all them (my original thought, above) and roll those feeds into a single stream…

In each case, you have set up where the Library is invisibly asking “can I help you?”

Now you might think that libraries in general don’t work that way, that they’re “go to” services who help “lean forward” users, rather than offering help to “lean back” users who didn’t think to ask the library in the first place (err…..?), but I couldn’t possibly comment…

PS More links in to OU communities…

which leads to:

PPS (March 2011) seems like the web ha caught up: InboxQ

OU Library iGoogle Gadgets

Just over a month ago, the OU web team released a “Fact of the Day” Google gadget that publishes an interesting fact from an OpenLearn course once a day, along with a link to the OpenLearn course that it came from.

(By the by, compare the offical press release with Laura’s post…)

The OU Library just announced a couple of OU Library iGoogle gadgets too (though I think they have been around for some time…)…

…but whereas the Fact of the Day widget is pretty neat, err, erm, err…

Here’s the new books widget. The Library produces an RSS feed of new books for a whole host of different topic areas. So you can pick your topic and view the new book titles in a gadget on your Google personal page, right…?

Err – well, you can pick a topic area from the gadget…

…and when you click “Go” you’re taken to the Library web page listing the new books for that topic area in a new tab…

Hmmm…

[Lots of stuff deleted about broken code that gives more or less blank pages when you click through on “Art History” at least; HTTP POST rather than GET (I don’t want to have to header trace to debug their crappy code) etc etc]

I have to admit I’m a little confused as to who would want to work this way… All the gadget does is give you lots of bookmarks to other pages. It’s not regularly (not ever) bringing any content to me that I can consume within my Google personal page environment… (That said, it’s probably typical of the sort of widget I developed when I first started thinking about such things…and before lots of AJAX toolkits were around…)

This could be so, so much better… For a start, much simpler, and probably more relevant…

For example, given a feed URL, you can construct another URL that will add the feed to your iGoogle page.

Given a URL like this:
http://voyager.open.ac.uk/rss/compscience.xml
just do this:
http://fusion.google.com/add?feedurl=http://voyager.open.ac.uk/rss/compscience.xml
which takes you to a page like this:

where you can get a widget like this:

Personally, I’d do something about the feed title…

It’s not too hard to write a branded widget that will display the feed contents, or maybe a more elaborate one that will pull in book covers.

For example, here’s an old old old old example of an alternative display – a carousel (described here, in a post from two years ago: Displaying New Library Books):

Admittedly, you’re faced with the issue of how to make the URLs known to the user. But you could generate a URL from a form on th Library gadget page, and assign it to an “add to Google” image button easily enough.

And the other widget – the Library catalogue search…?

Let’s just say that in the same way as the ‘new books’ widget is really just a list of links hidden in a drop down box, so the catalogue search tool is actually just a redirecting search box. Run a query and you’re sent to the Voyager catalogue search results page, rather than having the results pulled back to you in the gadget on the Google personal page.

(I know, a lot of search widgets are like that (I’ve done more than a few myself in years gone by), but things have moved on and I think I’d really expect the results to be pulled back into the widget nowadays…)

PS okay, I’m being harsh, it’s been a long crappy day, I maybe shouldn’t post this… maybe the widgets will get loads of installs, and loads of traffic going to the Library site… I wonder if they’re checking the web stats to see, maybe because they found out how to add Google Analytics tracking to a Google gadget? And I wonder what success/failure metrics they’re using?

PPS okay, okay – I apologise for the above post, Library folks. The widgets are a good effort – keep up the good work. I’ll be interested to see how you iterate the design of these widgets over the next few weeks, and what new wonders you have in store for us all… :-) Have a think about how users might actually use these widgets, and have a look at whether it may be appropriate to pull content back into the widget using an AJAX call, rather than sending the user away from their personal page to a Library web page. If you can find any users, ask them what they think, and how they’re using the widget. Use web stats/analytics to confirm (or deny) what they’re saying (users lie… ;-). And keep trying stuff out… my machine is littered with dead code and my Google personal page covered in broken and unusable widgets that I’ve built myself. Evolution requires failure…and continual reinvention ;-)