Getting Library Catalogue Searches Out There…

As a long time fan of custom search engine offerings, I keep wondering why Google doesn’t seem to have much active interest in this area? Google Custom Search updates are few and far between, and typically go unreported by the tech blogs. Perhaps more surprisingly, Custom Search Engines don’t appear to have much, if any, recognition in the Google Apps for Education suite, although I think they are available with a Google Apps for education ID?

One of the things I’ve been mulling over for years is the role that automatically created course related search engines might have to play as part of a course’s VLE offering. The search engine would offer search results either over a set of web domains linked to from the actual course materials, or simply boost results from those domains in the context of a “normal” set of search results. I’ve recently started thinking that we could also make use “promoted” results to highlight specific required or recommended readings when a particular topic is searched for (for example, Integrating Course Related Search and Bookmarking?).

During an informal “technical” meeting around three JISC funded reseource discovery projects at Cambridge yesterday (Comet, Jerome, SALDA; disclaimer: I didn’t work on any of them, but I was in the area over the weekend…), there were a few brief mentions of how various university libraries were opening up their catalogues to the search engine crawlers. So for example, if you do a site: limited search on the following paths:

– sabre.sussex.ac.uk/vufindsmu/Record/
– jerome.library.lincoln.ac.uk/catalogue/
– webcat.hud.ac.uk/catlink/bib/
– search.lib.cam.ac.uk/

you can get (partial?) search results, with a greater or lesser degree of success, from the Sussex, Lincoln, Huddersfield and Cambridge catalogues respectively.

In a Google custom search engine context, we can tunnel in a little deeper in an attempt to returns results limited to actual records:

– sabre.sussex.ac.uk/vufindsmu/Record/*/Description
– jerome.library.lincoln.ac.uk/catalogue/*
– webcat.hud.ac.uk/catlink/bib/*
– search.lib.cam.ac.uk/?itemid=*

I’ve added these to a new Catalogues tab on my UK HE library website CSE (about), so we can start to search over these catalogues using Google.

I’m not sure how useful or interesting this is at the moment, except to the library systems developers maybe, who can compare how informatively their library catalogue content is indexed and displayed in Google search results compared to other libraries… (so for example, I noticed that Google appears to be indexing the “related items” that Huddersfield publishes on a record page, meaning that if a search term appears in a related work, you might get a record that at first glance appears to have little to do with your search term, in effect providing a “reverse related work” search (that is, search on related works and return items that have the search term as the related work)).

Searching UK HE library catalogues via a Google CSE

But it’s a start… and with the addition of customised rankings, might provide a jumping off point for experimenting with novel ways of searching across UK HE catalogues using Google indexed content. (For example, a version of the CSE on the cam.ac.uk domain might boost the Cambridge results; within an institution, works related to a particular course through mention on a reading list might get a boost if a student on that course runs a search… and so on…

PS A couple of other things that may be worth pondering… could Google Apps for Education account holders be signed up to to Subscribed Links offering customised search results in the main Google domain relating to a particular course. (That is, define subscribed link profiles for a each course, and automatically add those subscriptions to an Apps for Edu user’s account based on the courses they’re taking?) Or I wonder if it would be possible to associate subscribed links to public access browsers in some way?

And how about finding some way of working with Google to open up “professional” search profiles, where for example students are provided with “read only” versions of the personalised search results of an expert in a particular area who has tuned, through personalisation, a search profile that is highly specialised in a particular subject area, e.g. as mentioned in Google Personal Custom Search Engines? (see also Could Librarians Be Influential Friends? And Who Owns Your Search Persona?).

If anyone out there is working on ways of using Google customised and personalised search as a way of delivering “improved” search results in an educational context, I’d love to hear more about what you’re getting up to…

Surveying the Territory: Open Source, Open-Ed and Open Data Folk on Twitter

Over the last few weeks, I’ve been tinkering with various ways of using the Twitter API to discover Twitter lists relating to a particular topic area, whether discovered through a particular hashtag, search term, a list that already exists on a topic, or one or more people who may be associated with a particular topic area.

On my to do list is a map of the “open” community on Twitter – and the relationships between them – that will try to identify notable folk in different areas of openness (open government, open data, open licensing, open source software) and the communities around them, then aggregate all this open afficionados, plot the network connections between them, and remap the result (to see whether the distinct communities we started with fall out, as well as to discover who acts as the bridges between them, or alternatively discover whether new emergent groupings appear to crystallise out based on network connectivity).

As a step on the road to that, I had a quick peek around found who were tweeting using the #oscon hashtag over the weekend. Through analysing people who were tweeting regularly around the topic, I identified several lists in the area: @realist/opensource, @twongkee/opensource, @lemasney/opensource ,@suncao/open-linked-free, @jasebo/open-source

Pulling down the members of these lists, and then looking for connections between them, I came up with this map of the open source community on Twitter:

A peek at FOSS community on Twitter

Using a different technique not based on lists, I generated a map of the open data community based on the interconnections between people followed by @openlylocal:

How the people @countculture follows follow each other

and the open education community based on the people that follow @opencontent:

How followers of @Opencontent follow each other

(So that’s a different way of identifying the members of each community, right? One based on lists that mention users of a particular hashtag, one based on folk a particular individual follows, and one based on the folk that follow a particular individual.)

I’ve also toyed with looking at communities defined by members of lists that mention a particular individual, or people followed by a particular individual, as well as ones based on members of lists that contain folk listed on one or more trusted, curated lists in a particular topic area (got that?!;-).

Whilst the graphs based on mapping friends or followers of an individual give a good overview of that individual’s sphere of interest or influence, I think the community graphs derived from finding connections between people mentioned on “lists in the area” is a bit more robust in terms of mapping out communities in general, though I guess I’d need to do “proper” research to demonstrate that?

As mentioned at the start, the next thing on my list is a map across the aggregated “open” communities on Twitter. Of course, being digerati, many of these people will have decamped to GooglePlus. So maybe I shouldn’t bother, but instead wait for Google+ to mature a bit, an API to become available, blah, blah, blah…

A Couple of Notes on “List Intelligence”

Just so I don’t forget the development timeline such as it is, here are a few quick notes-to-self as much as anything about my “List Intelligence” tinkering to date:

  • List Intelligence uses (currently) Twitter lists to associate individuals with a particular topic area (the focus of the list; note that this may be ill-specified, e.g. “people I have met”, or topic focussed “OU employees”, etc)
  • List Intelligence is presented with a set of “candidate members” and then:
    1. looks up the lists those candidate members are on to provide a set of “candidate lists”;
    2. identifies the membership of those candidate lists (“candidate list members”) (this set may be subject to ranking or filtering, for example based on the number of list subscribers, or the number of original candidate members who are members of the current list);
    3. for the superset of members across lists (i.e. the set of candidate list members), rank each individual compared to the number of lists they are on (this may be optionally weighted by the number of subscribers to each list they are on); these individuals are potentially “key” players in the subject area defined by the lists that the original candidate members are members of;
    4. identify which of the candidate lists contains most candidate members, and rank accordingly (possibly also according to subscriber numbers); the top ranked lists are lists trivially associated with the set of original candidate members;
    5. provide output files that allow the graphing of individuals who are co-members of the same sets, and use the corresponding network as the basis for network analysis;
    6. optionally generate graphs based on friendship connections between candidate list members, and use the resulting graph as the basis for network analysis. (Any clusters/communities detected based on friendship may then be compared with the co-membership graphs to see the extent to which list memberships reflect or correlate to community structures);
  • the original set of candidate members may be defined in a variety of ways. For example:
    1. one or more named individuals;
    2. the friends of a named individual;
    3. the recent users of a particular hashtag;
    4. the recent users of a particular searched for term;
    5. the members of a “seed” list.
  • List Intelligence attempts to identify “list clusters” in the candidate lists set by detecting significant overlaps in membership between different candidate lists.
  • Candidate lists may be used to identify potential “focus of interest” areas associated with the original set of candidate members.

I’ll try to post some pseudo-code, flow charts and formal algorithms to describe the above… but it may take a week or two…

Follower Networks and “List Intelligence” List Contexts for @JiscCetis

I’ve been tinkering with some of my “List Intelligence” code again, and thought it worth capturing some examples of the sort of network exploration recipes I’m messing around with at the moment.

Let’s take @jiscCetis as an example; this account follows no-one, is followed by a few, hasnlt much of a tweet history and is listed by a handful of others.

Here’s the follower network, based on how the followers of @jiscetis follow each other:

Friend connections between @Jisccetis followers

There are three (maybe four) clusters there, plus all the folk who don’t follow any of the @jisccetis’ followers…: do these follower clusters make any sort of sense I wonder? (How would we label them…?)

The next thing I thought to do was look at the people who were on the same lists as @jisccetis, and get an overview of the territory that @jisccetis inhabits by virtue of shared list membership.

Here’s a quick view over the folk on lists that @jisccetis is a member of. The nodes are users named on the lists that @jisccetis is named on, the edges are undirected and join indivduals who are on the same list.

Distribution of users named on lists that jisccetis is a member of

Plotting “co-membership” edges is hugely expensive in terms of upping the edge count that has to be rendered, but we can use a directed bipartite graph to render the same information (and arguably even more information); here, there are two sorts of nodes: lists, and the memvers of lists. Edges go from members to listnames (I should swap this direction really to make more sense of authority/hub metrics…?)

jisccetis co-list membership

Another thing I thought I’d explore is the structure of the co-list membership community. That is, for all the people on the lists that @jisccetis is a member of, how do those users follow each other?

How folk on same lists as @jisccetis follow each other

It may be interesting to explore in a formal way the extent to which the community groups that appear to arise from the friending relationships are reflected (or not) by the make up of the lists?

It would probably also be worth trying to label the follower group – are there “meaningful” (to @jisccetis? to the @jisccetis community?) clusters in there? How would you label the different colour groupings? (Let me know in the comments…;-)

Identifying the Twitterati Using List Analysis

Given absolutely no-one picked up on List Intelligence – Finding Reliable, Trustworthy and Comprehensive Topic/Sector Based Twitter Lists, here’s a example of what the technique might be good for…

Seeing the tag #edusum11 in my feed today, and not being minded to follow it it I used the list intelligence hack to see:

– which lists might be related to the topic area covered by the tag, based on looking at which Twitter lists folk recently using the tag appear on;
– which folk on twitter might be influential in the area, based on their presence on lists identified as maybe relevant to the topic associated with the tag…

Here’s what I found…

Some lists that maybe relate to the topic area (username/list, number of folk who used the hashtag appearing on the list, number of list subscribers), sorted by number of people using the tag present on the list:

/joedale/ukedtech 6 6
/TWMarkChambers/edict 6 32
/stevebob79/education-and-ict 5 28
/mhawksey/purposed 5 38
/fosteronomo/chalkstars-combined 5 12
/kamyousaf/uk-ict-education 5 77
/ssat_lia/lia 5 5
/tlists/edtech-995 4 42
/ICTDani/teched 4 33
/NickSpeller/buzzingeducators 4 2
/SchoolDuggery/uk-ed-admin-consultancy 4 65
/briankotts/educatorsuk 4 38
/JordanSkole/jutechtlets 4 10
/nyzzi_ann/teacher-type-people 4 9
/Alexandragibson/education 4 3
/danielrolo/teachers 4 20
/cstatucki/educators 4 13
/helenwhd/e-learning 4 29
/TechSmithEDU/courosalets 4 2
/JordanSkole/chalkstars-14 4 25
/deerwood/edtech 4 144

Some lists that maybe relate to the topic area (username/list, number of folk who used the hashtag appearing on the list, number of list subscribers), sorted by number of people subscribing to the list (a possible ranking factor for the list):
/deerwood/edtech 4 144
/kamyousaf/uk-ict-education 5 77
/SchoolDuggery/uk-ed-admin-consultancy 4 65
/tlists/edtech-995 4 42
/mhawksey/purposed 5 38
/briankotts/educatorsuk 4 38
/ICTDani/teched 4 33
/TWMarkChambers/edict 6 32
/helenwhd/e-learning 4 29
/stevebob79/education-and-ict 5 28
/JordanSkole/chalkstars-14 4 25
/danielrolo/teachers 4 20
/cstatucki/educators 4 13
/fosteronomo/chalkstars-combined 5 12
/JordanSkole/jutechtlets 4 10
/nyzzi_ann/teacher-type-people 4 9
/joedale/ukedtech 6 6
/ssat_lia/lia 5 5
/Alexandragibson/education 4 3
/NickSpeller/buzzingeducators 4 2
/TechSmithEDU/courosalets 4 2

Other ranking factors might include the follower count, or factors from some sort of social network analysis, of the list maintainer.

Having got a set of lists, we can then look for people who appear on lots of those lists to see who might be influential in the area. Here’s the top 10 (user, number of lists they appear on, friend count, follower count, number of tweets, time of arrival on twitter):

['terryfreedman', 9, 4570, 4831, 6946, datetime.datetime(2007, 6, 21, 16, 41, 17)]
['theokk', 9, 1564, 1693, 12029, datetime.datetime(2007, 3, 16, 14, 36, 2)]
['dawnhallybone', 8, 1482, 1807, 18997, datetime.datetime(2008, 5, 19, 14, 40, 50)]
['josiefraser', 8, 1111, 7624, 17971, datetime.datetime(2007, 2, 2, 8, 58, 46)]
['tonyparkin', 8, 509, 1715, 13274, datetime.datetime(2007, 7, 18, 16, 22, 53)]
['dughall', 8, 2022, 2794, 16961, datetime.datetime(2009, 1, 7, 9, 5, 50)]
['jamesclay', 8, 453, 2552, 22243, datetime.datetime(2007, 3, 26, 8, 20)]
['timbuckteeth', 8, 1125, 7198, 26150, datetime.datetime(2007, 12, 22, 17, 17, 35)]
['tombarrett', 8, 10949, 13665, 19135, datetime.datetime(2007, 11, 3, 11, 45, 50)]
['daibarnes', 8, 1592, 2592, 7673, datetime.datetime(2008, 3, 13, 23, 20, 1)]

The algorithms I’m using have a handful of tuneable parameters, which means there’s all sorts of scope for running with this idea in a “research” context…

One possible issue that occurred to me was that identified lists might actually cover different topic areas – this is something I need to ponder…

eSTEeM Project: Library Website Tracking For VLE Referrals

Assuming my projects haven’t been cut out at the final acceptance stage because I haven’t yet submitted a revised project plan,

Preamble
As OU courses are increasingly presented through the VLE, many of them opt to have one or more “Library Resources” pages that contain links to course related resources either hosted on the OU Library website or made available through a Library operated web service. Links to Library hosted or moderated resources may also appear inline in course content on the VLE. However, at the current time, it is difficult to get much idea about the extent to which any of these resources are ever accessed, or how students on a course make use of other Library resources.

With the state of the collection and reporting of activity data from the VLE still evolving, this project will explore the extent to which we can make use of data I do know exists, and to which I do have access, specifically Google Analytics data for the library.open.ac.uk domain.

The intention is to produce a three-way reporting framework using Google Analytics for visitors to the OU Library website and Library managed resources from the VLE. The reports will be targeted at: subject librarians who liaise with course teams; course teams; subscription managers.

Google Analytics (to which I have access) are already running on the library website and the matter just(?!) arises now of:

1) Identifying appropriate filters and segments to capture visits from different courses;

2) development of Google Analytics API wrapper calls to capture data by course or resource based segments and enable analysis, visualisation and reporting not supported within the Google Analytics environment.

3) Providing a meaningful reporting format for the three audience types. (note: we might also explore whether a view over the activity data may be appropriate for presenting back to students on a course.)

The Project
The OU Library has been running Google Analytics for several year, but to my knowledge has not started to exploit the data being collected as part of a reporting strategy on the usage of library resources resulting from referrals from the VLE. (Whenever a user clicks on a link in the VLE that leads to the Library website, the Google Analytics on the Library website can capture that fact.)

At the moment, we do not tend to work on optimising our online courses as websites so that they deliver the sorts of behaviour we want to encourage. If we were a web company, we would regularly analyse user behaviour on our course websites and modify them as a result.

This project represents the first step in a web analytics approach to understanding how our students access Library resources from the VLE: reporting. The project will then provide the basis for a follow on project that can look at how we can take insight from those reports and make them actionable, for example in the redesign of the way links to library resources are presented or used in the VLE, or how visitors from the VLE are handled when they hit the Library website.

The project complements work that has just started in the Library on a JISC funded project to making journal recommendations to students based on previous user actions.

The first outcome will be a set of Google Analytics filters and advanced segments tuned to the VLE visitor traffic and resource usage on the Library website. The second will be a set of Google analytics API wrappers that allow us to export this data and use it outside the Google Analytics environment.

The final deliverables are three report types in two possible flavours:

1) a report to subject librarians about the usage of library resources from visitors referred from the VLE for courses they look after

2) a report to librarians responsible for particular subscription databases showing how that resource is accessed by visitors referred from the VLE, broken down by course

3) a report to course teams showing how library resources linked to from the VLE for their course are used by visitors referred to those resources from the VLE.

The two flavours are:

a) Google analytics reports

b) custom dashboard with data accessed via the Google Analytics API

Recommendations will also be made based on the extent to which Library website usage by anonymous students on particular OU courses may be tracked by other means, such as affinity strings in the SAMS cookie, and the benefits that may accrue from this more comprehensive form of tracking.

If course team members on any OU courses presenting over the next 9 months are interested in how students are using the library website following a referral from the VLE, please get in touch. If academics on courses outside the OU would like to discuss the use of Google Analytics in an educational context, I’d love to hear from you too:-)

eSTEeM is joint initiative between the Open University’s Faculty of Science and Faculty of Maths, Computing and Technology to develop new approaches to teaching and learning both within existing and new programmes.

Current JISC Projects of Possible Interest to LAK11 Attendees

Mulling over an excellent couple of days in Banff at the first Learning Analytics and Knowledge conference (LAK11; @dougclow’s liveblog notes), where we heard about a whole host of data and anlytics related activites from around the world, I thought it may be worth pulling together descriptions of several current JISC projects that are exploring related issues to add in to the mix…

There are currently at least three programmes that it seems to me are in the general area…

———————————–

Activity Data

Many systems in institutions store data about the actions of students, teachers and researchers. The purpose of this programme is to experiment with this data with the aim of improving the user experience or the administration of services.

AEIOU – Aberystwyth University – this project will gather usage statistics from the repositories of all Higher Education Institutions in Wales and use this data to present searchers who discover paper from a Welsh repository with recommendations for other relevant papers that they may be interested in. All of this data will be gathered into a research gateway for Wales.

Agtivity – University of Manchester – this project will collect usage data from people using the Advanced Video Conferencing services supported by the Access Grid Support Centre. This data will be used evaluate usage more accurately, in terms of the time the service is used, audience sizes and environmental impact, and will be used to drive an overall improvement in Advanced Video Conferencing meetings through more targetted support by the Access Grid Support Centre staff of potentially failing nodes and meetings.

Exposing VLE Data – University of Cambridge – a project that will bring together activity and attention data for Cambridge’s institutional virtual learning environment (based on the Sakai software) to create useful and informative management reporting including powerful visualisations. These reports will enable the exploration of improvements to both the VLE software and to the institutional support services around it, including how new information can inform university valuation of VLEs and strategy in this area. The project will also release anonymised datasets for use in research by others.

Library Impact Data – Huddersfield University – the aim of this project is to prove a statistically significant correlation between library usage and student attainment. The project will collect anonymised data from University of Bradford, De Montfort University, University of Exeter, University of Lincoln, Liverpool John Moores University, University of Salford, Teesside University as well as Huddersfield. By identifying subject areas or courses which exhibit low usage of library resources, service improvements can be targeted. Those subject areas or courses which exhibit high usage of library resources can be used as models of good practice.

RISE – Open University – As a distance-learning institution, students, researchers and academics at the Open University mainly access the rich collection of library resources electronically. Although the systems used track attention data this data isn’t used to help users search. RISE aims to exploit the unique scale of the OU (with over 100,000 annual unique users of e-resources) by using attention data recorded by EZProxy to provide recommendations to users of the EBSCO Discovery search solution. RISE will then aim to release that data openly so it can be used by the community.

Salt – University of Manchester – SALT will experiment with 10 years of library circulation data from the John Rylands University Library to support humanities research by making underused “long tail” materials easier to find by library users. The project will also develop an api to enable others to reuse the circulation data and will explore the possibility of offering the api as a national shared service.

Shared OpenURL Data – EDINA – This is an invited proposal by JISC which takes forward the recommendations made in scopingactivity related to collection and use of OpenURL data that might be available from institutionalOpenURL resolvers and the national OpenURL router shared service which was funded between December 2008 – April 2009 by JISC. The work will be done in two stages: an initial stage exploring the steps required to make the data available openly, followed by making the data available and implementation of prototype service(s) using the data.

STAR-Trak – Leeds Metropolitan University – This project will provide an application (STAR-Trak:NG) to highlight and manage interventions with students who are at risk of dropping out, identified primarily by mining student activity data held in corporate systems.

UCIAD – Open University – UCIAD will investigate the use of semantic technologies for integrating user activity data from different systems within a University. The objective is to scope and prototype an open, pluggable software framework based on such semantic models, aggregating logs and other traces from different systems as a way to produce a comprehensive and meaningful overview of the interactions between individual users and a university.

See also:

The JISC RAPTOR project is investigating ways to explore usage of e-resources.

PIRUS is a project investigating the extension of Counter statistics to cover article level usage of electronic journals.

The Journal Usage Statistics Portal is a project that is developing a usage statistics portal for libraries to manage statistics about electronic journal usage.

The Using OpenURL activity data project will take forward the recommendations of the Shared OpenURL Data Infrastructure Investigation to further explore the value and viability of releasing OpenURL activity data for use by third parties as a means of supporting development of innovative functionality that serves the UK HE community.

The major influences on the Activity Data programme has been the JISC Mosaic project*final report) and the Gaining Intelligence event (final report).

———————————–

Business Intelligence

The Business Intelligence Programme is funded by JISC’s Organisational Support committee in line with its aim to work with managers to enhance the strategic management of institutions and has funded projects to further explore the issues encountered within institutions when trying to progress BI. (See also JISC’s recently commissioned study into the information needs of senior managers and current attitudes towards and plans for BI.)

Enabling Benchmarking Excellence – Durham University – This project proposes to gather a set of metadata from Higher Education institutions that will allow the current structures within national data sets to be mapped to department structures within each institution. The eventual aim is to make comparative analysis far more flexible and useful to all stakeholders within the HE community. This is the first instance where such a comprehensive use of meta-data to tie together disparate functional organisations has been utilised within the sector, making the project truly innovative.

BIRD – Business Intelligence Reporting Dashboard – UCLAN – Using the JISC InfoNet BI Resource for guidance, this project will work with key stakeholders to re-define the processes that deliver the evidence base to the right users at the right time and will subsequently develop the BI system using Microsoft SharePoint to deliver the user interface (linked to appropriate data sets through the data warehouse). We will use this interface to simplify the process for requesting data/analysis and will provide personalisation facilities to enable individuals to create an interface that provides the data most appropriate to their needs.

Bolt-CAP – University of Bolton – Using the requirements of HEFCE TRAC as the base model, the JISC Business Intelligence Infokit and an Enterprise Architecture approach, this project will consider the means by which effective data capture, accumulation, release and reuse can both meet the needs of decision support within the organisation and that of external agencies.

Bringing Corporate Data to Life – University of East London – The aim of the project is to make use of the significant advances in software tools that utilise in-memory technologies for the rapid development of three business intelligence applications (Student Lifecycle, Corporate Performance and Benchmarking). Information in each application will be presented using a range of fully interactive dashboards, scorecards and charts with filtering, search and drill-down and drill-up capabilities. Managers will be engaged throughout the project in terms of how information is presented, the design of dashboards, scorecards and reports and the identification of additional sources of data.

Business Intelligence for Learning About Our Students – University of Sheffield – The goal of this project is develop a methodology which will allow the analysis of the data in an aggregate way, by integrating information in different archives and enabling users to query the resulting archive knowledge base from a single point of access. Moreover we aim to integrate the internal information with publically available data on socio-economic indicators as provided by data.gov.uk. Our aims are to study, on a large scale, how student backgrounds impact their future academic achievements and to help the University devise evidence informed policies, strategies and procedures targeted to their students.

Engage – Using Data about Research Clusters to Enhance Collaboration – University of Glasgow – The Engage project will integrate, visualise and automate the production of information about research clusters at the University of Glasgow, thereby improving access to this data in support of strategic decision making, publicity, enhancing collaboration and interdisciplinary research, and research data reporting.

IN-GRiD – University of Manchester, Manchester Business School – The project addresses the process of collection, management and analysis of building profile data, building usage data, energy consumption data, room booking data, IT data and the corresponding financial data in order to improve the financial and environmental decision making processes of the University of Manchester through the use of business intelligence. The main motivation for the project is to support decision making activities of the senior management of the University of Manchester in the area of sustainability and carbon emissions management.

Liverpool University Management Information System (LUMIS) – Liverpool University – The University has identified a need for improved Management Information to support performance measurement and inform decision making. MI is currently produced and delivered by a variety of methods including standalone systems and spreadsheets. … The objectives of LUMIS are to design and implement an MI solution, combining technology with data integrity, business process improvement and change management to create a range of benefits.

RETAIN: Retaining Students Through Intelligent Interventions – Open University – The focus will be on using BI to improve student retention at the Open University. RETAIN will make it possible to: include additional datasources with existing statistical methods; use predictive modelling to identify ‘at risk’students.

Supporting institutional decision making with an intelligent student engagement tracking system – University of Bedfordshire – This project aims to demonstrate how the adoption of a student engagement tracking system (intelligent engagement) can support and enhance institutional decision making with evidence in three business intelligence (BI) data subject categories: student data and information, performance measurement and management, and strategic planning.

Visualisation of Research Strength (VoRS) – University of Huddersfield – Many HEIs now maintain repositories containing their researchers‟ publications. They have the potential to provide much information about the research strength of an HEI, as publications are the main output of research. The project aims to merge internal information extracted from an institution‟s publications repository with external information (academic subject definitions, quality of outlets and publications), for input to a visualisation tool. The tool will assist research managers in making decisions which need to be based on an understanding of research strengths across subject areas, such as where to aim internal investment. In the event that the tool becomes a part of a BI resource, It could lead to institution vs institution comparisons and visual benchmarking for research.

———————————–

Infrastructure for Resource Discovery

(IMHO, if resource recommendation can be improved by the application of “learning analytics”, we’ll be needing metadata used to describe those resources as well as the activity data generated around their use…)

In 2009 JISC and RLUK convened a group of Higher Education library, museum and archive experts to think about what national services were required for supporting online discovery and reuse of collection metadata. This group was called the resource discovery taskforce (RDTF) and … produced a vision and an implementation plan focused on making metadata about collections openly available therefore supporting the development of flexible and innovative services for end users. … This programme of projects has been funded to begin to address the challenges that need to be overcome at the institutional level to realise the RDTF vision. The projects are focused on making metadata about library, museum and archive collections openly available using standards and licensing that allows that data to be reused.

Comet – Cambridge University – The COMET project will release a large sub-set of bibliographic data from Cambridge University Library catalogues as open structured metadata, testing a number of technologies and methodologies including XML, RDF, SPARQL and JSON. It will investigate and document the availability of metadata for the library’s collections which can be released openly in machine-readable formats and the barriers which prevent other data from being exposed in this way. [Estimated amount of data to be made available: 2,200,000 metadata records]

Connecting repositories – Open University – The CORE project aims to make it easier to navigate between relevant scientific papers stored in Open Access repositories. The project will use Linked Data format to describe the relationships between papers stored across a selection of UK repositories, including the Open University Open Research Online (ORO). A resource discovery web-service and a demonstrator client will be provided to allow UK repositories to embed this new tool into their own repository. [Estimated amount of data to be made available: Content of 20 repositories, 50,000 papers, 1,000,000 rdf triples]

Contextual Wrappers – Cambridge University – The project is concerned with the effectiveness of resource discovery based on metadata relating to the Designated collections at the Fitzwilliam Museum in the University of Cambridge and made available through the Culture Grid, an aggregation service for museums, libraries and archives metadata. The project will investigate whether Culture Grid interface and API can be enhanced to allow researchers to explore hierarchical relationships between collections and the browsing of object records within a collection [Estimated amount of data to be made available: 164,000 object records (including 1,000 new/enhanced records), 74,800 of them with thumbnail images for improved resource discovery]

Discovering Babel – Oxford University – The digital literary and linguistic resources in the Oxford Text Archive and in the British National Corpus have been available to researchers throughout the world for several decades. This project will focus on technical enhancements to the resource discovery infrastructure that will allow wider dissemination of open metadata, will facilitate interaction with research infrastructures and the knowledge and expertise achieved will be shared with the community. [Estimated amount of data to be made available: 2,000 literary and linguistic resources in electronic form]

Jerome – University of Lincoln – Jerome began in the summer of 2010, as an informal ‘un-project’, with the aim of radically integrating data available to the University of Lincoln’s library services and offering a uniquely personalised service to staff and students through the use of new APIs, open data and machine learning. This project will develop a sustainable, institutional service for open bibliographic metadata, complemented with well documented APIs and an ‘intelligent’, personalised interface for library users. [Estimated amount of data to be made available: ~250,000 bibliographic record library catalogue, along with constantly expanding data about our available journals and their contents augmented by the Journal TOCs API, and c.3,000 additional records from our EPrints repository]

Open Metadata Pathfinder – King’s College London – The Open Metadata Pathfinder project will deliver a demonstrator of the effectiveness of opening up archival catalogues to widened automated linking and discovery through embedding RDFa metadata in Archives in the M25 area (AIM25) collection level catalogue descriptions. It will also implement as part of the AIM25 system the automated publishing of the system’s high quality authority metadata as open datasets. The project will include an assessment of the effectiveness of automated semantic data extraction through natural language processing tools (using GATE) and measure the effectiveness of the approach through statistical analysis and review by key stakeholders (users and archivists).

Salda – Sussex University – The project will extract the metadata records for the Mass Observation Archive from the University of Sussex Special Collection’s Archival Management System (CALM) and convert them in to Linked Data that will be made publicly available. [Estimated amount of data to be made available: This project will concentrate on the largest archival collection held within the Library, the Mass Observation Archive, potentially creating up to 23,000 Linked Data records.]

OpenArt – York University – OpenART, a partnership between the University of York, the Tate and technical partners, Acuity Unlimited, will design and expose linked open data for an important research dataset entitled “The London Art World 1660-1735”. Drawing on metadata about artists, places and sales from a defined period of art history scholarship, the dataset offers a complete picture of the London art world during the late 17th and early 18th centuries. Furthermore, links drawn to the Tate collection and the incorporation of collection metadata will allow exploration of works in their contemporary locations. The process will be designed to be scalable to much richer and more varied datasets, both at York, Tate and beyond.

See also:
Linked Open Copac Archives Hub
Linking University Content for Education and Research Online
Openbib

I need to find a way of representing the topic areas and interconnections between these projects somehow!

See also this list of projects in the above programmes [JSON] which may be a useful starting point if you need a list of project IDs. I think the short name attribute can be used to identify the project description HTML page name at the end of an appropriate programme path?

Corporate Data Analyst and Online Comms Jobs at the OU

Though I’m sure these sorts of job have been being advertised for years, it’s interesting tracking how they’re being represented at the moment, and the sorts of skills required.

Corporate Data and MI Analyst, Marketing (£29,853 – £35,646)

Main Purpose of the Post:
The post holder is a member of the Campaign Planning and Data team and will be required to play a pro-active role in that team, balancing the needs and recommendations of their own areas of responsibility with the wider needs and priorities of the team and the whole Marketing & Sales Unit.

This post has been constructed to assist the University to develop its marketing capacity so that challenging targets can be met. It will be essential for the post holder to work to harness the energies of academic and academic related staff in the University’s academic units, service units and regions to develop a more effective marketing strategy. This will require influencing and networking skills and an ability to adapt engagement style to an academic context.

The post holders work within a team producing Campaign Plans for both new and continuing students. The plans drive the allocation of over £10M of promotional activities (acquisition and retention campaigns).

Description of Duties of the Post:

Contribute to optimising the University’s customer targeting capability via regular reappraisal of segmentation policy with a view to increasing market share in high yield segments
Contribute to development and delivery of robust models, tools, skills and resources to enable segmentation, competitor and market analysis and data mining within the Campaign Planning Team and more widely within Marketing and Sales.

Planning 60%
Input into overall marketing plans and support planning process.
Segment the prospect data mart by developing key prospect indicators to provide Response, Reservation, Registration, Retention and other key metric predictions for each.
Support quantification of product performance predictions to provide Response, Reservation, Registration, Retention and other key metric predictions for each.
Maintain and contribute to development of a targeting model, which overlays product performance predictions/actual by segment over the agreed marketing plan to provide a targeting matrix.
Communicate targeting matrix to stakeholders and overlay tests and current campaign activity to provide an agreed campaign plan based on minimising Cost per Registration and maximising marketing mix and integration strategy.
Monitor performance daily and update segmentation, product and targeting models to maintain a data driven test and learn cycle. Identify significant deviations from forecast and potential actions.
Continually review the Customer Journey through input into creation of a Retention model based on a balanced scorecard approach. Work with key stakeholders to prioritise and implement developments.
Input into model validation and quality control.

Data 20%
Support development of a marketing data mart to primarily support marketing analysis and campaign execution.
Provide input into marketing data developments encouraging sharing of data and best practise.
Support development of in-house tools and processes to improve marketing analysis and campaign execution, primarily SAS and SIEBEL. Support other areas in evaluating tools and systems.
Where appropriate, maintain the relationship with OU data providers ensuring relevant data processing, development, quality and SLA’s are controlled.
Promote data use within marketing and other OU areas, maximising the use of data and providing a hub for data developments to be controlled

MI 20%
Input into development of key performance measures to be used across the OU.
Develop relationships with key OU stakeholders to ensure common goals are met.
Facilitate the use of marketing data across the OU and develop tools to support.
Support data focused research and tests with analytical input.
Input into development and maintenance of campaign performance measures.

Person Spec – Essential
Substantial experience in a campaign planning, analysis or similar role including for
example; campaign execution, data extraction, the development of data infrastructure.
Experience of Direct Marketing.
Experience of B2B and / or B2C marketing.
Experience of data propensity and segmentation modelling.
A balance of marketing analysis and technical skills, including data quality and protection.
Experience of test and learn data driven analysis, targeting processes and systems;
Proven ability to see trends in data and drill down to issues or key data.
Proven ability to develop relationships with key decision makers and stakeholders.
Proven ability to translate marketing requirements into planning / execution requirements.
Excellent presentation and facilitation skills.
Provide analytic support and direction to colleagues to ensure understanding.
Proven ability to meet challenging deadlines without compromising quality.

Still no adverts* for a “learning data analyst” though, tasked with analysing data to see:

– whether effective use is being made of linked to resources, particularly subscription Library content and open educational resources;

– whether there’s anything in the student activity data and/or social connection data we can use to predict attainment and/or satisfaction levels or improve retention levels.

* That said, IET do collect lots of stats, and I think a variety of stats are now available relating to activity on the Moodle VLE. I’m not sure who does what with all that data though…?

PS I wonder if any of the analysts that companies like Pearson presumably employ look to model ways of maximising the profitabilty, to those companies, of student acquisition and retention, given education is their business? (See also: Apollo Group results – BPP and University of Phoenix, Publishing giant Pearson looks set to offer degrees).

PPS This job ad may also be of interest to some? Online Communications Officer, Open University Business School (£29,853 – £35,646)

Again, it’s interesting to mark what’s expected…

This brand new role in the School will drive the development of online communications. Focusing on increasing engagement and traffic through the website, you will ensure this work is appropriately integrated into the wider work of the University’s Online Services, Marketing and Communications teams. Reporting to the Director of Business Development and External Affairs, it will be your responsibility to develop the website including content, usability, optimisation, interactivity and driving increased visitor numbers and online registration. You will continually find new and inventive ways to engage with our stakeholders and promote the reputation of the Business School through the online channel.

Your responsibilities will also extend to the School’s virtual presence through social networks, iTunes U and YouTube and utilise these channels to our advantage. You will increase our presence as well as delivering virtual campaigns to improve the overall student numbers. In this role, it will also be your responsibility to develop relationships with other areas of the University engaged in this work and will play a key role in the management of these relationships.

Summary of Duties
The main duties of the Online Communications Officer are detailed below.

• Advance the social media strategy ensuring it is inline with the Universities media position, market response and the development of new technology.
• Manage the online activity of the Business School’s social media communities
• Liaise, as appropriate, with units within The Open University, such as Online Services to keep up to date with policy changes and AACS regarding technical developments.
• Liaise with the Business School’s Information Officer for the maintenance and feeding of the Research Database into the website
• Generate assets to host on the website e.g. an Elluminate Demo Video
• Keep abreast of trends and developments to ensure that the Business School’s online presence remains at the forefront
• Work alongside Online Services, to monitor the visitor traffic of the website and establish appropriate and effective KPIs for dissemination across the Business School, for example through the creation of a dashboard.
• Engage in personal development based on organisational needs and developments to foster a high level of professional skills and technical ability
• Ensure that corporate branding and media guidelines are adhered to
• Understand and appreciate internal procedures and standards and be proactive in recommending improvements
• Ability to apply best professional practice to deliver effective solutions that take into account technical, budgetary and other project considerations
• Edit the content of both the internet and intranet
• Collate, interpret and select key information for dissemination on the latest trends and research in social media both within the OU and externally
• Produce graphics where necessary or liaise with designers in the University or outside agencies to produce graphics.
• Create/collate digital assets including audio and video files
• Post moderate discussion forums
• Disseminate best practice through a variety of communications channels eg project website, OU Life news, brief updates etc.
• Develop and maintain awareness of different audience needs in relation to appropriate communications channels (eg email, screensaver, website, print).
• Act as a flexible member of the Business Development and External Affairs team.
• Carry out other tasks as specified from time to time by the Project Director

Related: Joanne Jacobs’ Are you social or anti-social?: “How to employ a Social Media Strategist, and how you should measure their performance. (Social media isn’t going away. But some Social Media Strategists should go away.)”

Matplotlib: Detrending Time Series Data

Reading the rather wonderful Data Analysis with Open Source Tools (if you haven’t already got a copy, get one… NOW…), I noticed a comment that autocorrelation “is intended for time series that do not exhibit a trend and have zero mean”. Doh! Doh! And three times: doh!

I’d already come the same conclusion, pragmatically, in Identifying Periodic Google Trends, Part 1: Autocorrelation and Improving Autocorrelation Calculations on Google Trends Data but now I’ll be remembering this as a condition of use;-)

One issue I had come across with trying to plot a copy of the mean zero and/or detrended data as calculated using Matplotlib was how to plot a copy of the detrended data directly. (I’d already worked out how to use the detrend_ filters in the autocorrelation function).

The problem I had was simply trying plot mlab.detrend_linear(y) as applied to list of values y threw an error (“AttributeError: ‘list’ object has no attribute ‘mean'”). It seems that detrend expects y=[1,2,3] to have a method y.mean(); which it doesn’t, normally…

The trick appears to be that matplotlib prefers to use something like a structured array, rather than a simple list, which offers these additional methods. Biut how is the data structured? A simple Google wasn’t much help, but a couple of cribs suggested that casting the list to y=np.array(y) (where import numpy as np) might be a good idea.

So let’s try it:

import matplotlib.pyplot as plt
import numpy as np

label='run'
d=[0.99,0.98,0.95,0.93,0.91,0.93,0.92,0.95,0.95,0.94,0.96,0.98,0.97,1.00,1.01,1.05,1.06,1.06,0.98,0.98,0.98,0.97,0.96,0.93,0.93,0.96,0.95,1.05,0.97,0.95,1.01,1.02,0.98,1.01,0.98,1.00,1.06,1.04,1.06,1.04,0.97,0.94,0.92,0.90,0.87,0.88,0.85,0.90,0.91,0.87,0.88,0.88,0.91,0.91,0.88,0.91,0.92,0.91,0.90,0.92,0.87,0.92,0.92,0.92,0.94,0.97,0.99,1.01,1.01,1.04,0.97,0.94,0.98,0.94,0.98,0.91,0.93,0.92,0.95,1.00,0.93,0.93,0.96,0.96,0.96,0.97,0.95,0.95,1.06,1.12,1.01,1.00,0.99,0.98,0.96,0.93,0.91,0.92,0.92,0.94,0.94,0.94,0.90,0.86,0.89,0.93,0.90,0.90,0.90,0.90,0.89,0.92,0.91,0.92,0.93,0.93,0.94,0.99,0.98,0.99,1.01,1.06,1.06,0.96,0.98,0.92,0.92,0.93,0.91,0.90,0.93,1.02,0.90,0.93,0.91,0.93,0.95,0.93,0.91,0.92,0.96,0.93,1.02,1.02,0.91,0.88,0.87,0.87,0.84,0.82,0.82,0.84,0.83,0.85,0.80,0.80,0.87,0.85,0.83,0.80,0.84,0.83,0.84,0.88,0.83,0.88,0.88,0.86,0.91,0.93,0.91,0.97,0.96,1.00,1.01,0.98,0.94,0.97,0.94,0.95,0.92,0.93,0.97,1.02,0.95,0.92,0.91,0.95,0.93,0.94,0.91,0.92,0.98,0.99,0.97,0.98,0.90,0.86,0.87,0.91,0.87,0.86,0.86,0.89,0.89,0.87,0.86,0.83,0.85,0.86,0.90,0.87,0.87,0.90,0.89,0.93,0.93,0.97,0.99,0.95,1.00,1.05,1.03,1.04,1.08,1.05,1.05,1.05,1.05,1.01,1.07,1.02,1.02,1.04,1.00,1.04,1.17,1.03,1.01,1.02,1.05,1.06,1.05,0.99,1.07,1.03,1.05,1.07,1.04,0.97,0.94,0.97,0.93,0.94,0.96,0.96,1.04,1.05,1.04,0.96,1.00,1.04,1.01,1.00,0.99,0.99,0.99,1.03,1.05,1.02,1.06,1.07,1.04,1.16,1.19,1.12,1.18,1.19,1.16,1.12,1.12,1.09,1.12,1.11,1.12,1.06,1.05,1.14,1.26,1.09,1.12,1.13,1.16,1.18,1.22,1.17,1.24,1.28,1.35,1.19,1.16,1.11,1.11,1.13,1.13,1.11,1.09,1.06,1.07,1.09,1.09,1.03,1.05,1.04,1.04,1.03,1.03,1.06,1.09,1.17,1.12,1.11,1.14,1.20,1.18,1.24,1.19,1.21,1.22,1.22,1.27,1.25,1.18,1.15,1.18,1.17,1.11,1.09,1.10,1.12,1.26,1.15,1.15,1.16,1.16,1.15,1.12,1.15,1.14,1.20,1.31,1.17,1.18,1.14,1.15,1.14,1.12,1.17,1.11,1.10,1.11,1.14,1.10,1.08,1.06]

fig = plt.figure()
da=np.array(d)

ax1 = fig.add_subplot(211)
ax1.plot(da)

ax2 = fig.add_subplot(211)
y= mlab.detrend_linear(da)
ax2.plot(y)

ax3 = fig.add_subplot(211)
ax3.plot(da-y)

Here’s the result:

The top, ragged trace is the original data (in the d list); the lower trace is the same data, detrended; the straight line is the line that is subtracted from the original data to produce the detrended data.

The lower trace would be the one that gets used by the autocorrelation function using the detrend_linear setting. (To detrend based on simply setting the mean to zero, I think all we need to do is process da-da.mean()?

UPDATE: One of the problems with detrending the time series data using the linear trend is that the increasing trend doesn’t appear to start until midway through the series. Another approach to cleaning the data is to use remove the mean and trend by using the first difference of the signal: d(x)=f(x)-f(x-1). It’s calculated as follows:

#time series data in d
#first difference
fd=np.diff(d)

Here’s the linearly detrended data (green) compared to the first difference of the data (blue):

Note that the length of the first difference signal is one sample less than the orginal data, and shifted to the left one step. (There’s presumably a numpy way of padding the head or tail of the series, though I’m not sure what it is yet!)

Here’s the autocorrelation of the first difference signal – if you refer back to the previous post, you’ll see it’s much clearer in this case:

It is possible to pass an arbitrary detrending function into acorr, but I think it needs to return an array that is the same length as the original array?

So what next? Looking at the original data, it is quite noisy, with some apparently obvious to the eye trends. The diff calculation is quite sensitive to this noise, so it possibly makes sense to smooth the data prior to calculating the first difference and the autocorrelation. But that’s for next time…

Social Networks on Delicious

One of the many things that the delicious social networking site appears to have got wrong is how to gain traction from its social network. As well as the incidental social network that arises from two or more different users using the same tag or bookmarking the same resource (for example, Visualising Delicious Tag Communities Using Gephi), there is also an explicit social network constructed using an asymmetric model similar to that used by Twitter: specifically, you can follow me (become a “fan” of me) without my permission, and I can add you to my network (become a fan of you, again without your permission).

Realising that you are part of a social network on delicious is not really that obvious though, nor is the extent to which it is a network. So I thought I’d have a look at the structure of the social network that I can crystallise out around my delicious account, by:

1) grabbing the list of my “fans” on delicious;
2) grabbing the list of the fans of my fans on delicious and then plotting:
2a) connections between my fans and and their fans who are also my fans;
2b) all the fans of my fans.

(Writing “fans” feels a lot more ego-bollox than writing “followers”; is that maybe one of the nails in the delicious social SNAFU coffin?!)

Here’s the way my “fans” on delicious follow each other (maybe? I’m not sure if the fans call always grabs all the fans, or whether it pages the results?):

The network is plotted using Gephi, of course; nodes are coloured according to modularity clusters, the layout is derived from a Force Atlas layout).

Here’s the wider network – that is, showing fans of my fans:

In this case, nodes are sized according to betweenness centrality and coloured according to in-degree (that is, the number of my fans who have this people as fans). [This works in so far as we’re trying to identify reputation networks. If we’re looking for reach in terms of using folk as a resource discovery network, it would probably make more sense to look at the members of my network, and the networks of those folk…)

If you want to try to generate your own, here’s the code:

import simplejson

def getDeliciousUserFans(user,fans):
  url='http://feeds.delicious.com/v2/json/networkfans/'+user
  #needs paging? or does this grab all the fans?
  data = simplejson.load(urllib.urlopen(url))
  for u in data:
    fans.append(u['user'])
    #time also available: u['dt']
  #print fans
  return fans

def getDeliciousFanNetwork(user):
  f=openTimestampedFile("fans-delicious","all-"+user+".gdf")
  f2=openTimestampedFile("fans-delicious","inner-"+user+".gdf")
  f.write(gephiCoreGDFNodeHeader(typ="min")+"\n")
  f.write("edgedef> user1 VARCHAR,user2 VARCHAR\n")
  f2.write(gephiCoreGDFNodeHeader(typ="min")+"\n")
  f2.write("edgedef> user1 VARCHAR,user2 VARCHAR\n")
  fans=[]
  fans=getDeliciousUserFans(user,fans)
  for fan in fans:
    time.sleep(1)
    fans2=[]
    print "Fetching data for fan "+fan
    fans2=getDeliciousUserFans(fan,fans2)
    for fan2 in fans2:
      f.write(fan+","+fan2+"\n")
      if fan2 in fans:
        f2.write(fan+","+fan2+"\n")
  f.close()
  f2.close()

So what”s the next step…?!