Mulling over an excellent couple of days in Banff at the first Learning Analytics and Knowledge conference (LAK11; @dougclow’s liveblog notes), where we heard about a whole host of data and anlytics related activites from around the world, I thought it may be worth pulling together descriptions of several current JISC projects that are exploring related issues to add in to the mix…
There are currently at least three programmes that it seems to me are in the general area…
Many systems in institutions store data about the actions of students, teachers and researchers. The purpose of this programme is to experiment with this data with the aim of improving the user experience or the administration of services.
AEIOU – Aberystwyth University – this project will gather usage statistics from the repositories of all Higher Education Institutions in Wales and use this data to present searchers who discover paper from a Welsh repository with recommendations for other relevant papers that they may be interested in. All of this data will be gathered into a research gateway for Wales.
Agtivity – University of Manchester – this project will collect usage data from people using the Advanced Video Conferencing services supported by the Access Grid Support Centre. This data will be used evaluate usage more accurately, in terms of the time the service is used, audience sizes and environmental impact, and will be used to drive an overall improvement in Advanced Video Conferencing meetings through more targetted support by the Access Grid Support Centre staff of potentially failing nodes and meetings.
Exposing VLE Data – University of Cambridge – a project that will bring together activity and attention data for Cambridge’s institutional virtual learning environment (based on the Sakai software) to create useful and informative management reporting including powerful visualisations. These reports will enable the exploration of improvements to both the VLE software and to the institutional support services around it, including how new information can inform university valuation of VLEs and strategy in this area. The project will also release anonymised datasets for use in research by others.
Library Impact Data – Huddersfield University – the aim of this project is to prove a statistically significant correlation between library usage and student attainment. The project will collect anonymised data from University of Bradford, De Montfort University, University of Exeter, University of Lincoln, Liverpool John Moores University, University of Salford, Teesside University as well as Huddersfield. By identifying subject areas or courses which exhibit low usage of library resources, service improvements can be targeted. Those subject areas or courses which exhibit high usage of library resources can be used as models of good practice.
RISE – Open University – As a distance-learning institution, students, researchers and academics at the Open University mainly access the rich collection of library resources electronically. Although the systems used track attention data this data isn’t used to help users search. RISE aims to exploit the unique scale of the OU (with over 100,000 annual unique users of e-resources) by using attention data recorded by EZProxy to provide recommendations to users of the EBSCO Discovery search solution. RISE will then aim to release that data openly so it can be used by the community.
Salt – University of Manchester – SALT will experiment with 10 years of library circulation data from the John Rylands University Library to support humanities research by making underused “long tail” materials easier to find by library users. The project will also develop an api to enable others to reuse the circulation data and will explore the possibility of offering the api as a national shared service.
Shared OpenURL Data – EDINA – This is an invited proposal by JISC which takes forward the recommendations made in scopingactivity related to collection and use of OpenURL data that might be available from institutionalOpenURL resolvers and the national OpenURL router shared service which was funded between December 2008 – April 2009 by JISC. The work will be done in two stages: an initial stage exploring the steps required to make the data available openly, followed by making the data available and implementation of prototype service(s) using the data.
STAR-Trak – Leeds Metropolitan University – This project will provide an application (STAR-Trak:NG) to highlight and manage interventions with students who are at risk of dropping out, identified primarily by mining student activity data held in corporate systems.
UCIAD – Open University – UCIAD will investigate the use of semantic technologies for integrating user activity data from different systems within a University. The objective is to scope and prototype an open, pluggable software framework based on such semantic models, aggregating logs and other traces from different systems as a way to produce a comprehensive and meaningful overview of the interactions between individual users and a university.
The JISC RAPTOR project is investigating ways to explore usage of e-resources.
PIRUS is a project investigating the extension of Counter statistics to cover article level usage of electronic journals.
The Journal Usage Statistics Portal is a project that is developing a usage statistics portal for libraries to manage statistics about electronic journal usage.
The Using OpenURL activity data project will take forward the recommendations of the Shared OpenURL Data Infrastructure Investigation to further explore the value and viability of releasing OpenURL activity data for use by third parties as a means of supporting development of innovative functionality that serves the UK HE community.
The Business Intelligence Programme is funded by JISC’s Organisational Support committee in line with its aim to work with managers to enhance the strategic management of institutions and has funded projects to further explore the issues encountered within institutions when trying to progress BI. (See also JISC’s recently commissioned study into the information needs of senior managers and current attitudes towards and plans for BI.)
Enabling Benchmarking Excellence – Durham University – This project proposes to gather a set of metadata from Higher Education institutions that will allow the current structures within national data sets to be mapped to department structures within each institution. The eventual aim is to make comparative analysis far more flexible and useful to all stakeholders within the HE community. This is the first instance where such a comprehensive use of meta-data to tie together disparate functional organisations has been utilised within the sector, making the project truly innovative.
BIRD – Business Intelligence Reporting Dashboard – UCLAN – Using the JISC InfoNet BI Resource for guidance, this project will work with key stakeholders to re-define the processes that deliver the evidence base to the right users at the right time and will subsequently develop the BI system using Microsoft SharePoint to deliver the user interface (linked to appropriate data sets through the data warehouse). We will use this interface to simplify the process for requesting data/analysis and will provide personalisation facilities to enable individuals to create an interface that provides the data most appropriate to their needs.
Bolt-CAP – University of Bolton – Using the requirements of HEFCE TRAC as the base model, the JISC Business Intelligence Infokit and an Enterprise Architecture approach, this project will consider the means by which effective data capture, accumulation, release and reuse can both meet the needs of decision support within the organisation and that of external agencies.
Bringing Corporate Data to Life – University of East London – The aim of the project is to make use of the significant advances in software tools that utilise in-memory technologies for the rapid development of three business intelligence applications (Student Lifecycle, Corporate Performance and Benchmarking). Information in each application will be presented using a range of fully interactive dashboards, scorecards and charts with filtering, search and drill-down and drill-up capabilities. Managers will be engaged throughout the project in terms of how information is presented, the design of dashboards, scorecards and reports and the identification of additional sources of data.
Business Intelligence for Learning About Our Students – University of Sheffield – The goal of this project is develop a methodology which will allow the analysis of the data in an aggregate way, by integrating information in different archives and enabling users to query the resulting archive knowledge base from a single point of access. Moreover we aim to integrate the internal information with publically available data on socio-economic indicators as provided by data.gov.uk. Our aims are to study, on a large scale, how student backgrounds impact their future academic achievements and to help the University devise evidence informed policies, strategies and procedures targeted to their students.
Engage – Using Data about Research Clusters to Enhance Collaboration – University of Glasgow – The Engage project will integrate, visualise and automate the production of information about research clusters at the University of Glasgow, thereby improving access to this data in support of strategic decision making, publicity, enhancing collaboration and interdisciplinary research, and research data reporting.
IN-GRiD – University of Manchester, Manchester Business School – The project addresses the process of collection, management and analysis of building profile data, building usage data, energy consumption data, room booking data, IT data and the corresponding financial data in order to improve the financial and environmental decision making processes of the University of Manchester through the use of business intelligence. The main motivation for the project is to support decision making activities of the senior management of the University of Manchester in the area of sustainability and carbon emissions management.
Liverpool University Management Information System (LUMIS) – Liverpool University – The University has identified a need for improved Management Information to support performance measurement and inform decision making. MI is currently produced and delivered by a variety of methods including standalone systems and spreadsheets. … The objectives of LUMIS are to design and implement an MI solution, combining technology with data integrity, business process improvement and change management to create a range of benefits.
RETAIN: Retaining Students Through Intelligent Interventions – Open University – The focus will be on using BI to improve student retention at the Open University. RETAIN will make it possible to: include additional datasources with existing statistical methods; use predictive modelling to identify ‘at risk’students.
Supporting institutional decision making with an intelligent student engagement tracking system – University of Bedfordshire – This project aims to demonstrate how the adoption of a student engagement tracking system (intelligent engagement) can support and enhance institutional decision making with evidence in three business intelligence (BI) data subject categories: student data and information, performance measurement and management, and strategic planning.
Visualisation of Research Strength (VoRS) – University of Huddersfield – Many HEIs now maintain repositories containing their researchers‟ publications. They have the potential to provide much information about the research strength of an HEI, as publications are the main output of research. The project aims to merge internal information extracted from an institution‟s publications repository with external information (academic subject definitions, quality of outlets and publications), for input to a visualisation tool. The tool will assist research managers in making decisions which need to be based on an understanding of research strengths across subject areas, such as where to aim internal investment. In the event that the tool becomes a part of a BI resource, It could lead to institution vs institution comparisons and visual benchmarking for research.
(IMHO, if resource recommendation can be improved by the application of “learning analytics”, we’ll be needing metadata used to describe those resources as well as the activity data generated around their use…)
In 2009 JISC and RLUK convened a group of Higher Education library, museum and archive experts to think about what national services were required for supporting online discovery and reuse of collection metadata. This group was called the resource discovery taskforce (RDTF) and … produced a vision and an implementation plan focused on making metadata about collections openly available therefore supporting the development of flexible and innovative services for end users. … This programme of projects has been funded to begin to address the challenges that need to be overcome at the institutional level to realise the RDTF vision. The projects are focused on making metadata about library, museum and archive collections openly available using standards and licensing that allows that data to be reused.
Comet – Cambridge University – The COMET project will release a large sub-set of bibliographic data from Cambridge University Library catalogues as open structured metadata, testing a number of technologies and methodologies including XML, RDF, SPARQL and JSON. It will investigate and document the availability of metadata for the library’s collections which can be released openly in machine-readable formats and the barriers which prevent other data from being exposed in this way. [Estimated amount of data to be made available: 2,200,000 metadata records]
Connecting repositories – Open University – The CORE project aims to make it easier to navigate between relevant scientific papers stored in Open Access repositories. The project will use Linked Data format to describe the relationships between papers stored across a selection of UK repositories, including the Open University Open Research Online (ORO). A resource discovery web-service and a demonstrator client will be provided to allow UK repositories to embed this new tool into their own repository. [Estimated amount of data to be made available: Content of 20 repositories, 50,000 papers, 1,000,000 rdf triples]
Contextual Wrappers – Cambridge University – The project is concerned with the effectiveness of resource discovery based on metadata relating to the Designated collections at the Fitzwilliam Museum in the University of Cambridge and made available through the Culture Grid, an aggregation service for museums, libraries and archives metadata. The project will investigate whether Culture Grid interface and API can be enhanced to allow researchers to explore hierarchical relationships between collections and the browsing of object records within a collection [Estimated amount of data to be made available: 164,000 object records (including 1,000 new/enhanced records), 74,800 of them with thumbnail images for improved resource discovery]
Discovering Babel – Oxford University – The digital literary and linguistic resources in the Oxford Text Archive and in the British National Corpus have been available to researchers throughout the world for several decades. This project will focus on technical enhancements to the resource discovery infrastructure that will allow wider dissemination of open metadata, will facilitate interaction with research infrastructures and the knowledge and expertise achieved will be shared with the community. [Estimated amount of data to be made available: 2,000 literary and linguistic resources in electronic form]
Jerome – University of Lincoln – Jerome began in the summer of 2010, as an informal ‘un-project’, with the aim of radically integrating data available to the University of Lincoln’s library services and offering a uniquely personalised service to staff and students through the use of new APIs, open data and machine learning. This project will develop a sustainable, institutional service for open bibliographic metadata, complemented with well documented APIs and an ‘intelligent’, personalised interface for library users. [Estimated amount of data to be made available: ~250,000 bibliographic record library catalogue, along with constantly expanding data about our available journals and their contents augmented by the Journal TOCs API, and c.3,000 additional records from our EPrints repository]
Open Metadata Pathfinder – King’s College London – The Open Metadata Pathfinder project will deliver a demonstrator of the effectiveness of opening up archival catalogues to widened automated linking and discovery through embedding RDFa metadata in Archives in the M25 area (AIM25) collection level catalogue descriptions. It will also implement as part of the AIM25 system the automated publishing of the system’s high quality authority metadata as open datasets. The project will include an assessment of the effectiveness of automated semantic data extraction through natural language processing tools (using GATE) and measure the effectiveness of the approach through statistical analysis and review by key stakeholders (users and archivists).
Salda – Sussex University – The project will extract the metadata records for the Mass Observation Archive from the University of Sussex Special Collection’s Archival Management System (CALM) and convert them in to Linked Data that will be made publicly available. [Estimated amount of data to be made available: This project will concentrate on the largest archival collection held within the Library, the Mass Observation Archive, potentially creating up to 23,000 Linked Data records.]
OpenArt – York University – OpenART, a partnership between the University of York, the Tate and technical partners, Acuity Unlimited, will design and expose linked open data for an important research dataset entitled “The London Art World 1660-1735″. Drawing on metadata about artists, places and sales from a defined period of art history scholarship, the dataset offers a complete picture of the London art world during the late 17th and early 18th centuries. Furthermore, links drawn to the Tate collection and the incorporation of collection metadata will allow exploration of works in their contemporary locations. The process will be designed to be scalable to much richer and more varied datasets, both at York, Tate and beyond.
I need to find a way of representing the topic areas and interconnections between these projects somehow!
See also this list of projects in the above programmes [JSON] which may be a useful starting point if you need a list of project IDs. I think the short name attribute can be used to identify the project description HTML page name at the end of an appropriate programme path?
If you’re using a particular tag to aggregate content around a particular course or event, what do the other tags used to bookmark those resource tell you about that course or event?
In a series of recent posts, I’ve started exploring again some of the structure inherent in socially bookmarked and tagged resource collections (Visualising Delicious Tag Communities Using Gephi, Social Networks on Delicious, Dominant Tags in My Delicious Network). In this post, I’m going to look at the tags that co-occur with a particular tag that may be used to bookmark resources relating to an event or course, for example.
Here are a few examples, starting with cck11, using the most recent bookmarks tagged with ‘cck11′:
The nodes are sized according to degree; the edges represent that the two tags were both applied by an individual user person to the same resource (so if three (N) tags were applied to a resource (A, B, C), there are N!/(K!(N-K)!) pairwise (K=2) combinations (AB, AC, BC; that is, three combinations in this case.).
Here are the tags for lak11 – can you tell what this online course is about from them?
Finally, here are tags for the OU course T151; again, can you tell what the course is most likely to be about?
Here’s the Python code I used to generate the gdf network definition files used to generate the diagrams shown above in Gephi:
import simplejson, urllib def getDeliciousTagURL(tag,typ='json', num=100): #need to add a pager to get data when more than 1 page return "http://feeds.delicious.com/v2/json/tag/"+tag+"?count=100" def getDeliciousTaggedURLTagCombos(tag): durl=getDeliciousTagURL(tag) data = simplejson.load(urllib.urlopen(durl)) uniqTags= tagCombos= for i in data: tags=i['t'] for t in tags: if t not in uniqTags: uniqTags.append(t) if len(tags)>1: for i,j in combinations(tags,2): print i,j tagCombos.append((i,j)) f=openTimestampedFile('delicious-tagCombos',tag+'.gdf') header='nodedef> name VARCHAR,label VARCHAR, type VARCHAR' f.write(header+'\n') for t in uniqTags: f.write(t+','+t+',tag\n') f.write('edgedef> tag1 VARCHAR,tag2 VARCHAR\n') for i,j in tagCombos: f.write(i+','+j+'\n') f.close() def combinations(iterable, r): # combinations('ABCD', 2) --> AB AC AD BC BD CD # combinations(range(4), 3) --> 012 013 023 123 pool = tuple(iterable) n = len(pool) if r > n: return indices = range(r) yield tuple(pool[i] for i in indices) while True: for i in reversed(range(r)): if indices[i] != i + n - r: break else: return indices[i] += 1 for j in range(i+1, r): indices[j] = indices[j-1] + 1 yield tuple(pool[i] for i in indices)
Next up? I’m wondering whether a visualisation of the explicit fan/network (i.e. follower/friend) delicious network for users of a given tag might be interesting, to see how it compares to the ad hoc/informal networks that grow up around a tag?