Who Pays for Academic Publishing? Some Data Trails…

A couple of days ago, I came across a dataset on figshare (a data sharing site) detailing the article processing charges (APCs) paid by the University of Portsmouth to publishers in 2014. After I casually (lazily…;-) remarked on the existence of this dataset via Twitter, Owen Stephens/@ostephens referred me to a JISC project that is looking at APCs in more detail, with prototype data explorer here: All APC demonstrator [Github repository].

The project looks as if it is part of Jisc Collections’ look at the Total Cost of Ownership in the context of academic publishing, summing things like journal subscription fees along side “article processing charges” (which I’d hope include page charges?).

If you aren’t in academia, you may not realise that what used to be referred to as ‘vanity publishing’ (paying to get your first novel or poetry collection published) is part of the everyday practice of academic publishing. But it isn’t called that, obviously, because your work also has to be peer reviewed by other academics… So it’s different. It’s “quality publishing”.

Peer review is, in part, where academics take on the ownership of the quality aspects of academic publishing, so if the Total Cost of Ownership project is trying to be relevant to institutions and not just to JISC, I wonder if there should also be columns in the costing spreadsheet relating to the work time academics spend reviewing other peoples’ articles, editing journals, and so on. This is different to the presentational costs, obviously, because you can’t just write paper and submit it, you have to submit it in an appropriately formatted document and “camera ready” layout, which can also add a significant amount of time to preparing a paper for publication. So you do the copy editing and layout too. And so any total costing to an academic institution of the research publishing racket should probably include this time too. But that’s by the by.

The data that underpins the demonstrator application was sourced from a variety of universities and submitted in spreadsheet form. A useful description (again via @ostephens) of the data model can be found here: APC Aggregation: Data Model and Analytical Usage. Looking at it it just seems to cover APCs.

APC data relating to the project can be found on figshare. I haven’t poked around in the demonstrator code or watched its http traffic to see if the are API calls on to the aggregated data that provide another way in to it.

As well as page charges, there are charges associated with subscription fees to publishers. Publishers don’t like this information getting out on grounds of commercial sensitivity, and universities don’t like publishing it presumably on grounds of bringing themselves into disrepute (you spend how much?!), but there is some information out there. Data from a set of FOI requests about journal subscriptions (summarised here), for example. If you want to wade through some of the raw FOI responses yourself, have a look on WhatDoTheyKnow: FOI requests: “journal costs”.

Tim Gowers also wrote compellingly about his FOI escapades trying to trying down journal subscription costs data: Elsevier journals – some facts.

Other possible sources include a search engine that allows you to rank journals by price per article or citation (data and information sources).

This is all very well, but is it in anyway useful? I have no idea. One thing I imagined that might be quite amusing to explore was the extent to which journal subscriptions paid their way (or were “cost effective”). For example, looking at institutional logs, how often are (articles from) particular journals being accessed or downloaded either for teaching or research purposes? (Crudely: teaching – access comes from a student account; research – access from a research account.) On the other hand, for the research outputs of the institution, how many things are being published into a particular journal, and how many citations appear in those outputs to other publications.

If we take the line that use demonstrates value, and use is captured as downloads, publications into, or references into. (That’s very crude, but then I’m approaching this as a possible recreational data exercise, not a piece of formal research. And yes – I know, journals are often bundled up in subscription packages together, and just like Sky blends dross with desirable channels in its subscription deals, I suspect academic publishers do too… But then, we could start to check these based on whether particular journals in bundle are ever accessed, ever referenced, ever published into within a particular organisation, etc. Citation analysis can also help here – for example, if 5 journals all heavily cite each other, and one publisher publishes 3 of those, it could makes sense for them to bundle the journals two into one package and the third into another, so if you’re researching topics that are reported by heavily linked articles across those journals, you can essentially force people researching that topic into subscribing to both packages. Without having a look at citation network analyses and subscription bundles, I can’t check that outlandish claim of course;-)

Erm… that’s it…

PS see also Evaluating big deal journal bundles (via @kpfssport)

PPS for a view from the publishers’ side on the very real costs associated with publishing, as well as a view on how academia and business treat employment costs and “real” costs in rather contrasting ways, see Time is Money: Why Scholarly Communication Can’t Be Free.

Licensing and Tracking Online Content – News and OERs

Trying to pull a quote from the FreePint/fumsi article “Frictionless sharing” – exploring the changes to Facebook yesterday, I was presented with this pop up dialogue:

I just tried to copy some text from fumsi.com, and here's what I saw...

Clicking “No” meant that rather than grabbing the text I was trying to copy into my clipboard, no copying action took place (when I tried to paste the content, the thing that was pasted was the text I had last successfully copied and pasted…)

Clicking “Yes” took me to another dialogue, shown here in two parts. Firstly, a preview of the text I was trying to copy, and the price charged for reusing it:

fumsi: buying the license...

The “Do I need a License…” line actually ended with a link to Fair Use Statement: Do I need a license to republish an excerpt from this article?.

And the second part – the payment bit:

fumsi - buying a license pat 2

I didn’t go through with the purchase, so I can’t say what happens next, or what sort of embed code I get as a result. (I did, however, View Source, and just check that I could copy as much of the content from the original post as I wanted…;-)

The Fair Use statement links to the site that provides the technology behind the popup, iCopyright:

iCopyright provides a comprehensive suite of services to publishers to help them protect, promote and profit from their content. … With one simple implementation of the iCopyright tag, publishers may take advantage of all of these services. No other content licensing solution comes close to matching the iCopyright platform.

Designed to discourage individuals and organizations from using your content without permission, or exceeding the terms of their original license, iCopyright’s peer-policing and license authentication feature allows those who receive content to know whether a proper license has been obtained. People won’t pirate your content if anyone can verify whether it is an authorized copy.

The Feed & Tag Syndication service enables you to license feeds of your own copyrighted content to other publishers, bloggers and websites — instantly! Your iCopyright toolbar and licensing services are embedded in the feed of your content when displayed on subscribing web sites. You earn new revenue for each page view and share in all secondary uses according to terms you specify. Similarly, you can obtain a licensed feed of content from other publishers in the iCopyright network to enhance your own original content. And, of course, you earn a revenue share on all licensed reuses of that content on your site.
Tag-Only Syndication allows you to add iCopyright tags to content you license from third-party content producers who may or may not be in the iCopyright network. Similarly, it allows you to authorize and manage the tagging of your copyrighted content when it appears on third-party sites, such as aggregators and research databases.
In all cases, iCopyright tracks all licenses, collects the fees, and remits revenue shares among the partners and publishers each month. Publishers never lose control over their brand or their content

And so on…

Wouldn’t it be great? Fan-friggin-tastic. Arsem…

In other news, a press release that’s being rehashed* over various media blogs today announces Newsright, a platform for licensing and tracking the (re)use of online (textual?) news content.

(* Churnalism at work, the bane of many of the larger pop tech and pop media blogs; at least when people just share a link to news releases, they’re admitting all they’re doing is forwarding PR statements, rather than copying and pasting text into a blog post without comment, annotation, curation or contextual linking and then pretending they’re doing more than just syndicating PR fluff.)

According to the Washington Post take on the story (AP, NYTimes, McClatchy, others launch NewsRight online rights clearinghouse), Newsright appears to build an tracking system based around the NewsRegistry [FAQ] launched by AP a couple of years or so ago (here’s a report from the time (July 2009): AP takes action on copyright breaches with new tracking system; The NewsRegistry was based around the hNews microformat, I think? (Ref: No Need for Violence in Microformat War Between hNews, rNews)). Certainly, if you go to the NewsRegistry site today you get quickly led to Newsright.

So why’s this interesting in online education sense? Tracking.

Having released open educational resources onto the web, folk are now getting worried about impact (i.e. one of those forms of return on investment you can use when there isn’t obviously a direct return on the bottom line). I don’t really have much clue about what impact is or is supposed to be, or how it is supposed to be measured (nor, I think, does anyone else), but tracking seems to be one of the gut reaction responses. Which is why the approach taken by Newsright may be of interest to folk wanting to track direct reuse of things like open educational resources.

(For a summary of approaches and technical solutions that have been explored to date, see the JISC/CETIS wiki: Tracking OERs: Technical Approaches to Usage Monitoring for UKOER, the RAPTOR e-resource log analysis toolkit, and JISC’s latest favourite toy in the area, the Learning Registry [press release; I haven’t really paid much attention to this, so don’t really know what it’s all about. Based on this early review (JISC Learning Registry Node Experiment it’s a database that will aggregate metadata and usage and tracking data (the Learning Registry folk call that “paradata”, I think? I guess you get way more budget for coining a neologism than you do an acronym?!) that other people have figured out how to collect (“The Learning Registry itself is not a search engine, a repository, or a registry in the conventional sense. Instead the project aims to produce a core transport network infrastructure and will rely on the community to develop their own discovery tools and services, such as search engines, community portals, recommender systems, on top of this infrastructure. Dan commented: ‘We assume some smart people will do some interesting (and unanticipated) things with the timeline data stream.’ The Learning Registry infrastructure is built on couchDb, a noSQL style ‘document oriented database’ providing a RESTful JSON API.”))]).

And finally… This post started with a look at the ways of policing copyright of text in an online setting. We all know that the current copyright laws aren’t particularly suited to the digital context, so it’s perhaps refreshing that they’re under review. In the UK, the Intellectual Property Office are currently running a consultation around copyright (Consultation on proposals to change the UK’s copyright system). The consultation was launched on December 14th, 2011, and runs until March 21st, 2012, so you still have plenty of time to respond;-) A good starting point may well be Extending Copyright Exceptions for Educational Use [PDF].

Several Million Up for Grabs in JISC ‘Course Data’ Call. On the Other Hand…

I notice that there’s a couple of days left for institutions to get £10k from JISC in order to look at what it would take to start publishing course data via XCRI feeds, with another £40-80k each for up to 80 institutions to do something about it (JISC Grant Funding 8/11: JISC ‘Course Data: Making the most of Course Information’ Capital Programme – Call for Letters of Commitment; see also Immediate Impressions on JISC’s “Course Data: Making the most of Course Information” Funding Call, as well as the associated comments):

As funding for higher education is reduced and the cost to individuals rises, we see a move towards a consumer-led market for education and increased student expectations. One of the key themes about delivering a better student experience discussed in the recent Whitepaper mentions improving the information available to prospective students.

Nowadays, information about a college or university is more likely found via a laptop than in a prospectus. In this competitive climate publicising courses while embracing new technologies is ever more important for institutions.

JISC have made it easier for prospective students to decide which course to study by creating an internationally recognised data standard for course information, known as XCRICAP. This will make transferring and advertising information about courses between institutions and organisations, more efficient and effective.

The focus of this new programme is to enable institutions to publish electronic prospectus information in a standard format for all types of courses, especially online, distance, part time, post graduate and continuing professional development. This standard data could then be shared with many different aggregator agencies (such as UCAS, the National Learning Directory, 14-19 Prospectus websites, or new services yet to be developed) to collect and share with prospective student

All well and good, but:

– there still won’t be a single, centralised directory of UK courses, the sort of thing than can be used to scaffold other services. I know it isn’t perfect, but UCAS has some sort of directory of UK undergrad HE courses that can be applied for via central clearing, but it’s not available as open data.

– the universities are being offered £10k each to explore how they can start to make more of their course data. There seems to be the expectation that some good will follow, and aggregation services will flower around this data (This standard data could then be shared with many different aggregator agencies (such as … new services yet to be developed). I think they might too. (For example, we’re already starting to see sites like Which University? provide shiny front ends to HESA and NSS data.) But why should these aggregation sites have to wait for the universities to scope out, plan, commission, delay and then maybe or maybe not deliver open XCRI feeds. (Hmm, I wonder: does the JISC money place any requirements on universities making their XCRI-CAP feeds available under an open license that allows commercial reuse?)

When we cobbled together the Course Detective search engine, we exploited Google’s index of UK HE websites to provide a search engine that provides a customised search over the course prospectus webpages on UK HE websites. Being a Google Custom Search Engine there’s only so much we can do with it, but whilst we wait for all the UK HEIs to get round to publishing course marketing feeds, it’s a start.

Of course, if we had our own index, we could offer a more refined search service, with all sorts of potential enhancements and enrichment. Which is where copyright kicks in…

…because course catalogue webpages are generally copyright the host institution, and not published under an open license that allows for commercial reuse.

(I’m not sure how the law stands against general indexing for web search purposes vs indexing only a limited domain (such as course catalogue pages on UK HEI websites) vs scraping pages from a limited domain (such as course catalogue pages on UK HEI websites) in order to create a structured search engine over UK HE course pages. But I suspect the latter two cases breach copyright in ways that are harder to argue your way out of then a “we index everything we can find, regardless” web search engine. (I’m not sure how domain limited Google CSEs figure either? Or folk who run searches with the site: limit?))

To kickstart the “so what could we do with a UK wide aggregation of course data?”, I wonder whether UK HEIs who are going to pick up the £10k from JISC’s table might also consider doing the following:

– licensing their their course catalogue web pages with an open, commercial license (no one really understands what non-commercial means…and the aim would be to build sustainable services that help people find courses in a fair (open algorithmic) way that they might want to take…)

– publishing a sitemap/site feed that makes it clear where the course catalogue content lives (as a starter for 10, we have the Course Detective CSE definition file [XML]). That way, the sites could retain some element of control over which parts of the site good citizen scrapers could crawl. (I guess a robots.txt file might also be used to express this sort of policy?)

The license would allow third parties to start scraping and indexing course catalogue content, develop normalised forms of that data, and start working on discovery services around that data. A major aim of such sites would presumably be to support course discovery by potential students and their families, and ultimately drive traffic back to the university websites, or on to the UCAS website. Such sites, once established, would also provide a natural sink for XCRI-CAP feeds as and when they are published (although I suspect JISC would also like to be able to run a pilot project looking at developing an aggregator service around XCRI-CAP feeds as well;-) In addition, the sites might well identify additional – pragmatic – requirements on other sorts of data that might contribute to intermediary course discovery and course comparison sites.

It’s already looking as if the KIS – Key Information Set – data that will supposedly support course choice won’t be as open as it might otherwise be (e.g. Immediate Thoughts on the “Provision of information about higher education”); it would be a shame if the universities themselves also sought to limit the discoverability of their courses via cross-sector course discovery sites…

Drafting a Bid Proposal – Comments?

[Note that I might treat this post a bit like a wiki page… Note to self: sort out a personal wiki]

Call is JISC OER3 – here’s the starter for ten (comments appreciated, both positive and negative; letters of support/expressions of interest welcome; comments relating to possible content/themes, declarations of interest in taking the course, etc etc also welcome, though I will be soliciting these more specifically at some point)

Rapid Resource Discovery and Development via Open Production Pair Teaching (ReDOPT) seeks to draft a set of openly licensed resources for potential (re)use in courses in two different institutions through the real-time production and delivery of an open online short-course in the area of data handling and visualisation. This approach subverts the more traditional technique of developing materials for a course and then retrospectively making them open, by creating the materials in public and in an openly licensed way, in a way that makes them immediately available for “study” as well as open web discovery, and then bringing them back into the closed setting for (re)use. The course will be promoted to the data journalism and open data communities as a free “MOOC” (Massive Online Open Course)/P2PU style course, with a view to establishing an immediate direct use by a practitioner community. The project will proceed as follows: over a 10-12 week period, the core project team will use a variant of the Pair Teaching approach to develop and publish an informal open, online course hosted on an .ac.uk domain via a set of narrative linked resources (each one about the length of a blog post and representing 10 minutes to 1 hour of learner activity) mapping out the project team’s own learning journey through the topic area. The course scope will be guided by a skeletal curriculum determined in advance from a review of current literature, informal interviews/questionnaires and perceived skills and knowledge gaps in the area. The created resources will contain openly licensed custom written/bespoke material, embedded third party content (audio, video, graphical, data), and selected links to relevant third party material. A public custom search engine in the topic area will also be curated during the course. Additional resources created by course participants (some of whom may themselves be part of the project team), will be integrated into the core course and added to the custom search engine by the project team. Part-time, hourly paid staff will be funded to contribute additional resources into the evolving course. Because of timescales involved, this proposal is limited to the production of the draft materials, and does not extend as far as the reuse/first formal use case. Success metrics will therefore be limited to volume and reach of resources produced, community engagement with the live production of the materials, and the extent to which project team members intend to directly reuse the materials produced as a result.

Immediate Impressions on JISC’s “Course Data: Making the most of Course Information” Funding Call

Notes on the JISC Grant Funding Call 8/11: “Course Data: Making the most of Course Information” Capital Programme – Call for Letters of Commitment

This post builds on quick commentaries around other reports in the area of Higher Education course data: Immediate Thoughts on the “Provision of information about higher education” and Getting Access to University Course Code Data (or not… (yet…))). It doesn’t necessarily represent my own opinions, let alone those of my employer.

1. The Joint Information Systems Committee (JISC) and the Higher Education Funding Council for England (HEFCE) invite English Universities and FE colleges (teaching over 400 HE FTEs) to become involved in a new programme of work which will help prepare the sector for increasing demands on course data.

3. Funding is available for projects starting from Monday 12 September 2011 for an initial period of approximately three months. Projects selected to go forward into Stage 2 will continue for an additional 12 to 15 months. All projects must be complete by 29 March 2013.

So how does this fit with the timeline for HEFCE Key Information Set (KIS) development if the called for work is relevant to that? (Note: HEFCE makes available much of the monies disbursed by JISC, and HEFCE is managing the KIS work directly.)

As soon as possible and not later than the end of September 2011 Technical guidance published by HEFCE
January to March 2012 Submission system open for KISs to be published in September 2012: Institutions submit their data to
June to early July 2012 2012 NSS and DLHE data available to HEFCE
July to August 2012 HEFCE merges data submitted by institutions with 2012 NSS and DLHE data. Institutions quality check and sign off their final
September 2012 KISs available for institutions to upload. All KISs to be accessible via institutional web-sites by the end of the month

[HEFCE: Provision of information about higher education]

So given the timings, the JISC second phase work looks as if it is supporting processes relating to, and publication of, different sorts of data to the KIS data, although phase 1 work may be relevant to KIS releases?

10. There are 3 main drivers for making it easier for people to find and compare courses:
– prospective fee paying students want to know more about the academic experience a course will provide and be able to compare this with other courses;
– better informed students are more likely to choose a course that they will complete, and be more motivated to achieve better results;
– increased scrutiny by quality assurance agencies and the Government’s requirement for transparency of publicly funded bodies.

11. JISC have made it easier for prospective students to decide which course to study by creating an internationally recognised data standard for course information, known as XCRI-CAP which is conformant with the new European standard for Advertising Learning Opportunities. This will make transferring and advertising information about courses between institutions and organisations more efficient and effective. Placing this data at a consistent COOL URI makes it easier to find.

So there are two end-user groups in mind for the course related information: prospective students, and the scrutineers. XCRI-CAP relates to the publication of information describing at a high level the subject content of a course, rather than the sorts of “metadata” around courses that the KIS will provide. If we were building a course comparison website, the XCRI-CAP data might provide course descriptions relating to a course, whereas the KIS data would provide student satisfaction ratings, teaching hours, assessment strategies, graduate employment rates and salaries. Pricing related information might be common to both sets?

KIS example

KIS example 2
Example of what the KIS display might look like.

Within the university website, developers will be required to identify which course a page relates to, and then call in the appropriate KIS widget from HEFCE or its agent, presumably by passing parameters relating to: institution identifier; course identifier.

In order to display both XCRI-CAP style data and KIS data on the same third party site web page, the third party will need to be able to identify the course identifier and the university identifier. It will also need a way of identifying which course codes are offered by each institution. In order to satisfy requests from potential applicants searching for a particular topic anywhere in the country*, the third party would ideally have access to an index (or at least a comprehensive list either of courses for each institution, or of institutions by course) that allows it to identify and return the set of (institution, course) pairs for which the course satisifes the search term. (Alternatively, for every request, the third party could query every university separately for related courses, aggregate these responses, and then annotate each result with a link to the corresponding KIS information, or its widget.) If the aggregator was to offer a service whereby potential applicants could rank each result according to one or more KIS data elements, it would need to index associate the KIS data relating to each of the courses identified by the (institution, course) pairs with the corresponding pair, and then use this aggregated data set to present the result to the end user. Again, this could be achieved my making separate requests to the KIS information server, once for each (institution, course) pair; or it could draw on its own index of this data if the information was openly licensed.

* when thinking about course selection, I often have four scenarios in mind: a) I know what course I want to do and where I want to do it; b) I know where I want to go but donlt know what course to do; c) I know what course I want to do, but know where to do it; d) I don’t know what course to do or where to do it…

Just by the by, I wonder if the intention of the HEFCE technical working group is to come up with a structured machine readable standard for communicating the KIS information via the KIS widget? That is, will content represented via the KIS widget be marked up in semantic form, or will semantics at the data representation level have to be reverse engineered from the presentation of the information? Where the KIS renders graphical elements, will the charts be generated directly from data transported to the widget, or will the provision simply be flat image files? (Charts displayed in widgets can come in three flavours: 1) as a flat image file with an arbitrary URL (e.g. kisDataImage4.png) (note that data underlying the graph may be described in surrounding metadata, such as within img attribute tags; 2) as an image file generated from data contained within the URL (e.g. as in the mechanism used by the Google Charts API); 3) through the enhancement of data contained within the page (for example, in a Javascript data scructure or an HTML table).

The KIS data only partially overlaps with the XCRI-CAP data, so I wonder: to what extent will it be possible to JOIN the two data sets (that is, how will we be able to link XCRI-CAP and KIS data? Via HEI+coursecode keys, presumably?)

12. The proposed programme will support the sector to prepare for the increasing demand for course information, and increase the availability of high-quality, accurate information about part-time, online and distance learning opportunities offered by UK HEIs by:

– funding institutions to make the process and technical innovations necessary to release a structured, machine-readable feed of their course-related information, and;
– creating a proof-of-concept aggregator and discovery service to bring together this course information and enable prospective students to search it.

So – what I think the JISC are suggesting is that they are looking to fund work on the “wider information set” of information around courses? That JISC are also looking to create a “proof-of-concept aggregator and discovery service to bring together this course information and enable prospective students to search it” sounds interesting. I wonder how this would sit in the context of:

  1. UCAS (which currently concentrates course listings as a basis for a single point of application for entry (how will entry work for the private universities? Cf. also the OU, which has only just started to make use of the UCAS entry route, and which also supports a significant direct entry route onto modules?)
  2. third party services such as ???Hotcourses
  3. custom search engines such as CourseDetective, which search over online course prospectuses (and which cost approx. 2 volunteered FTE days to put together at a hackday…;-)

It’s also worth bearing in mind that my TSO OpenUp competition entry also suggested the opening up of course code scaffolding data so that third parties could start to create aggregated and enriched datasets around courses, as well as building services on top of that data that would potentially be revenue generating and commercially sustainable…

Just on the topic of “wider information sets”, here’s what the HEFCE KIS consultation report had to say on the matter:

The wider information set
32. Higher education providers already publish a wide range of information about their institution and the courses they deliver. The information published has been considered by QAA in the context of institutional audit (for publicly funded higher education institutions and those privately funded providers that subscribe to QAA) or of Integrated Quality and Enhancement Review (for further education colleges (FECs) offering HE courses) and is subject to a ‘comment’ in that context. The consultation proposed that institutions should make this information more public-facing, noting that published information would, in due course, be subject to a judgement in QAA review processes.

33. It was proposed that this wider information set has two purposes: to provide information about higher education to a wide variety of audiences including:
prospective and current students; students’ parents and advisers; employers; the media; and the institution itself to form part of the evidence used in QAA audit and review.

34. The required information set was presented in the consultation document as a minimum requirement, with institutions continuing to publish as much other information as they wished. Institutions were asked to consider whether any of the information could be presented in more accessible ways.

Information about aspects of course/awards (not available in the KIS):

Information to be provided Level of information Availability
prospectuses, programme guides, module descriptors or similar programme specifications;
results of internal student surveys
links with employers – where employers have input into a course or programme (this could be quite a high-level statement)
partnership agreements, links with awarding bodies/delivery partners.
Course/programme level All apart from results of internal surveys to be publicly available
Results of internal surveys should be available internally

[HEFCE: Provision of information about higher education]

If there is such pent-up demand for aggregated course discovery services, then they should also be able to run as commercial services? One thing that I would argue currently limits innovation in this area is access to a comprehensive qualifcation catologue across the UK. UCAS do have this data, and they do sell it. But I want to play with it and see if I can build a service round it, rather than deciding to quit my job, raise finance, buy the data from UCAS and then see if I can make a go of building a commercial service around the data. UCAS would still benefit from traffic driven to the UCAS site for couse registrations. (But then, if aggregators were also aggregating information about courses in the private sector that supported direct entry and did not require central applications and clearing, aggregators might also start recommending courses outside the scope of UCAS…? Hmmm… Becuase the private universities would probably provide a commercial incentive to drive traffic to them in the form of affiliate fees based on registrations resulting from referrals… Hmmm… This is all starting to put me in mind of things like FOTA, Formula One and the FIA…!)

Another route to a comprehensive course catalogue is through indexing catalogue feeds (akin to website sitemap feeds that detail all the pages on a website to make it easy for search engines to index them) published directly by the universities, such as XCRI-CAP feeds…

13. The availability of useable course data feeds, and the demonstration of the proof-of-concept aggregator, is intended to provide a catalyst to the feeds being used within existing aggregators, catalogues or information, advice and guidance services, or to form the basis of new services.

I’m not sure an incentive is required.. just open access to the data, free in the first instance. (And if companies do start to make money from it, then license fees can kick in. I don’t think people would have a problem with that…)

15. Between September 2011 and March 2013, JISC intends to fund projects that help institutions review and adapt their internal processes to permit easier access to their course data to meet the needs of various stakeholders. As a minimum, and to provide a clear focus for this overarching activity, the programme will concentrate on the implementation of an XCRI-CAP standard system-generated feed. The programme will be staged to ensure maximum benefit is achieved.

If this data is already exposed via online course prospectuses, a developer with data scraper in hand could probably get a large chunk of this data anyway over the next three to six months. (The CourseDetective CSE definition file already provides a basis for anyone wanting to spider university course catalogues… Hmmm… maybe that’s a good reason for me to get to grips with Lucene…? Ideally, course prospectuses would also produce a sitemap (or XCRI) feed providing URLs for all the course pages currently published via the online prospectus to make it easy for third parties to index, or harvest, this data. The provision of semantic markup in a page, whether through RDFa, microformats, microdata or metadata would also simplify the sctaping (i.e. machibe parseability) of the course pages. At the very least, using template based, sensibly structured presentation markup that enforces markup conventions that suggest de facto semantics makes pages reliably scrapeable and provides one way of supporting the harvesting of data (if license conditions allow…)) Because, of course, a major why potentially commercial services don’t just scrape the data to build course comparison sites relates to the licensing/copyright restrictions that may exist, deliberately or by default, over the university prospectus data that is published online… (Not everyone’s a pirate;-)

16. In Stage 1, institutions will review the maturity of their management of course data using the XCRI Self Assessment Framework. This could cover the full course data life cycle, but must include a particular focus on prospectus and course advertising information. Based on the outcomes of this review, institutions will produce an implementation plan for how they will improve processes to, as a minimum, create a system-generated course advertising feed in a XCRI CAP 1.2 format with a COOL URI.

Ah, ha….

So I wonder, would JISC indemnify a third party looking to scrape, aggregate, and republish this data in a standard form via an open API and a permissive license, against actions taken against them by UCAS and the universities for breach of copyright?! I also wonder whether JISC will be providing guidance about what license conditions they expect XCRI-CAP data to be published under? Or is that out of scope?

19. The anticipated outcomes from this programme of work are:
– There will be increased usage of appropriate technology to streamline course data processes leading to:
— More standardised, and therefore comparable, course information in a consistent location making discovery easier.
— Improved quality and therefore more efficient and effective course data.
— Increased ease in finding and comparing courses, especially types of courses that are currently hard to find, such as ones delivered by distance learning.

– Institutions are able to make appropriate and informed decisions about their processes for managing course-related data, leading to a reduced administrative data burden, cost-effective working, and better business intelligence.

Ah… this is actually different to getting the data out there, then, in a way that third parties can use it? It’s more about tweaking systems and processes inside the institution to support the provisioning of data in ways that make it more accessible to third party aggregators? The course aggregator is then a red herring – it’s just there to provide a reference/candidate client/consumer against which the released data can be targeted.

25. There will be a support and synthesis project that will be working with projects from the start of the programme to help them shape their implementation plans in Stage 1 and other outputs in Stage 2 that are of most use to the sector. Projects are expected to engage with the support and synthesis project and to be proactive in sharing outputs throughout the project. This information will be synthesised and shared with the sector; where that information is sensitive, it will be shared in an aggregated, anonymised form.

A “support and synthesis project” within JISC presumably, (i.e. run by the usual suspects)? Rather than sponsoring and indemnifying the open data community on the one hand, or encouraging potential startups on the other, to start building user facing (potential student) services, along with the necessary business model that will make them sustainable, and maybe even profitable?

26. Funding is provided to enable institutions to carry out project work, but also to release key staff to prepare for, take part in and follow up on these programme-level activities. Projects should allow at least 5 person-days in Stage 1 and 10 person-days in Stage 2.

Such is the price of funding HE based developer activities. 5 days project work: £10k. 10 days project work: £40k-80k. So now you know…

Current JISC Projects of Possible Interest to LAK11 Attendees

Mulling over an excellent couple of days in Banff at the first Learning Analytics and Knowledge conference (LAK11; @dougclow’s liveblog notes), where we heard about a whole host of data and anlytics related activites from around the world, I thought it may be worth pulling together descriptions of several current JISC projects that are exploring related issues to add in to the mix…

There are currently at least three programmes that it seems to me are in the general area…


Activity Data

Many systems in institutions store data about the actions of students, teachers and researchers. The purpose of this programme is to experiment with this data with the aim of improving the user experience or the administration of services.

AEIOU – Aberystwyth University – this project will gather usage statistics from the repositories of all Higher Education Institutions in Wales and use this data to present searchers who discover paper from a Welsh repository with recommendations for other relevant papers that they may be interested in. All of this data will be gathered into a research gateway for Wales.

Agtivity – University of Manchester – this project will collect usage data from people using the Advanced Video Conferencing services supported by the Access Grid Support Centre. This data will be used evaluate usage more accurately, in terms of the time the service is used, audience sizes and environmental impact, and will be used to drive an overall improvement in Advanced Video Conferencing meetings through more targetted support by the Access Grid Support Centre staff of potentially failing nodes and meetings.

Exposing VLE Data – University of Cambridge – a project that will bring together activity and attention data for Cambridge’s institutional virtual learning environment (based on the Sakai software) to create useful and informative management reporting including powerful visualisations. These reports will enable the exploration of improvements to both the VLE software and to the institutional support services around it, including how new information can inform university valuation of VLEs and strategy in this area. The project will also release anonymised datasets for use in research by others.

Library Impact Data – Huddersfield University – the aim of this project is to prove a statistically significant correlation between library usage and student attainment. The project will collect anonymised data from University of Bradford, De Montfort University, University of Exeter, University of Lincoln, Liverpool John Moores University, University of Salford, Teesside University as well as Huddersfield. By identifying subject areas or courses which exhibit low usage of library resources, service improvements can be targeted. Those subject areas or courses which exhibit high usage of library resources can be used as models of good practice.

RISE – Open University – As a distance-learning institution, students, researchers and academics at the Open University mainly access the rich collection of library resources electronically. Although the systems used track attention data this data isn’t used to help users search. RISE aims to exploit the unique scale of the OU (with over 100,000 annual unique users of e-resources) by using attention data recorded by EZProxy to provide recommendations to users of the EBSCO Discovery search solution. RISE will then aim to release that data openly so it can be used by the community.

Salt – University of Manchester – SALT will experiment with 10 years of library circulation data from the John Rylands University Library to support humanities research by making underused “long tail” materials easier to find by library users. The project will also develop an api to enable others to reuse the circulation data and will explore the possibility of offering the api as a national shared service.

Shared OpenURL Data – EDINA – This is an invited proposal by JISC which takes forward the recommendations made in scopingactivity related to collection and use of OpenURL data that might be available from institutionalOpenURL resolvers and the national OpenURL router shared service which was funded between December 2008 – April 2009 by JISC. The work will be done in two stages: an initial stage exploring the steps required to make the data available openly, followed by making the data available and implementation of prototype service(s) using the data.

STAR-Trak – Leeds Metropolitan University – This project will provide an application (STAR-Trak:NG) to highlight and manage interventions with students who are at risk of dropping out, identified primarily by mining student activity data held in corporate systems.

UCIAD – Open University – UCIAD will investigate the use of semantic technologies for integrating user activity data from different systems within a University. The objective is to scope and prototype an open, pluggable software framework based on such semantic models, aggregating logs and other traces from different systems as a way to produce a comprehensive and meaningful overview of the interactions between individual users and a university.

See also:

The JISC RAPTOR project is investigating ways to explore usage of e-resources.

PIRUS is a project investigating the extension of Counter statistics to cover article level usage of electronic journals.

The Journal Usage Statistics Portal is a project that is developing a usage statistics portal for libraries to manage statistics about electronic journal usage.

The Using OpenURL activity data project will take forward the recommendations of the Shared OpenURL Data Infrastructure Investigation to further explore the value and viability of releasing OpenURL activity data for use by third parties as a means of supporting development of innovative functionality that serves the UK HE community.

The major influences on the Activity Data programme has been the JISC Mosaic project*final report) and the Gaining Intelligence event (final report).


Business Intelligence

The Business Intelligence Programme is funded by JISC’s Organisational Support committee in line with its aim to work with managers to enhance the strategic management of institutions and has funded projects to further explore the issues encountered within institutions when trying to progress BI. (See also JISC’s recently commissioned study into the information needs of senior managers and current attitudes towards and plans for BI.)

Enabling Benchmarking Excellence – Durham University – This project proposes to gather a set of metadata from Higher Education institutions that will allow the current structures within national data sets to be mapped to department structures within each institution. The eventual aim is to make comparative analysis far more flexible and useful to all stakeholders within the HE community. This is the first instance where such a comprehensive use of meta-data to tie together disparate functional organisations has been utilised within the sector, making the project truly innovative.

BIRD – Business Intelligence Reporting Dashboard – UCLAN – Using the JISC InfoNet BI Resource for guidance, this project will work with key stakeholders to re-define the processes that deliver the evidence base to the right users at the right time and will subsequently develop the BI system using Microsoft SharePoint to deliver the user interface (linked to appropriate data sets through the data warehouse). We will use this interface to simplify the process for requesting data/analysis and will provide personalisation facilities to enable individuals to create an interface that provides the data most appropriate to their needs.

Bolt-CAP – University of Bolton – Using the requirements of HEFCE TRAC as the base model, the JISC Business Intelligence Infokit and an Enterprise Architecture approach, this project will consider the means by which effective data capture, accumulation, release and reuse can both meet the needs of decision support within the organisation and that of external agencies.

Bringing Corporate Data to Life – University of East London – The aim of the project is to make use of the significant advances in software tools that utilise in-memory technologies for the rapid development of three business intelligence applications (Student Lifecycle, Corporate Performance and Benchmarking). Information in each application will be presented using a range of fully interactive dashboards, scorecards and charts with filtering, search and drill-down and drill-up capabilities. Managers will be engaged throughout the project in terms of how information is presented, the design of dashboards, scorecards and reports and the identification of additional sources of data.

Business Intelligence for Learning About Our Students – University of Sheffield – The goal of this project is develop a methodology which will allow the analysis of the data in an aggregate way, by integrating information in different archives and enabling users to query the resulting archive knowledge base from a single point of access. Moreover we aim to integrate the internal information with publically available data on socio-economic indicators as provided by data.gov.uk. Our aims are to study, on a large scale, how student backgrounds impact their future academic achievements and to help the University devise evidence informed policies, strategies and procedures targeted to their students.

Engage – Using Data about Research Clusters to Enhance Collaboration – University of Glasgow – The Engage project will integrate, visualise and automate the production of information about research clusters at the University of Glasgow, thereby improving access to this data in support of strategic decision making, publicity, enhancing collaboration and interdisciplinary research, and research data reporting.

IN-GRiD – University of Manchester, Manchester Business School – The project addresses the process of collection, management and analysis of building profile data, building usage data, energy consumption data, room booking data, IT data and the corresponding financial data in order to improve the financial and environmental decision making processes of the University of Manchester through the use of business intelligence. The main motivation for the project is to support decision making activities of the senior management of the University of Manchester in the area of sustainability and carbon emissions management.

Liverpool University Management Information System (LUMIS) – Liverpool University – The University has identified a need for improved Management Information to support performance measurement and inform decision making. MI is currently produced and delivered by a variety of methods including standalone systems and spreadsheets. … The objectives of LUMIS are to design and implement an MI solution, combining technology with data integrity, business process improvement and change management to create a range of benefits.

RETAIN: Retaining Students Through Intelligent Interventions – Open University – The focus will be on using BI to improve student retention at the Open University. RETAIN will make it possible to: include additional datasources with existing statistical methods; use predictive modelling to identify ‘at risk’students.

Supporting institutional decision making with an intelligent student engagement tracking system – University of Bedfordshire – This project aims to demonstrate how the adoption of a student engagement tracking system (intelligent engagement) can support and enhance institutional decision making with evidence in three business intelligence (BI) data subject categories: student data and information, performance measurement and management, and strategic planning.

Visualisation of Research Strength (VoRS) – University of Huddersfield – Many HEIs now maintain repositories containing their researchers‟ publications. They have the potential to provide much information about the research strength of an HEI, as publications are the main output of research. The project aims to merge internal information extracted from an institution‟s publications repository with external information (academic subject definitions, quality of outlets and publications), for input to a visualisation tool. The tool will assist research managers in making decisions which need to be based on an understanding of research strengths across subject areas, such as where to aim internal investment. In the event that the tool becomes a part of a BI resource, It could lead to institution vs institution comparisons and visual benchmarking for research.


Infrastructure for Resource Discovery

(IMHO, if resource recommendation can be improved by the application of “learning analytics”, we’ll be needing metadata used to describe those resources as well as the activity data generated around their use…)

In 2009 JISC and RLUK convened a group of Higher Education library, museum and archive experts to think about what national services were required for supporting online discovery and reuse of collection metadata. This group was called the resource discovery taskforce (RDTF) and … produced a vision and an implementation plan focused on making metadata about collections openly available therefore supporting the development of flexible and innovative services for end users. … This programme of projects has been funded to begin to address the challenges that need to be overcome at the institutional level to realise the RDTF vision. The projects are focused on making metadata about library, museum and archive collections openly available using standards and licensing that allows that data to be reused.

Comet – Cambridge University – The COMET project will release a large sub-set of bibliographic data from Cambridge University Library catalogues as open structured metadata, testing a number of technologies and methodologies including XML, RDF, SPARQL and JSON. It will investigate and document the availability of metadata for the library’s collections which can be released openly in machine-readable formats and the barriers which prevent other data from being exposed in this way. [Estimated amount of data to be made available: 2,200,000 metadata records]

Connecting repositories – Open University – The CORE project aims to make it easier to navigate between relevant scientific papers stored in Open Access repositories. The project will use Linked Data format to describe the relationships between papers stored across a selection of UK repositories, including the Open University Open Research Online (ORO). A resource discovery web-service and a demonstrator client will be provided to allow UK repositories to embed this new tool into their own repository. [Estimated amount of data to be made available: Content of 20 repositories, 50,000 papers, 1,000,000 rdf triples]

Contextual Wrappers – Cambridge University – The project is concerned with the effectiveness of resource discovery based on metadata relating to the Designated collections at the Fitzwilliam Museum in the University of Cambridge and made available through the Culture Grid, an aggregation service for museums, libraries and archives metadata. The project will investigate whether Culture Grid interface and API can be enhanced to allow researchers to explore hierarchical relationships between collections and the browsing of object records within a collection [Estimated amount of data to be made available: 164,000 object records (including 1,000 new/enhanced records), 74,800 of them with thumbnail images for improved resource discovery]

Discovering Babel – Oxford University – The digital literary and linguistic resources in the Oxford Text Archive and in the British National Corpus have been available to researchers throughout the world for several decades. This project will focus on technical enhancements to the resource discovery infrastructure that will allow wider dissemination of open metadata, will facilitate interaction with research infrastructures and the knowledge and expertise achieved will be shared with the community. [Estimated amount of data to be made available: 2,000 literary and linguistic resources in electronic form]

Jerome – University of Lincoln – Jerome began in the summer of 2010, as an informal ‘un-project’, with the aim of radically integrating data available to the University of Lincoln’s library services and offering a uniquely personalised service to staff and students through the use of new APIs, open data and machine learning. This project will develop a sustainable, institutional service for open bibliographic metadata, complemented with well documented APIs and an ‘intelligent’, personalised interface for library users. [Estimated amount of data to be made available: ~250,000 bibliographic record library catalogue, along with constantly expanding data about our available journals and their contents augmented by the Journal TOCs API, and c.3,000 additional records from our EPrints repository]

Open Metadata Pathfinder – King’s College London – The Open Metadata Pathfinder project will deliver a demonstrator of the effectiveness of opening up archival catalogues to widened automated linking and discovery through embedding RDFa metadata in Archives in the M25 area (AIM25) collection level catalogue descriptions. It will also implement as part of the AIM25 system the automated publishing of the system’s high quality authority metadata as open datasets. The project will include an assessment of the effectiveness of automated semantic data extraction through natural language processing tools (using GATE) and measure the effectiveness of the approach through statistical analysis and review by key stakeholders (users and archivists).

Salda – Sussex University – The project will extract the metadata records for the Mass Observation Archive from the University of Sussex Special Collection’s Archival Management System (CALM) and convert them in to Linked Data that will be made publicly available. [Estimated amount of data to be made available: This project will concentrate on the largest archival collection held within the Library, the Mass Observation Archive, potentially creating up to 23,000 Linked Data records.]

OpenArt – York University – OpenART, a partnership between the University of York, the Tate and technical partners, Acuity Unlimited, will design and expose linked open data for an important research dataset entitled “The London Art World 1660-1735”. Drawing on metadata about artists, places and sales from a defined period of art history scholarship, the dataset offers a complete picture of the London art world during the late 17th and early 18th centuries. Furthermore, links drawn to the Tate collection and the incorporation of collection metadata will allow exploration of works in their contemporary locations. The process will be designed to be scalable to much richer and more varied datasets, both at York, Tate and beyond.

See also:
Linked Open Copac Archives Hub
Linking University Content for Education and Research Online

I need to find a way of representing the topic areas and interconnections between these projects somehow!

See also this list of projects in the above programmes [JSON] which may be a useful starting point if you need a list of project IDs. I think the short name attribute can be used to identify the project description HTML page name at the end of an appropriate programme path?

Project Pitching: JISC Elevator Concept

A month or two ago I submitted an (unsuccessful:-( application to a joint Mozilla/Shuttlewoth Foundation call for an open education fellowship (though I believe it will call again – and accept resubmissions, later this year). The application process was via the Mozilla Drumbeat Platform, which encourages people to submit and join projects related to the idea of the open web:

Mozilla drumbeat

As support grows around a project, it can get the attention of not only other people (maybe developers, maybe participants, maybe users) but also might attract the interest of Mozilla Foundation – and with it, funding…

At the< JISC Innovation Forum today, a similar approach was pitched for attracting small amounts of JISC rapid innovation funding: the JISC Elevator (mock-up).

JIISC Elevator - http://elevator.triplegeek.com/

Unlike the traditional JISC project application route, where proposals are submitted in response to project calls, the idea that seems to be being proposed here is to cheaply capture ideas for proposals that require only a small amount of funding (£5k-£30k) – simply provide a short elevator pitch and a pitching video:

JISC Elevator

Low cost, easy submission is essential for this sort of pitch: if it costs, in practical terms, £2k-£3k of internally costed funds to make it pitch, that can be 60% of the amount pitched for…

One question that does come to mind is how this money might get spent? If you have an immediate development need, requiring 1-2 weeks of developer effort, for example, to prove some concept or other, and in return develop a working, if not necessarily production ready, service, where will that developer effort come from. Although I never really understand why, a few days work is the sort of work it can be really hard to schedule… So maybe we need slack capacity in the system, capacity that can work on short, itch scratching or blue skies doodles (not projects, doodles: always lightweight, small pieces lightly joined style applications or services), and pitch next step ideas around those doodles to elevator?

That said – I can think of several little projects that would benefit from a small amount funding…;-)

The Elevator idea is very much up for discussion, and very much in the spirit of the idea, capturing feedback on Uservoice… There’s also a poll to capture more immediate feedback – find it here: Is JISC Elevator a good idea? poll.

What Makes a Good API? A Call to Arms…

One of the sessions I attended at last year’s CETIS get together was the UKOLN organised Technological Innovation in a World of Web APIs session (see also My CETIS 2008 Presentations and What Makes A Good API? Doing The Research Using Twitter).

This session formed part of a project being co-ordinated by UKOLN’s homeworking Marieke GuyJISC “Good APIs” project (project blog) – which is well worth getting involved with because it might just help shape the future of JISC’s requirements when they go about funding projects…

(So if you like SOAP and think REST is for wimps, keep quiet and let projects that do go for APIs continue to get away with proposing overblown, unfriendly, overengineered ones…;-)

So how can you get involved? By taking this survey, for one thing:

The ‘Good APIs’ project aims to provide JISC and the sector with information and advice on best practice which should be adopted when developing and consuming APIs.

In order to collate information the project team have written a very brief research survey asking you about your use of APIs (both providing and consuming).

TAKE THE What makes a good API? SURVEY.

I don’t know if the project will have a presence at the JISC “Developer Happiness” Days (the schedule is still being put together) but it’d be good if Marieke or Brian were there on one of the days (at least) to pitch in some of the requirements of a good API that they’ve identified to date;-)

PS here’s another fun looking event – Newcastle Maker Faire.