Slides From UKSG…

Earlier this week, I spent three very enjoyable days in Edinburgh at UKSG (tag uksg10), the UK Serials Group conference which brings together librarians and vendors of journal subscriptions and discovery services.

Walking round the exhibition, a couple of things jumped out at me. Firstly, a lot of the search/discovery interfaces that various vendors were pushing are still not really doing much on the relevancy/results ranking front or personal recommendations (for some of my previous thoughts on this, see OPAC Ground Truth… and for some ideas about new ranking factors see
JISC MOSAIC Competition Entries – Imaginings Around the Use of Library Loans Data
). Everyone was happy to show me their advanced search interface forms, though…. (which gets a personal yawn from me… If I want to use an advanced search, I’ll usually drop a limit tag into the search box, or hack the URL. For advanced searching, I guess I prefer command line to a form!;-)

The second thing that jumped out at me was the lack of technical knowledge on the part of the vendors and some of the buyers. “Is there an API for that?” is not, apparently, polite conversation in such circles…

As ever, the conversation is what makes an event, and I’ll try to pop a few notes about some of the conversations I found myself in over the next few days. But for now, here are the slides I used in my (rather rushed) presentation…

As ever, I guess you had to be there…

Infoskills for the Future – If You Can’t Handle Information, Get Out of the Library

On Wednesday, transport willing, I’ll be giving a short presentation at an East of England Imnformations Services Group event:

Whilst preparing the slides, I listened in to Martin Bean’s opening keynote from JISC2010, and was interested to hear what he had to say about libraries:

That is, folk are gonna need help with sensemaking around information and with identfiying trusted [trustable?] content.

I had intended to put together a talk about the challenges faced by the OU library, as I see them, as it starts to offer a comprehensive digital library service for our students; but the VC’s talk got me thinking again about some of the issues I touched on in my Arcadia brown bag lunch talk about the skills training gap that I think is building up around digital tools:

Anyway, here’s a preview of my slides for Wednesday (subject, as ever, to change…;-)

(UPDATE, post presentation: a couple of folk commented on the slide aesthetic – it’s inspired in part by the Presentation Zen approach (blog), in part by Lawrence Lessig. As far as the Digital Economy Bill/Act goes, here’s a summary. And for Doctorow on book ownership, listen here.)

The content diverges somewhat from the title (oops!) but I feel the need to have another crack at exploring what exactly are the skills I think we’re failing to articulate…

As ever, it’s rich in images that don’t make a lot of sense without my commentary. I also toyed with the idea of embedding a few audio and video clips in the presentation, but as time is tight, I think I’ll omit probably have to omit those on the day:-(

One of the clips I had thought of using was Martin Bean’s quote embedded above. Another was from a recent TEDxNYED talk by Jeff Jarvis (via @ajcann) in which he talks about the move educators – like journalists – may have to make towards a curatorial role.

For libraries, too, there is need to consider the new curatorial role of the library (e.g. as recently observed by Lorcan Dempsey: Lam-inating libraries…). But maybe more important is the help that librarians can give to academics, and researchers, who are building their own collections, and wanting to curate their own “exhibitions”?

(Just by the by, I’ve started putting together the images I use in my presentations in flickr galleries. In part, this means I have ready access from source to images I’ve used before if I want to use them again… I’m also toying with the idea of trying to annotate the images in the gallery with “presenter notes” or “presentation design notes” as a way of capturing some of the things I was thinking about/looking for as I was selecting the images. If I was doing an Art GCSE, I guess this would correspond to my notebook…)

One of the reasons I considered adding the audio clips to the presentation was because they were to hand and I heard things in them potentially relevant to, and reusable in, the presentation I was preparing. (The use of the clips would also slow the presentation down a little – something I’m looking for strategies to help me with. Fewer slides may help here, of course…!;-) To make (re) use of them, I wired the headphone out to the audio in on my laptop, and played through the relevant parts of the original videos whilst capturing the audio track using Audacity. A little bit of editing in that environment cropped the audio clip to just the bit I needed, and also allowed me to tidy it up a little (removing ums and ahs, for example). For hosting purposes, I’ve used Audioboo. I’m not sure this is really in the spirit of Audioboo, but again, it was a pragmatic choice;-) Now I haven’t received any training in this (as any audiophile will probably be able to tell you!) but it got the job sort of done…

So, is that a skill the library can – or should – help me with, if required? In the OU’s case, I think getting help with that sort of activity would fall under the Digilab remit.

At the very least, is/could/should it be the role of the library to help me develop effective strategies for discovering audio content (nothwithstanding what the VC had to say about moving on from search and discovery)? Discovering audio content from OER repositories, maybe?

And what about help or advice on producing visualisations, such as visualisation of volcanic ash data from official advisory notes? Would that count too? (That was a request typical of the ones I receive on a weekly basis from various parts of the OU…) I captured my hacked attempt at working through that problem in the post Steps Towards a Volcanic Ash Advisory Google Maps Mashup Using Met Office Data, which also includes several bad assumptions I made in the original version of the post, so maybe I should pick through that to identify some of the skills involved?

Web Lego And Format Glue, aka Get Yer Mashup On

Another week, another presentation… I dunno about death by Powerpoint on the audience side, but even though I’ve started finding ways of reusing slides, it still takes me forever and a day (well, 4-6 hrs), to put a slide deck together… One day – one day – I’ll have to produce a presentation I can just give over and over again… ;-)

Anyway, here are slides for a presentation I’m due to give tomorrow (Thursday) at the University of Portsmouth. The plan is for a 1 hr “lecture”, and a 1 hr hands-on workshop session. The slides are for the talk – but also set the scene for the practical activity…

So what’s the practical? (For anyone reading this in advance of attending the session, I suggest you get yourself sorted with accounts for Google/Google Spreadsheets, Yahoo/Yahoo Pipes and IBM/Many Eyes and Many Eyes Wikified.) As time is tight, I suggest the best way in is to just try recreating some of the demos shown in the presentation above, and then going from there…. A good alternative would be to start working through this intro to Yahoo Pipes:

For the more adventurously minded, looking through the pipework category on this blog might provide a little more inspiration…

If it’s Google Spreadsheets hacks you’re after, searching for Google spreadsheet import formula should turn up some example posts…

Many Eyes and Many Eyes Wikified demos can be found by search for “Many Eyes”

For treating Google Spreadsheets as a database, here’s the Guardian Datastore Explorer, and here’s half a how to about using it.

NB there are are actually two ways of using a Google Spreadsheet as a database: form a third party page via a query API; and within a spreadsheet using a =QUERY() formula;

JQuery you should be able to find, but here are a handful of Javascript visualisation libraries that you might also like to try out at some point…;-) To find my list of flash visualisation libraries, look at the URL and use your initiative…

I was hoping to put together a couple of rather better structured self-paced workshop examples, but I’m afraid I’ve run out of time for today…:-(

Imaginings Around Emerging Infoskills for Digital Librarians

Earlier this week, I had the privilege of leading a couple of sessions in the OU Library for Library staff on exploring the emerging digital skills terrain, and the extent to which it could, or should, fall to a new wave of Digital Librarians to support related service delivery and skills development.

The first session focused on supporting digital and networked researchers, the second on the sort of practical infoskills that I rely on a day to day basis that I get the feeling aren’t really being provided or developed as much as they should be…

Both sessions were structured in a similar way – I rambled on for too long with some background/scene setting talk, and then provided a set of “challenges” or discussion topics for the four groups of five or so to argue over for 30-40 minutes or so. A final plenary provided an opportunity for each table to report back on the outcome of their discussion.

The morning session opened with a too long presentation on the Social’n’Digital Researcher:

before opening up the following series of Social Researcher Challenges:

The afternoon session opened with a quick review of some of the things that I learned how to do over the last couple of months – The Digital Librarian:

before setting the following Digital Infoskills Challenges:

(OU LIbrary folk – I will post some solutions to the Library wiki at some point, I promise ;-)

It was the first time I’ve run this sort of event, and learned a lot from doing so, so if I ever get the chance again, here are some of the things I’d do differently:

1) talk less at the start;
2) do an icebreaker activity right at the start to set the scene for the scene setting presentation, with a view to: a) trying to find out some common assumptions about what folk are doing there and what they expect to get out of the session; b) start to (re)shape their expectations about what the event is about;
3) provide a short slide to be displayed throughout the activity to remind folk of what the challenges are designed to achieve;
4) be clearer about what the point of doing the challenges is/what they are designed to achieve from the very start…!;-)
5) have takeaway/follow up training material prepared in advance for anyone motivated enough to pick up and run with it immediately…

I’ll try to do another post summarising a bit more of the what the outcomes of the session were at a later time – but for now, I just wanted to get the posts archived here, along with notes-to-self about how to do it differently next time… if there is a next time…!

PS Richard Nurse has done a write-up of the sessions here: ‘Digital Librarians’

PPS I think this whole area is something that needs exploring, so if any other libraries would like to brave this sort of session, feel free to get in touch…

PPPS a couple of people expressed interest in Yahoo Pipes – there’s a Get Started tutorial here: Pipes Book – Imaginings

My Slides from the Data Driven Journalism Round Table (ddj)

Yesterday, I was fortunate enough to attend a Data Driven Journalism round table (sort of!) event organised by the European Journalism Centre.

Here are the slides I used in my talk, such as they are… I really do need to annotate them with links, but in the meantime, if you want to track any of the examples down the best way is to probably just search this blog ;-)

(Readers might also be interested in my slides from News:Rewired (although they make even less sense without notes!))

Although most of the slides will be familiar to longtime readers of this blog, there is one new thing in there: the first sketch if a network diagram showing how some of my favourite online apps can work together based on the online file formats they either publish or consume (the idea being once you can get a file into the network somewhere, you can route it to other places/apps in the network…)

The graph showing how a handful of web apps connect together was generated using Graphiz, with the graph defined as follows:

GoogleSpreadsheet -> CSV;
GoogleSpreadsheet -> “<GoogleGadgets>”;
GoogleSpreadsheet -> “[GoogleVizDataAPI]”;
“[GoogleVizDataAPI]”->JSON;
CSV -> GoogleSpreadsheet;
YahooPipes -> CSV;
YahooPipes -> JSON;
CSV -> YahooPipes;
JSON -> YahooPipes;
XML -> YahooPipes;
“[YQL]” -> JSON;
“[YQL]” -> XML;
CSV->”[YQL]”;
XML->”[YQL]”;
CSV->”<ManyEyesWikified>”;
YahooPipes -> KML;
KML->”<GoogleEarth>”;
KML->”<GoogleMaps>”;
“<GoogleMaps>”->KML;
RDFTripleStore->”[SPARQL]”;
“[SPARQL]”->RDF;
“[SPARQL]”->XML;
“[SPARQL]”->CSV;
“[SPARQL]”->JSON;
JSON-> “<JQueryCharts_etc>”;

I had intended to build this up “live” in a text editor using the GraphViz Mac client to display the growing network, but in the end I just showed a static image.

At the time, I’d also forgotten that there is an experimental Graphviz chart generator made available via the Google Charts API, so here’s the graph generated via a URL using the Google Charts API:

Graphviz chart via Google Charts API

Here’s the chart playground view:

Google Charts API LIve chart playground

PS if the topics covered by the #ddj event appealed to you, you might also be interested in the P2PU Open Journalism on the Open Webcourse, the “syllabus” of which is being arranged at the moment (and which includes at least one week on data journalism) and which will run over 6 weeks, err, sometime; and the Web of Data Data Journalism Meetup in Berlin on September 1st.

Scholarly Communication in the Networked Age

Last week, I was fortunate enough to receive an invitation to attend the Texts and Literacy in the Digital Age: Assessing the future of scholarly communication at the Dutch National Library in Den Haag (a trip that ended up turning into a weekend break in Amsterdam when my flight was cancelled…)

The presentation can be found here and embedded below, if your feed reader supports it:

One thing I have tried to do is annotate each slide with a short piece of discursive text relating to the slide. I need to find a way of linearising slide shows prepared this way to see if I can find a way of generating blog posts from them, which is a task for next year…

The presentation draws heavily on Martin Belam’s news:rewired presentation from 2009 (The tyranny of chronology), as I try to tease out some of the structural issues that face the presentation of news media in an online networked age, and constrast (or complement) them with issues faced by scholoarly publishing.

One of the things I hope to mull over more next year, and maybe communicate in a more principled way rather than via occasional blog posts and tweets, are the ways in which news media and academia can work together to put the news into some sort of deeper context, and maybe even into a learning (resource) context…

Google Apps as a Mashup Environment – Slides from #guug11

FWIW, here are the slides from my presentation on “Mashing Up Google Apps” at the excellent Google Apps UK User Group (#guug11), as hosted by Martin Hamilton at Loughbourough University yesterday.

The “mashup environment” diagram was generated using a desktop version of Graphviz, but it can also be generated using the Google Chart Tools Graphviz chart, as in the example below:

google apps mashup environment

Here’s the “source code” for that image:

digraph googApps {

GoogleSpreadsheet [shape=Msquare]
GoogleCalendar [shape=Msquare]
GoogleMail [shape=Msquare]
GoogleDocs [shape=Msquare]
CSV [shape=diamond]
JSON [shape=diamond]
HTML [shape=diamond]
XML [shape=diamond]
GoogleAppsScript [shape=diamond]
"[GoogleVizDataAPI]" [shape=diamond]
"<GoogleForm>" [shape=doubleoctagon]
"<GoogleGadgets>" [shape=doubleoctagon]
"<GoogleVizDataCharts>" [shape=doubleoctagon]
"<GoogleMaps>" [shape=doubleoctagon]

CSV->URL
HTML->URL
XML->URL
event->GoogleAppsScript
GoogleAppsScript->"<GoogleMaps>"
GoogleAppsScript->GoogleMail
GoogleAppsScript->GoogleCalendar
GoogleAppsScript->GoogleSpreadsheet
GoogleSpreadsheet->GoogleAppsScript
GoogleAppsScript->GoogleDocs
GoogleSpreadsheet->JSON
email->GoogleMail
GoogleMail->email
GoogleDocs->GoogleAppsScript
GoogleCalendar->GoogleAppsScript
"<GoogleForm>"->event
event->GoogleSpreadsheet
time->event
"<GoogleForm>"->GoogleSpreadsheet
URL->GoogleSpreadsheet
GoogleSpreadsheet->"[GoogleVizDataAPI]"
"[GoogleVizDataAPI]"->"<GoogleVizDataCharts>"
GoogleSpreadsheet->"<GoogleGadgets>"
}

And finally, here’s a snapshot of the hashtag community around the event as of mid-morning yesterday:

#guug11 twitter echo chamber

Node colour is related to the total number of followers, and node size is betweenness centrality.

TSO OpenUP Competition – Opening Up UCAS Data

Here’s the presentation I gave to the judging panel at the TSO OpenUp competition final yesterday. As ever, it doesn’t make sense with[out] (doh!) me talking, though I did add some notes in to the Powerpoint deck: Opening up UCAS Course Code Data

(I had hoped Slideshare would be able to use the notes as a transcript, bit it doesn’t seem to do that, and I can’t see how to cut and paste the notes in by hand?:-(

A quick summary:

The “Big Idea” behind my entry to the TSO competition was a simple one – make UCAS course data (course code, title and institution) avaliable as data. By opening up the data we make it possible for third parties to construct services and applications based around complete data skeleton of all the courses offered for undergraduate entry through clearing in a particular year across UK higher education.
The data acts as scaffolding that can be used to develop consumer facing applications across HE (e.g. improved course choice applications) as well as support internal “vertical” activities within HEIs that may also be transferable across HEIs.
Primary value is generated from taking the course code scaffolding and annotating it with related data. Access to this dataset may be sold on in a B2B context via data platform services. Consumer facing applications with their own revenue streams may also be built on top of the data platform.
This idea makes data available that can potentially disrupt the currently discovery model for course choice and selection (but in its current form, not in university application or enrollment), in Higher Education in the UK.

Here are the notes I doodled to myself in preparation for the pitch. Now the idea has been picked up, it will need tightening up and may change significantly! ;-) Which is to say – in this form, it is just my original personal opinion on the idea, and all ‘facts’ need checking…

  1. I thought the competition was as much about opening up the data as anything… So the original idea was simply that it would be really handy to have machine readable access to course code and course name information for UK HE courses from UCAS – which is presumably the closest thing we have to a national catalogue of higher education courses.

    But when selected to pitch the idea, it became clear that an application or two were also required, or at least some good business reasons for opening up this data…

    So here we go…

  2. UCAS is the clearing house for applying to university in the UK. It maintains a comprehensive directory of HE courses available in the UK.

    Postgraduate students and Open University students do not go through UCAS. Other direct entry routes to higher education courses may also be available.

    According to UCAS, in 2010, there were 697,351 applicants with 487,329 acceptances, compared with 639,860 applications and 481,854 acceptances in 2009. [ Slightly different figures in end of cycle report 2009/10? ]

    For convenience, hold in mind the thought that course codes could be to course marketing, what postcodes are for geo related applications… They provide a natural identifier that other things can be associated with.

    Associated with each degree course is a course code. UCAS course codes are also associated with JACS codes – Joint Academic Coding System identifiers – that relate to particular topics of study. “The UCAS course codes have no meaning other than “this course is offered by this institution for this application cycle”.” link]

    “UCAS course code is 4 character reference which can be any combination of letters and numbers.

    Each course is also assigned up to three JACS (Joint Academic Coding System) codes in order to classify the course for *J purposes. The JACS system was introduced for 2002 entry, and replaced UCAS Standard Classification of Academic Subjects (SCAS). Each JACS code consists of a single letter followed by 3 numbers. JACS is divided into subject areas, with a related initial letter for each. JACS codes are allocated to courses for the *J return.

    The JACS system is used by the Higher Education Statistics Agency (HESA), and is the result of a joint UCAS-HESA subject code harmonization project.

    JACS is also used by UK institutions to identify the subject matter of programmes and modules. These institutions include the Department for Innovation, Universities and Skills (DIUS), the Home Office and the Higher Education Funding Council for England (HEFCE).”

    Keywords: up to 10 keywords per course are allocated to each course from a restricted list of just over 4,500 valid keywords.
    “Main keyword: This is generally a broad subject category, usually expressed as a single word, for example ‘Business’.
    Suggested keyword (SUG): Where a search on a main keyword identifies more than 200 courses, the Course Search user is prompted to select from a set of secondary keywords or phrases. These are the more specific ‘Suggested keywords’ attached to the courses identified. For example, ‘Business Administration’ is one of a range of ‘Suggested keywords’ which could be attached to a Business course (there are more than 60 others to choose from). A course in Business Administration would typically have this as the ‘Suggested keyword’, with ‘Business’ as the main keyword.
    However, if a course only has a ‘Suggested keyword’ and not a related ‘Main keyword’, the course will not be displayed in any search under the ‘Main keyword’ alone.

    Single subject: Main keywords can be ticked as ‘Single subject’. This means that the course will be displayed by a keyword search on the subject, when the user chooses the ‘single subject’ option below. You may have a maximum of two keywords indicated as single subjects per course.”

    “Between January and March 2010, approximately 600,000 unique IP addresses access the UCAS course code search function. During the same time period, almost 5 million unique IP addresses accessed the UCAS subject search function.” [link]

    “New courses from 2012 will be given UCAS codes that should not be used for subject classification purposes. However, all courses will still be assigned up to three individual JACS3 codes based on the subject content of the course.

    An analysis of unique IP address activity on the UCAS Course Search has shown that very few searches are conducted using the course code, compared to the subject search function. UCAS Courses Data Team will be working to improve the subject search and course keywords over the coming year to enable potential applicants to accurately find suitable courses.” [link]

    Course code identifiers have an important role to play within a university administrations, for example in marshalling resources around a course, although they are not used by students. (On the other hand, students may have a familiarity with module codes.) Course codes identify courses that are the subject of quality assessment by the QAA. To a certain extent, a complete catalogue of course codes allows third parties to organise offerings based around UK higher education degrees in a comprehensive way and link in to the UCAS application procedure.

  3. If released as open data, and particularly as Linked Open Data, the course data can be used to support:
    – the release of horizontal data across the UK HE sector by HEIs, such as course catalogue information;
    – vertical scaffolding within an institution for elaboration by module codes, which in turn may be associated with module descriptions, reading lists, educational resources, etc.
    – the development across HE of services supporting student choice – for example “compare the uni” type services
  4. At the moment the data is siloed inside UCAS behind a search engine with unfriendly session based URLs and a poor results UI. Whilst it is possible to scrape or crowd-source course code information, such ad hoc collection mechanisms run the danger of being far from complete, which means that bias may be introduced into the collection as a side effect of the collection method.
  5. Making the data available via an API or Linked data store makes it easier for third parties to build course related services of whatever flavour – course comparison sites, badging services, resource recommendation services. The availability of the data also makes it easier for developers within an intsitution to develop services around course codes that might be directly transferable to, or scaleable across, other institutions.
  6. What happens if the API becomes writeable? An appropriately designed data store, and corresponding ingest routes, might encourage HEIs to start releasing the course data themselves in a more structured way.

    XCRI is JISC’s preferred way of doing this, and I think there has been some lobbying of HEFCE from various JISC projects, but I’m not sure how successful it’s been?

  7. Ultimately, we might be able to aggregate data from locally maintained local data stores. Course marketing becomes a feature of the Linked Data cloud.

    Also context of data burden on HEIs, reporting to Professional, Statutory and Regulatory Bodies – PSURBS.

    Reconciliation with HESA Institution and campus identifiers, as well as the JISCMU API and Guardian Datablog Rosetta Stone spreadsheet

    By hosting course code data, and using it as scaffolding within a Linked Data cloud around HE courses, a valuable platform service can be made available to HEIs as well as commercial operations designed to support student choice when it comes to selecting an appropriate course and university.

  8. Several recent JISC project have started to explore the release of course related activity data on the one hand, and Linked Data approaches to enterprise wide data management on the other. What is currently lacking is national data-centric view over all HEI course offerings. UCAS has that data.

    Opening up the data facilitates rapid innovation projects within HEIs, and makes it possible for innovators within an HEI to make progress on projects that span across course offerings even if they don’t have easy access to that data from their own institution.

  9. Consumer services are also a possibility. As HEIs become more businesslike, treating students as customers, and paying customers at that, we might expect to see the appearance of university course comparison sites.

    CompareTheUni has had a holding page up for months – but will it ever launch? Uni&Books crowd sources module codes and associated reading links. Talis Aspire is a commercial reading list system that associates resources with module codes.

  10. Last year, I pulled together a few separate published datasets and through them into Google Fusion Tables, then plotted the results. The idea was that you could chart research ratings against student satisfaction, or drop out rates against the academic pay. [link ]

    Guardian datablog picked up the post, and I still get traffic from there on a daily basis… [link ]

  11. The JISC MOSAIC Library data challenge saw Huddersfield University open up book loans data associated with course codes – so you could map between courses and books, and vice versa (“People who studied this course borrowed this book”, “this book was referred to by students on this course”)

    One demonstrator I built used a bookmarklet to annotate UCAS course pages with a link to a resource page showing what books had been borrowed by students on that course at Huddersfiled University. [Link ]

  12. Enter something like compare the uni, but data driven, and providing aggregated views over data from universities and courses.
  13. To set the scene, the site needs to be designed with a user in mind. I see a 16-17 year old, sloughing on the sofa, TV on with the most partial of attention being paid to it, laptop or tablet to hand and the main focus of attention. Facebook chat and a browser are grabbing attention on screen, with occasional distractions from the TV and mobile phone.
  14. The key is course data – this provides a natural set of identifiers that span the full range of clearing based HE course offerings in the UK and allows third parties to build servies on this basis.

    The course codes also provide hooks against which it may be possible to deploy mappings across skills frameworks, e.g. SFIA in IT world. The course codes will also have associated JACS subject code mappings and UCAS search terms, which in turn may provide weak links into other domains, such as the world of books using vocabularies such as the Library of Congress Subject headings and Dewey classification codes.

  15. Further down the line, if we can start to associate module codes with course codes, we can start to develop services to support current students, or informal learners, by hooking in educational resources at the module level.
  16. Marketing can go several ways. For the data platform, evangelism into the HE developer community may spark innovation from within HEIs, most likely under the auspices of JISC projects. Platform data services may also be marketed to third party developers and innovators/entrepeneurs.

    Marketing of services built on top of the data platform will need to be marketed to the target audience using appropriate channels. Specialist marketers such as Campus Group may be appropriate partners here.

  17. The idea pitched is disruptive in that one of the major competitors is at first UCAS. However, if UCAS retains it’s unique role in university application and clearing, then UCAS will still play an essential, and heavily trafficked, role in undergraduate student applications to university. Course discovery and selection will, however, move away from the UCAS site towards services that better meet the needs of potential applicants. One then might imagine UCAS becoming a B2B service that acts as intermediary between student choice websites and universities, or even entertain a scenario in which UCAS is disintermediated and some other clearing mechanism instituted between universities and potential-student facing course choice portals.
  18. According to UCAS, between January and March 2010 “almost 5 million unique IP addresses accessed the UCAS subject search function” [link] In each of the last couple of years, the annual application/acceptance numbers have been of the order approx 500,000 students intake per year, on 600,000 applicants. If 10% of applicants and generate £5 per applicant, that’s £300k pa. £10 from 20% of intake, that’s £1M pa. £50 each from 40% is £10M. I haven’t managed to find out what the acquisition cost of a successful applicant is, or the advertising budget allocated to an undergraduate recruitment marketing campaign, but there are 200 or so HE institutions (going by the number of allocated HESA institution codes).

    For platform business – e.g. business model based around selling queries on linked/aggregated/mapped datasets. If you imagine a query returning results with several attributes, each result is a row and each attribute is a column, If you allow free access to x thousand query cells returned a day, and then charge for cells above that limit, you:
    Encourage wider innovation around your platform; let people run narrow queries or broad queries. License on use of data for folk to use on their own datastores/augmented with their own triples.
    Generate revenue that scales on a metered basis according to usage;
    – offer additional analytics that get your tracking script in third party web pages, helping train your learning classifiers, which makes platform more valuable.

    For a consumer facing application – eg a course choice site for potential appications is the easiest to imagine:
    – Short term model would be advertising (e.g. course/uni ads), affiliate fees on booksales for first year books? Seond hand books market eg via Facebook marketplace?
    – Medium term – affiliate for for prospectus application/fulfilment
    Long term – affiliate fee for course registration

  19. At the end of the day, if the data describing all the HE courses available in the UK is available as data, folk will be able to start to build interesting things around it…