Category: Stirring

Enter the Market – Course Data

I’m not at Dev8Ed this week, though I probably should be, but here’s what I’d have probably tinkered with had I gone – a recipe for creating a class of XCRI (course marketing data) powered websites to support course choice on a variety of themes and that could be used to ruthlessly and shamelessly exploit any and every opportunity for segmenting audiences and fragmenting different parts of the market for highly targeted marketing campaigns. So for example:

  • let’s start with something easy and obvious: (sic;-), maybe? Search for courses from Russell Group (research intensive) universities on a conservatively branded site, lots of links to research inspired resources, pre-emptively posted reading lists (with Amazon affiliate codes attached); then bring in a little competition, and set this site up as a Waitrose to the Sainsburys of, a course choice site based around the 1994 Group Universities (hmmm: seems like some of the 1994 Group members are deserting and heading off to join the Russell Group?); takes the Tesco ads for the Million+ group, maybe, and (University Alliance) the Morrisons(?) traffic. (I have no idea if these uni group-supermarket mappings work? What would similarly tongue-in-cheek broadsheet/tabloid mappings be I wonder?!). If creative arts are more your thing, there could be for the UKIAD folk, perhaps?
  • there are other ways of segmenting the market, of course. University groupings organise universities from the inside, looking out, but how about groupings based on consumers looking in? At, you know where the barrier is set, as you do with, whereas could be good for bottom of the market SEO. could help at clearing time (courses could be identified by looking at grade mappings in course data feeds), as could the slightly more upmarket And so on
  • National Student Survey data could also play a part in automatically partitioning universities into different verticals, maybe in support of FTSE-30 like regimes where only courses from universities in the top 30 according to some ranking scheme or other are included. NSS data could also power rankings of course. (Hmm… did I start to explore this for Course Detective? I don’t remember…Hmmm…)

The intention would be to find a way of aggregating course data from different universities onto a common platform, and then to explore ways of generating a range of sites, with different branding, and targeted at different markets, using different views over the same aggregated data set but similar mechanics to drive the sites.

PS For a little inspiration about building course comparison websites based around XCRI data, NSS data and KIS data, it may be worth looking at how the NHS does it (another UK institution that’s hurtling towards privatisation…): for example, check out NHS Choices hospitals near you service, or alternatively compare GPs.

PPS If anyone did start to build out a rash of different course comparison sites on a commercial basis, you can bet that as well as seeking affiliate fees for things like lead generation (prospectuses downloaded/mailed, open day visits booked (in exchange for some sort of ‘discount’ to the potential student if they actually turn up to the open day), registrations/course applications made etc) advertising would play a major role in generating site revenue. If a single operator was running a suite of course choice sites, it would make sense for them to look at how cross-site exploitation of user data could be used to track users across sites and tune offerings for them. I suspect we’d also see the use of paid placement on some sites (putting results to the top of a search results listing based on payment rather than a more quality driven ranking algorithm), recreating some of the confusion of the early days of web searchengines.

I suspect there’d also be the opportunity for points-make-prizes competitions, and other giveaways…

Or like this maybe?


[Disclaimer: the opinions posted herein are, of course, barely even my own, let alone those of my employer.]

Cognitive Waste and the Project Funding Bind

As I tweeted earlier today: “A problem with project funding is that you’re expected to know what you’re going to do in advance – rather than discover what can be done..”

This was prompted by reading a JISC ITT (Invitation to Tender) around coursedata: Making the most of course information – xcri-cap feed use demonstrators. Here’s an excerpt from the final call:

JISC is seeking to fund around 6-10 small, rapid innovation projects to create innovative, engaging examples that demonstrate the use of the #coursedata xcri-cap feeds (either directly, or via the JISC Aggregator API). These innovative examples will be shared openly through the JISC web site and events to promote the good practice that has been adopted.
13. The demonstrators could use additional data sources such as geolocation data to provide a mash-up, or may focus on using a single institutional feed to meet a specific need.
14. The demonstrators should include a clear and compelling use case and usage scenario.
15. The range of demonstrators commissioned will cover a number of different approaches and is likely to include examples of:
• an online prospectus, such as a specialist courses directory;
• a mobile app, such as a course finder for a specific geographical area;
• a VLE block or module, such as a moodle block that identifies additional learning opportunities offered by the host institution;
• an information dashboard, such as a course statistics dashboard for managers providing an analysis of the courses your institution offers mashed up with search trends from the institutional website;
• a lightweight service or interface, such as an online study group that finds peers based on course description;
• a widget for a common platform, such as a Google Gadget that identifies online courses, and pushes updates to the users iGoogle page.
16. All demonstrators should be working code and must be available under an open source licence or reusable with full documentation. Project deliverables can build on proprietary components but wherever possible the final deliverables should be open source. If possible, a community-based approach to working with open source code should be taken rather than just making the final deliverables available under an open source licence.
17. The demonstrators should be rapidly developed and be ready to use within 4 months. It is expected most projects would not require more than 30 – 40 chargeable person days.

In addition:

23. Funding will not be allocated to allow a simple continuation of an existing project or activity. The end deliverable must address a specific need that is accepted by the community for which it is intended and produce deliverables within the duration of the project funding.
24. There should be no expectation that future funding will be available to these projects. The grants allocated under this call are allocated on a finite basis. Ideally, the end deliverables should be sustainable in their own right as a result of providing a useful solution into a community of practice.

The call appears to be open to all comers (for example, sole traders) and represents a way of spending money on bootstrapping innovation around course data feeds using HEFCE funding, in a similar way to how the Technology Strategy money disburses money (more understandably?) to commercial enterprises, SMEs, and so on. (Although JISC isn’t a legal entity – yet – maybe we’ll start to see JISC trying to find ways in which it can start to act as a vehicle that generates returns from which it can benefit financially, eg as a venture funder, or as a generator of demonstrable financial growth?)

As with many JISC calls, the intention is that something “sustainable” will result:

22. Without formal service level agreements, dependency on third party systems can limit the shelf life of deliverables. For these types of projects, long term sustainability although always desirable, is not an expected outcome. However making the project deliverables available for at least one year after the end of the project is essential so opportunities are realised and lessons can be learned.

24. There should be no expectation that future funding will be available to these projects. The grants allocated under this call are allocated on a finite basis. Ideally, the end deliverables should be sustainable in their own right as a result of providing a useful solution into a community of practice.

All well and good. Having spent a shedload (technical term ;-) on getting institutions to open up their course data, the funders now need some uptake. (That there aren’t more apps around course data to date is partly my fault. The TSO Open Up Competition prize I won secured a certain amount of TSO resource to build something around course scaffolding code scaffolding data as held by UCAS (my proposal was more to do with seeing this data opened up as enabling data, rather than actually pitching a specific application…). As it turned out, UCAS (a charity operated by the HEIs, I think) were (still are?) too precious over the data to release it as open data for unspecified uses, so the prize went nowhere… Instead, HEFCE spent millions through JISC to get universities to open up course data (albeit probably more comprehensive than the UCAS data) instead…and now there’s an unspecified amount for startups and businesses to build services around the XCRI data. (Note to self: are UCAS using XCRI as an import format or not? If not, is HEFCE/JISC also paying the HEIs to maintain/develop systems that publish XCRI data as well as systems that publish data in an alternative way to UCAS?)

I think TSO actually did some work aggregating datasets around a, erm, model of the UCAS course data; so if they want a return on that work, they could probably pitch an idea for something they’ve already prepped and try to gt HEFCE to pay for it, 9 months on from when I was talking to them at their expense…

Which brings me in part back to my tweet earlier today (“A problem with project funding is that you’re expected to know what you’re going to do in advance – rather than discover what can be done..”), as well as the mantra I was taught way back when I was a research student, that the route to successful research bids was to bid to do work you had already done (in part because then you could propose to deliver what you knew you could already deliver, or could clearly see how to deliver…)

This is fine if you know what you’re pitching to do (essentially, doing something you know how to do), as opposed to setting out to discover what sorts of things might be possible if you set about playing with them. Funders don’t like the play of course, because it smacks of frivolity and undirectedness, even though it may be a deeply focussed and highly goal directed activity, albeit one where the goal emerges during the course of the activity rather than being specified in advance.

As it is, funders tend to fund projects. They tell bidders what they want, bidders tell funders back how they’ll do it (either something they’ve already done = guaranteed deliverable, paid for post hoc), or something they *think* they intend to do (couched in project management and risk assessment speak to mask the fact they don’t really know what’ll happen when they try to execute the plan, but that doesn’t really matter, because at the end of the day they have a plan and a set of deliverables against which they can measure (lack of) progress.) In the play world, you generally do or deliver something because that’s the point – you are deeply engaged in and highly focussed on whatever it is that you’re doing (you are typically intrinsically motivated and maybe also extrinsically motivated by whatever constraints or goals you have adopted as defining the play context/play world. During play, you work hard to play well. And then there’s the project world. In the project world, you deliver or you don’t. So what.

Projects also have overheads associated with them. From preparing, issuing, marking, awarding, tracking and reporting on proposals and funded projects on the fundrs’ side, to preparing, submitting, and managing the project on the other (aside from actually doing the project work – or at last, writing up what has previously been done in an appropriate way;-).

And then there’s the waste.

Clay Shirky popularised the notion of cognitive surplus to characterise creative (and often collaborative creative) acts done in folks’ free time. Things like Wikipedia. I’d characterise this use of cognitive surplus capacity as a form of play – in part because it’s intrinsically motivated, but also because it is typically based around creative acts.

But what about cognitive waste, such as arises from time spent putting together project proposals that are unfunded and then thrown away (why aren’t these bids, along with the successful ones, made open as a matter of course, particularly when the application is for public money from an applicant funded by public money?). (Or the cognitive waste associated with maintaining a regular blog… erm… oops…)

I’ve seen bids containing literature reviews that rival anything in the (for fee, paywall protected, subscription required, author/institution copyright waivered) academic press, as well as proposals that could be taken up, maybe in partnership, by SMEs for useful purpose, rather than academic partners for conference papers), to time spent pursuing project processes, milestones and deliverables for the sole reason that they are in the plan that was defined before the space the project was pitched in to is properly through engaging with it, rather than because they continue to make sense (if indeed they ever did). (And yes, I know that the unenlightened project manager who sees more merit in trying to stick to the project plan and original deliverables, rather than pivoting if a far more productive, valuable or useful opportunity reveals itself, is a mythical beast…;-).

Maybe the waste is important. Evolution is by definition wasteful process, and maybe the route to quality is through a similar sort of process. Maybe the time, thought and effort that goes into unsuccessful bids really is cognitive waste, bad ideas that don’t deserve to be shared (and more than that, shouldn’t be shared because they are dangerously wrong). But then, I’m not sure how that fits in with project funding schemes that are over-subscribed and even highly rated proposals (that would ordinarily receive funding) are rejected, whereas in an undersubscribed call (maybe because it is mis-positioned or even irrelevant), weak bids (that ordinarily wouldn’t be considered) get funding.

Or maybe cognitive waste arises from a broken system and broken processes, and really is something valuable that is being wasted in the sense of squandered?

Right – rant over, (no) (late)lunchtime over… back to the “work” thing, I guess…

PS via @raycorrigan: “Newton, Galileo, Maxwell, Faraday, Einstein, Bohr, to name but a few; evidence of paradigm shifting power of ‘cognitive waste'” – which is another sense of “waste” I hadn’t considered, which is waste (as in loss, or loss to an organisation) of good ideas through rejecting or not supporting the development of a particular proposal or idea..?

Mapping the Tesco Corporate Organisational Sprawl – An Initial Sketch

A quick sketch, prompted by Tesco Graph Hunting on OpenCorporates of how some of Tesco’s various corporate holdings are related based on director appointments and terminations:

The recipe is as follows:

– grab a list of companies that may be associated with “Tesco” by querying the OpenCorporates reconciliation API for tesco
– grab the filings for each of those companies
– trawl through the filings looking for director appointments or terminations
– store a row for each directorial appointment or termination including the company name and the director.

You can find the scraper here: Tesco Sprawl Grapher

import scraperwiki, simplejson,urllib

import networkx as nx

#Keep the API key [private - via
import os, cgi
    qsenv = dict(cgi.parse_qsl(os.getenv("QUERY_STRING")))

#note - the opencorporates api also offers a search:  companies/search

def getOCcompanyData(ocid):
    return ocdata

#need to find a way of playing nice with the api, and not keep retrawling

def getOCfilingData(ocid):
    print 'filings',ocid
    #print 'filings',ocid,ocdata
    #print 'filings 2',tmpdata
    while tmpdata['page']<tmpdata['total_pages']:
        print '...another page',page,str(tmpdata["total_pages"]),str(tmpdata['page'])
    return ocdata

def recordDirectorChange(ocname,ocid,ffiling,director):
    print 'ddata',ddata['fid'], table_name='directors', data=ddata)

def logDirectors(ocname,ocid,filings):
    print 'director filings',filings
    for filing in filings:
        if filing["filing"]["filing_type"]=="Appointment of director" or filing["filing"]["filing_code"]=="AP01":
            director=desc.replace('DIRECTOR APPOINTED ','')
        elif filing["filing"]["filing_type"]=="Termination of appointment of director" or filing["filing"]["filing_code"]=="TM01":
            director=desc.replace('APPOINTMENT TERMINATED, DIRECTOR ','')
            director=director.replace('APPOINTMENT TERMINATED, ','')

for entity in entities['result']:

The next step is to graph the result. I used a Scraperwiki view (Tesco sprawl demo graph) to generate a bipartite network connecting directors (either appointed or terminated) with companies and then published the result as a GEXF file that can be loaded directly into Gephi.

import scraperwiki
import urllib
import networkx as nx

import networkx.readwrite.gexf as gf

from xml.etree.cElementTree import tostring

scraperwiki.sqlite.attach( 'tesco_sprawl_grapher')
q = '* FROM "directors"'
data =


for row in data:
    if row['fdirector'] not in directors:
    if row['ocname'] not in companies:

scraperwiki.utils.httpresponseheader("Content-Type", "text/xml")


print tostring(writer.xml)

Saving the output of the view as a gexf file means it can be loaded directly in to Gephi. (It would be handy if Gephi could load files in from a URL, methinks?) A version of the graph, laid out using a force directed layout, with nodes coloured according to modularity grouping, suggests some clustering of the companies. Note the parts of the whole graph are disconnected.

In the fragment below, we see Tesco Property Nominees are only losley linked to each other, and from the previous graphic, we see that Tesco Underwriting doesn’t share any recent director moves with any other companies that I trawled. (That said, the scraper did hit the OpenCorporates API limiter, so there may well be missing edges/data…)

And what is it with accountants naming companies after colours?! (It reminds me of sys admins naming servers after distilleries and Lord of the Rings characters!) Is there any sense in there, or is arbitrary?

Tesco Graph Hunting on OpenCorporates

A quick lunchtime post on some thoughts around constructing corporate graphs around OpenCorporates data. To ground it, consider a search for “tesco” run on gb registered companies via the OpenCorporates reconciliation API.

{"result":[{"id":"/companies/gb/00445790", "name":"TESCO PLC", "type":[{"id":"/organization/organization","name":"Organization"}], "score":78.0, "match":false, "uri":""}, {"id":"/companies/gb/05888959", "name":"TESCO AQUA (FINCO1) LIMITED", "type":[{"id":"/organization/organization", "name":"Organization"}], "score":71.0, "match":false, "uri":""}, { ...

Some or all of these companies may or may not be part of the same corporate group. (That is, there may be companies in that list with Tesco in the name that are not part of the group of companies associated with a major UK supermarket.)

If we treat the companies returned in that list as one class of nodes in a graph, we can start to construct a range of graphs that demonstrate linkage between companies based on a variety of factors. For example, a matching address for a registered, post off box mediated, address in an offshore tax haven suggests there may be a weak tie at least between companies:

(Alternatively, we might construct bipartite graphs containing company nodes and address nodes, for example, then collapse the graph about common addresses.)

Shared directors would be another source of linkage, although at the moment, I don’t think OpenCorporates publishes directors associated with UK companies (I suspect that data is still commercially licensed?). However, there is associated information available in the OpenCorporates database already…. For example, if we look at the various company filings, we can pick up records relating to director appointments and terminations?

By monitoring filings, we can then start to build up a record of directorial involvement with companies? From looking at the filings, it also suggests that it would make sense to record commencement and cessation dates for directorial appointments…

There may also be weak secondary evidence linking companies. For example, two companies that file trademarks using the same agent have a weak tie through that agent. (Of course, that agent may be acting for two completely independent companies.)

If we weight edges between nodes according to the perceived strength of a tie and then lay out the graph in a way that is sensitive to the number of weight of edge connections between company nodes, we may be able to start mapping out the corporate structure of these large, distributed corporations, either in network map terms, or maybe by mapping geolocated nodes based on registered addresses; and then we can start asking questions about why these distributed corporate entities are structured the way they are…

PS note to self – OpenCorporates API limit with key: 1000/hr, 10k/day

Autodiscoverable Feeds and UK HEIs (Again…)

It’s that time of year again when Brian’s banging on about IWMW, the Instituional[ised?] Web Managers’ Workshop, and hence that time of year again when he reminds me* about my UK HE Feed Autodiscovery app that trawls through various UK HEI home pages (the ones on, rather than the one you get by searching for a uni name in Google;-)

* that is, tells me the script is broken and, by implication, gently suggests that I should fix it…;-)

As ever, most universities don’t seem to be supporting autodiscoverable feeds (neither are many councils…), so here are a few thoughts about what feeds you might link to, and why…

news feeds: the canonical example. News feeds can be used to pipe news around various university websites, and also syndicate content to any local press or hyperlocal news sites. If every UK HEI published a news feed that was autodiscoverable as such, it would be trivial to set up a UK universities aggregated newswire.

research announcements: I was told that one reason for putting out press releases was simply to build up an institutional memory/archive of notable events. Many universities run research newsletters that remark on awarded grants. How about a “funded research” feed from each university detailing grant awards and other research funding. Again, at a national level, this could be aggregated to provide a research funding newswire, as well as contribtuing data to local archives of research funding success.

jobs: if every UK HEI published a jobs/vacancies RSS feed, it would trivial to build an aggregator and let people roll their own versions of

events: universities contribute a lot to local culture through public talks and exhibitions. Make it easy for the local press and hyperlocal news sites to syndicate this info, and add events to their own aggregated “what’s on” calendars. (And as well as RSS, give ’em an iCal feed for your events.)

recent submissions to local repository: provide a feed listing recent submissions to the local research output/paper repository (and/or maybe a feed of the most popular downloads); if local feeds are you thing, the library quite possibly makes things like recent acquisition feeds available…

YouTube uploads: you might was well add an autodiscoverable feed to your university’s recent uploads on YouTube. If nothing else, it contributes an informal ownership link to the web for folk who care about things like that.

your university Twitter feed: if you’ve got one. I noticed Glasgow Caledonian linked to their Twitter feed through an autodiscoverable link on their university homepage.

tenders: there’s a whole load of work going on in gov at the moment regarding transparency as it relates to procurement and tendering. So why not get open with your procurement and tendering data, and increase the chances of SMEs finding out what you’re tendering around. If the applications have to go through a particular process, no problem: link to the appropriate landing page in each feed item.

energy data: releasing this data may well become a requirement in the not so far off future, so why not get ahead of the game, e.g. as Lincoln are starting to do (Lincoln U energy data)? If everyone was publishing energy data feeds, I’m sure DevCSI hackday folk would quickly roll together something like the aggregating service built by college student @issyl0 out of a Rewired State hack that pulls together UK gov department energy data: GovSpark

XCRI-CAP course marketing data feeds: JISC is giving away shed loads of cash to support this, so pull your finger out and get the thing published.

location data: got a KML feed yet? If not, why not? e.g. Innovations in Campus Mapping

PS the backend of my RSS feed autodiscovery app (founded: 2008) is a Yahoo pipe. Just because, I thought I’d take half an hour out to try and build something related on Scraperwiki. The code is here: UK University Autodiscoverable RSS feeds. Please feel free to improve or, fork it, etc. University homepage URLs are identified by scraping a page on the Universities UK website, but I probably should use a feed from the JISC Monitoring Unit (e.g. getting UK University location/contact data).

PPS this could be handy for some folk – the code that runs the talks@cam events site: (Thanks Laura:-) – does it do feeds nicely now?! Related: Keeping Up With Events, a quickly hacked app from my Arcadia project that (used to) aggregate Cambridge events feeds.)

Getting Access to University Course Code Data (or not… (yet…))

A couple of weeks or so ago, having picked up the TSO OpenUp competition prize for suggesting that it would be a Good Thing for UCAS/university course code data to be made available, I had a meeting with the TSO folk to chat over “what next?” The meeting was an upbeat one with a plan to get started as soon as possible with a scrape of the the UCAS website… so what’s happened since…?

First up – a reading of the UCAS website Terms and Conditions suggests that scraping is a no-no…

6. Intellectual property rights
e. Copying, distributing or any use of the material contained on the website for any commercial purpose is prohibited.
f. You may not create a database by systematically downloading substantial parts of the website

(In the finest traditions of the web, you aren’t allowed to deep link into the site without permission either: 6.c inks to the website are not permitted, other than links to the homepage for your personal use, except with our prior written permission. Links to the website from within a frameset definition are not permitted except with our prior written permission.)

So, err, I guess my link to the terms and conditions breaks those terms and conditions? Oops…;-) Should I be sending them something like this do you think?

As per your terms and conditions, (paragraph 6 c) please may I publish a link to your terms and conditions web page [ ] in a blog post I am writing that, in part, refers to your terms and conditions?

As a fallback, I put a couple of trial balloon FOI requests in to a couple of universities asking for the course names and UCAS course codes for courses offered in 2010/11, along with the search keywords associated with each course (doh! I did it again, deep linking into the UCAS site…)

PS Please may I also link to the page describing course search keywords [ ] ?

The first request went to the University of Southampton, in part because I knew that they already publish chunks of the data (as data) as part of their #opensoton Open Data initiative. (This probably means I was abusing the FOI system, but a point maybe needed to be made…?!;-) The second request was put in to the University of Bristol.

The requests were of the form:

I would be grateful if you could send me in spreadsheet, machine readable electronic form or plain text a copy of the course codes, course titles and search keywords for each course as submitted to UCAS for the 2010-2011 (October 2010) student entry.

If possible, would you also provide HESA subject category codes associated with each course.

So how did I get on?

Bristol’s response was as follows:

On discussion with our Admissions and Student Information teams, it appears that the University does not actually hold this data – it is held on a UCAS database. UCAS are not currently subject to the Freedom of Information Act (they will be in due course) but it may be worth talking to them directly to see if they are willing to assist.

And Southampton’s FOI response?

Course codes and titles may be found here: Keywords were not held by the University – you should inquire with UCAS ( HESA subject category codes may be found here:

So what did I learn?

  1. I don’t seem to have made it clear enough to Southampton that I wanted the the 2-tuple (course code, HESA code) for each course. So how should I have asked for that data (the response pointed me to the list of all HESA codes. What I wanted was, for each course code, the course code/HESA code pair).
  2. Generalising from an example of one;-), there seems to be a disconnect between FOI and open data branches of organisations. In my ideal world, the FOI person (an advocate for the person making the request) would also be on good terms with the Open Data team in the organisation, if not a data wrangler themselves. For data requests, the FOI person would make sure the data is released as open data as part of the process of fulfilling the request and then refer the person making the request to the open data site (see also: Open Data Processes – Taps, Query Paths/Audit Trails and Round Tripping). Southampton have part of this process already – the course data is in a PDF on the their site and I was referred to it. (Note that the PDF is not just any PDF – have a look at it! – rather than the spreadsheet, machine readable electronic form or plain text I requested, even though @cgutteridge had posted a link to the SPARQL opendata query for the course code/UCAS code information I’d requested as a reply to my FOI request on the WhatDoTheyKnow site.)
  3. Universities don’t necessarily have any record of the search keywords they associate with the courses they post on UCAS. The UCAS website suggests that (doh!) “[r]ecent analysis of unique IP address use of the UCAS Course Search indicates that the subject search is by far the most popular of the 3 search options currently available”, such that “[w]hen an applicant uses our Course Search facility to search for available courses, they can choose a keyword by which to search, known as the ‘subject search’.” Which is to say, universities have no local record of the terms they use to describe courses that are the the primary way of discovering their courses on UCAS? Blimey… (I wonder how much universities spend on Google AdWords for advertising particular courses on their own course prospectus websites and how they go about selecting those terms?)
  4. Asking for a machine readable “data as data” response has no teeth at the current time. I don’t know if the Protection of Freedoms bill clause that “extends Freedom of Information rights by requiring datasets to be available in a re-usable format” will change this? It seems like it might?

    (a) an applicant makes a request for information to a public authority in respect of information that is, or forms part of, a dataset held by the public authority, and
    (b) on making the request for information, the applicant expresses a preference for communication by means of the provision to the applicant of a copy of the information in electronic form, the public authority must, so far as reasonably practicable, provide the information to the applicant in an electronic form which is capable of re-use.

  5. So what next? UCAS is a charity that appears to be operated by, for, and on behalf of UK Higher Education (e.g. UCAS Directors’ Report and Accounts 2009). Whilst not FOIable yet, it looked set to become FOIable from October 2011 (Ministry of Justice: Greater transparency in Freedom of Information), though I haven’t been able to find the SI and commencement date that enact this…?). IF it does become FOIable, we may be able to get the data out that way (although memories of the battle between open data advocates and the Ordnance Survey come to mind…) Hopefully, though, we’ll be able to get the data open by more amicable means before then…:-)

    PS a couple of other things that I’ve been dipping into relating to this project. Firstly, the UCAS Business Plan 2009-2012 (doh!):

    PPS Please may I also link to your Corporate Business Plan 2009-2012 [ ]

    Secondly, the Cabinet Office’s “Better Choices: Better Deals” strategy document [PDF], which as well as its “MyData” right to personal data initiative, also encourages business to put their information (and data…) to work. Whether or not you agree that more information may help to make for better choices from potential students, or that comparison sites have a role to play in this, the UK government appears to believe it and looks set to support the development of businesses operating in this area. For example:

    Effective consumer choices are also important in the public sector – such as decisions about what and where to study.
    However, unlike in private markets, public services are generally:
    ● Free at the point of delivery, so prices do not give us clues about quality or popularity.
    ● Not motivated by profits, so there is little incentive to highlight differences and encourage switching.
    ● Supplied under a universal service obligation, such that they serve a particularly broad range of users, from the very informed to the highly vulnerable.
    In the same way that comparison and feedback sites have developed for private markets, some choice-tools have already emerged for public services. For example, parents and prospective students can use league tables to compare school and university performance, while patients can access websites comparing waiting times for treatments across different healthcare providers, and feedback from fellow consumers about the performance of a local GP practice. Their role is likely to become more important in future as public service markets are opened up and there is scope for further choice-tools to be developed [Better Choices: Better Deals, p. 32]

    If you’re looking to put a bid or business plan together based on using public data as a basis for comparison services, the Better Choices document has more than a few quotable sections;-)

    [Related: Course Detective metasearch/custom search across UK University prospectus websites]

Predictive Ads…? Or Email Address Targeted Advertising…?!

As I get was getting increasingly annoyed by large flashing display ads in my feedreader this morning, the thought suddenly occurred to me: could Google serve me ads on third party sites based on my unread Gmail emails?

That is, as I check my feeds before my email in a morning, could I be seeing ads that foreshadow the content of the email I’ve been ignoring for way too long? Or could I receive ads that flag the content of my Priority Inbox messages?

Rules regarding sensitivity and privacy would have to be carefully thought through,m of course. Here’s how they currently stand regarding contextual ads delivered in Gmail (More on Gmail and privacy: Targeted ads in Gmail):

By offering Gmail users relevant ads and information related to the content of their messages, we aim to offer users a better webmail experience. For example, if you and your friends are planning a vacation, you may want to see news items or travel ads about the destination you’re considering.

To ensure a quality user experience for all Gmail users, we avoid showing ads reflecting sensitive or inappropriate content by only showing ads that have been classified as “Family-Safe.” We also avoid targeting ads to messages about catastrophic events or tragedies. [Google’s emphasis]

[See also: Ads in Gmail and your personal data Share Comment]

Not quite as future predictive as gDay™ with MATE™ that lets you “search tomorrow’s web today” and “[discover] content on the internet before it is created”, but almost…!

It’s also a step on the road to Eric Schmidt’s dream of providing you with results even before you search for them. (For a more recent interview, see Google’s Eric Schmidt predicts the future of computing – and he plans to be involved.)

Here’s another, more practical(?!) thought – suppose Google served me headers of Priority Inbox email messages that were also marked as urgent through Adwords ads, in a full-on attempt to try to attract my attention to “really important” messages?! “Flashmail” messages delivered through the Adwords network… (I can imagine at least one course manager who I suspect would try to contact me via ads when I don’t pick up my email! ;-)

Searching the internet of things may still be a little way off though….

PS thinking email address targeted ads (mailads?) through a bit more, here are a couple of ways of doing it that immediately come to mind. Suppose I want to target an ad at

1) Adwords could place that ad in my GMail sidebar; (I think they’d be unlikely to place ads within emails, even if clearly marked, because this approach has been hugely unpopular in the past (it also p****s me off in feeds ); that said, Google has apparently started experimenting with (image based) display ads in gmail;

2) Adwords could place the ad on a third party site if the Goog spots me via a cookie and sees I’m currently logged in to Google, for example, with the email address.

As Facebook gets into the universal messaging game, email address based ad targeting would also work there?

PPS interesting – the best ads act as content, so maybe ads could be used to deliver linked content? Twitter promoted tweets – the AdWords for live news?. Which reminds me, I need to work up my bid for using something like AdWords to deliver targeted educational content.