I note a pre-announced intention from the Justice Data Lab that they will publish “[t]ailored reports pertaining to the re-offending outcomes of services or interventions delivered by organisations who have requested information through the Justice Data Lab. Each report will be an Official Statistic.
If you haven’t been keeping up, the Justice data lab is a currently free, one year pilot scheme (started April 2013) in which “a small team from Analytical Services within the Ministry of Justice (the Justice Data Lab team) will support organisations that provide offender services by allowing them easy access to aggregate re-offending data specific to the group of people they have worked with” [User Journey].
Here’s how the user journey doc describes the process…
…and the methodology:
which is also described in the pre-announcement doc as follows:
Participating organisations supply the Justice Data Lab with details of the offenders who they have worked with, and information about the services they have provided. As standard the Justice Data Lab will supply aggregate one year proven re-offending rates for that group, and that of a matched control group of similar offenders. The re-offending rates for the organisation’s group and the matched control group will be compared using statistical testing to assess the impact of the organisation’s work on reducing re-offending. The results will then be returned to the organisation in a clear and easy to understand format, with explanations of the key metrics, and any caveats and limitations necessary for interpretation of the results.
The pre-announcement suggests that participating organisations will not only receive a copy of the report, but so will the public… The rationale:
The Justice Data Lab pilot is free at the point of service, paid for through the Ministry of Justice budget. The Ministry of Justice therefore has a duty to act transparently and openly about the outcomes of this initiative. It is anticipated that by making this information available in the public domain, organisations that work with offenders will have a greater evidence base about what works to rehabilitate offenders, and ultimately cut crime.
(Nice to see the MoJ believes in transparency. Shame that doesn’t go as far as timely spending data transparency, but I guess we can’t have it all…)
I think it’s worth taking notice of this pre-announcement for few reasons:
- are such data release mechanisms the result of lobbying pressure? Other government departments have datalabs, such as the HMRC datalab. HMRC recently ran a consultation on the release of VAT registration information as opendata, although concerns have been raised that this may just be a shortcut way of releasing company VAT registration data to credit rating agencies and their ilk…?, so it seems as if they are looking at what data they may be able to open up, and how, maybe in response to lobbying requests from corporate players who don’t want to have to (pay to) collect the data themselves…? Who might have lobbied the MoJ for the results of MoJ datalab requests to be opened up as public data, I wonder?
- are the results gameable, or might they be used as a tool to “attack” a group that is the basis of a research request? For example, can third parties request that the MoJ datalab runs an analysis on the effectiveness of a programme carried out by another party, such as, I dunno, G4S?
- the ESRC is in the process of a multi-stage funding round that will establish a range of research data centres. The first round, to establish a series of Administrative Data Research Centres has now closed (who won?!) and the second – for Business and Local Government Data Research Centres – is currently open. (Phase three will focus on “Third Sector data and social media data”…wtf?!) To what extent might any of the funded research data centres require that summaries of analyses run using datasets they control access to are released as public open data?
Just by the by, I note here the RCUK Common Principles on Data Policy:
Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.
Institutional and project specific data management policies and plans should be in accordance with relevant standards and community best practice. Data with acknowledged long-term value should be preserved and remain accessible and usable for future research.
To enable research data to be discoverable and effectively re-used by others, sufficient metadata should be recorded and made openly available to enable other researchers to understand the research and re-use potential of the data. Published results should always include information on how to access the supporting data.
RCUK recognises that there are legal, ethical and commercial constraints on release of research data. To ensure that the research process is not damaged by inappropriate release of data, research organisation policies and practices should ensure that these are considered at all stages in the research process.
To ensure that research teams get appropriate recognition for the effort involved in collecting and analysing data, those who undertake Research Council funded work may be entitled to a limited period of privileged use of the data they have collected to enable them to publish the results of their research. The length of this period varies by research discipline and, where appropriate, is discussed further in the published policies of individual Research Councils.
In order to recognise the intellectual contributions of researchers who generate, preserve and share key research datasets, all users of research data should acknowledge the sources of their data and abide by the terms and conditions under which they are accessed.
It is appropriate to use public funds to support the management and sharing of publicly-funded research data. To maximise the research benefit which can be gained from limited budgets, the mechanisms for these activities should be both efficient and cost-effective in the use of public funds.
In The Curse of Our Time – Tracking, Tracking Everywhere, I noted how the likes of Google set up cookie matching services that allow advertisers to reconcile their cookies with Google’s:
The Cookie Matching Service enables a buyer to associate two kinds of data:
- the cookie that identifies a user within the buyer domain, and
- the doubleclick.net cookie that identifies the user for Google. (We share a buyer-specific encrypted Google User ID for buyers to match on.)
The data structure that the buyer uses to maintain these associations is called a Match Table. While the buyer is responsible for building and maintaining match tables, Google can host them.
With an RTB [real-time bidding] application, the buyer can bid on impressions where the user has a specific Google User ID, and can use information associated with the Google User ID as criteria in a bid for an ad impression.
But it seems that this isn’t enough for Google. It actually gets worse… A USA Today story suggests Google is exploring the idea of an AdID, an identifier that it would share with advertisers to uniquely identify eyeballs rather than them having to use a range of alternative third party user tracking services.
How AdID would actually work (if indeed it is ever comes to pass) is not explained, although a post on the AdExchanger blog – What The Google AdID Means For Ad Tech – comes up with possible mechanism: Google uniquely identifies users (presumably using cookies and authenticated user credentials (topped up with a little bit of browser fingerprinting, I wonder?)) then provides an advertiser with a hashed version of the ID. The hashed identifier means advertisers can’t share information with each other.
The Google AdID service seems like it would be offered as an alternative to tracking users using third party services that use their own third party cookies, with a user tracking system that offers more effective identity tracking techniques (such as a logged in Google user id). Which is to say, Google wants to replace third party cookie based tracking services with it’s own (logged in user + cookies + browser fingerprinting + etc etc) user tracking service? Or have I misinterpreted all this?
One AdID to rule them all, One AdID to find them,
One AdID to bring them all and in the darkness bind them
In the Land of Google where the Shadows lie.
PS by the by, I notice in a post on Author information in search results, that “If you want your authorship information to appear in search results for the content you create, you’ll need a Google+ Profile with a good, recognizable headshot as your profile photo. Then, verify authorship of your content by associating it with your profile using either of the methods below.” Ah ha… so you agree to give the Goog a good photo of yourself that it can use in it’s face matching algos, such as in the sort of thing that could be used to unlock your phone. Good. Not. Will faceID play a part of AdID, I wonder? With a gaze tracking feedback loop? Or maybe Google will be getting more into the tracked user, outside advertising market?
PPS This then also brings back to mind the face tracking approaches mentioned in The Curse of Our Time – Tracking, Tracking Everywhere…
PPPS By the by, it seems the Culture, Media and Sport Committee have no problem with online targeted ads:
85. Inevitably public funding is under pressure, a point illustrated by cuts in the budget of Arts Council England. Given the essential role of public funding in sustaining the wider creative economy, it is crucial that adequate resources are available. Of course, the private sector should be encouraged as much as possible to invest in the creative industries. One good example is provided by advertising, which not only provides a major source of funding but is a creative industry in its own right. Evidence from the Advertising Association points to advertising as “a major creative industry and a critical source of funding for other creative industries”. The Advertising Association’s evidence goes on to express deep concern about draft EU Data Protection Regulation “which could damage direct marketing, internet advertising, and the UK economy both off and online”. Increasing use is being made of personal data to target online advertising better. While concerns around this have prompted reviews of data protection legislation, we do not think the targeting of appropriate advertising—essential to so many business models —represents the greatest threat to privacy. [original emphasis]
You probably can’t help but have noticed (in the EU at least), that website operators seem keen to gain your permission to pop “cookies” into your browser. Cookies are tiny computer files that a website can use to store information about you on your own computer. To prevent nasty people doing nasty things, the security policies operated by your browser try to ensure that only websites that write a cookie can read them back.
If enough people adopt a particular third party service, that service may be able to pick up quite a good idea about the range of sites you visit. Google’s various ad’n'analytics services in principle allow it to track you across a wide range of sites, because those services are so widely used, though the extent to which Google does or does not fuse data from the cookies associated with those various services may be moot…
One thing I hadn’t realised (or maybe, hadn’t really thought about) before was brought to my attention when something else that was new to me crossed my radar the other day: real time bidding on web adverts, the architecture for which is broadly descibed by Shuai Yuan, Jun Wang, Xiaoxue Zhao in their paper Real-time Bidding for Online Advertising: Measurement and Analysis as follows:
The model is crudely this: when you visit a web site, the publisher alerts the advertisers that someone has landed on the webpage. Through various cookie machinations, the publisher (and/or the advertiser) may be able to identify you, or certain things about you, from the various cookies on your machine. The advertisers decide what you’re worth and bid to place the advert. The publisher accepts a bid from one of the advertisers and pops the ad into the page you’re visiting. Sort of. (The publisher in this case is more likely to be an ad marketplace/broker, rather than the webpage publisher.)
So that was new to me – realtime bidding. The world’s gone mad. Anyway. As a result of that, I suddenly appreciated the creepy bit in the image above, in step 4: “advertisers choose to buy 3rd party data optionally”. That is, advertisers – in real time – may buy cookie mediated information about people who are in the process of loading a particular web page – in order to work out a bid price for placing an ad within that page to present to that person. Personal advertising, in real time. If data from other (non-web) sources can be added into the mix, perhaps because someone has been uniquely identified, then presumably all the better… for example.
To help create a better picture of the person who is actually opening up a web page, and to piece together all those fractional bits of information that separate web domains can place into your browser through cookies they – and only they – can read and write – “cookie matching” services, such as the cookie matching service run by Google’s DoubleClick Ad Exchange provide a means by which various parties can pool together, or sell between each other, what they know about an individual from the cookies they have independently set on that individual’s webpage. (For a description of one recipe for matching cookies, see SSP to DSP Cookie-Synching Explained.)
I guess I knew this happened anyway (it’s part of the basis for ad retargeting – aka ads that follow you round the web), but I hadn’t realised quite how sharey-sharey, selly-selly and real time it all was.
So we’re being tracked and info about us being sold in real-time as we traipse around the web. But we know that anyway, and we don’t seem to let it bother us.
How about real world tracking, though? Are we as happy being tracked as we walk around in physical space too? It seems so – and the technology appears to be getting so mundane… Through my feeds yesterday I was lead to MFlow Journey, a product of a company not so subtly called Human Recognition Systems, that uses video surveillance to capture and follow “anonymous” faces to track human traffic flows through airports. Human tracking is nothing new of course – your mobile operator can track your phone as easy as peasy can be, and if you have wifi enabled on any of the devices you’re carrying around with you, anyone who cares to can track you too. With the click of a button, apparently (review of a typical Euclid analytics dashboard). (See also: New York Times, Attention, Shoppers: Store Is Tracking Your Cell). Apple seems to be doing it’s bit to make retail centre tracking easier too.
So that’s faces and phones… number plates can be trivially tracked too of course. Here’s a recent (January 2013) ACPO report on The police use of Automatic Number Plate Recognition (the vignettes at the start of the report are illustrative of just what the millions of rows of data in the database allow you to pick out about a particular individual; other operational examples are described in this IPPC Independent Investigation into the use of ANPR in Durham, Cleveland and North Yorkshire from 23 – 26 October 2009 (summarising press release); see also the ACPO 2009 Practice Advice on the management and Use of Automatic Number Plate Recognition, the National ACPO ANPR Standards (NAAS) v.4.12 Nov 2011 and this Memorandum of Understanding to Support Access to ANPR Data, v.2 Feb 2011. A recent bollocking from the ICO (Police use of ‘Ring of Steel’ is disproportionate and must be reviewed) suggests that popping ANPR cameras on all roads in and out of a town is just not on, but I guess this is limited to police deployed cameras, and doesn’t necessarily address mosaic pictures that you can build up from piecing together ANPR hits wherever you can pick them up from…
…because as well as ANPR systems operated by the police, ANPR is widely used by private companies (though I’m not sure about the extent to which they do, or may be obliged to, share their logs or data collection facilities with the police?) For an idea of what sorts of ANPR “solutions” are available, here’s a list of approved car parking operators with some handy metadata that shows whether they use ANPR or not.
Camera surveillance is just not limited to ANPR systems of course, as any precinct bench loitering yoofs will be able to tell you. Just what is and isn’t deemed acceptable generally is described by the recent (August 2013) surveillance camera code of practice (press release).
Hey ho – it’s got me wondering now what other pieces of the panopticon are already in place?
A couple of weeks ago I spotted a BBC news article announcing that a University launches online course with TV show:
In what is being claimed as the biggest ever experiment in “edutainment”, a US television company is forming a partnership with a top-ranking Californian university to produce online courses linked to a hit TV show.
This blurring of the digital boundaries between academic study and the entertainment industry will see a course being launched next month based on the post-apocalypse drama series the Walking Dead.
Television shows might have spin-off video games or merchandising, but this drama about surviving disaster and social collapse is now going to have its own university course.
The OU has supplemented courses with material from TV broadcasts for several decades, and has also wrapped factual programming with OU courses. We’ve even commissioned drama pieces that have been woven into OU courses. But something about wrapping Hollywood hype also seemed familiar… and then I remembered Hollywood Science. But it’s not available on iPlayer, unfortunately, and I don’t think it went to DVD either…which makes this all something of a non-post!
Way back when, when I was full of hope for social feed architectures constructed out of RSS and Atom content syndication feeds, I used to advocate the use of Yahoo Pipes as a means by which folk could start to develop their own content wrangling solutions. At one point, I even started dabbling with the idea of doing a simple recipe book, something I might even be able to make a bit of pin money from. But a completer finisher I am not, so…
Anyway – as the summer break turns into email nightmare catch-up, and I dream of a life not this one, I started pondering the recipe book idea again. Flicking through the Pipes related pages I’ve posted, and some ideas I never got round to adding in, I noticed that many of the recipes that I’d sketched with over the years are now defunct because the open and accessible technologies they were built on have been closed off.
So for example, using Twitter search feeds (JSON, I think, though RSS used to be an alternative too?) for mapping tweets, or discovering colocation communities. Or a twitter to audio pipe; or a pipe for serendiptitously discovering content related to a Twitter stream; and so on…
These pipes are not even pining now – they’re dead; Twitter gave up on RSS/Atom, opting for JSON instead; and while this wasn’t a problem – Pipes handle JSON just as well as XML based syndication feeds – the addition of authentication as a precursor to accessing Twitter data kills off the easy flow access to the data that Yahoo Pipes made such good use of.
Authentication also killed off a whole range of Amazon related mashups (remember mashups? I used to play with what used to be called mashups all the time;-): an Amazon Book Search Pipe, for example, or Looking Up Alternative Copies of a Book on Amazon, via ThingISBN; or even Amazon Reviews from Different Editions of the Same Book.
I also seem to remember making use of Amazon Listmania lists – for example, in support of the feed powered StringLE (String’n'Glue Learning Environment) riff on disaggregated MIT courseware using RSS feeds – although there again, I note that RIP Amazon Listmania.
Way back when, when the web was still opening up, services like Amazon – and then Twitter – help me cut my teeth on wrangling with web tech and near friction free information flows. Those services grew up, closed themselves off (or at least, added more friction than I care to work around). And just as I gave up playing with Amazon – and ceased taking an interest in pondering the flow of book information from Amazon sources – when authentication hit, so too I’ve now given up on playing with Twitter data (and as a result, cut down on my Twitter usage too; I don’t really care for it as much as an information space any more).
Such is life, I guess. The web has moved on, and I have got stuck. So maybe I need to move on too? Offers…?
PS see also Google Lock-In Lock-Out
A few more bits and pieces around the possible distribution and application of open public data (that is, openly licensed data released by public bodies):
- Bills before Parliament – Education (Information Sharing) Bill 2013-14: although this is a private member’s bill, explanatory notes have been prepared by prepared by the Department for Education. The bill allows for “student information of a prescribed description” to be made available to a “prescribed person” or “a person falling within a prescribed category”. If the bill goes through, keeping tabs on these prescriptions will be key to seeing how this might play out.
As mentioned in my Rambling Round-Up of Some Recent #OpenData Notices from August, the HMRC is consulting on opening up access to VAT records. And through the post this week, I received a letter from the NHS regarding the sharing of data within the NHS via Summary Care Records, although this appears to be more to do with data sharing within the NHS on a case-by-case basis, rather than sharing of bulk datasets for analysis/research and/or business development. So outbreaks of planned sharing are appearing all over the place. I’m not sure what the best way of tracking such initiatives is though?
I haven’t really been tracking private members’ bills either (except the Supermarket Pricing Information Bill 2012-13 that never went anywhere!), and I’m not really sure what they signal, but some of them do make me a bit twitchy. Like the currently proposed Collection of Nationality Data Bill that will “require the collection and publication of information relating to the nationality of those in receipt of benefits and of those to whom national insurance numbers are issued.” Or the Face Coverings (Prohibition) Bill 2013-14, whereby “a person wearing a garment or other object intended by the wearer as its primary purpose to obscure the face in a public place shall be guilty of an offence.” As discussions regarding privacy and anonymity on the web ebb and flow, it’s interesting to see how they’re tracked “IRL”. If a space is public, do you have any right to privacy or anonymity?
- ESRC Pre-call: Business and Local Government Data Research Centres – Big Data Network Phase 2:
The ESRCs Big Data Network will support the development of a network of innovative investments which will strengthen the UKs competitive advantage in Big Data. The core aim of this network is to facilitate access to different types of data and thereby stimulate innovative research and develop new methods to undertake that research. This network has been divided into three phases.
- Phase 1 of the Big Data Network the ESRC has invested in the development of the Administrative Data Research Network (ADRN) which will provide access to de-identified administrative data collected by government departments for research use
- Phase 2, which is the focus of this pre-announcement, will focus primarily on business data and local government data
- Phase 3, further details of which will be released in the Autumn, will focus primarily on third sector data and social media data
- Progress continues on the smart meter roll out program, with huge chunks of money being lined up for a few lucky companies (Government Selects Favourites For The Smart Meter Roll-Out). See also the Energy and Climate Change Select Committee inquiry – “Smart meter roll-out” and their Smart meter roll out report. Whilst the drivers are presumably supposedly related more efficient energy management, there are plenty of surveillance opportunities arising! Whilst not public data, as such, the availability (and sharing with data aggregators) of smart meter data does form part of the government’s #midata programme (around which the current strategy appears to be “the less said the better”…)
- Maybe of interest to hardcore openspending data geeks, Local Audit and Accountability Bill 2013-14 has made its way from the Lords into the Commons. Schedule 9 introduces regulations around data matching, described as “an exercise involving the comparison of sets of data to determine how far they match (including the identification of any patterns and trends)”, although “data matching exercise[s] may not be used to identify patterns and trends in an individual’s characteristics or behaviour which suggest nothing more than the individual’s potential to commit fraud in the future”. A code of practice is also required. The power “is exercisable for the purpose of assisting in the prevention and detection of fraud” although the schedule may be amended in order to assist: “a) in the prevention and detection of crime (other than fraud), (b) in the apprehension and prosecution of offenders, and (c) in the recovery of debt owing to public bodies”.
Schedule 11 covers the Disclosure of Information. Where an auditor obtains information from a public body “[a] local auditor, or a person acting on the auditor’s behalf, may also disclose information to which this Schedule applies except where the disclosure would, or would be likely to, prejudice the effective performance of a function imposed or conferred on the auditor by or under an enactment”. I’m not sure to what extent such information might be requestable from the local auditor though?
I have to admit, I’m losing track of all these data and information related laws. And I guess I should also admit that I don’t really understand what any of them actually mean, either…!;-)
Reading Game Analytics: Maximizing the Value of Player Data earlier this morning (which I suggest might be a handy read if you’re embarking on a learning analytics project…) I was struck by the mention of “player dossiers”. A Game Studies article from 2011 by Ben Medler- Player Dossiers: Analyzing Gameplay Data as a Reward describes them as follows:
Recording player gameplay data has become a prevalent feature in many games and platform systems. Players are now able to track their achievements, analyze their past gameplay behaviour and share their data with their gaming friends. A common system that gives players these abilities is known as a player dossier, a data-driven reporting tool comprised of a player’s gameplay data. Player dossiers presents a player’s past gameplay by using statistical and visualization methods while offering ways for players to connect to one another using online social networking features.
Which is to say – you can grab your own performance and achievement data and then play with it, maybe in part to help you game the game.
The Game Analytics book also mentioned the availability of third party services built on top of game APIs that let third parties build analytics tools for users that are not otherwise supported by the game publishers.
What I started to wonder was – are there any services out there that allow you aggregate dossier material from different games to provide a more rounded picture of your performance as a gamer, or maybe services that homologate dossiers from different games to give overall rankings?
In the learning analytics space, this might correspond to getting your data back from a MOOC provider, for example, and giving it to a third party to analyse. As a user of MOOC platform, I doubt that you’ll be allowed to see much of the raw data that’s being collected about you; I’m also wary that institutions that sign up to MOOC platforms will also get screwed by the platform providers when it comes to asking for copies of the data. (I suggest folk signing their institutions up to MOOC platforms talk to their library colleagues, and ask how easy it is for them to get data, (metadata, transaction data, usage data etc etc) out of the library system vendors, and what sort of contracts got them into the mess they may admit to being in.)
(By the by, again the Game Analytics book made a useful distinction – that of viewing folk as customers, (i.e. people you can eventually get money from), or as players of the game (or maybe in MOOC land, learners). Whilst you may think of yourself as a player (learner), what they really want to do is develop you as a customer. In this respect, I think one of the great benefits of the arrival of MOOCs is that it allows us to see just how we can “monetise” education and let’s us talk freely and, erm, openly, in cold hard terms about the revenue potential of these things, and how they can be used as part of a money making/sales venture, without having to pretend to talk about educational benefits, which we’d probably feel obliged to do if we were talking about universities. Just like game publishers create product (games) to make money, MOOCspace is about businesses making money from education. (If it isn’t, why is venture capital interested?))
Anyway, all that’s all by the by, not just the by the by bit: this was just supposed to be a quick post, rather than a rant, about how we might do a little bit to open up part of the learning analytics data collection process to the community. (The technique generalises to other sectors…) The idea is built on appropriating a technology that many website publishers use to collect data, the third party service that is Google Analytics (eg from 2012, 88% of Universities UK members use Google Analytics on their public websites). I’m not sure how many universities use Google Analytics to track VLE activity though? Or how many MOOC operators use Google Analytics to track activity on course related pages? But if there are some, I think we can grab that data and pop it into a communal data pool; or grab that data into our own Google Account.
So how might we do that?
That’s all a rather roundabout way of saying we can quite easily write extensions that change the behaviour of a web page. (Hmm… can we do this for mobile devices?) So what I propose – though I don’t have time to try it and test it right now (the rant used up the spare time I had!) – is an extension that simply replaces the Google Analytics tracking code with another tracking code:
- either a “common” one, that pools data from multiple individuals into the same Google Analytics account;
- or a “personal” one, that lets you collect all the data that the course provider was using Google Analytics to collect about you.
(Ideally the rewrite would take place before the tracking script is loaded? Or we’d have to reload the script with the new code if the rewrite happens too late? I’m not sure how the injection/replacement of the original tracking code with the new one actual takes place when the extension loads?)
Another “advantage” of this approach is that you hijack the Google Analytics data so it doesn’t get sent to the account of the person whose site you’re visiting. (Google Analytics docs suggest that using multiple tracking codes is “not supported”, though this doesn’t mean it can’t be done if you wanted to just overload the data collection (i.e. let the publisher collect the data to their account, and you just grab a copy of it too…).
(An alternative, cruder, approach might be to create an extension that purges Google Analytics code within a page, and then inject your own Google Analytics scripts/code. This would have the downside of not incorporating the instrumentation that the original page publisher added to the page. Hmm.. seems I looked at this way back when too… Collecting Third Party Website Statistics (like Yahoo’s) with Google Analytics.)
All good fun, eh? And for folk operating cMOOCs, maybe this represents a way of tracking user activity across multiple sites (though to mollify ethical considerations, tracking/analytics code should probably only be injected onto whitelisted course related domains, or users presented with a “track my activity on this site” button…?)