Archive for the ‘Policy’ Category
Some time ago, in the post Using Aggregated Local Council Spending Data for Reverse Spending (Payments to) Lookups, I described a way of looking at local council spending data based on how much different councils spent with each other.
This technique generalises within and across sectors, so for example we could look at how hospitals spend money with each other, or how police authorities spend money with each other. In this way, we can get a picture of how public bodies buy -and sell – services off each other. The mappings don’t have to relate to spend, either – we could equally well use this sort of model to see how hospitals transfer patients to one another, or how mental health or social care services offer out-of-area cover to each other, or how councils and housing trusts manage transfers between each other.
The insight that lets us produce this sort of view is that we have entities of a particular sort (hospitals, for example, or local councils), entering into transactions with other entities of the same sort. If these sorts of entity all operate under the same transparency rules, a requirement to publish outgoing (spend) transactions, for example, then we can recreate incoming (receipt) transactions from each entity of the same sort. For example, if local councils are required to publish details of spend over £x, then we can also learn how much councils received from other local councils by means of transactions over £x.
As the UK Government at least seems hell bent on getting markets established in the delivery of public services, markets that can include private companies, then we are faced with a possible asymmetry in transparency information.
The public should be able to hold local councils to account about the services they provide. To do this, people need information about what decisions local councils are taking, and how local councils are spending public money.
And from the NHS:
As part of the government’s commitment to greater transparency, there is a requirement to publish online each NHS organisation’s expenditure over £25,000. In accordance with the requirement NHS Direct publish this on the basis of payments made in each calendar month.
For example, if hospital A buys significant services off hospital B, and must report that spend under transparency legislation, we can build up a picture not only relating to A’s spend, but also B’s sale of services, because A’s data relating to spend with B is openly available; which means B’s receipts from A are also available. (In this example, if items can be itemised as less than £25k per item, then this form of reporting under transparency guidelines is not required.)
If hospital A now buys service of company C, then we can look up spend from hospital A to get a picture of how much public money is flowing out to the private sector and into company C. That is, we can get an idea of company C’s receipts from openly published hospital spending data. (Of course, games could be played with itemisation – 10 treatments at £3k a treatment would result in a ‘must declare’ spend of £30k on the course of treatment, but an undeclarable £3k per treatment if billing is organised that way.)
But what if company C buys services off hospital B (maybe even subcontracting services it was contracted to deliver by hospital A)? If the spend data of company C is not subject to transparency requirements, and the receipt data from the hospital is not publicly available, we lose sight of how money is being spent within and across the public service.
Whilst private companies may balk at being required to publish details of their own spending data, we might still be able to recreate a picture of their spend with public services by requiring public bodies to also publish receipts data, along with the current requirement to publish spend data?
A jumbled collection of recent clips and snippets, that feel to me as if they’re pieces of the same jigsaw…
- An article in The Atlantic on Obscurity: A Better Way to Think About Your Data Than ‘Privacy’:
…”privacy” is an over-extended concept. It grabs our attention easily, but is hard to pin down. Sometimes, people talk about privacy when they are worried about confidentiality. Other times they evoke privacy to discuss issues associated with corporate access to personal information. Fortunately, obscurity has a narrower purview.
Obscurity is the idea that when information is hard to obtain or understand, it is, to some degree, safe. Safety, here, doesn’t mean inaccessible. Competent and determined data hunters armed with the right tools can always find a way to get it. Less committed folks, however, experience great effort as a deterrent.
This can be a useful distinction to make, I think, when considering the uses to which “personal data” is, or can be, put. Obscure things are hard to find. Just because a dataset is “anonymised” doesn’t mean that a determined data hunter (DDH) won’t be able to deanonymise elements of it.
For a linked take in defense of privacy (from which we can maybe identify useful attributes associated with the notion of privacy), see Privacy is not the enemy – rebooted… Paul Bernal.
- Overt camera surveillance (cameras in carparks, shops and town centres, for example, or ANPR (Automated Number Plate Recognition) cameras in petrol station forecourts and again, in car parks) is presumably deployed to dissuade people from performing particular acts by making it known to them that if they engage in those acts they will be held accountable for them. If we pick this apart a little, CCTV surveillance can operate in two modes: 1) identifying particular actions and then (maybe) taking steps to prevent their furtherance; 2) identifying people captured in the video. Whilst the aim of (2) may be to identify people involved in (1), (2) may also be used to identify and track people in general, irrespective of the actions they are performing. A currently open Home Office Surveillance camera code of practice consultation gives some background to what is deemed to be acceptable use of, and controls on, the use of overt camera surveillance, although it does not seem to explore any possible “evil consequences” of such technology. I’m not sure whether it covers the use of drone-based surveillance either?!
A wider review of surveillance systems can be found in an EU Seventh Framework Programme report – IRISS (Increasing Resilience in Surveillance Societies) Deliverable D1.1: Surveillance, fighting crime and violence.
- Another key ingredient in the management of privacy and obscurity is the notion of identity and identities. UKGov has been considering “identity” in two different ways recently:
- The BIS Foresight project on Future Identities/The Future of Identity reviews different notions of identity (where identity is “the sum of those characteristics which determine who a person is”) and the different identities we may express:
This Foresight Report provides an evidence base for decision makers in government to understand better how people’s identities in the UK might change over the next 10 years. The impact of new technologies and increasing uptake of online social media, the effects of globalisation, environmental disruption, the consequences of the economic downturn, and growing social and cultural diversity are all important drivers of change for identities. However, there is a gap in understanding how identities might change in the future, and how policy makers might respond to such change.
- When working with services online, we’re all familiar with the notion of have different login identities with different services. When working with government services, there may be a requirement to ensure that a given user login identity actually relates to a particular person. The DWP Identity Assurance Scheme seems to be working with commercial providers (Post Office, Cassidian, Digidentity, Experian, Ingeus, Mydex, Verizon, PayPal) to establish an “identity registration service [that] will enable benefit claimants to choose who will validate their identity by automatically checking their authenticity with the provider before processing online benefit claims”. Whatever that is supposed to mean. Does it mean when I create a DWP login I can use my PayPal credentials to prove to DWP who I am? Or does it mean I’ll be able to log in to DWP services using my PayPal credentials? I couldn’t find anything related in a quick skim of the DWP Digital Strategy on this? Are there any good references out there? UPDATE – ah, this ComputerWeekly report suggests the identity providers will do verification and manage logins – not sure if those logins will be unique to accessing DWP/gov.uk services, though, or whether they would also access eg my PayPal account?)
See also the Open Identity Exchange, a scheme for building trusted relationships between online identity providers on a global scale…
- The BIS Foresight project on Future Identities/The Future of Identity reviews different notions of identity (where identity is “the sum of those characteristics which determine who a person is”) and the different identities we may express:
- A recent report from the Administrative Data Taskforce – Improving Access for Research and Policy – provides a series of recommendations for establishing a research network for analysing and linking administrative datasets. Among other things, the report suggests the following model for “de-identifying” linked datasets:
Here’s a sample of some of the other sorts of things the ADT recommended:
- R1.1 The ADRCs will be responsible for commissioning and undertaking linkage of data from different government departments and making the linked data available for analysis, thereby creating new resources for a growing research agenda. Analyses of within sector data (e.g. linking medical records between primary and secondary care) and linking of data between departments for operational purposes may continue to be conducted by the relevant government departments and agencies.
- R1.3 Personal identifiers (names, addresses, precise date of birth, national insurance numbers, etc.) attached to administrative data records will not be available to, or held in, the ADRCs; hence, both ADRC staff and researchers accessing data through ADRCs will not have sight of such personal identifying information. Linkage will be achieved through the use of third parties who have the expertise to provide secure data linkage services for matching personal records from existing data systems.
- R1.6 Access to data held in the ADRCs by accredited researchers will be possible using three approaches. For all of these, no individual-level records will be released from the ADRCs. First, researchers can visit the ADRC secure data access facility, where their analyses of the relevant data sub-set will be overseen by the ADRC support team. Second, researchers can submit statistical syntax to the ADRC support team who will run the analysis on the dataset on behalf of the researcher (results would be thoroughly checked before return). Third, remote secure data access facilities may be established which allow virtual access to datasets held in the ADRCs. With the latter approach, no data would be transferred to these remote safe settings, which would use state-of-the-art technologies and apply rigorous international standards, equivalent to those used in the ADRCs themselves, to provide a secure environment for researchers to undertake their analyses.
- R1.11 … However, the Taskforce recognises that there could well be potential benefits that derive from private sector data and related research interests. The Governing Board will, at an early stage, investigate guidelines for access and linkage by private sector interests, …
- I haven’t had a chance to read this yet, but the World Economic Forum (WEF) have just published a report on Rethinking Personal Data.
In the UK, the #midata route to encouraging folk to hand over access to their personal transaction data associated with company to other data processing and aggregation services continues apace with a set of clauses added to the Enterprise & Regulatory Reform Bill – Midata.
In the US, related notion of Smart Disclosure is being pursued – “an innovative new tool designed to help consumers make better informed decisions and benefit from new products and services powered by data. It refers to expanding access to data in machine-readable formats so that innovators can create interactive services and tools that allow consumers to make important choices in sectors such as health care, education, finance, energy, transportation, and telecommunications.” Because of course “Giving consumers access to their own data—with comprehensive privacy and security safeguards—can empower consumers to make better choices.” Which is to say – if you give access to your data to a third party, they can use that, in combination with other data, to recommend services to you.
So – that’s a quick round-up of recent reports I’m aware of. Have I missed any?
On June 28th, 2012, the open data policy white paper Unleashing the Potential was published by the Cabinet Office. In the section on “Opening Up Access to Research”, one particular paragraph runs as follows:
2.66 To further develop government policy on access to research, we are also establishing a Research Transparency Sector Board, chaired by the Minister for Universities and Science, which will consider ways in which transparency in the area of research can be a driver for innovation. Recognising that research data is different to other PSI [Public Sector Information, presumably? - ed.], the Board will consider how to implement transparency measures relating to research in a manner which protects the integrity of the research and associated intellectual property, while ensuring access to research for those SME entrepreneurs vital for driving growth. This will help to realise the full benefits for society as a whole. The Research Transparency Sector Board will consist of government departments, funding agencies and representatives from universities and other stakeholders, and among the first of its tasks will be to consider how to act on the recommendations of the Royal Society report.
The announcement of the board (referred to as the Research Sector Transparency Board – which makes more sense…) was welcomed by the Royal Society in a guest blog post on the data.gov.uk website dated 27th June 2012 (the day before the embargo lifted? I’m not sure when the blog post actually became public): An intelligently open enterprise.
The minutes of a Regular meeting of the ICO Higher Education sector panel on FOI and DP (24.09.2012) dated 16/10/12 notes the following:
Research data caused much concern. VA reminded delegates that she does need input from Research Councils and BIS in this area, as stated in the draft DD [HE definition document]. Definitions of “publicly funded” and “key outputs” may need clarification. It was noted that the Engineering and Physical Sciences Research Panel had to produce this type of data to an agreed timetable by 2015. It was also mentioned that the Open Data White Paper announced the formation of a new Research Sector Transparency Board and it was suggested that HEI research data could be linked to that format – it is not yet ready for use but might be worth noting in the new DD that this is a future aim.
Correspondence from House of Lords European Union Select Committee includes a letter from David Willetts MP dated 25 October 2012 that refers to his anticipated chairing of the Board:
On the question of Open Access (OA), I was pleased to note your expressed support for Open Data (OD) for which the UK is again identified as a good example. We have made excellent progress through the Finch Report on expanded access to research publications and the Government’s response to it. OD is at a relatively early stage. Some initiatives are already in train under Government’s Transparency Agenda, as detailed in the Cabinet Office White Paper, Open Data: Unleashing the Potential. This includes establishment of the Research Sector Transparency Board, which I shall be chairing. The Board will want to examine the complex issues around increasing the sharing of research data. The Research Councils’ published Open Access policy makes appropriate reference to research data, and the recent Royal Society report has informed the discussion, but work is needed on deciding further measures and implementing these appropriately, with the right terms and conditions and timing for disclosure.
We cannot be complacent and we will want to consider how best to monitor the take-up of Gold OA both here in the UK and overseas. The HEFCE-funded Joint Infrastructure Systems Committee (JISC), OAIG, and the Research Innovation Network (RIN) are already active in monitoring OA trends generally. HEFCE also envisages a possible role for JISC in monitoring the effectiveness – and effects – of Government OA policy. I expect that the Research Sector Transparency Board will also take an interest in OA policy implementation.
The 2012 BIS Annual Innovation Report from November 2012 referred to the announcement of the Board, making me wonder how many other Annual Reports celebrate the announcement of vapour
10.3 Open data and transparency
We have continued to work to harness the potential and collaborative opportunities offered by wider use of open data.
In June 2012 the Government announced in its Open Data White Paper that we would set up a Research Sector Transparency Board. The Board will consider how transparency in research can be a driver for innovation and discovery while furthering the UK’s recognised excellence in science. It will advise Government transparency issues relating to the national research effort, and improved access for small and medium businesses to the research base. Amongst its first tasks will be to consider and address the recommendations of the Royal Society report, Science as an Open Enterprise, into the sharing and disclosing of research data.
We also established the Administrative Data Taskforce, in December 2011. It will publish proposals for new mechanisms and collaborative agreements to enable and promote the wider use of administrative data for research and policy purposes, before the end of the year.
(I’m not sure I’d picked up on the Administrative Data Taskforce before? It reported in December 2012: The UK Administrative Data Research Network: Improving Access for Research and Policy. This report looks like it could be worth reading – a quick skim reveals several sections on legal and ethical issues related to linking administrative data to other dataset.)
A Hansard reported Written Answer to the House of Lords from 12 Dec 2012 (Column WA241) from The Parliamentary Under-Secretary of State, Department for Business, Innovation and Skills (Lord Marland) on questions referring to open access to research data records:
Any further opening up of access to data, in the context of the wider open data agenda, would be the subject of future discussions with the research councils and other parties including the Data Strategy Board and representative university bodies. These policy issues would also be considered as appropriate by the Research Sector Transparency Board which is chaired by David Willetts. There are no proposals to change the research councils’ policy on access to data at this time.
The Russell Group response to the House of Lords Science and Technology Committee’s inquiry on open access publishing, dated 24 January 2013, makes the following reference to the board:
1.3 The Russell Group has been monitoring the development of open access (OA) policy for some time. We followed the ‘Finch Review’ and Royal Society work on science as an open enterprise with interest and the Russell Group is now represented on the Research Sector Transparency Board which will be covering OA, open data and other issues over the coming year. We have recently had a number of meetings with Research Councils UK (RCUK) to discuss implementation of OA policy.
This suggests that membership of the board has been decided upon, at least partially?
A HEFCE letter on Open access and submissions to the REF post-2014 dated 25/2/13 refers to the board in the following terms:
25. With the Research Councils and the Research Transparency Sector Board, we are giving consideration to the issues involved in increasing access to research data. We are committed to working in dialogue with the sector to develop fair and balanced mechanisms to achieve this aim.
Again, this suggests that the Board has been convened.
So I wonder:
- What is tha actual name of the board – Research Transparency Sector Board or Research Sector Transparency Board ;-)? (Other sectors have Transparency Boards….)
- What is the membership of the board and has it convened yet?
- What are the terms of reference for the board?
- If it has convened, where are the minutes?
By the by, I note the emergence of the Research Councils UK – Gateway to Research, which provides a single point of access to “[k]ey data from the seven UK Research Councils in one location.”
This site appears to collate information about research grants, grantees, and publications by grant, across the Research Councils (I’m not sure if an #opendata dump is available though, which would mean I don’t need to scrape across all the sites using Scraperwiki any more?!;-)
PS it seems a tweet about the first meeting appeared whilst I was writing this post:
First meeting of the Research Sector Transparency Board today and all agree that open data are a public good – but that issue is complicated
— adam tickell (@adamtickell) February 26, 2013
No linkage that I can see yet, though?
A couple of weeks ago, I gave a presentation to the WebScience students at the University of Southampton on the topic of open data, using it as an opportunity to rehearse a view of open data based on the premise that it starts out closed. In much the same way that Darwin’s Theory of Evolution by Natural Selection is based on a major presupposition, specifically a theory of inheritance and the existence of processes that support reproduction with minor variation, so too does much of our thinking about open data derive from the presupposed fact that many of the freedoms we associate with the use of open data in legal terms arise from license conditions that the “owner” of the data awards to us.
Viewing data in this light, we might start by considering what constitutes “closed” data and how it comes to be so, before identifying the means by which freedoms are granted and the data is opened up. (Sometimes it can also be easier to consider what you can’t do than what you can, especially when answers to questions such as “so what can you actually do with open data?” attract the (rather meaningless) response: “anything”. We can then contrast what you can do in terms of freedom complementary to what you can’t…)
So how can data be “closed”?
One lens I particularly like for considering constraints that are placed on actions and actors, particularly in the digital world (although we can apply the model elsewhere) I first saw described by Lawrence Lessig in Code and Other Laws of Cyberspace: What Things Regulate: A Dot’s Life.
Here’s the dot and the forces that constrain its behaviour:
So we see, for example, the force of law, social norms, the market (that is, economic forces) and architecture, that is the “digital physical” way the world is implemented. (Architecture may of course be designed in order to enforce particular laws, but it is likely that other “natural laws” will arise as a result of any particular architecture or system implementation.)
Without too much thought, we might identify some constraints around data and its use under each of these separate lenses. For example:
- Law: copyright and database right grant the creator of a dataset certain protective rights over that data; data protection laws (and other “privacy laws”) limit access to, or disclosure of, data that contains personal information, as well as restricting the use of that data for purposes disclosed at the time it was collected. The UK Data Protection Act also underwrites the right of individuals to claim additional limits on data use, for example the rights “to object to processing that is likely to cause or is causing damage or distress to prevent processing for direct marketing; to object to decisions being taken by automated means” (ICO Guide to the DPA, Principle 6 – The rights of individuals).
- Norms: social mores, behaviour and taboos limit the ways in which we might use data, even if that use is not constrained by legal, economic or technical concerns. For example, applications that invite people to “burgle my house” based on analysing social network data to discover when they are likely to be away from home and what sorts of valuable product might be on the premises are generally not welcomed. Norms of behaviour and everyday workpractice also mean that much data is not published when theere are no real reasons why it couldn’t be.
- Market: in the simplest case, charging for access to data places a constraint on who can gain access to the data even in advance of trying to make use of it. If we extend “market” to cover other financial constraints, there may be a cost associated with preparing data so that it can be openly released.
- Architecture: technical constraints can restrict what you can do with data. Digital rights management (DRM) uses encryption to render data streams unusable to all but the intended client, but more prosaically, document formats such as PDF or the “release” of data charts are flat image files makes it difficult for the end user to manipulate as data any data resources contained in those documents.
Laws can also be used to grant freedoms where freedoms are otherwise restricted. For example:
- the Freedom of Information Act (FOI) provides a mechanism for requesting copies of datasets from public bodies; in addition, the Environmental Information Regulations “provide public access to environmental information held by public authorities”.
- the laws around copyright relax certain copyright constraints for the purposes of criticism and review, reporting, research, teaching (IPO – Permitted uses of copyright works);
- in the UK, the Data Protection Act provides for “a right of access to a copy of the information comprised in their personal data” (ICO Guide to the DPA, Principle 6).
- in the UK, the Data Protection Act regulates what can be done legitimately with “personal” data. However, other pieces of legislation relax confidentiality requirements when it comes to sharing data for research purposes. For example:
- the NHS Act s. 251 Control of patient information; for example, the Secretary of State for Health may “make regulations to set aside the common law duty of confidentiality for medical purposes where it is not possible to use anonymised information and where seeking individual consent is not practicable” (discussion). Note that they are changes afoot regarding s. 251…
- The Secretary of State for Education has specific powers to share pupil data from the National Pupil database (NPD) “with named bodies and third parties who require access to the data to undertake research into the educational achievements of pupils”. The NPD “tracks a pupil’s progress through schools and colleges in the state sector, using pupil census and exam information. Individual pupil level attainment data is also included (where available) for pupils in non-maintained and independent schools” (access arrangements).
- the Enterprise and Regulatory Reform Bill currently making its way through Parliament legislates around the Supply of Customer Data (the “#midata” clauses) which is intended to open up access to customer transaction data from suppliers of energy, financial services and mobile phones “(a) to a customer, at the customer’s request; (b) to a person who is authorised by a customer to receive the data, at the customer’s request or, if the regulations so provide, at the authorised person’s request.” Although proclaimed as a way of opening up individual rights to access this data, the effect will more likely see third parties enticing individuals to authorise the release to the third party of the individual first party’s personal transaction data held by a second party (for example, #Midata Is Intended to Benefit Whom, Exactly?). (So you’ll presumably legally be able to grant Facebook access to your mobile phone records… Or Facebook will find a way of getting you to release that data to them without you realising you granted them that permission;-)
Contracts (which I guess fall somewhere between norms and laws from the dot’s perspective (I need to read that section of Lessig’s book again!) can also be used by rights holders to grant freedoms over the data they hold the rights for. For example, the Creative Commons licensing framework provides a copyright holder with a set of tools for relaxing some of the rights afforded to them by copyright when they license the work accordingly.
Note that “I am not a lawyer”, so my understanding of all this is pretty hazy;-) I also wonder how the various pieces of legislation interact, and whether there are cracks and possible inconsistencies between them? If there are pieces of legislation around the regulation and use of data that I’m missing, please post links in the comments below, and I’ll try and do a more thorough round up in a follow on post.
[Had a link to last year's numbers release and didn't notice (that'll teach me to type whilst on the phone! And that's why it's much safer sticking to F1 data... it doesn't really matter if I get that wrong ;-)]
UCAS released their latest figures for university applications for 2013 today UK Application rates by country, sex, age and background (2013 Cycle, January deadline), along with data files for data charted in the report. Details of the actual number of applicants is also available: 2013 cycle applicant figures – January deadline.
Relates to 2012:
(Data reported for applications considered on time for 15 January deadline). Headlines report that “Total applicant numbers at this stage of the cycle are 7.4% lower than at the same point in 2011″, with 18 year old England domiciled applicants down 4.1% and the largest drop in terms of actual numbers year on year coming the the age 19 year group (down over 17,000). The percentages are based on differences between the actual number of applicants year on year.
[Other parts of this post are also thrown off course now, eg after spotting in UCAS reports 3.5% increase in applications to higher education that "Application rates, which take population changes into account, show that the proportion of English 18 year olds applying in 2013 has increased by one percentage point. The application rates of 18 year olds across the UK are at, or near, their highest recorded levels". Rates for last year were also released last year. I do wonder a couple of things though - why two separate releases, rates and actual numbers, (that can catch the casual user unaware... ahem... ;-) And why doesn't there appear to be much consideration of the possible effect of demographic changes (population by age) on actual numbers applying, "all other things being equal"...?]
I wondered whether demographics might account for some of the change, or even work against it, assuming that the actual percentage of individuals within a year group that applied to university was consistent. [It turns out that information about population application rates was also release.] A quick peek at the ONS stats reported the following counts for age by single year in the 2011 Census for England as follows (2011 Census: QS103EW Age by single year, local authorities in England and Wales (Excel sheet 1045Kb) <- don't ask how I found that on the ONS website. I have no idea and could not recreate the steps… [Via @paulbradshaw, 2011 Census, Population and Household Estimates for England and Wales, Table P02 2011 Census: Usual resident population by single year of age and sex, England, which is differently identified to the data I found... I wonder if the numbers are different too?!]):
(A 10,000 change on 700,000 is about 1.4% of the 700,000.) If a fixed percentage of 18 year olds from England are applying to university each year (30%, say), then demographic factors could account for some of the change in actual numbers of applicants.
One of the things that surprised me slightly about the mechanics of the UCAS data release was that they didn’t make downloadable files containing the data available (although it is easily scraped from the data tables on the announcement page; an additional, slightly more expansive breakdown of one dataset – applications by subset – is provided though). I was surprised because the precursor announcement, December’s End of Cycle report 2012 which announced the publication of a figure filled PDF based report, also included a set of datafiles containing raw data used to generate the figures in the report, so it seems as if this is now UCAS’ standard way of releasing data referred to in reports? (See also: Press Releases and Convenient Report Publication Formats for Data Journalists where I introduce the notion of “view source for data” and describe a few other examples of how public bodies have released data in support of formal reports.)
If you wanted to try to make sense of the university application numbers, and try to get a feel for what sort of effect policy changes might have had on university applications and uptake over the years, you’d probably want to look at some longitudinal data.
One of the criticisms raised about the ONS website by the Public Administration Committee over its website as part of an inquiry on Communicating and publishing statistics (see also the session the day before) related to the availability of longitudinal datasets. One dataset that I tried to looked for related to the number of students in Higher Education over the last 40 years. My first guess at the natural home for this on the ONS site was Higher Education Enrolments, and Qualifications Obtained, at Higher Education Institutions in the UK, but this only appears to go back to 2006.
I recalled struggling to find historical workforce data on the ONS site before now, but recall finding deeper historical data on nomis, the ONS official labour market stats site – but that doesn’t really do education…
However, I did manage to turn up a research briefing from the House of Commons Library (this was also referred to by Michael Blastland in his submission to the Public Affairs Committee about the ONS, I think?): Education: Historical statistics – Commons Library Standard Note. Unfortunately, the actual data referred to in that note is not available as a dataset.
(As an aside, whilst looking around the parliament site to see what else might be there, I came across pages for searching through research briefings as well papers deposited to the House libraries (for example, in response to official questions). The deposited papers include a whole range of document types, including spreadsheets, so FWIW, I started building a scraper to try to index them: scraperwiki: Parliamentary deposited papers.)
So.. the data’s out, and also in a form where it can be played with. So, has anyone played with it?! I also wonder if there’s historical data on the UCAS website detailing application numbers going further back than 2006, ideally as a nicely packaged longitudinal dataset…?
A CTRL-Shift blog post entitled MIDATA Legislation Begins mentions, but doesn’t link to, “an amendment to the Enterprise and Regulator Reform Bill in the House of Lords”, presumably referring to paragraphs 58C*, 58D* and 58E* proposed by Viscount Younger of Leckie in the Seventh Marshalled List of Amendments:
Insert the following new Clause—
“Supply of customer data
(1) The Secretary of State may by regulations require a regulated person to provide customer data—
(a) to a customer, at the customer’s request;
(b) to a person who is authorised by a customer to receive the data, at the customer’s request or, if the regulations so provide, at the authorised person’s request.
(2) “Regulated person” means—
(a) a person who, in the course of a business, supplies gas or electricity to any premises;
(b) a person who, in the course of a business, provides a mobile phone service;
(c) a person who, in the course of a business, provides financial services consisting of the provision of current account or credit card facilities;
(d) any other person who, in the course of a business, supplies or provides goods or services of a description specified in the regulations.
(3) “Customer data” means information which—
(a) is held in electronic form by or on behalf of the regulated person, and
(b) relates to transactions between the regulated person and the customer.
(4) Regulations under subsection (1) may make provision as to the form in which customer data is to be provided and when it is to be provided (and any such provision may differ depending on the form in which a request for the data is made).
(5) Regulations under subsection (1)—
(a) may authorise the making of charges by a regulated person for complying with requests for customer data, and
(b) if they do so, must provide that the amount of any such charge—
(i) is to be determined by the regulated person, but
(ii) may not exceed the cost to that person of complying with the request.
(6) Regulations under subsection (1)(b) may provide that the requirement applies only if the authorised person satisfies any conditions specified in the regulations.
(7) In deciding whether to specify a description of goods or services for the purposes of subsection (2)(d), the Secretary of State must (among other things) have regard to the following—
(a) the typical duration of the period during which transactions between suppliers or providers of the goods or services and their customers take place;
(b) the typical volume and frequency of the transactions;
(c) the typical significance for customers of the costs incurred by them through the transactions;
(d) the effect that specifying the goods or services might have on the ability of customers to make an informed choice about which supplier or provider of the goods or services, or which particular goods or services, to use;
(e) the effect that specifying the goods or services might have on competition between suppliers or providers of the goods or services.
(8) The power to make regulations under this section may be exercised—
(a) so as to make provision generally, only in relation to particular descriptions of regulated persons, customers or customer data or only in relation to England, Wales, Scotland or Northern Ireland;
(b) so as to make different provision for different descriptions of regulated persons, customers or customer data;
(c) so as to make different provision in relation to England, Wales, Scotland and Northern Ireland;
(d) so as to provide for exceptions or exemptions from any requirement imposed by the regulations, including doing so by reference to the costs to the regulated person of complying with the requirement (whether generally or in particular cases).
(9) For the purposes of this section, a person (“C”) is a customer of another person (“R”) if—
(a) C has at any time, including a time before the commencement of this section, purchased (whether for the use of C or another person) goods or services supplied or provided by R or received such goods or services free of charge, and
(b) the purchase or receipt occurred—
(i) otherwise than in the course of a business, or
(ii) in the course of a business of a description specified in the regulations.
(10) In this section, “mobile phone service” means an electronic communications service which is provided wholly or mainly so as to be available to members of the public for the purpose of communicating with others, or accessing data, by mobile phone.”
Insert the following new Clause—
“Supply of customer data: enforcement
(1) Regulations may make provision for the enforcement of regulations under section (Supply of customer data) (“customer data regulations”) by the Information Commissioner or any other person specified in the regulations (and, in this section, “enforcer” means a person on whom functions of enforcement are conferred by the regulations).
(2) The provision that may be made under subsection (1) includes provision—
(a) for applications for orders requiring compliance with the customer data regulations to be made by an enforcer to a court or tribunal;
(b) for notices requiring compliance with the customer data regulations to be issued by an enforcer and for the enforcement of such notices (including provision for their enforcement as if they were orders of a court or tribunal).
(3) The provision that may be made under subsection (1) also includes provision—
(a) as to the powers of an enforcer for the purposes of investigating whether there has been, or is likely to be, a breach of the customer data regulations or of orders or notices of a kind mentioned in subsection (2)(a) or (b) (which may include powers to require the provision of information and powers of entry, search, inspection and seizure);
(b) for the enforcement of requirements imposed by an enforcer in the exercise of such powers (which may include provision comparable to any provision that is, or could be, included in the regulations for the purposes of enforcing the customer data regulations).
(4) Regulations under subsection (1) may—
(a) require an enforcer (if not the Information Commissioner) to inform the Information Commissioner if the enforcer intends to exercise functions under the regulations in a particular case;
(b) provide for functions under the regulations to be exercisable by more than one enforcer (whether concurrently or jointly);
(c) where such functions are exercisable concurrently by more than one enforcer—
(i) designate one of the enforcers as the lead enforcer;
(ii) require the other enforcers to consult the lead enforcer before exercising the functions in a particular case;
(iii) authorise the lead enforcer to give directions as to which of the enforcers is to exercise the functions in a particular case.
(5) Regulations may make provision for applications for orders requiring compliance with the customer data regulations to be made to a court or tribunal by a customer who has made a request under those regulations or in respect of whom such a request has been made.
(6) Subsection (8)(a) to (c) of section (Supply of customer data) applies for the purposes of this section as it applies for the purposes of that section.
(7) The Secretary of State may make payments out of money provided by Parliament to an enforcer.
(8) In this section, “customer” and “regulated person” have the same meaning as in section (Supply of customer data).”
Insert the following new Clause—
“Supply of customer data: supplemental
(1) The power to make regulations under section (Supply of customer data) or (Supply of customer data: enforcement) includes—
(a) power to make incidental, supplementary, consequential, transitional or saving provision;
(b) power to provide for a person to exercise a discretion in a matter.
(2) Regulations under either of those sections must be made by statutory instrument.
(3) A statutory instrument containing regulations which consist of or include provision made by virtue of section (Supply of customer data)(2)(d) may not be made unless a draft of the instrument has been laid before, and approved by a resolution of, each House of Parliament.
(4) A statutory instrument containing any other regulations under section (Supply of customer data) or section (Supply of customer data: enforcement) is subject to annulment in pursuance of a resolution of either House of Parliament.”
Note that 58C/1/b states that data could be released “to a person who is authorised by a customer to receive the data, at the customer’s request or, if the regulations so provide, at the authorised person’s request.” So if I say to my electricity company that they can share the data with you (“a person who is authorised by a customer to receive the data”), the company can share the data with you if I ask them to or if you ask them. Which is presumably a bit like how direct debits work (I sign something and give it to you and you then go to my bank and request access to my bank account). So the proposed legislation seems to allow for (or at least, not exclude?) the creation of data aggregators who might start to aggregate data from a variety of “regulated persons” at my authorisation.
Note that I assume other regulations, such as the Data Protection Act, preclude those data aggregators from acting as data brokers, “companies that collect personal information about consumers from a variety of public and non-public sources and resell the information to other companies” (FTC [the US Federal Trade Commission] to Study Data Broker Industry’s Collection and Use of Consumer Data).
It’s also worth mentioning that the amendment doesn’t actually seem to set about enacting any actual midata legislation: “The Secretary of State may by regulations require…” which is presumably setting up the opportunity for the Secretary of State to bring it about through a Statutory Instrument or similar?
(In passing, the tabled amendments to the Bill also includes amendments relating to proposed amendments to the Copyright, Designs and Patents Act 1988 (part 6 of the Bill, relating to licensing of orphan works, collection licensing, duration of copyright et al.) as well as the creation of a Director General of Intellectual Property Rights (28C).)
The day before, CTRL-Shift had also published a post on Building Relationships for a New Data Age:
The challenge (and opportunity) is to start building an information sharing relationship with customers where both sides use data sharing to save time, cut costs and be more efficient – and to add new value.
In a world that’s rapidly going digital, an information sharing relationship makes it normal for individuals to provide the organisations they deal with new, additional and updated data, and for organisations to also routinely provide customers with additional data or data-based services. Information sharing relationships and services are becoming a key influence on which organisations customers choose to do business with, and how valuable this business becomes.
The question is, how do we get from A to B? From today’s ‘one way’ norm where organisations collect data about customers and send messages to them, to a more equal and valuable information sharing partnership? There are three key pillars to an information sharing relationship:
- establish a trustworthy ‘default setting’ for the use of personal data
- give users/customers control
- earn VPI (volunteered personal information) via new information services.
Volunteered personal information, a phrase straight out of the Facebook playbook…
The post then discusses the importance of getting default settings right, in part to avoid a public backlash and a “loss of trust” when folk realise the terms and conditions allow the companies involved to do whatever it is they say the company can, before describing how companies can Earn VPI via information services:
Getting default settings right and giving users control only create the context needed for a healthy information sharing relationship. They don’t actually get the information flowing. To do that, organisations need to:
- elicit valuable additional information from customers
- release and provide customers with additional information and/or information based services that help them make better decisions and make it easier for them to get stuff done and achieve their goals – i.e. services that add new value.
In theory, eliciting VPI and offering added value information services are two separate things. But in reality they are likely to advance hand in hand: with individuals offering additional information (in an environment they can trust because of default settings and user control) as a way to get additional value from information-driven services.
Hmmm… elicit valuable additional information from customers; and then release and provide customers with … services that add new value (I can play the selective cut and past game too…;-) #midata is presumably being sold to consumers on the basis of the latter, particularly those services that “help them make better decisions and make it easier for them to get stuff done and achieve their goals”.
And then we read:
In theory, eliciting VPI and offering added value information services are two separate things. But in reality they are likely to advance hand in hand: with individuals offering additional information (in an environment they can trust because of default settings and user control) as a way to get additional value from information-driven services.
In theory, eliciting VPI and offering added value information services are two separate things. In the land where the flowers grow and the flopsy bunnies frolic, blissfully unaware that they are what Farmer McGregor actually sells to the butcher, presumably at a greater price than he can sell the lettuces the flopsy bunnies eat to the local greengrocer. Or something like that.
But in reality sound the drums of doom…in reality they are likely to advance hand in hand. Erm…of course… No-one wants shed loads of transactional data for personal use…with individuals offering additional information as a way to get additional value from information-driven services.
Yep… #midata is a way of getting you to give shed loads of low quality transactional data to third parties (who may or may not aggregate it worth other data you grant them access to) and then give them a shed load more data before it actually becomes useful. Because that’s how data works…but it’s not how the dream is sold…
Hmmm… I wonder, does the draft legislation say anything about the extent to which an authorised person is allowed to aggregate and mine data from regulated person(s) that relates to data collected from different customers either of the same, or different regulated persons? Because there lies another source of those “in reality” sources of potential value add…though we really should also try to imagine what sources they might be. (Is receiving targeted ads “value add” for me over random junk mail?)
On the other side of the fence, sort of, we see a Private Member’s Bill (Ten Minute Rule Bill?) from John Denham, Labour MP for Southampton, Itchen (not, apparently, the constituency in which the University of Southampton resides…) on Supermarket price transparency which seeks to require supermarkets “to release pricing data product by product and store by store [update: Supermarket Pricing Information Bill 2012-13]. This price information would not only enable the comparison of basic product prices, but also enable consumers to understand the differences in pricing between stores within the same retail chain, or variations in pricing of goods in different areas and regions.” In addition, it is claimed that the Private Member’s Bill “would also enable efficient scrutiny of special offers, multi-buys, ‘bogofs’ and other price promotions that have been the subject of recent criticism and regulatory action.”.
PS See also So What, #midata? And #yourData, #ourData…
Following the official opening of the Open Data Institute (ODI) last week, a flurry of data related announcements this week:
- A big one for stats fans with the release of 2011 Census data by the ONS: 2011 Census, Key Statistics for Local Authorities in England and Wales. A few charts appear to have made it into the mix (along with the data to generate them), which I guess sets the baseline for whoever lands the currently advertised Head of Rich Content at the ONS job…
The data files associated with press releases are published as Excel spreadsheets. I guess this reflects, in part, the need to come up with a container that can cope with all the metadata. It’s a bit of a pain, though. One thing I keep meaning to explore further are ways of bundling data in R packages, along with scripts for analysing and visualising the data so bundled (eg US Census Spatial and Demographic Data in R: The UScensus2000 Suite of Packages or US consumer expenditure survey (ce) in R). I probably should also look again at Google’s Dataset Publication Language (DSPL) as well as other packaging formats. I need to check out the latest major release from the W3C Provenance Working Group too…
- Over at BIS, £8 million of investment in open public data is announced, the major chunk of which goes to the Data Strategy Board (#datastrategy) Breakthrough Fund to help public bodies get over short term technical barriers to releasing open public data. I keep wittering on about mapping out data flows that already exist and then finding ways to tap into them directly, so won’t repeat that here;-) A smaller pot, administered by the ODI, will be available to SMEs via the Open Data Immersion Programme. Also announced, the Ordnance Survey will be widening the availability of its range of mapping data.
- Not sure if I missed this when it was presumably announced? The Data Strategy Board’s chair Stephan Shakespeare (CEO of YouGov Plc) is leading an independent review of public sector information (here are the (draft) terms of reference). I’m not sure how this review fits into the reports to the tangle of reporting lines associated with the Data Strategy Board and the Public Data Group (the latter seems to have been very quiet?). I also wonder where the ODI fits into that whole structure?
- The funding around public open data coincided with a written Ministerial statement form the Cabinet Office that provided an Update on Departmental Open Data Commitments and adherence to Public Data Principles (>original link on a gov.uk domain, h/t @owenboswarva). The update is spectacularly lacking in linking to any of the raw data that is summarised in the actual statement, so so much for any actual transparency there… The same minister, Francis Maude, has also been fulfilling his social media obligations with a piece in the Huffington Post on A Practical Vision for Open Government. (In other news, at the micro/pragmatic level of open public data, I’m still finding that week on week releases of NHS sitrep data show minor differences in formatting and occasional errors…)
Things have been moving on the Communications Data front too. Communications Data got a look in as part of the 2011/2012 Security and Intelligence Committee Annual Report with a review of what’s currently possible and “why change may be necessary”. Apparently:
118. The changes in the telecommunications industry, and the methods being used by people to communicate, have resulted in the erosion of the ability of the police and Agencies to access the information they require to conduct their investigations. Historically, prior to the introduction of mobile telephones, the police and Agencies could access (via CSPs, when appropriately authorised) the communications data they required, which was carried exclusively across the fixed-line telephone network. With the move to mobile and now internet-based telephony, this access has declined: the Home Office has estimated that, at present, the police and Agencies can access only 75% of the communications data that they would wish, and it is predicted that this will significantly decline over the next few years if no action is taken. Clearly, this is of concern to the police and intelligence and security Agencies as it could significantly impact their ability to investigate the most serious of criminal offences.
N. The transition to internet-based communication, and the emergence of social networking and instant messaging, have transformed the way people communicate. The current legislative framework – which already allows the police and intelligence and security Agencies to access this material under tightly defined circumstances – does not cover these new forms of communication. [original emphasis]
Elsewhere in Parliament, the Joint Select Committee Report on the Draft Communications Data Bill was published and took a critical tone (Home Secretary should not be given carte blanche to order retention of any type of data under draft communications data bill, says joint committee. “There needs to be some substantial re-writing of the Bill before it is brought before Parliament” adds Lord Blencathra, Chair of the Joint Committee.) Friend and colleague Ray Corrigan links to some of the press reviews of the report here: Joint Committee declare CDB unworkable.
In other news, Prime Minister David Cameron’s announcement of DNA tests to revolutionise fight against cancer and help 100,000 patients was reported via a technology angle – Everybody’s DNA could be on genetic map in ‘very near future’ [Daily Telegraph] – as well as by means of more reactionary headlines: Plans for NHS database of patients’ DNA angers privacy campaigners [Guardian], Privacy fears over DNA database for up to 100,000 patients [Daily Telegraph].
If DNA is your thing, don’t forget that the Home Office already operates a National DNA Database for law enforcement purposes.
And if national databases are your thing, there always the National Pupil Database which was in the news recently with the launch of a consultation on proposed amendments to individual pupil information prescribed persons regulations which seeks to “maximise the value of this rich dataset” by widening access to this data. (Again, Ray provides some context and commentary: Mr Gove touting access to National Pupil Database.)
PS A late inclusion: DECC announcement around smart meter rollout with some potential links to #midata strategy (eg “suppliers will not be able to use energy consumption data for marketing purposes unless they have explicit consent”). A whole raft of consultations were held around smart metering and Govenerment responses are also published today, including Government Response on Data Access and Privacy Framework, the Smart Metering Privacy Impact Assessment and a report on public attitudes research around smart metering. I also spotted an earlier consultation that had passed me by around the Data and Communications Company (DCC) License Conditions; here the response, which opens with: “The communications and data transfer and management required to support smart metering is to be organised by a new central communications body – the Data and Communications Company (“the DCC”). The DCC will be a new licensed entity regulated by the Gas and Electricity Markets Authority (otherwise referred to as “the Authority”, or “Ofgem”). A single organisation will be granted a licence under each of the Electricity and Gas Acts (there will be two licences in a single document, referred to as the “DCC Licence”) to provide these services within the domestic sector throughout Great Britain”. Another one to put on the reading pile…
Putting a big brother watch hat on, the notion of “meter surveillance” brings to mind BBC article about an upcoming (will hopefully thence be persistently available on iPlayer?) radio programme on “Electric Network Frequency (ENF) analysis”, The hum that helps to fight crime. According to Wikipedia, ENF is a forensic science technique for validating audio recordings by comparing frequency changes in background mains hum in the recording with long-term high-precision historical records of mains frequency changes from a database. In turn, this reminds me of appliance signature detection (identifying what appliance is switched on or off from its electrical load curve signature), for example Leveraging smart meter data to recognize home appliances. In context of audio surveillance, how about supplementing surveillance video cameras with microphones? Public Buses Across Country [US] Quietly Adding Microphones to Record Passenger Conversations.
Via my feeds, a handful of consultations and codes relating to open data, particularly in a local government context:
- first up, from the Cabinet Office, an open consultation on FOI data-release guidelines (see also the press release and draft code of practice); this relates to ways of handling the changes to the Freedom of Information Act that will come in to force in April 2013 that arise from the changes introduced through the Protection of Freedoms Act earlier this year;
- from DCLG, a consultation under the banner of “Improving Local Government Transparency” that takes a look at matters arising from making the Code of recommended practice for local authorities on Data Transparency mandatory [PDF] (here’s a copy of the The Code of Recommended Practice for Local Authorities on Data Transparency, which “sets out key principles for local authorities in creating greater transparency through the publication of public data”, where “‘Public data’ therefore means the objective, factual data, on which policy decisions are based and on which public services are assessed, or which is collected or generated in the course of public service delivery.”);
- from the Advisory Panel on Public Sector Information (APPSI), we have A National Information Framework for Public Sector Information and Open Data (I saw this via a Guardian commentary piece).
Also of note this week, the ICO published its Anonymisation: managing data protection risk code of practice [PDF] (here’s the press release). ENISA, the European Network and Information Security Agency, have also just published the latest in a series of reports on privacy: Privacy considerations of online behavioural tracking. My colleague Ray Corrigan has done a quick review here.
Although it’s hard to know who has influence where, to the extent that the UK’s Open Government Partnership National Action Plan suggests a general roadmap for open government activity, this is maybe worth noting: Involve workshops: Developing the UK’s next open government National Action Plan
For a recent review of the open data policy context, see InnovateUK’s Open Data and it’s role in powering economic growth.
(I’ll update this post with a bit more commentary over the next few days. For now, I thought I’d share the links in case any readers out there fancy a bit of weekend reading…;-)
PS though not in the news this week, here are a couple of links to standing and appealed case law around database IPR:
- background – OKF review of data/database rights and Out-law review of database rights
- (ongoing) Court of Justice of European Union Appeal – Football Dataco and others: are football event listings protected? (context and commentary on review.)
- case law: The British Horseracing Board Ltd and Others v William Hill (horse-racing information) (commentary by out-law).
The Twittertubes were all abuzz yesterday with news about the UKGov’s announcement on #midata (even though the press release everyone was referring to came out earlier?). It’s still not clear to me what announcement was actually made yesterday, or where? [Ah... seems the actual statement relates to the Government's response to the midata consultation, along with an impact assessment.] I also struggled to find any write-ups of the hacks’n'ideas produced at the ODI’s (@ukodi) #midata hackathon over the weekend?
(For a round-up from over the summer of reports on personal data, see Personal Data Exploitation – Recent Reports, which also quotes a sceptical view about public uptake from a Government commissioned report.)
The personal data that UKGov is encouraging companies to make available in the first instance is credit card/banking transaction data, phone billing, and energy usage data. The first two sectors typically offer itemised breakdowns anyway – maybe the #midata initiative will “request” that the information is made available in a machine readable form if it isn’t already published as such? – with the energy usage data requiring a smart meter, presumably (and which many folk who are interested will have acquired – and hacked years ago – already?!) So what’s new? Does this add to our right to data, eg as supported by subject access requests under the Data Protection Act? What I’m sceptical about is the extent to which this initiative is just a roundabout way of allowing companies to share data amongst themselves (eg Data Bartering Is Everywhere) with checkbox customer permission, of course… (Market context: Computing.co.uk – How Tesco and co are testing the limits of customer data exploitation.)
By the by, on the topic of sharing individual level data, it seems that the Department for Education are currently consulting around the wider release of pupil data – Consultation on proposed amendments to individual pupil information prescribed persons regulations: A consultation on proposals to amend regulations to enable the Department for Education to share extracts of data held in the National Pupil Database for a wider range of purposes than currently possible. The aim is to maximise the value of this rich dataset.
The National Pupil Database is a longitudinal database, which holds information on children in schools in England. The majority of datasets go back 10 years, with the earliest data going back to 1996. There are a range of data sources in the National Pupil Database providing information about children’s education at different stages (pre-school, primary, secondary and further education).
It includes detailed information about pupils, their test and exam results, prior attainment and progression at different key stages for all state schools in England. Attainment data is also held for pupils and students in nonmaintained special schools, sixth form and Further Education (FE) colleges and (where available) independent schools. The National Pupil Database includes information about the characteristics of pupils in the state sector and non-maintained special schools such as gender, ethnicity, first language, eligibility for free school meals, information about special educational needs (SEN), as well as detailed information about pupil absence and exclusions.
The data held in the National Pupil Database is collected from a range of sources including schools, local authorities and awarding organisations. This data is processed by the Department’s Data and Statistics Division and matched and stored in the National Pupil Database. The Department makes it clear to children and their parents what information is held about pupils and how it is processed, through a statement on its website. Schools also inform parents and pupils of how the data is used through privacy notices.
There’s a lot to be said for opening up this data to researchers, but I’m sure the privacy wonks will also have plenty of points to make… For example, from Privacy International, UK School Census proposals – How you can help. (Related – just in to my mailbox, EU report on “The right to be forgotten”. And more: ICO code on anonymisation, managing privacy risks and maintaining transparency.)
Sort of related, it’s maybe also worth remembering that the Department of Health, via the NHS, is also widening collection of, and opening up researcher access to, anonymised cradle-to-grave health records via Clinical Practice Research Datalink. (Launch press release; context: eg NHS patient records to revolutionise medical research in Britain.)
Taking these together, along with the idea that media channels deliver audiences to advertisers, I wonder: what is being transacted (collected, bought, staged, and sold) when government releases life event related “transactional” datasets (school records, health records) to researchers? How do the costs and benefits flow (eg in terms of improving the lot of the citizen, playing fair with taxation, etc…?)
PS I haven’t been keeping up with Linked Data in Gov initiatives lately, so this (via @ldodds, I think?) looks like it might be a handy round-up: UKGovLD (UK Government Linked Data Working Group) – opening the doors event.
PPS Via @mhawksey, something that should be read alongside the #midata announcement – Tesco vacancy – Product Manager, ‘My Data’ (commentary): “The successful candidate will define the strategy to develop and support the deployment of Group-wide capability to deliver market-leading products and games which give our Clubcard customers simple, useful, fun access to their own data to help them plan and achieve their goals.”
- You will build and develop the personalised access to customer’s data capability plan
- Accountable for working with functional and country stakeholders across the business to develop a strategy for personalised access to customer’s data and prioritising which products, tools and capabilities to build
- Work with Tesco IT and dunnhumby and other functional stakeholders to deliver these new capabilities to plan and to budget
- Manage the delivery of Clubcard Play (games) to engage customers and create new media opportunities for brands and marketing opportunities for Tesco
- Represent the functional teams and their interests to ensure there is a constant delivery of customer and business benefits from the personalised access to customer’s data workstream
- Manage a team of managers (who work with functional stakeholders and IT) to define and deliver new products, tools and capabilities
- Work with key functional stakeholders such as marketing to manage the organisation change and impact that the personalised access to customer’s data workstream will have
- Work with Corporate and Legal Affairs to manage any legal obligations around giving customers digital access to their own data
- Drive learning through rapid testing and piloting and be involved in running trials in market where needed
- Drive requirements back into the Data and Personalisation Engine streams within the Programme
- Manage the reporting and tracking of benefits to ensure that we are measuring the impact of our activities
- Contribute as part of the Personalisation customer data leadership team
- Look to the medium-term future and think about potential innovations in the area of personalised access to customer’s data to bring into the overall programme roadmap
- Stay close to the customer through market scanning, networking and by building relationships with key internal and external thought leaders
If you spot any ads from other companies that look as if they are #midata related, please post a link to them, the job title and if possible a clip/quick summary, in the comments;-)
PPPS On my “possibly related?” to read list: Network Accountability for the Domestic Intelligence Apparatus. From the abstract, “The network is anchored by “fusion centers,” novel sites of intergovernmental collaboration that generate and share intelligence and information. Several fusion centers have generated controversy for engaging in extraordinary measures that place citizens on watch lists, invade citizens’ privacy, and chill free expression. … A new concept of accountability – network accountability – is needed to address the shortcomings of fusion centers. Network accountability has technical, legal, and institutional dimensions. Technical standards can render data exchange between agencies in the network better subject to review. Legal redress mechanisms can speed the correction of inaccurate or inappropriate information.” With public datasets, we can of course create our own “fusion centres”.
PPPPS …and on the “to play with” list, analyze the consumer expenditure survey (ce) with r (“the consumer expenditure survey (ce) is the primo data source to understand how americans spend money. participating households keep a running diary about every little purchase over the year. those diaries are then summed up into precise expenditure categories.” And the data is available:-).
PPPPPS December 2012: FTC to Study Data Broker Industry’s Collection and Use of Consumer Data “The Federal Trade Commission issued orders requiring nine data brokerage companies to provide the agency with information about how they collect and use data about consumers. The agency will use the information to study privacy practices in the data broker industry.
“Data brokers are companies that collect personal information about consumers from a variety of public and non-public sources and resell the information to other companies. In many ways, these data flows benefit consumers and the economy; for example, having this information about consumers enables companies to prevent fraud. Data brokers also provide data to enable their customers to better market their products and services.”
Could be interesting… It also links to a March 2012 report on Protecting Consumer Privacy in an Era of Rapid Change: Recommendations for Businesses and Policymakers.
In an earlier post on awarding body market share in UK school examinations, I described an OfQual dataset that listed the number of certificates awarded by certificate name and qualification level by the various awarding bodies. We can use that sort of data to see market share by certificates awarded, but the dataset does not give us any insight into the grades awarded by the different bodies, which might allow us to ask a range of other questions: for example, do any exams appear to be “easier” or “harder” than others, simply based on percentages awarded at each grade by different bodies or within different subject areas (that is the distribution of grades; note that statistical assumptions may be used to tweak grade boundaries, so we need to be careful here about what questions we even think we might be able to ask…).
Just a quick aside here: WhatDoTheyKnow suggests that the JCQ is not FOIable, although OfQual is, and routinely uses “publicly available information” from the JCQ in the formulation of its own reports; similar data is also published by the Awarding bodies, although again in the form of informally structured tabular data within PDF files. In response to an FOI request I made to OfQual, the following statement appears:
Because this information is already accesible (sic) to you from JCQ and Awarding Organisations it is exempt from disclosure under Section 21 of the Act because the information is already in the public domain.
So, to clarify:
- OfQual is an FOIable body that has published some data as information; I don’t know whether it holds the information as data
- The data I requested is available as information in the form of PDF documents from two classes of non-FOIable body: the JCQ, a charity; the Awarding Bodies, commercial companies.
Under FOI 2000 s. 11 (as amended by Protection fo Freedoms Act 2012), if you request all or part of a dataset in an electronic form “the public authority must, so far as reasonably practicable, provide the information to the applicant in an electronic form which is capable of re-use.” I don’t know if there are any cases out there arguing the toss about how to interpret this (PDF doc, CSV or SQL dump good, etc? If you know of any, please add a comment…) but I’d argue that the data is not available in that form. So a question that naturally follows is: does this affect the reading of Section 21 of the Act “Information which is reasonably accessible to the applicant … is exempt information.”? (BTW, this looks handy – FOI Wiki, though I’m not sure it’s being maintained?) Similarly, if a public body publishes a dataset in the form of a PDF document and not as data, can I FOI a request for that information as data notwithstanding that the information is available in a different, less accessible form? Or will they throw s. 21 back at me? [Via a tweet, @paulbradshaw suggests that a request for machine readable/data version of info released as PDF will typically be satisified.]
Now where was I..?!
Oh yes… the JCQ data as PDF… well it just so happens that the data is available as data from the Guardian Datastore: GCSE results 2012: exam breakdown by subject, gender and area [data] (I’m not sure if they scraped the A’level results too?). However, the breakdown does not go as far as distributions by award board, and the linkage between subject areas and the certificate titles used in the OfQual dataset is not obvious (there may be mappings in the data documentation/explanatory notes maybe? I haven’t looked.)
Another aside: could the FOIable body point to a scraped dataset published by a third party as evidence that the information is available in a reusable form, even if the reusable format was not published directly by the original FOIable body? That is, if a council publishes data as PDF, and someone scrapes it using Scraperwiki, making it available “as data”, could the council point to the Scraperwiki database as evidence of “[i]nformation which is reasonably accessible to the applicant”? How would they know the data was valid? How about the concrete case here of the JCQ PDF data being scraped by the Guardian Datastore folk and republished as a Google Spreadsheet? And here’s another thought: if I were known to be a demon PDF hacker, would that affect the interpretation of “reasonably accessible”?
If we really wanted to look at distributions of grades by certificate and Awarding Body, we’d probably need to go to the horse’s mouth. So for example, EdExcel grade statistics, AQA results statistics, OCR results stats, CEA Statistics. But again, this data is only available in PDF form, and the companies that publish it aren’t FOIable. (If you’re running – or know of – scrapers grabbing this data, please let me know via the comments). Note that if this “source” data were available, we should be able to check it against the original OfQual data (at least, we should be able to check totals by award board and certificate).
Of course, I could possibly go straight to the OfQual annual market report [PDF] to see market segment breakdowns; but I think that was where I started (the pie charts immediately started to put me off!) – and it’s not really the datajunkie way, is it, seeing reports containing tables and charts and not being able to recreate them?;-)
SO what DDJ lessons do we learn from all this? One thing may be that as data goes along a publishing chain, it tends to get summarised, which then limits the sorts of questions you can ask of it. By unpicking the chain, and getting access to ever finer grained data, we get ourselves into a position whereby we should in principle be able to regenerate the summary reports from the next level down; but we may also be faced with trying to reconcile the data or fit it into the categories that are referred to in the original reports. For transparency as reproducibility, what we need is for reports that publish summary data to also publish two other things: 1) the full set of data that was summarised; 2) the formulae used to generate the summaries from that full data set. Of course, it may be that there are multiple summary steps in the chain (report A generates summaries of dataset B, which itself summarises or represents a particular view over a dataset C). In the current example, OfQual publishes data about certificates awarded by each Awarding Body but no grades; JCQ has grade data across awards but no awarding body data (though in some cases we may be able to recreate that – eg where only a single awarding bidy offers certificates in a particular area); the awarding bodies publish the finest grained data of all – grade distributions by certificate (and rather obviously, this data is at the level of a particular awarding body).