A CTRL-Shift blog post entitled MIDATA Legislation Begins mentions, but doesn’t link to, “an amendment to the Enterprise and Regulator Reform Bill in the House of Lords”, presumably referring to paragraphs 58C*, 58D* and 58E* proposed by Viscount Younger of Leckie in the Seventh Marshalled List of Amendments:
Insert the following new Clause—
“Supply of customer data
(1) The Secretary of State may by regulations require a regulated person to provide customer data—
(a) to a customer, at the customer’s request;
(b) to a person who is authorised by a customer to receive the data, at the customer’s request or, if the regulations so provide, at the authorised person’s request.
(2) “Regulated person” means—
(a) a person who, in the course of a business, supplies gas or electricity to any premises;
(b) a person who, in the course of a business, provides a mobile phone service;
(c) a person who, in the course of a business, provides financial services consisting of the provision of current account or credit card facilities;
(d) any other person who, in the course of a business, supplies or provides goods or services of a description specified in the regulations.
(3) “Customer data” means information which—
(a) is held in electronic form by or on behalf of the regulated person, and
(b) relates to transactions between the regulated person and the customer.
(4) Regulations under subsection (1) may make provision as to the form in which customer data is to be provided and when it is to be provided (and any such provision may differ depending on the form in which a request for the data is made).
(5) Regulations under subsection (1)—
(a) may authorise the making of charges by a regulated person for complying with requests for customer data, and
(b) if they do so, must provide that the amount of any such charge—
(i) is to be determined by the regulated person, but
(ii) may not exceed the cost to that person of complying with the request.
(6) Regulations under subsection (1)(b) may provide that the requirement applies only if the authorised person satisfies any conditions specified in the regulations.
(7) In deciding whether to specify a description of goods or services for the purposes of subsection (2)(d), the Secretary of State must (among other things) have regard to the following—
(a) the typical duration of the period during which transactions between suppliers or providers of the goods or services and their customers take place;
(b) the typical volume and frequency of the transactions;
(c) the typical significance for customers of the costs incurred by them through the transactions;
(d) the effect that specifying the goods or services might have on the ability of customers to make an informed choice about which supplier or provider of the goods or services, or which particular goods or services, to use;
(e) the effect that specifying the goods or services might have on competition between suppliers or providers of the goods or services.
(8) The power to make regulations under this section may be exercised—
(a) so as to make provision generally, only in relation to particular descriptions of regulated persons, customers or customer data or only in relation to England, Wales, Scotland or Northern Ireland;
(b) so as to make different provision for different descriptions of regulated persons, customers or customer data;
(c) so as to make different provision in relation to England, Wales, Scotland and Northern Ireland;
(d) so as to provide for exceptions or exemptions from any requirement imposed by the regulations, including doing so by reference to the costs to the regulated person of complying with the requirement (whether generally or in particular cases).
(9) For the purposes of this section, a person (“C”) is a customer of another person (“R”) if—
(a) C has at any time, including a time before the commencement of this section, purchased (whether for the use of C or another person) goods or services supplied or provided by R or received such goods or services free of charge, and
(b) the purchase or receipt occurred—
(i) otherwise than in the course of a business, or
(ii) in the course of a business of a description specified in the regulations.
(10) In this section, “mobile phone service” means an electronic communications service which is provided wholly or mainly so as to be available to members of the public for the purpose of communicating with others, or accessing data, by mobile phone.”
Insert the following new Clause—
“Supply of customer data: enforcement
(1) Regulations may make provision for the enforcement of regulations under section (Supply of customer data) (“customer data regulations”) by the Information Commissioner or any other person specified in the regulations (and, in this section, “enforcer” means a person on whom functions of enforcement are conferred by the regulations).
(2) The provision that may be made under subsection (1) includes provision—
(a) for applications for orders requiring compliance with the customer data regulations to be made by an enforcer to a court or tribunal;
(b) for notices requiring compliance with the customer data regulations to be issued by an enforcer and for the enforcement of such notices (including provision for their enforcement as if they were orders of a court or tribunal).
(3) The provision that may be made under subsection (1) also includes provision—
(a) as to the powers of an enforcer for the purposes of investigating whether there has been, or is likely to be, a breach of the customer data regulations or of orders or notices of a kind mentioned in subsection (2)(a) or (b) (which may include powers to require the provision of information and powers of entry, search, inspection and seizure);
(b) for the enforcement of requirements imposed by an enforcer in the exercise of such powers (which may include provision comparable to any provision that is, or could be, included in the regulations for the purposes of enforcing the customer data regulations).
(4) Regulations under subsection (1) may—
(a) require an enforcer (if not the Information Commissioner) to inform the Information Commissioner if the enforcer intends to exercise functions under the regulations in a particular case;
(b) provide for functions under the regulations to be exercisable by more than one enforcer (whether concurrently or jointly);
(c) where such functions are exercisable concurrently by more than one enforcer—
(i) designate one of the enforcers as the lead enforcer;
(ii) require the other enforcers to consult the lead enforcer before exercising the functions in a particular case;
(iii) authorise the lead enforcer to give directions as to which of the enforcers is to exercise the functions in a particular case.
(5) Regulations may make provision for applications for orders requiring compliance with the customer data regulations to be made to a court or tribunal by a customer who has made a request under those regulations or in respect of whom such a request has been made.
(6) Subsection (8)(a) to (c) of section (Supply of customer data) applies for the purposes of this section as it applies for the purposes of that section.
(7) The Secretary of State may make payments out of money provided by Parliament to an enforcer.
(8) In this section, “customer” and “regulated person” have the same meaning as in section (Supply of customer data).”
Insert the following new Clause—
“Supply of customer data: supplemental
(1) The power to make regulations under section (Supply of customer data) or (Supply of customer data: enforcement) includes—
(a) power to make incidental, supplementary, consequential, transitional or saving provision;
(b) power to provide for a person to exercise a discretion in a matter.
(2) Regulations under either of those sections must be made by statutory instrument.
(3) A statutory instrument containing regulations which consist of or include provision made by virtue of section (Supply of customer data)(2)(d) may not be made unless a draft of the instrument has been laid before, and approved by a resolution of, each House of Parliament.
(4) A statutory instrument containing any other regulations under section (Supply of customer data) or section (Supply of customer data: enforcement) is subject to annulment in pursuance of a resolution of either House of Parliament.”
Note that 58C/1/b states that data could be released “to a person who is authorised by a customer to receive the data, at the customer’s request or, if the regulations so provide, at the authorised person’s request.” So if I say to my electricity company that they can share the data with you (“a person who is authorised by a customer to receive the data”), the company can share the data with you if I ask them to or if you ask them. Which is presumably a bit like how direct debits work (I sign something and give it to you and you then go to my bank and request access to my bank account). So the proposed legislation seems to allow for (or at least, not exclude?) the creation of data aggregators who might start to aggregate data from a variety of “regulated persons” at my authorisation.
Note that I assume other regulations, such as the Data Protection Act, preclude those data aggregators from acting as data brokers, “companies that collect personal information about consumers from a variety of public and non-public sources and resell the information to other companies” (FTC [the US Federal Trade Commission] to Study Data Broker Industry’s Collection and Use of Consumer Data).
It’s also worth mentioning that the amendment doesn’t actually seem to set about enacting any actual midata legislation: “The Secretary of State may by regulations require…” which is presumably setting up the opportunity for the Secretary of State to bring it about through a Statutory Instrument or similar?
(In passing, the tabled amendments to the Bill also includes amendments relating to proposed amendments to the Copyright, Designs and Patents Act 1988 (part 6 of the Bill, relating to licensing of orphan works, collection licensing, duration of copyright et al.) as well as the creation of a Director General of Intellectual Property Rights (28C).)
The day before, CTRL-Shift had also published a post on Building Relationships for a New Data Age:
The challenge (and opportunity) is to start building an information sharing relationship with customers where both sides use data sharing to save time, cut costs and be more efficient – and to add new value.
In a world that’s rapidly going digital, an information sharing relationship makes it normal for individuals to provide the organisations they deal with new, additional and updated data, and for organisations to also routinely provide customers with additional data or data-based services. Information sharing relationships and services are becoming a key influence on which organisations customers choose to do business with, and how valuable this business becomes.
The question is, how do we get from A to B? From today’s ‘one way’ norm where organisations collect data about customers and send messages to them, to a more equal and valuable information sharing partnership? There are three key pillars to an information sharing relationship:
- establish a trustworthy ‘default setting’ for the use of personal data
- give users/customers control
- earn VPI (volunteered personal information) via new information services.
Volunteered personal information, a phrase straight out of the Facebook playbook…
The post then discusses the importance of getting default settings right, in part to avoid a public backlash and a “loss of trust” when folk realise the terms and conditions allow the companies involved to do whatever it is they say the company can, before describing how companies can Earn VPI via information services:
Getting default settings right and giving users control only create the context needed for a healthy information sharing relationship. They don’t actually get the information flowing. To do that, organisations need to:
- elicit valuable additional information from customers
- release and provide customers with additional information and/or information based services that help them make better decisions and make it easier for them to get stuff done and achieve their goals – i.e. services that add new value.
In theory, eliciting VPI and offering added value information services are two separate things. But in reality they are likely to advance hand in hand: with individuals offering additional information (in an environment they can trust because of default settings and user control) as a way to get additional value from information-driven services.
Hmmm… elicit valuable additional information from customers; and then release and provide customers with … services that add new value (I can play the selective cut and past game too…;-) #midata is presumably being sold to consumers on the basis of the latter, particularly those services that “help them make better decisions and make it easier for them to get stuff done and achieve their goals”.
And then we read:
In theory, eliciting VPI and offering added value information services are two separate things. But in reality they are likely to advance hand in hand: with individuals offering additional information (in an environment they can trust because of default settings and user control) as a way to get additional value from information-driven services.
In theory, eliciting VPI and offering added value information services are two separate things. In the land where the flowers grow and the flopsy bunnies frolic, blissfully unaware that they are what Farmer McGregor actually sells to the butcher, presumably at a greater price than he can sell the lettuces the flopsy bunnies eat to the local greengrocer. Or something like that.
But in reality sound the drums of doom…in reality they are likely to advance hand in hand. Erm…of course… No-one wants shed loads of transactional data for personal use…with individuals offering additional information as a way to get additional value from information-driven services.
Yep… #midata is a way of getting you to give shed loads of low quality transactional data to third parties (who may or may not aggregate it worth other data you grant them access to) and then give them a shed load more data before it actually becomes useful. Because that’s how data works…but it’s not how the dream is sold…
Hmmm… I wonder, does the draft legislation say anything about the extent to which an authorised person is allowed to aggregate and mine data from regulated person(s) that relates to data collected from different customers either of the same, or different regulated persons? Because there lies another source of those “in reality” sources of potential value add…though we really should also try to imagine what sources they might be. (Is receiving targeted ads “value add” for me over random junk mail?)
On the other side of the fence, sort of, we see a Private Member’s Bill (Ten Minute Rule Bill?) from John Denham, Labour MP for Southampton, Itchen (not, apparently, the constituency in which the University of Southampton resides…) on Supermarket price transparency which seeks to require supermarkets “to release pricing data product by product and store by store [update: Supermarket Pricing Information Bill 2012-13]. This price information would not only enable the comparison of basic product prices, but also enable consumers to understand the differences in pricing between stores within the same retail chain, or variations in pricing of goods in different areas and regions.” In addition, it is claimed that the Private Member’s Bill “would also enable efficient scrutiny of special offers, multi-buys, ‘bogofs’ and other price promotions that have been the subject of recent criticism and regulatory action.”.
PS See also So What, #midata? And #yourData, #ourData…
The Twittertubes were all abuzz yesterday with news about the UKGov’s announcement on #midata (even though the press release everyone was referring to came out earlier?). It’s still not clear to me what announcement was actually made yesterday, or where? [Ah... seems the actual statement relates to the Government's response to the midata consultation, along with an impact assessment.] I also struggled to find any write-ups of the hacks’n'ideas produced at the ODI’s (@ukodi) #midata hackathon over the weekend?
(For a round-up from over the summer of reports on personal data, see Personal Data Exploitation – Recent Reports, which also quotes a sceptical view about public uptake from a Government commissioned report.)
The personal data that UKGov is encouraging companies to make available in the first instance is credit card/banking transaction data, phone billing, and energy usage data. The first two sectors typically offer itemised breakdowns anyway – maybe the #midata initiative will “request” that the information is made available in a machine readable form if it isn’t already published as such? – with the energy usage data requiring a smart meter, presumably (and which many folk who are interested will have acquired – and hacked years ago – already?!) So what’s new? Does this add to our right to data, eg as supported by subject access requests under the Data Protection Act? What I’m sceptical about is the extent to which this initiative is just a roundabout way of allowing companies to share data amongst themselves (eg Data Bartering Is Everywhere) with checkbox customer permission, of course… (Market context: Computing.co.uk – How Tesco and co are testing the limits of customer data exploitation.)
By the by, on the topic of sharing individual level data, it seems that the Department for Education are currently consulting around the wider release of pupil data – Consultation on proposed amendments to individual pupil information prescribed persons regulations: A consultation on proposals to amend regulations to enable the Department for Education to share extracts of data held in the National Pupil Database for a wider range of purposes than currently possible. The aim is to maximise the value of this rich dataset.
The National Pupil Database is a longitudinal database, which holds information on children in schools in England. The majority of datasets go back 10 years, with the earliest data going back to 1996. There are a range of data sources in the National Pupil Database providing information about children’s education at different stages (pre-school, primary, secondary and further education).
It includes detailed information about pupils, their test and exam results, prior attainment and progression at different key stages for all state schools in England. Attainment data is also held for pupils and students in nonmaintained special schools, sixth form and Further Education (FE) colleges and (where available) independent schools. The National Pupil Database includes information about the characteristics of pupils in the state sector and non-maintained special schools such as gender, ethnicity, first language, eligibility for free school meals, information about special educational needs (SEN), as well as detailed information about pupil absence and exclusions.
The data held in the National Pupil Database is collected from a range of sources including schools, local authorities and awarding organisations. This data is processed by the Department’s Data and Statistics Division and matched and stored in the National Pupil Database. The Department makes it clear to children and their parents what information is held about pupils and how it is processed, through a statement on its website. Schools also inform parents and pupils of how the data is used through privacy notices.
There’s a lot to be said for opening up this data to researchers, but I’m sure the privacy wonks will also have plenty of points to make… For example, from Privacy International, UK School Census proposals – How you can help. (Related – just in to my mailbox, EU report on “The right to be forgotten”. And more: ICO code on anonymisation, managing privacy risks and maintaining transparency.)
Sort of related, it’s maybe also worth remembering that the Department of Health, via the NHS, is also widening collection of, and opening up researcher access to, anonymised cradle-to-grave health records via Clinical Practice Research Datalink. (Launch press release; context: eg NHS patient records to revolutionise medical research in Britain.)
Taking these together, along with the idea that media channels deliver audiences to advertisers, I wonder: what is being transacted (collected, bought, staged, and sold) when government releases life event related “transactional” datasets (school records, health records) to researchers? How do the costs and benefits flow (eg in terms of improving the lot of the citizen, playing fair with taxation, etc…?)
PS I haven’t been keeping up with Linked Data in Gov initiatives lately, so this (via @ldodds, I think?) looks like it might be a handy round-up: UKGovLD (UK Government Linked Data Working Group) – opening the doors event.
PPS Via @mhawksey, something that should be read alongside the #midata announcement – Tesco vacancy – Product Manager, ‘My Data’ (commentary): “The successful candidate will define the strategy to develop and support the deployment of Group-wide capability to deliver market-leading products and games which give our Clubcard customers simple, useful, fun access to their own data to help them plan and achieve their goals.”
- You will build and develop the personalised access to customer’s data capability plan
- Accountable for working with functional and country stakeholders across the business to develop a strategy for personalised access to customer’s data and prioritising which products, tools and capabilities to build
- Work with Tesco IT and dunnhumby and other functional stakeholders to deliver these new capabilities to plan and to budget
- Manage the delivery of Clubcard Play (games) to engage customers and create new media opportunities for brands and marketing opportunities for Tesco
- Represent the functional teams and their interests to ensure there is a constant delivery of customer and business benefits from the personalised access to customer’s data workstream
- Manage a team of managers (who work with functional stakeholders and IT) to define and deliver new products, tools and capabilities
- Work with key functional stakeholders such as marketing to manage the organisation change and impact that the personalised access to customer’s data workstream will have
- Work with Corporate and Legal Affairs to manage any legal obligations around giving customers digital access to their own data
- Drive learning through rapid testing and piloting and be involved in running trials in market where needed
- Drive requirements back into the Data and Personalisation Engine streams within the Programme
- Manage the reporting and tracking of benefits to ensure that we are measuring the impact of our activities
- Contribute as part of the Personalisation customer data leadership team
- Look to the medium-term future and think about potential innovations in the area of personalised access to customer’s data to bring into the overall programme roadmap
- Stay close to the customer through market scanning, networking and by building relationships with key internal and external thought leaders
If you spot any ads from other companies that look as if they are #midata related, please post a link to them, the job title and if possible a clip/quick summary, in the comments;-)
PPPS On my “possibly related?” to read list: Network Accountability for the Domestic Intelligence Apparatus. From the abstract, “The network is anchored by “fusion centers,” novel sites of intergovernmental collaboration that generate and share intelligence and information. Several fusion centers have generated controversy for engaging in extraordinary measures that place citizens on watch lists, invade citizens’ privacy, and chill free expression. … A new concept of accountability – network accountability – is needed to address the shortcomings of fusion centers. Network accountability has technical, legal, and institutional dimensions. Technical standards can render data exchange between agencies in the network better subject to review. Legal redress mechanisms can speed the correction of inaccurate or inappropriate information.” With public datasets, we can of course create our own “fusion centres”.
PPPPS …and on the “to play with” list, analyze the consumer expenditure survey (ce) with r (“the consumer expenditure survey (ce) is the primo data source to understand how americans spend money. participating households keep a running diary about every little purchase over the year. those diaries are then summed up into precise expenditure categories.” And the data is available:-).
PPPPPS December 2012: FTC to Study Data Broker Industry’s Collection and Use of Consumer Data “The Federal Trade Commission issued orders requiring nine data brokerage companies to provide the agency with information about how they collect and use data about consumers. The agency will use the information to study privacy practices in the data broker industry.
“Data brokers are companies that collect personal information about consumers from a variety of public and non-public sources and resell the information to other companies. In many ways, these data flows benefit consumers and the economy; for example, having this information about consumers enables companies to prevent fraud. Data brokers also provide data to enable their customers to better market their products and services.”
Could be interesting… It also links to a March 2012 report on Protecting Consumer Privacy in an Era of Rapid Change: Recommendations for Businesses and Policymakers.
A BIS Press Release (Next steps making midata a reality) seems to have resulted in folk tweeting today about the #midata consultation that was announced last month. If you haven’t been keeping up, #midata is the policy initiative around getting companies to make “[consumer data] that may be actionable and useful in making a decision or in the course of a specific activity” (whatever that means) available to users in a machine readable form. To try to help clarify matters, several vignettes are described in this July 2012 report – Example applications of the midata programme – which plays the role of a ‘draft for discussion’ at the September midata Strategy Board [link?]. Here’s a quick summary of some of them:
- form filling: a personal datastore will help you pre-populate forms and provide certified evidence of things like: proof of her citizenship, qualified to drive, passed certain exams and achieved certain qualifications, passed a CRB check, and so on. (Note: I’ve previously tried to argue the case for the OU starting to develop a service (OU Qualification Verification Service) around delivering verified tokens relating to the award of OU degrees, and degrees awarded by the polytechnics, as was (courtesy of the OU’s CNAA Aftercare Service), but after an initial flurry of interest, it was passed on. midata could bring it back maybe?
- home moving admin: change your details in a personal “mydata” data store, and let everyone pick up the changes from there. Just think what fun you could have with an attack on this;-)
- contracts and warranties dashboard: did my crApple computer die the week before or after the guarantee ran out?
- keeping track of the housekeeping: bank and financial statement data management and reporting tools. I thought there already was software for doing this? do we use it though? I’d rather my bank improved the tools it provided me with?
- keeping up with the Jones’s: how does my house’s energy consumption compare with that of my neighbours?
- which phone? Pick a tariff automatically based on your actual phone usage. From going through this recently, the problem is not with knowing how I use my phone (easy enough to find out), it’s with navigating the mobile phone sites trying to understand their offers. (And why can’t Vodafone send me an SMS to say I’m 10 minutes away from using up this month’s minutes, rather than letting me go over? The midata answer might be an agent that looks at my usage info and tells me when I’m getting close to my limit, which requires me having access to my contract details in a machine readable form, I guess?
And here’s a BIS blog post summarising them: A midata future: 10 ways it could shape your choices.
(The #midata policy seems based on a belief that users want better access to data so they can do things with it. I’m not convinced – why should I have to export my bank data to another service (increasing the number of services I must trust) rather than my bank providing me with useful tools directly? I guess one way this might play out is that any data that does dribble out may get built around by developers who then sell the tools back to the data providers so they can offer them directly? In this context, I guess I should read the BIS commissioned Jigsaw Research report: Potential consumer demand for midata.)
Today has also seen a minor flurry of chat around the call for evidence on the Communications Data Bill, presumably because the closing date for responses is tomorrow (draft Communications Data Bill). (Related reading: latest Annual Report of the Interception of Communications Commissioner.) Again, if you haven’t been keeping up, the draft Communications Data Bill describes communications data in the following terms:
- Communications data is information about a communication; it can include the details of the time, duration, originator and recipient of a communication; but not the content of the communication itself
- Communications data falls into three categories: subscriber data; use data; and traffic data.
The categories are further defined in an annex:
- Subscriber Data – Subscriber data is information held or obtained by a provider in relation to persons to whom the service is provided by that provider. Those persons will include people who are subscribers to a communications service without necessarily using that service and persons who use a communications service without necessarily subscribing to it. Examples of subscriber information include:
– ‘Subscriber checks’ (also known as ‘reverse look ups’) such as “who is the subscriber of phone number 012 345 6789?”, “who is the account holder of e-mail account email@example.com?” or “who is entitled to post to web space http://www.xyz.anyisp.co.uk?”;
– Subscribers’ or account holders’ account information, including names and addresses for installation, and billing including payment method(s), details of payments;
– information about the connection, disconnection and reconnection of services which the subscriber or account holder is allocated or has subscribed to (or may have subscribed to) including conference calling, call messaging, call waiting and call barring telecommunications services;
– information about the provision to a subscriber or account holder of forwarding/redirection services;
– information about apparatus used by, or made available to, the subscriber or account holder, including the manufacturer, model, serial numbers and apparatus codes.
– information provided by a subscriber or account holder to a provider, such as demographic information or sign-up data (to the extent that information, such as a password, giving access to the content of any stored communications is not disclosed).
- Use data – Use data is information about the use made by any person of a postal or telecommunications service. Examples of use data may include:
– itemised telephone call records (numbers called);
– itemised records of connections to internet services;
– itemised timing and duration of service usage (calls and/or connections);
– information about amounts of data downloaded and/or uploaded;
– information about the use made of services which the user is allocated or has subscribed to (or may have subscribed to) including conference calling, call messaging, call waiting and call barring telecommunications services;
– information about the use of forwarding/redirection services;
– information about selection of preferential numbers or discount calls;
- Traffic Data – Traffic data is data that is comprised in or attached to a communication for the purpose of transmitting the communication. Examples of traffic data may include:
– information tracing the origin or destination of a communication that is in transmission;
– information identifying the location of equipment when a communication is or has been made or received (such as the location of a mobile phone);
– information identifying the sender and recipient (including copy recipients) of a communication from data comprised in or attached to the communication;
– routing information identifying equipment through which a communication is or has been transmitted (for example, dynamic IP address allocation, file transfer logs and e-mail headers – to the extent that content of a communication, such as the subject line of an e-mail, is not disclosed);
– anything, such as addresses or markings, written on the outside of a postal item (such as a letter, packet or parcel) that is in transmission;
– online tracking of communications (including postal items and parcels).
To put the communications data thing into context, here’s something you could try for yourself if you have a smartphone. Using something like the SMS to Text app (if you trust it!), grab your txt data from your phone and try charting it: SMS analysis (coming from an Android smartphone or an IPhone). And now ask yourself: what if I also mapped my location data, as collected by my phone? And will this sort of thing be available as midata, or will I have to collect it myself using a location tracking app if I want access to it? (There’s an asymmetry here: the company potentially collecting the data, or me collecting the data…)
It’s also worth bearing in mind that even if access to your data is locked down, access to the data of people associated with you might reveal quite a lot of information about you, including your location, as Adam Sadilek et al. describe: Finding Your Friends and Following Them to Where You Are (see also Far Out: Predicting Long-Term Human Mobility). My own tinkerings with emergent social positioning (looking at who the followers of particular twitter users also follow en masse) also suggest we can generate indicators about potential interests of a user by looking at the interests of their followers… Even if you’re careful about who your friends are, your followers might still reveal something about you you have tried not to disclose yourself (such as your birthday…). (That’s one of the problems with asymmetric trust models! Hmmm… could be interesting to start trying to model some of this… )
Both of these consultations provide a context for reflecting on the extent to which companies use data for their own processing purposes (for a recent review, see What happens to my data? A novel approach to informing users of data processing practices), the extent to which they share this data in raw and processed form with other companies or law enforcement agencies, the extent to which they may use it to underwrite value-added/data-powered services to users directly or when combined with data from other sources, the extent to which they may be willing to share it in raw or processed form back with users, and the extent to which users may then be willing (or licensed) to share that data with other providers, and/or combine it with data from other providers.
One of the biggest risks from a “what might they learn about me” point of view – as well as some of the biggest potential benefits – comes from the reconciliation of data from multiple different sources. Mosaic theory is an idea taken from the intelligence community that captures the idea that when data from multiple sources is combined, the value of the whole view may be greater than the sum of the parts. When privacy concerns are idly raised as a reason against the release of data, it is often suspicion and fears around what a data mosaic picture might reveal that act as drivers of these concerns. (Similar fears are also used as a reason against the release of data, for example under Freedom of Information requests, in case a mosaic results in a picture that can be used against national interests: eg D.E. Pozen, The Mosaic Theory, National Security, and the Freedom of Information Act and MP Goodwin, A National Security Puzzle: Mosaic Theory and the First Amendment Right of Access in the Federal Courts).
Note that within a particular dataset, we might also appeal to mosaic theory thinking; for example, might we learn different things when we observe individual data records as singletons, as opposed to a set of data (and the structures and patterns it contains) as a single thing: GPS Tracking and a ‘Mosaic Theory’ of Government Searches. And as a consequence, might we want to treat individual data records, and complete datasets, differently?
PS via this ORG post – Consulympics: opportunities to have your say on tech policies – which details a whole raft of currently open ICT related consultations in the UK, I am reminded of this ICO Consultation on the draft Anonymisation code of practice along with a draft of the anaoymisation code itself.
Yesterday, UK gov folk announced what I imagine someone, somewhere, has termed “a breakthrough in consumer empowerment”, a voluntary scheme for corporates to opt in to that means they may let us have access to some of the data they’ve collected about us.
According to BBC technology reporter Rory Cellan-Jones (Midata: Will the public share government’s enthusiasm?), here’s what we can expect:
[From] Callcredit which holds credit files on every adult in the UK.
It’s now promising that every consumer will be able to look at their file for free for life, in a radical change to its business model. …
Scottish Power’s midata plans involve making its customers’ annual energy consumption data more easily accessible to make the process of switching suppliers easier.
And finally, there was the Royal Bank of Scotland which is promising to give its customers “a complete walkthrough” of all their annual transactions. So, for instance, you will be able to find out how much you spent at Tesco last year*.
*Only you won’t be able to get that information from Tesco, because they haven’t signed up?
A lot of this information is likely to already be available to folk who are interested in the quantified self. For example, you can download your statements from your bank or credit card company, as data, or use services (in the US at least?) such as Mint to aggregate and report on your personal finance data; you can use devices such as Current Cost to track your energy usage, or apps on your phone to break down how you’ve used it, and so on…
But maybe if neatly packaged and re-presented data (as well as data as data) were more available, there would be wider interest in it? Maybe…
I also noticed that Google is a signatory to the #midata initiative. So what might we expect from them? Here are three things that I think might already hit #midata buttons, so maybe this will give us a clue as to the sort of thing we can expect to see when (if…) companies start rolling our #midata services next year:
PS By the by, if you search around things like “mi data” you tend to turn up jobs around the areas of market intelligence and management information systems… Just noticing…;-)
PPS I also noticed this in Rory’s article: “Meanwhile the government’s drive to free up public data has hit a few roadbumps. … The consumer affairs minister Edward Davey said there was a balance to be struck when it came to public data: “It’s got to be sustainable. If we gave away large datasets that cost a lot of money to collect, the data would degenerate over time.” So: the plan is that companies make data they were keeping to themselves “open” to the people who generated it, presumably for free rather than for a fee, but we need to hold off on opening up data collected at public expense that could be used to drive innovation, efficiencies (or so govt were claiming a year or two ago) and wider awareness in the public sector because it’s unsustainable?
PPPS Something I’d like to see in my data returns from signatories is a list of folk and partner organisations who they’ve sold or otherwise exchanged my personal data with, along with a list of what data was included in that transaction…
Via @wilm, I notice that it’s time again for someone (this time at the Wall Street Journal) to have written about the scariness that is your Google personal web history (the sort of thing you probably have to opt out of if you sign up for a new Google account, if other recent opt-in by defaults are to go by…)
It may not sound like much, but if you do have a Google account, and your web history collection is not disabled, you may find your emotional response to seeing months of years of your web/search history archived in one place surprising… Your Google web history.
Not mentioned in the WSJ article was some of the games that the Chrome browser gets up. @tim_hunt tipped me off to a nice (if technically detailed, in places) review by Ilya Grigorik of some the design features of the Chrome browser, and some of the tools built in to it: High Performance Networking in Chrome. I’ve got various pre-fetching tools switched off in my version of Chrome (tools that allow Chrome to pre-emptively look up web addresses and even download pages pre-emptively*) so those tools didn’t work for me… but looking at chrome://predictors/ was interesting to see what keystrokes I type are good predictors of web pages I visit…
* By the by, I started to wonder whether webstats get messed up to any significant effect by Chrome pre-emptively prefetching pages that folk never actually look at…?
In further relation to the tracking of traffic we generate from our browsing habits, as we access more and more web/internet services through satellite TV boxes, smart TVs, and catchup TV boxes such as Roku or NowTV, have you ever wondered about how that activity is tracked? LG Smart TVs logging USB filenames and viewing info to LG servers describes not only how LG TVs appear to log the things you do view, but also the personal media you might view, and in principle can phone that information home (because the home for your data is a database run by whatever service you happen to be using – your data is midata is their data).
there is an option in the system settings called “Collection of watching info:” which is set ON by default. This setting requires the user to scroll down to see it and, unlike most other settings, contains no “balloon help” to describe what it does.
At this point, I decided to do some traffic analysis to see what was being sent. It turns out that viewing information appears to be being sent regardless of whether this option is set to On or Off.
you can clearly see that a unique device ID is transmitted, along with the Channel name … and a unique device ID.
This information appears to be sent back unencrypted and in the clear to LG every time you change channel, even if you have gone to the trouble of changing the setting above to switch collection of viewing information off.
It was at this point, I made an even more disturbing find within the packet data dumps. I noticed filenames were being posted to LG’s servers and that these filenames were ones stored on my external USB hard drive.
Hmmm… maybe it’s time I switched out my BT homehub for a proper hardware firewalled router with a good set of logging tools…?
PS FWIW, I can’t really get my head round how evil on the one hand, or damp squib on the other, the whole midata thing is turning out to be in the short term, and what sorts of involvement – and data – the partners have with the project. I did notice that a midata innovation lab report has just become available, though to you and me it’ll cost 1500 squidlly diddlies so I haven’t read it: The midata Innovation Opportunity. Note to self: has anyone got any good stories to say about TSB supporting innovation in micro-businesses…?
PPS And finally, something else from the Ilya Grigorik article:
The HTTP Archive project tracks how the web is built, and it can help us answer this question. Instead of crawling the web for the content, it periodically crawls the most popular sites to record and aggregate analytics on the number of used resources, content types, headers, and other metadata for each individual destination. The stats, as of January 2013, may surprise you. An average page, amongst the top 300,000 destinations on the web is:
- 1280 KB in size
- composed of 88 resources
- connects to 15+ distinct hosts
Is it any wonder that pages take so long to load on a mobile phone off the 3G netwrok, and that you can soon eat up your monthly bandwidth allowance!
Via a post on my colleague, and info law watchdog, Ray Corrigan’s blog – Alas medical confidentiality in the UK, we knew it well… – I note he has some concerns about the way in which the NHS data linkage service may be able to up its game as a result of the creation of the HSCIC – the Health and Social Care Information Centre – and it’s increasing access to data (including personal medical records?) held by GPs via the General Practice Extraction Service (GPES). (The HSCIC itself was established via legislation: Part 9 Chapter 2 of the Health and Social Care Act 2012. As I commented in The Business of Open Public Data Rolls On…, I think we need to keep a careful eye on (proposed) legislation that allows for “information of a prescribed description” to be made available to a “prescribed person” or “a person falling within a prescribed category”, where those prescriptions are left to the whim of the Minister responsible.) (Also via Ray, medConfidential has an interesting review of the HSCIC/GPES story so far.)
Something I hadn’t spotted before was the price list for data extraction and linkage services – just as interesting as the prices are the categories of service:
Here are the actual prices:
Complexity based on time to process-
3. A request is classed as ‘simple’ if specification, production and checking are expected to take less than 5 hours.
4. A request is classed as ‘medium’ if specification, production and checking are expected to take less than 7 hours but more than 5.
5. A request is classed as ‘complex’ if specification, production and checking are expected to take less than 12 hours but more than 7.
Doing a little search around the notion of “data linkage”, I stumbled across what looks to be quite a major data linkage initiative going on in Scotland – the Scotland Wide data linkage framework. There seems to have been a significant consultation exercise in 2012 prior to the establishment of this framework earlier this year: Data Linkage Framework Consultation [closed] [see for example the Consultation paper on "Aims and Guiding Principles" or the Technical Consultation on the Design of the Data Sharing and Linking Service [closed]]. Perhaps mindful of the fact that there may have been and may yet be public concerns around the notion of data linkage, an engagement exercise and report on Public Acceptability of Cross-Sectoral Data Linkage was also carried out (August 2012). A further round of engagement looks set so occur during November 2013.
I’m not sure what the current state of the framework, or its implementation, is (maybe this FOI request on Members and minutes of meetings of Data Linkage Framework Operations Group would give some pointers?) but one component of it at at least looks to be the Electronic Data Research and Innovation Service (eDRIS), a “one-stop shop for health informatics research”, apparently… Here’s the elevator pitch:
Some examples of collaborative work are also provided:
- Linking data from NHS24 and Scottish Ambulance Service with emergency admissions and deaths data to understand unscheduled care patient pathways.
- Working with NHS Lothian to provide linked health data to support EuroHOPE – European Healthcare Outcomes, Performance and Efficiency Project Epidemiology, disease burden and outcomes of diverticular disease in Scotland
- Infant feeding in Scotland: Exploring the factors that influence infant feeding choices (within Glasgow) and the potential health and economic benefits of breastfeeding on child health
This got me wondering about what sorts of data linkage project things like HSCIC or the MoJ data lab (as reviewed here) might get up to. Several examples seem to to provided by the ESRC Administrative Data Liaison Service (ADLS): Summary of administrative data linkage. (For more information about the ADLS, see the Administrative Data Taskforce report Improving Access for Research and Policy.)
The ADLS itself was created as part of a three phase funding programme by the ESRC, which is currently calling for second phase bids for Business and Local Government Data Research Centres. I wonder if offering data linkage services will play a role in their remit? If so, I wonder if they will offer services along the lines of the ADLS Trusted Third Party Service (TTPS), which “provides researchers and data holding organisations a mechanism to enable the combining and enhancing of data for research to which may not have otherwise been possible because of data privacy and security concerns”? Apparently,
The [ADLS TTPS] facility is housed within a secure room within the Centre for Census and Survey Research (CCSR) at the University of Manchester, and has been audited by the Office for National Statistics. The room is only used to carry out disclosure risk assessment work and other work that requires access to identifiable data.”
Another example of a secure environment for data analysis is provided by the the HMRC Datalab. One thing I notice about that facility is that they don’t appear to allow expect researchers to use R (the software list identifies STATA 9/10/11, SAS 9.3, Microsoft Excel, Microsoft Word, SPSS Clementine 8.1/9.0/10.1/11.1/12)?
Why’s this important? Because little L, little D, linked data can provide a much richer picture than distinct data sets…
PS see also mosaic theory…
PPS reminded by @wilm, here’s a “nice” example of data linkage from the New York Times… N.S.A. Gathers Data on Social Connections of U.S. Citizens.
PPPS and from the midata Innovation Lab, I notice this claim:
On the 4th of July 2013 we opened the midata Innovation Lab (mIL), on what we call “UK Consumer Independence Day”. So what is it? It’s the UK Government, leading UK companies and authoritative bodies collaborating on data services innovation and consumer protection for a data-driven future. We’ve put together the world’s fastest-built data innovation lab, creating the world’s most interesting and varied datasets, for the UK’s best brands and developers to work with.
The mIL is an accelerator for business to use a rich dataset to create new services for consumers. Designed in conjunction with innovative “Founding Partner” businesses, it also has oversight from authoritative bodies so we can create the world’s best consumer protection in the emerging personal data ecosystem.
The unique value of the lab is its ability to offer a unique dataset and consumer insight that it would be difficult for any one organization to collate. With expert input from authoritative consumer protection bodies, we can test and learn how to both empower and protect consumers in the emerging personal data ecosystem.
And this: “The personal data that we have asked for is focused on a few key areas: personal information including vehicle and property data, transactional banking and credit records, mobile, phone, broadband and TV billing information and utility bills.” It seems that data was collected from 50 individuals to start with.
A few more bits and pieces around the possible distribution and application of open public data (that is, openly licensed data released by public bodies):
- Bills before Parliament – Education (Information Sharing) Bill 2013-14: although this is a private member’s bill, explanatory notes have been prepared by prepared by the Department for Education. The bill allows for “student information of a prescribed description” to be made available to a “prescribed person” or “a person falling within a prescribed category”. If the bill goes through, keeping tabs on these prescriptions will be key to seeing how this might play out.
As mentioned in my Rambling Round-Up of Some Recent #OpenData Notices from August, the HMRC is consulting on opening up access to VAT records. And through the post this week, I received a letter from the NHS regarding the sharing of data within the NHS via Summary Care Records, although this appears to be more to do with data sharing within the NHS on a case-by-case basis, rather than sharing of bulk datasets for analysis/research and/or business development. So outbreaks of planned sharing are appearing all over the place. I’m not sure what the best way of tracking such initiatives is though?
I haven’t really been tracking private members’ bills either (except the Supermarket Pricing Information Bill 2012-13 that never went anywhere!), and I’m not really sure what they signal, but some of them do make me a bit twitchy. Like the currently proposed Collection of Nationality Data Bill that will “require the collection and publication of information relating to the nationality of those in receipt of benefits and of those to whom national insurance numbers are issued.” Or the Face Coverings (Prohibition) Bill 2013-14, whereby “a person wearing a garment or other object intended by the wearer as its primary purpose to obscure the face in a public place shall be guilty of an offence.” As discussions regarding privacy and anonymity on the web ebb and flow, it’s interesting to see how they’re tracked “IRL”. If a space is public, do you have any right to privacy or anonymity?
- ESRC Pre-call: Business and Local Government Data Research Centres – Big Data Network Phase 2:
The ESRCs Big Data Network will support the development of a network of innovative investments which will strengthen the UKs competitive advantage in Big Data. The core aim of this network is to facilitate access to different types of data and thereby stimulate innovative research and develop new methods to undertake that research. This network has been divided into three phases.
- Phase 1 of the Big Data Network the ESRC has invested in the development of the Administrative Data Research Network (ADRN) which will provide access to de-identified administrative data collected by government departments for research use
- Phase 2, which is the focus of this pre-announcement, will focus primarily on business data and local government data
- Phase 3, further details of which will be released in the Autumn, will focus primarily on third sector data and social media data
- Progress continues on the smart meter roll out program, with huge chunks of money being lined up for a few lucky companies (Government Selects Favourites For The Smart Meter Roll-Out). See also the Energy and Climate Change Select Committee inquiry – “Smart meter roll-out” and their Smart meter roll out report. Whilst the drivers are presumably supposedly related more efficient energy management, there are plenty of surveillance opportunities arising! Whilst not public data, as such, the availability (and sharing with data aggregators) of smart meter data does form part of the government’s #midata programme (around which the current strategy appears to be “the less said the better”…)
- Maybe of interest to hardcore openspending data geeks, Local Audit and Accountability Bill 2013-14 has made its way from the Lords into the Commons. Schedule 9 introduces regulations around data matching, described as “an exercise involving the comparison of sets of data to determine how far they match (including the identification of any patterns and trends)”, although “data matching exercise[s] may not be used to identify patterns and trends in an individual’s characteristics or behaviour which suggest nothing more than the individual’s potential to commit fraud in the future”. A code of practice is also required. The power “is exercisable for the purpose of assisting in the prevention and detection of fraud” although the schedule may be amended in order to assist: “a) in the prevention and detection of crime (other than fraud), (b) in the apprehension and prosecution of offenders, and (c) in the recovery of debt owing to public bodies”.
Schedule 11 covers the Disclosure of Information. Where an auditor obtains information from a public body “[a] local auditor, or a person acting on the auditor’s behalf, may also disclose information to which this Schedule applies except where the disclosure would, or would be likely to, prejudice the effective performance of a function imposed or conferred on the auditor by or under an enactment”. I’m not sure to what extent such information might be requestable from the local auditor though?
I have to admit, I’m losing track of all these data and information related laws. And I guess I should also admit that I don’t really understand what any of them actually mean, either…!;-)
A jumbled collection of recent clips and snippets, that feel to me as if they’re pieces of the same jigsaw…
- An article in The Atlantic on Obscurity: A Better Way to Think About Your Data Than ‘Privacy’:
…”privacy” is an over-extended concept. It grabs our attention easily, but is hard to pin down. Sometimes, people talk about privacy when they are worried about confidentiality. Other times they evoke privacy to discuss issues associated with corporate access to personal information. Fortunately, obscurity has a narrower purview.
Obscurity is the idea that when information is hard to obtain or understand, it is, to some degree, safe. Safety, here, doesn’t mean inaccessible. Competent and determined data hunters armed with the right tools can always find a way to get it. Less committed folks, however, experience great effort as a deterrent.
This can be a useful distinction to make, I think, when considering the uses to which “personal data” is, or can be, put. Obscure things are hard to find. Just because a dataset is “anonymised” doesn’t mean that a determined data hunter (DDH) won’t be able to deanonymise elements of it.
For a linked take in defense of privacy (from which we can maybe identify useful attributes associated with the notion of privacy), see Privacy is not the enemy – rebooted… Paul Bernal.
- Overt camera surveillance (cameras in carparks, shops and town centres, for example, or ANPR (Automated Number Plate Recognition) cameras in petrol station forecourts and again, in car parks) is presumably deployed to dissuade people from performing particular acts by making it known to them that if they engage in those acts they will be held accountable for them. If we pick this apart a little, CCTV surveillance can operate in two modes: 1) identifying particular actions and then (maybe) taking steps to prevent their furtherance; 2) identifying people captured in the video. Whilst the aim of (2) may be to identify people involved in (1), (2) may also be used to identify and track people in general, irrespective of the actions they are performing. A currently open Home Office Surveillance camera code of practice consultation gives some background to what is deemed to be acceptable use of, and controls on, the use of overt camera surveillance, although it does not seem to explore any possible “evil consequences” of such technology. I’m not sure whether it covers the use of drone-based surveillance either?!
A wider review of surveillance systems can be found in an EU Seventh Framework Programme report – IRISS (Increasing Resilience in Surveillance Societies) Deliverable D1.1: Surveillance, fighting crime and violence.
- Another key ingredient in the management of privacy and obscurity is the notion of identity and identities. UKGov has been considering “identity” in two different ways recently:
- The BIS Foresight project on Future Identities/The Future of Identity reviews different notions of identity (where identity is “the sum of those characteristics which determine who a person is”) and the different identities we may express:
This Foresight Report provides an evidence base for decision makers in government to understand better how people’s identities in the UK might change over the next 10 years. The impact of new technologies and increasing uptake of online social media, the effects of globalisation, environmental disruption, the consequences of the economic downturn, and growing social and cultural diversity are all important drivers of change for identities. However, there is a gap in understanding how identities might change in the future, and how policy makers might respond to such change.
- When working with services online, we’re all familiar with the notion of have different login identities with different services. When working with government services, there may be a requirement to ensure that a given user login identity actually relates to a particular person. The DWP Identity Assurance Scheme seems to be working with commercial providers (Post Office, Cassidian, Digidentity, Experian, Ingeus, Mydex, Verizon, PayPal) to establish an “identity registration service [that] will enable benefit claimants to choose who will validate their identity by automatically checking their authenticity with the provider before processing online benefit claims”. Whatever that is supposed to mean. Does it mean when I create a DWP login I can use my PayPal credentials to prove to DWP who I am? Or does it mean I’ll be able to log in to DWP services using my PayPal credentials? I couldn’t find anything related in a quick skim of the DWP Digital Strategy on this? Are there any good references out there? UPDATE – ah, this ComputerWeekly report suggests the identity providers will do verification and manage logins – not sure if those logins will be unique to accessing DWP/gov.uk services, though, or whether they would also access eg my PayPal account?)
See also the Open Identity Exchange, a scheme for building trusted relationships between online identity providers on a global scale…
- The BIS Foresight project on Future Identities/The Future of Identity reviews different notions of identity (where identity is “the sum of those characteristics which determine who a person is”) and the different identities we may express:
- A recent report from the Administrative Data Taskforce – Improving Access for Research and Policy – provides a series of recommendations for establishing a research network for analysing and linking administrative datasets. Among other things, the report suggests the following model for “de-identifying” linked datasets:
Here’s a sample of some of the other sorts of things the ADT recommended:
- R1.1 The ADRCs will be responsible for commissioning and undertaking linkage of data from different government departments and making the linked data available for analysis, thereby creating new resources for a growing research agenda. Analyses of within sector data (e.g. linking medical records between primary and secondary care) and linking of data between departments for operational purposes may continue to be conducted by the relevant government departments and agencies.
- R1.3 Personal identifiers (names, addresses, precise date of birth, national insurance numbers, etc.) attached to administrative data records will not be available to, or held in, the ADRCs; hence, both ADRC staff and researchers accessing data through ADRCs will not have sight of such personal identifying information. Linkage will be achieved through the use of third parties who have the expertise to provide secure data linkage services for matching personal records from existing data systems.
- R1.6 Access to data held in the ADRCs by accredited researchers will be possible using three approaches. For all of these, no individual-level records will be released from the ADRCs. First, researchers can visit the ADRC secure data access facility, where their analyses of the relevant data sub-set will be overseen by the ADRC support team. Second, researchers can submit statistical syntax to the ADRC support team who will run the analysis on the dataset on behalf of the researcher (results would be thoroughly checked before return). Third, remote secure data access facilities may be established which allow virtual access to datasets held in the ADRCs. With the latter approach, no data would be transferred to these remote safe settings, which would use state-of-the-art technologies and apply rigorous international standards, equivalent to those used in the ADRCs themselves, to provide a secure environment for researchers to undertake their analyses.
- R1.11 … However, the Taskforce recognises that there could well be potential benefits that derive from private sector data and related research interests. The Governing Board will, at an early stage, investigate guidelines for access and linkage by private sector interests, …
- I haven’t had a chance to read this yet, but the World Economic Forum (WEF) have just published a report on Rethinking Personal Data.
In the UK, the #midata route to encouraging folk to hand over access to their personal transaction data associated with company to other data processing and aggregation services continues apace with a set of clauses added to the Enterprise & Regulatory Reform Bill – Midata.
In the US, related notion of Smart Disclosure is being pursued – “an innovative new tool designed to help consumers make better informed decisions and benefit from new products and services powered by data. It refers to expanding access to data in machine-readable formats so that innovators can create interactive services and tools that allow consumers to make important choices in sectors such as health care, education, finance, energy, transportation, and telecommunications.” Because of course “Giving consumers access to their own data—with comprehensive privacy and security safeguards—can empower consumers to make better choices.” Which is to say – if you give access to your data to a third party, they can use that, in combination with other data, to recommend services to you.
So – that’s a quick round-up of recent reports I’m aware of. Have I missed any?
A couple of weeks ago, I gave a presentation to the WebScience students at the University of Southampton on the topic of open data, using it as an opportunity to rehearse a view of open data based on the premise that it starts out closed. In much the same way that Darwin’s Theory of Evolution by Natural Selection is based on a major presupposition, specifically a theory of inheritance and the existence of processes that support reproduction with minor variation, so too does much of our thinking about open data derive from the presupposed fact that many of the freedoms we associate with the use of open data in legal terms arise from license conditions that the “owner” of the data awards to us.
Viewing data in this light, we might start by considering what constitutes “closed” data and how it comes to be so, before identifying the means by which freedoms are granted and the data is opened up. (Sometimes it can also be easier to consider what you can’t do than what you can, especially when answers to questions such as “so what can you actually do with open data?” attract the (rather meaningless) response: “anything”. We can then contrast what you can do in terms of freedom complementary to what you can’t…)
So how can data be “closed”?
One lens I particularly like for considering constraints that are placed on actions and actors, particularly in the digital world (although we can apply the model elsewhere) I first saw described by Lawrence Lessig in Code and Other Laws of Cyberspace: What Things Regulate: A Dot’s Life.
Here’s the dot and the forces that constrain its behaviour:
So we see, for example, the force of law, social norms, the market (that is, economic forces) and architecture, that is the “digital physical” way the world is implemented. (Architecture may of course be designed in order to enforce particular laws, but it is likely that other “natural laws” will arise as a result of any particular architecture or system implementation.)
Without too much thought, we might identify some constraints around data and its use under each of these separate lenses. For example:
- Law: copyright and database right grant the creator of a dataset certain protective rights over that data; data protection laws (and other “privacy laws”) limit access to, or disclosure of, data that contains personal information, as well as restricting the use of that data for purposes disclosed at the time it was collected. The UK Data Protection Act also underwrites the right of individuals to claim additional limits on data use, for example the rights “to object to processing that is likely to cause or is causing damage or distress to prevent processing for direct marketing; to object to decisions being taken by automated means” (ICO Guide to the DPA, Principle 6 – The rights of individuals).
- Norms: social mores, behaviour and taboos limit the ways in which we might use data, even if that use is not constrained by legal, economic or technical concerns. For example, applications that invite people to “burgle my house” based on analysing social network data to discover when they are likely to be away from home and what sorts of valuable product might be on the premises are generally not welcomed. Norms of behaviour and everyday workpractice also mean that much data is not published when theere are no real reasons why it couldn’t be.
- Market: in the simplest case, charging for access to data places a constraint on who can gain access to the data even in advance of trying to make use of it. If we extend “market” to cover other financial constraints, there may be a cost associated with preparing data so that it can be openly released.
- Architecture: technical constraints can restrict what you can do with data. Digital rights management (DRM) uses encryption to render data streams unusable to all but the intended client, but more prosaically, document formats such as PDF or the “release” of data charts are flat image files makes it difficult for the end user to manipulate as data any data resources contained in those documents.
Laws can also be used to grant freedoms where freedoms are otherwise restricted. For example:
- the Freedom of Information Act (FOI) provides a mechanism for requesting copies of datasets from public bodies; in addition, the Environmental Information Regulations “provide public access to environmental information held by public authorities”.
- the laws around copyright relax certain copyright constraints for the purposes of criticism and review, reporting, research, teaching (IPO – Permitted uses of copyright works);
- in the UK, the Data Protection Act provides for “a right of access to a copy of the information comprised in their personal data” (ICO Guide to the DPA, Principle 6).
- in the UK, the Data Protection Act regulates what can be done legitimately with “personal” data. However, other pieces of legislation relax confidentiality requirements when it comes to sharing data for research purposes. For example:
- the NHS Act s. 251 Control of patient information; for example, the Secretary of State for Health may “make regulations to set aside the common law duty of confidentiality for medical purposes where it is not possible to use anonymised information and where seeking individual consent is not practicable” (discussion). Note that they are changes afoot regarding s. 251…
- The Secretary of State for Education has specific powers to share pupil data from the National Pupil database (NPD) “with named bodies and third parties who require access to the data to undertake research into the educational achievements of pupils”. The NPD “tracks a pupil’s progress through schools and colleges in the state sector, using pupil census and exam information. Individual pupil level attainment data is also included (where available) for pupils in non-maintained and independent schools” (access arrangements).
- the Enterprise and Regulatory Reform Bill currently making its way through Parliament legislates around the Supply of Customer Data (the “#midata” clauses) which is intended to open up access to customer transaction data from suppliers of energy, financial services and mobile phones “(a) to a customer, at the customer’s request; (b) to a person who is authorised by a customer to receive the data, at the customer’s request or, if the regulations so provide, at the authorised person’s request.” Although proclaimed as a way of opening up individual rights to access this data, the effect will more likely see third parties enticing individuals to authorise the release to the third party of the individual first party’s personal transaction data held by a second party (for example, #Midata Is Intended to Benefit Whom, Exactly?). (So you’ll presumably legally be able to grant Facebook access to your mobile phone records… Or Facebook will find a way of getting you to release that data to them without you realising you granted them that permission;-)
Contracts (which I guess fall somewhere between norms and laws from the dot’s perspective (I need to read that section of Lessig’s book again!) can also be used by rights holders to grant freedoms over the data they hold the rights for. For example, the Creative Commons licensing framework provides a copyright holder with a set of tools for relaxing some of the rights afforded to them by copyright when they license the work accordingly.
Note that “I am not a lawyer”, so my understanding of all this is pretty hazy;-) I also wonder how the various pieces of legislation interact, and whether there are cracks and possible inconsistencies between them? If there are pieces of legislation around the regulation and use of data that I’m missing, please post links in the comments below, and I’ll try and do a more thorough round up in a follow on post.