A Nudge Here, A Nudge There, But With Meaning..

A handful of posts caught my attention yesterday around the whole data thang…

First up, a quote on the New Aesthetic blog: “the state-of-the-art method for shaping ideas is not to coerce overtly but to seduce covertly, from a foundation of knowledge”, referencing an article on Medium: Is the Internet good or bad? Yes. The quote includes mention of an Adweek article (this one? Marketers Should Take Note of When Women Feel Least Attractive; see also a response and the original press release) that “noted that women feel less attractive on Mondays, and that this might be the best time to advertise make-up to them.”

I took this as a cautionary tale about the way in which “big data” qua theoryless statistical models based on the uncontrolled, if large, samples that make up “found” datasets, to pick up on a phrase used by Tim Harford in Big data: are we making a big mistake? [h/t @schmerg et al]) can be used to malevolent affect. (Thanks to @devonwalshe for highlighting that it’s not the data we should blame (“the data itself has no agency, so a little pointless to blame … Just sensitive to tech fear. Shifts blame from people to things.”) but the motivations and actions of the people who make use of the data.)

Which is to say – there’s ethics involved. As an extreme example, consider the possible “weaponisation” of data, for example in the context of PSYOP – “psychological operations” (are they still called that?) As the New Aesthetic quote, and the full Medium article itself, explain, the way in which data models allow messages to be shaped, targeted and tailored provides companies and politicians with a form of soft power that encourage us “to click, willingly, on a choice that has been engineered for us”. (This unpicks further – not only are we modelled so that the prompts are issued to us at an opportune time, but the choices we are provided with may also have been identified algorithmically.)

So that’s one thing…

Around about the same time, I also spotted a news announcement that Dunnhumby – an early bellwether of how to make the most of #midata consumer data – has bought “advertising technology” firm Sociomantic (press release): “dunnhumby will combine its extensive insights on the shopping preferences of 400 million consumers with Sociomantic’s intelligent digital-advertising technology and real-time data from more than 700 million online consumers to dramatically improve how advertising is planned, personalized and evaluated. For the first time, marketing content can be dynamically created specifically for an individual in real-time based on their interests and shopping preferences, and delivered across online media and mobile devices.” Good, oh…

A post on the Dunnhumby blog (It’s Time to Revolutionise Digital Advertising) provides further insight about what we might expect next:

We have decided to buy the company because the combination of Sociomantic’s technological capability and dunnhumby’s insight from 430m shoppers worldwide will create a new opportunity to make the online experience a lot better, because for the first time we will be able to make online content personalised for people, based on what they actually like, want and need. It is what we have been doing with loyalty programs and personalised offers for years – done with scale and speed in the digital world.

So what will we actually do to make that online experience better for customers? First, because we know our customers, what they see will be relevant and based on who they are, what they are interested in and what they shop for. It’s the same insight that powers Clubcard vouchers in the UK which are tailored to what customers shop for both online and in-store. Second, because we understand what customers actually buy online or in-store, we can tell advertisers how advertising needs to change and how they can help customers with information they value. Of course there is a clear benefit to advertisers, because they can spend their budgets only where they are talking to the right audience in the right way with the right content at the right time, measuring what works, what doesn’t and taking out a lot of guesswork. The real benefit though must be to customers whose online experience will get richer, simpler and more enjoyable. The free internet content we enjoy today is paid for by advertising, we just want to make it advertising offers and content you will enjoy too.

This needs probing further – are Dunnhumby proposing merging data about actual shopping habits in physical and online store with user cookies so that ads can be served based on actual consumption? (See for example Centralising User Tracking on the Web. How far has this got, I wonder? Seems like it may be here on mobile devices? Google’s New ‘Advertising ID’ Is Now Live And Tracking Android Phones — This Is What It Looks Like. Here’s the Android developer docs on Advertising ID. See also GigaOm on As advertisers phase out cookies, what’s the alternative?, eg in context of “known identifiers” (like email addresses and usernames) and “stable identifiers” (persistent device or browser level identifiers).)

That’s the second thing…

For some reason, it’s all starting to make me think of supersaturated solutions

PS FWIW, the OU/BBC co-produced Bang Goes the Theory (BBC1) had a “Big Data” episode recently – depending on when you read this, you may still be able to watch it here: Bang Goes the Theory – Series 8 – Episode 3: Big Data

More Digital Traces…

Via @wilm, I notice that it’s time again for someone (this time at the Wall Street Journal) to have written about the scariness that is your Google personal web history (the sort of thing you probably have to opt out of if you sign up for a new Google account, if other recent opt-in by defaults are to go by…)

It may not sound like much, but if you do have a Google account, and your web history collection is not disabled, you may find your emotional response to seeing months of years of your web/search history archived in one place surprising… Your Google web history.

Not mentioned in the WSJ article was some of the games that the Chrome browser gets up. @tim_hunt tipped me off to a nice (if technically detailed, in places) review by Ilya Grigorik of some the design features of the Chrome browser, and some of the tools built in to it: High Performance Networking in Chrome. I’ve got various pre-fetching tools switched off in my version of Chrome (tools that allow Chrome to pre-emptively look up web addresses and even download pages pre-emptively*) so those tools didn’t work for me… but looking at chrome://predictors/ was interesting to see what keystrokes I type are good predictors of web pages I visit…

chrome predictors

* By the by, I started to wonder whether webstats get messed up to any significant effect by Chrome pre-emptively prefetching pages that folk never actually look at…?

In further relation to the tracking of traffic we generate from our browsing habits, as we access more and more web/internet services through satellite TV boxes, smart TVs, and catchup TV boxes such as Roku or NowTV, have you ever wondered about how that activity is tracked? LG Smart TVs logging USB filenames and viewing info to LG servers describes not only how LG TVs appear to log the things you do view, but also the personal media you might view, and in principle can phone that information home (because the home for your data is a database run by whatever service you happen to be using – your data is midata is their data).

there is an option in the system settings called “Collection of watching info:” which is set ON by default. This setting requires the user to scroll down to see it and, unlike most other settings, contains no “balloon help” to describe what it does.

At this point, I decided to do some traffic analysis to see what was being sent. It turns out that viewing information appears to be being sent regardless of whether this option is set to On or Off.

you can clearly see that a unique device ID is transmitted, along with the Channel name … and a unique device ID.

This information appears to be sent back unencrypted and in the clear to LG every time you change channel, even if you have gone to the trouble of changing the setting above to switch collection of viewing information off.

It was at this point, I made an even more disturbing find within the packet data dumps. I noticed filenames were being posted to LG’s servers and that these filenames were ones stored on my external USB hard drive.

Hmmm… maybe it’s time I switched out my BT homehub for a proper hardware firewalled router with a good set of logging tools…?

PS FWIW, I can’t really get my head round how evil on the one hand, or damp squib on the other, the whole midata thing is turning out to be in the short term, and what sorts of involvement – and data – the partners have with the project. I did notice that a midata innovation lab report has just become available, though to you and me it’ll cost 1500 squidlly diddlies so I haven’t read it: The midata Innovation Opportunity. Note to self: has anyone got any good stories to say about TSB supporting innovation in micro-businesses…?

PPS And finally, something else from the Ilya Grigorik article:

The HTTP Archive project tracks how the web is built, and it can help us answer this question. Instead of crawling the web for the content, it periodically crawls the most popular sites to record and aggregate analytics on the number of used resources, content types, headers, and other metadata for each individual destination. The stats, as of January 2013, may surprise you. An average page, amongst the top 300,000 destinations on the web is:

- 1280 KB in size
- composed of 88 resources
- connects to 15+ distinct hosts

Let that sink in. Over 1 MB in size on average, composed of 88 resources such as images, JavaScript, and CSS, and delivered from 15 different own and third-party hosts. Further, each of these numbers has been steadily increasing over the past few years, and there are no signs of stopping. We are increasingly building larger and more ambitious web applications.

Is it any wonder that pages take so long to load on a mobile phone off the 3G netwrok, and that you can soon eat up your monthly bandwidth allowance!

Tin Foil Hats or Baseball Caps? Why Your Face is a Cookie and Your Data is midata

Over the weekend, chatting with friends, I heard myself going off on what I imagine sounded like a paranoid fantasy fuelled privacy rant. But it stems from my own confusion about what it means for so much data to be out there about us, and whether the paranoid fantasy bit actually relates to:

- the extent to which folk would want to collect and process that data, and use it “against” me, as an individual;
- the extent to which data from disparate sources can be reconciled;
- the idea that all manner and variety of data about me is being collected anyway;
- the fact that all manner and variety of data about me could in principle be being collected.

So here are some more bits and pieces…

We all know that Tesco pioneered the use of loyalty cards for personalised customer marketing and store optimisation (eg The Tesco Data Business (Notes on “Scoring Points”)) and maybe that they track you round a store (or do they track your face?!), and now it seems that as well as supplementing their petrol stations with ANPR (Automatic Number Plate Recognition) systems (I assume their garages are equipped with them? Some of their car parks are…) they’ll be using face scanning Amscreen Point of Sale advertising screens to profile folk based on gender and age. (It’s possibly just easier to recognise someone by their face or phone and then lookup their gender and age; and economic circumstances; and etc etc?!)

Adrian Short has some further comments here… When does face scanning tip over into the full-time surveillance society?

Face recgonition as commodity
See the ad? Face recgonition as commodity service?

I don’t really know how concerning this is – folk I meet regularly recognise me, so what does it matter if machines universally and ubiquitously recognise me? Should I be concerned that my face is essentially third party cookie, at least for unique ID purposes, that can be identified by anyone whose servers hook into a particular video or image feed?

And presumably things like my payment cards, and car number plate, and postcode, and etc etc can effectively be treated as third party cookies too in a similar respect of unique or group identification? (What should we call such things? I, me, my cookies…? icookies?! Or to tie into the notion of #midata, micookies?)

And should I be fearful that such companies buy and sell data about me via ad exchanges and cookie matching services?

Surely companies using #midata can help me make better decisions, nudging me in to taking courses of action that are good for me?

Food hygiene rating

So should we care? Should we care what data’s out there in the wild about me? Should we care that a shedload of #midata may actually be publicly available data, not least through cookie tracking, and micookie traces?

Should we care that services like Wonga.com may be making use of that data to make decisions about me, as described in Leaky data: How Wonga makes lending decisions (read it, it’s an interesting read…).

And should we care that the decisions made on the basis of such publicly available but who knows what data are probably so algorithmically complex that there is no transparency or rationale in how or why such decisions are actually made the way they are? (See for example Transparent Predictions, Tal Zarsky, University of Illinois Law Review, Vol. 2013, No. 4, 2013.)

Not paranoid, just confused, and not really able to think any of this through…

POS an example of where Facebook’s at wrt automated face recognition around the end of 2013: DeepFace: Closing the Gap to Human-Level Performance in Face Verification

The Loss of Obscurity – A Round-Up of Recent Reports Relating to Privacy and Personal Consumer Data

A jumbled collection of recent clips and snippets, that feel to me as if they’re pieces of the same jigsaw…

  • An article in The Atlantic on Obscurity: A Better Way to Think About Your Data Than ‘Privacy’:

    …”privacy” is an over-extended concept. It grabs our attention easily, but is hard to pin down. Sometimes, people talk about privacy when they are worried about confidentiality. Other times they evoke privacy to discuss issues associated with corporate access to personal information. Fortunately, obscurity has a narrower purview.

    Obscurity is the idea that when information is hard to obtain or understand, it is, to some degree, safe. Safety, here, doesn’t mean inaccessible. Competent and determined data hunters armed with the right tools can always find a way to get it. Less committed folks, however, experience great effort as a deterrent.

    This can be a useful distinction to make, I think, when considering the uses to which “personal data” is, or can be, put. Obscure things are hard to find. Just because a dataset is “anonymised” doesn’t mean that a determined data hunter (DDH) won’t be able to deanonymise elements of it.

    Related to obscurity is obfuscation – coding things in such a way that you accept the information contained in the dataset is open, but you do your damnedest to deliberately make it difficult for people to extract certain meaningful elements from it. (For example, How can I obfuscate JavaScript?.) Looking at the way many open public datasets are published, you might think an obfuscation step had been built in to the publication process;-)

    For a linked take in defense of privacy (from which we can maybe identify useful attributes associated with the notion of privacy), see Privacy is not the enemy – rebooted… Paul Bernal.

  • Overt camera surveillance (cameras in carparks, shops and town centres, for example, or ANPR (Automated Number Plate Recognition) cameras in petrol station forecourts and again, in car parks) is presumably deployed to dissuade people from performing particular acts by making it known to them that if they engage in those acts they will be held accountable for them. If we pick this apart a little, CCTV surveillance can operate in two modes: 1) identifying particular actions and then (maybe) taking steps to prevent their furtherance; 2) identifying people captured in the video. Whilst the aim of (2) may be to identify people involved in (1), (2) may also be used to identify and track people in general, irrespective of the actions they are performing. A currently open Home Office Surveillance camera code of practice consultation gives some background to what is deemed to be acceptable use of, and controls on, the use of overt camera surveillance, although it does not seem to explore any possible “evil consequences” of such technology. I’m not sure whether it covers the use of drone-based surveillance either?!

    A wider review of surveillance systems can be found in an EU Seventh Framework Programme report – IRISS (Increasing Resilience in Surveillance Societies) Deliverable D1.1: Surveillance, fighting crime and violence.

  • Another key ingredient in the management of privacy and obscurity is the notion of identity and identities. UKGov has been considering “identity” in two different ways recently:
    • The BIS Foresight project on Future Identities/The Future of Identity reviews different notions of identity (where identity is “the sum of those characteristics which determine who a person is”) and the different identities we may express:

      This Foresight Report provides an evidence base for decision makers in government to understand better how people’s identities in the UK might change over the next 10 years. The impact of new technologies and increasing uptake of online social media, the effects of globalisation, environmental disruption, the consequences of the economic downturn, and growing social and cultural diversity are all important drivers of change for identities. However, there is a gap in understanding how identities might change in the future, and how policy makers might respond to such change.

    • When working with services online, we’re all familiar with the notion of have different login identities with different services. When working with government services, there may be a requirement to ensure that a given user login identity actually relates to a particular person. The DWP Identity Assurance Scheme seems to be working with commercial providers (Post Office, Cassidian, Digidentity, Experian, Ingeus, Mydex, Verizon, PayPal) to establish an “identity registration service [that] will enable benefit claimants to choose who will validate their identity by automatically checking their authenticity with the provider before processing online benefit claims”. Whatever that is supposed to mean. Does it mean when I create a DWP login I can use my PayPal credentials to prove to DWP who I am? Or does it mean I’ll be able to log in to DWP services using my PayPal credentials? I couldn’t find anything related in a quick skim of the DWP Digital Strategy on this? Are there any good references out there? UPDATE – ah, this ComputerWeekly report suggests the identity providers will do verification and manage logins – not sure if those logins will be unique to accessing DWP/gov.uk services, though, or whether they would also access eg my PayPal account?)

      See also the Open Identity Exchange, a scheme for building trusted relationships between online identity providers on a global scale…

  • A recent report from the Administrative Data TaskforceImproving Access for Research and Policy – provides a series of recommendations for establishing a research network for analysing and linking administrative datasets. Among other things, the report suggests the following model for “de-identifying” linked datasets:

    ADT - de-identified record linkage

    Here’s a sample of some of the other sorts of things the ADT recommended:

    • R1.1 The ADRCs will be responsible for commissioning and undertaking linkage of data from different government departments and making the linked data available for analysis, thereby creating new resources for a growing research agenda. Analyses of within sector data (e.g. linking medical records between primary and secondary care) and linking of data between departments for operational purposes may continue to be conducted by the relevant government departments and agencies.
    • R1.3 Personal identifiers (names, addresses, precise date of birth, national insurance numbers, etc.) attached to administrative data records will not be available to, or held in, the ADRCs; hence, both ADRC staff and researchers accessing data through ADRCs will not have sight of such personal identifying information. Linkage will be achieved through the use of third parties who have the expertise to provide secure data linkage services for matching personal records from existing data systems.
    • R1.6 Access to data held in the ADRCs by accredited researchers will be possible using three approaches. For all of these, no individual-level records will be released from the ADRCs. First, researchers can visit the ADRC secure data access facility, where their analyses of the relevant data sub-set will be overseen by the ADRC support team. Second, researchers can submit statistical syntax to the ADRC support team who will run the analysis on the dataset on behalf of the researcher (results would be thoroughly checked before return). Third, remote secure data access facilities may be established which allow virtual access to datasets held in the ADRCs. With the latter approach, no data would be transferred to these remote safe settings, which would use state-of-the-art technologies and apply rigorous international standards, equivalent to those used in the ADRCs themselves, to provide a secure environment for researchers to undertake their analyses.
    • R1.11 … However, the Taskforce recognises that there could well be potential benefits that derive from private sector data and related research interests. The Governing Board will, at an early stage, investigate guidelines for access and linkage by private sector interests, …
  • I haven’t had a chance to read this yet, but the World Economic Forum (WEF) have just published a report on Rethinking Personal Data.

    In the UK, the #midata route to encouraging folk to hand over access to their personal transaction data associated with company to other data processing and aggregation services continues apace with a set of clauses added to the Enterprise & Regulatory Reform Bill – Midata.

    In the US, related notion of Smart Disclosure is being pursued – “an innovative new tool designed to help consumers make better informed decisions and benefit from new products and services powered by data. It refers to expanding access to data in machine-readable formats so that innovators can create interactive services and tools that allow consumers to make important choices in sectors such as health care, education, finance, energy, transportation, and telecommunications.” Because of course “Giving consumers access to their own data—with comprehensive privacy and security safeguards—can empower consumers to make better choices.” Which is to say – if you give access to your data to a third party, they can use that, in combination with other data, to recommend services to you.

So – that’s a quick round-up of recent reports I’m aware of. Have I missed any?

See also:
- Whither Transparency? This Week in Open Data
- OpenData Reports Round Up (Links…)
- So What, #midata? And #yourData, #ourData…

#Midata Is Intended to Benefit Whom, Exactly?

A CTRL-Shift blog post entitled MIDATA Legislation Begins mentions, but doesn’t link to, “an amendment to the Enterprise and Regulator Reform Bill in the House of Lords”, presumably referring to paragraphs 58C*, 58D* and 58E* proposed by Viscount Younger of Leckie in the Seventh Marshalled List of Amendments:


Insert the following new Clause—

“Supply of customer data

(1) The Secretary of State may by regulations require a regulated person to provide customer data—

(a) to a customer, at the customer’s request;

(b) to a person who is authorised by a customer to receive the data, at the customer’s request or, if the regulations so provide, at the authorised person’s request.

(2) “Regulated person” means—

(a) a person who, in the course of a business, supplies gas or electricity to any premises;

(b) a person who, in the course of a business, provides a mobile phone service;

(c) a person who, in the course of a business, provides financial services consisting of the provision of current account or credit card facilities;

(d) any other person who, in the course of a business, supplies or provides goods or services of a description specified in the regulations.

(3) “Customer data” means information which—

(a) is held in electronic form by or on behalf of the regulated person, and

(b) relates to transactions between the regulated person and the customer.

(4) Regulations under subsection (1) may make provision as to the form in which customer data is to be provided and when it is to be provided (and any such provision may differ depending on the form in which a request for the data is made).

(5) Regulations under subsection (1)—

(a) may authorise the making of charges by a regulated person for complying with requests for customer data, and

(b) if they do so, must provide that the amount of any such charge—

(i) is to be determined by the regulated person, but

(ii) may not exceed the cost to that person of complying with the request.

(6) Regulations under subsection (1)(b) may provide that the requirement applies only if the authorised person satisfies any conditions specified in the regulations.

(7) In deciding whether to specify a description of goods or services for the purposes of subsection (2)(d), the Secretary of State must (among other things) have regard to the following—

(a) the typical duration of the period during which transactions between suppliers or providers of the goods or services and their customers take place;

(b) the typical volume and frequency of the transactions;

(c) the typical significance for customers of the costs incurred by them through the transactions;

(d) the effect that specifying the goods or services might have on the ability of customers to make an informed choice about which supplier or provider of the goods or services, or which particular goods or services, to use;

(e) the effect that specifying the goods or services might have on competition between suppliers or providers of the goods or services.

(8) The power to make regulations under this section may be exercised—

(a) so as to make provision generally, only in relation to particular descriptions of regulated persons, customers or customer data or only in relation to England, Wales, Scotland or Northern Ireland;

(b) so as to make different provision for different descriptions of regulated persons, customers or customer data;

(c) so as to make different provision in relation to England, Wales, Scotland and Northern Ireland;

(d) so as to provide for exceptions or exemptions from any requirement imposed by the regulations, including doing so by reference to the costs to the regulated person of complying with the requirement (whether generally or in particular cases).

(9) For the purposes of this section, a person (“C”) is a customer of another person (“R”) if—

(a) C has at any time, including a time before the commencement of this section, purchased (whether for the use of C or another person) goods or services supplied or provided by R or received such goods or services free of charge, and

(b) the purchase or receipt occurred—

(i) otherwise than in the course of a business, or

(ii) in the course of a business of a description specified in the regulations.

(10) In this section, “mobile phone service” means an electronic communications service which is provided wholly or mainly so as to be available to members of the public for the purpose of communicating with others, or accessing data, by mobile phone.”


Insert the following new Clause—

“Supply of customer data: enforcement

(1) Regulations may make provision for the enforcement of regulations under section (Supply of customer data) (“customer data regulations”) by the Information Commissioner or any other person specified in the regulations (and, in this section, “enforcer” means a person on whom functions of enforcement are conferred by the regulations).

(2) The provision that may be made under subsection (1) includes provision—

(a) for applications for orders requiring compliance with the customer data regulations to be made by an enforcer to a court or tribunal;

(b) for notices requiring compliance with the customer data regulations to be issued by an enforcer and for the enforcement of such notices (including provision for their enforcement as if they were orders of a court or tribunal).

(3) The provision that may be made under subsection (1) also includes provision—

(a) as to the powers of an enforcer for the purposes of investigating whether there has been, or is likely to be, a breach of the customer data regulations or of orders or notices of a kind mentioned in subsection (2)(a) or (b) (which may include powers to require the provision of information and powers of entry, search, inspection and seizure);

(b) for the enforcement of requirements imposed by an enforcer in the exercise of such powers (which may include provision comparable to any provision that is, or could be, included in the regulations for the purposes of enforcing the customer data regulations).

(4) Regulations under subsection (1) may—

(a) require an enforcer (if not the Information Commissioner) to inform the Information Commissioner if the enforcer intends to exercise functions under the regulations in a particular case;

(b) provide for functions under the regulations to be exercisable by more than one enforcer (whether concurrently or jointly);

(c) where such functions are exercisable concurrently by more than one enforcer—

(i) designate one of the enforcers as the lead enforcer;

(ii) require the other enforcers to consult the lead enforcer before exercising the functions in a particular case;

(iii) authorise the lead enforcer to give directions as to which of the enforcers is to exercise the functions in a particular case.

(5) Regulations may make provision for applications for orders requiring compliance with the customer data regulations to be made to a court or tribunal by a customer who has made a request under those regulations or in respect of whom such a request has been made.

(6) Subsection (8)(a) to (c) of section (Supply of customer data) applies for the purposes of this section as it applies for the purposes of that section.

(7) The Secretary of State may make payments out of money provided by Parliament to an enforcer.

(8) In this section, “customer” and “regulated person” have the same meaning as in section (Supply of customer data).”


Insert the following new Clause—

“Supply of customer data: supplemental

(1) The power to make regulations under section (Supply of customer data) or (Supply of customer data: enforcement) includes—

(a) power to make incidental, supplementary, consequential, transitional or saving provision;

(b) power to provide for a person to exercise a discretion in a matter.

(2) Regulations under either of those sections must be made by statutory instrument.

(3) A statutory instrument containing regulations which consist of or include provision made by virtue of section (Supply of customer data)(2)(d) may not be made unless a draft of the instrument has been laid before, and approved by a resolution of, each House of Parliament.

(4) A statutory instrument containing any other regulations under section (Supply of customer data) or section (Supply of customer data: enforcement) is subject to annulment in pursuance of a resolution of either House of Parliament.”

Note that 58C/1/b states that data could be released “to a person who is authorised by a customer to receive the data, at the customer’s request or, if the regulations so provide, at the authorised person’s request.” So if I say to my electricity company that they can share the data with you (“a person who is authorised by a customer to receive the data”), the company can share the data with you if I ask them to or if you ask them. Which is presumably a bit like how direct debits work (I sign something and give it to you and you then go to my bank and request access to my bank account). So the proposed legislation seems to allow for (or at least, not exclude?) the creation of data aggregators who might start to aggregate data from a variety of “regulated persons” at my authorisation.

Note that I assume other regulations, such as the Data Protection Act, preclude those data aggregators from acting as data brokers, “companies that collect personal information about consumers from a variety of public and non-public sources and resell the information to other companies” (FTC [the US Federal Trade Commission] to Study Data Broker Industry’s Collection and Use of Consumer Data).

It’s also worth mentioning that the amendment doesn’t actually seem to set about enacting any actual midata legislation: “The Secretary of State may by regulations require…” which is presumably setting up the opportunity for the Secretary of State to bring it about through a Statutory Instrument or similar?

(In passing, the tabled amendments to the Bill also includes amendments relating to proposed amendments to the Copyright, Designs and Patents Act 1988 (part 6 of the Bill, relating to licensing of orphan works, collection licensing, duration of copyright et al.) as well as the creation of a Director General of Intellectual Property Rights (28C).)

The day before, CTRL-Shift had also published a post on Building Relationships for a New Data Age:

The challenge (and opportunity) is to start building an information sharing relationship with customers where both sides use data sharing to save time, cut costs and be more efficient – and to add new value.

In a world that’s rapidly going digital, an information sharing relationship makes it normal for individuals to provide the organisations they deal with new, additional and updated data, and for organisations to also routinely provide customers with additional data or data-based services. Information sharing relationships and services are becoming a key influence on which organisations customers choose to do business with, and how valuable this business becomes.

The question is, how do we get from A to B? From today’s ‘one way’ norm where organisations collect data about customers and send messages to them, to a more equal and valuable information sharing partnership? There are three key pillars to an information sharing relationship:

- establish a trustworthy ‘default setting’ for the use of personal data
- give users/customers control
- earn VPI (volunteered personal information) via new information services.

Volunteered personal information, a phrase straight out of the Facebook playbook…

The post then discusses the importance of getting default settings right, in part to avoid a public backlash and a “loss of trust” when folk realise the terms and conditions allow the companies involved to do whatever it is they say the company can, before describing how companies can Earn VPI via information services:

Getting default settings right and giving users control only create the context needed for a healthy information sharing relationship. They don’t actually get the information flowing. To do that, organisations need to:

- elicit valuable additional information from customers
- release and provide customers with additional information and/or information based services that help them make better decisions and make it easier for them to get stuff done and achieve their goals – i.e. services that add new value.

In theory, eliciting VPI and offering added value information services are two separate things. But in reality they are likely to advance hand in hand: with individuals offering additional information (in an environment they can trust because of default settings and user control) as a way to get additional value from information-driven services.

Hmmm… elicit valuable additional information from customers; and then release and provide customers with … services that add new value (I can play the selective cut and past game too…;-) #midata is presumably being sold to consumers on the basis of the latter, particularly those services that “help them make better decisions and make it easier for them to get stuff done and achieve their goals”.

And then we read:

In theory, eliciting VPI and offering added value information services are two separate things. But in reality they are likely to advance hand in hand: with individuals offering additional information (in an environment they can trust because of default settings and user control) as a way to get additional value from information-driven services.

In theory, eliciting VPI and offering added value information services are two separate things. In the land where the flowers grow and the flopsy bunnies frolic, blissfully unaware that they are what Farmer McGregor actually sells to the butcher, presumably at a greater price than he can sell the lettuces the flopsy bunnies eat to the local greengrocer. Or something like that.

But in reality sound the drums of doom…in reality they are likely to advance hand in hand. Erm…of course… No-one wants shed loads of transactional data for personal use…with individuals offering additional information as a way to get additional value from information-driven services.

Yep… #midata is a way of getting you to give shed loads of low quality transactional data to third parties (who may or may not aggregate it worth other data you grant them access to) and then give them a shed load more data before it actually becomes useful. Because that’s how data works…but it’s not how the dream is sold…

Hmmm… I wonder, does the draft legislation say anything about the extent to which an authorised person is allowed to aggregate and mine data from regulated person(s) that relates to data collected from different customers either of the same, or different regulated persons? Because there lies another source of those “in reality” sources of potential value add…though we really should also try to imagine what sources they might be. (Is receiving targeted ads “value add” for me over random junk mail?)

On the other side of the fence, sort of, we see a Private Member’s Bill (Ten Minute Rule Bill?) from John Denham, Labour MP for Southampton, Itchen (not, apparently, the constituency in which the University of Southampton resides…) on Supermarket price transparency which seeks to require supermarkets “to release pricing data product by product and store by store [update: Supermarket Pricing Information Bill 2012-13]. This price information would not only enable the comparison of basic product prices, but also enable consumers to understand the differences in pricing between stores within the same retail chain, or variations in pricing of goods in different areas and regions.” In addition, it is claimed that the Private Member’s Bill “would also enable efficient scrutiny of special offers, multi-buys, ‘bogofs’ and other price promotions that have been the subject of recent criticism and regulatory action.”.

PS See also So What, #midata? And #yourData, #ourData…

This Week in Open and Communications Data Land…

Following the official opening of the Open Data Institute (ODI) last week, a flurry of data related announcements this week:

Things have been moving on the Communications Data front too. Communications Data got a look in as part of the 2011/2012 Security and Intelligence Committee Annual Report with a review of what’s currently possible and “why change may be necessary”. Apparently:

118. The changes in the telecommunications industry, and the methods being used by people to communicate, have resulted in the erosion of the ability of the police and Agencies to access the information they require to conduct their investigations. Historically, prior to the introduction of mobile telephones, the police and Agencies could access (via CSPs, when appropriately authorised) the communications data they required, which was carried exclusively across the fixed-line telephone network. With the move to mobile and now internet-based telephony, this access has declined: the Home Office has estimated that, at present, the police and Agencies can access only 75% of the communications data that they would wish, and it is predicted that this will significantly decline over the next few years if no action is taken. Clearly, this is of concern to the police and intelligence and security Agencies as it could significantly impact their ability to investigate the most serious of criminal offences.

N. The transition to internet-based communication, and the emergence of social networking and instant messaging, have transformed the way people communicate. The current legislative framework – which already allows the police and intelligence and security Agencies to access this material under tightly defined circumstances – does not cover these new forms of communication. [original emphasis]

Elsewhere in Parliament, the Joint Select Committee Report on the Draft Communications Data Bill was published and took a critical tone (Home Secretary should not be given carte blanche to order retention of any type of data under draft communications data bill, says joint committee. “There needs to be some substantial re-writing of the Bill before it is brought before Parliament” adds Lord Blencathra, Chair of the Joint Committee.) Friend and colleague Ray Corrigan links to some of the press reviews of the report here: Joint Committee declare CDB unworkable.

In other news, Prime Minister David Cameron’s announcement of DNA tests to revolutionise fight against cancer and help 100,000 patients was reported via a technology angle – Everybody’s DNA could be on genetic map in ‘very near future’ [Daily Telegraph] – as well as by means of more reactionary headlines: Plans for NHS database of patients’ DNA angers privacy campaigners [Guardian], Privacy fears over DNA database for up to 100,000 patients [Daily Telegraph].

If DNA is your thing, don’t forget that the Home Office already operates a National DNA Database for law enforcement purposes.

And if national databases are your thing, there always the National Pupil Database which was in the news recently with the launch of a consultation on proposed amendments to individual pupil information prescribed persons regulations which seeks to “maximise the value of this rich dataset” by widening access to this data. (Again, Ray provides some context and commentary: Mr Gove touting access to National Pupil Database.)

PS A late inclusion: DECC announcement around smart meter rollout with some potential links to #midata strategy (eg “suppliers will not be able to use energy consumption data for marketing purposes unless they have explicit consent”). A whole raft of consultations were held around smart metering and Govenerment responses are also published today, including Government Response on Data Access and Privacy Framework, the Smart Metering Privacy Impact Assessment and a report on public attitudes research around smart metering. I also spotted an earlier consultation that had passed me by around the Data and Communications Company (DCC) License Conditions; here the response, which opens with: “The communications and data transfer and management required to support smart metering is to be organised by a new central communications body – the Data and Communications Company (“the DCC”). The DCC will be a new licensed entity regulated by the Gas and Electricity Markets Authority (otherwise referred to as “the Authority”, or “Ofgem”). A single organisation will be granted a licence under each of the Electricity and Gas Acts (there will be two licences in a single document, referred to as the “DCC Licence”) to provide these services within the domestic sector throughout Great Britain”. Another one to put on the reading pile…

Putting a big brother watch hat on, the notion of “meter surveillance” brings to mind BBC article about an upcoming (will hopefully thence be persistently available on iPlayer?) radio programme on “Electric Network Frequency (ENF) analysis”, The hum that helps to fight crime. According to Wikipedia, ENF is a forensic science technique for validating audio recordings by comparing frequency changes in background mains hum in the recording with long-term high-precision historical records of mains frequency changes from a database. In turn, this reminds me of appliance signature detection (identifying what appliance is switched on or off from its electrical load curve signature), for example Leveraging smart meter data to recognize home appliances. In context of audio surveillance, how about supplementing surveillance video cameras with microphones? Public Buses Across Country [US] Quietly Adding Microphones to Record Passenger Conversations.

So What, #midata? And #yourData, #ourData…

The Twittertubes were all abuzz yesterday with news about the UKGov’s announcement on #midata (even though the press release everyone was referring to came out earlier?). It’s still not clear to me what announcement was actually made yesterday, or where? [Ah... seems the actual statement relates to the Government's response to the midata consultation, along with an impact assessment.] I also struggled to find any write-ups of the hacks’n’ideas produced at the ODI’s (@ukodi) #midata hackathon over the weekend?

(For a round-up from over the summer of reports on personal data, see Personal Data Exploitation – Recent Reports, which also quotes a sceptical view about public uptake from a Government commissioned report.)

The personal data that UKGov is encouraging companies to make available in the first instance is credit card/banking transaction data, phone billing, and energy usage data. The first two sectors typically offer itemised breakdowns anyway – maybe the #midata initiative will “request” that the information is made available in a machine readable form if it isn’t already published as such? – with the energy usage data requiring a smart meter, presumably (and which many folk who are interested will have acquired – and hacked years ago – already?!) So what’s new? Does this add to our right to data, eg as supported by subject access requests under the Data Protection Act? What I’m sceptical about is the extent to which this initiative is just a roundabout way of allowing companies to share data amongst themselves (eg Data Bartering Is Everywhere) with checkbox customer permission, of course… (Market context: Computing.co.uk – How Tesco and co are testing the limits of customer data exploitation.)

By the by, on the topic of sharing individual level data, it seems that the Department for Education are currently consulting around the wider release of pupil data – Consultation on proposed amendments to individual pupil information prescribed persons regulations: A consultation on proposals to amend regulations to enable the Department for Education to share extracts of data held in the National Pupil Database for a wider range of purposes than currently possible. The aim is to maximise the value of this rich dataset.

The National Pupil Database is a longitudinal database, which holds information on children in schools in England. The majority of datasets go back 10 years, with the earliest data going back to 1996. There are a range of data sources in the National Pupil Database providing information about children’s education at different stages (pre-school, primary, secondary and further education).
It includes detailed information about pupils, their test and exam results, prior attainment and progression at different key stages for all state schools in England. Attainment data is also held for pupils and students in nonmaintained special schools, sixth form and Further Education (FE) colleges and (where available) independent schools. The National Pupil Database includes information about the characteristics of pupils in the state sector and non-maintained special schools such as gender, ethnicity, first language, eligibility for free school meals, information about special educational needs (SEN), as well as detailed information about pupil absence and exclusions.
The data held in the National Pupil Database is collected from a range of sources including schools, local authorities and awarding organisations. This data is processed by the Department’s Data and Statistics Division and matched and stored in the National Pupil Database. The Department makes it clear to children and their parents what information is held about pupils and how it is processed, through a statement on its website. Schools also inform parents and pupils of how the data is used through privacy notices.

There’s a lot to be said for opening up this data to researchers, but I’m sure the privacy wonks will also have plenty of points to make… For example, from Privacy International, UK School Census proposals – How you can help. (Related – just in to my mailbox, EU report on “The right to be forgotten”. And more: ICO code on anonymisation, managing privacy risks and maintaining transparency.)

Sort of related, it’s maybe also worth remembering that the Department of Health, via the NHS, is also widening collection of, and opening up researcher access to, anonymised cradle-to-grave health records via Clinical Practice Research Datalink. (Launch press release; context: eg NHS patient records to revolutionise medical research in Britain.)

Taking these together, along with the idea that media channels deliver audiences to advertisers, I wonder: what is being transacted (collected, bought, staged, and sold) when government releases life event related “transactional” datasets (school records, health records) to researchers? How do the costs and benefits flow (eg in terms of improving the lot of the citizen, playing fair with taxation, etc…?)

PS I haven’t been keeping up with Linked Data in Gov initiatives lately, so this (via @ldodds, I think?) looks like it might be a handy round-up: UKGovLD (UK Government Linked Data Working Group) – opening the doors event.

PPS Via @mhawksey, something that should be read alongside the #midata announcement – Tesco vacancy – Product Manager, ‘My Data’ (commentary): “The successful candidate will define the strategy to develop and support the deployment of Group-wide capability to deliver market-leading products and games which give our Clubcard customers simple, useful, fun access to their own data to help them plan and achieve their goals.”

Key responsibilities:

- You will build and develop the personalised access to customer’s data capability plan
- Accountable for working with functional and country stakeholders across the business to develop a strategy for personalised access to customer’s data and prioritising which products, tools and capabilities to build
- Work with Tesco IT and dunnhumby and other functional stakeholders to deliver these new capabilities to plan and to budget
- Manage the delivery of Clubcard Play (games) to engage customers and create new media opportunities for brands and marketing opportunities for Tesco
- Represent the functional teams and their interests to ensure there is a constant delivery of customer and business benefits from the personalised access to customer’s data workstream
- Manage a team of managers (who work with functional stakeholders and IT) to define and deliver new products, tools and capabilities
- Work with key functional stakeholders such as marketing to manage the organisation change and impact that the personalised access to customer’s data workstream will have
- Work with Corporate and Legal Affairs to manage any legal obligations around giving customers digital access to their own data
- Drive learning through rapid testing and piloting and be involved in running trials in market where needed
- Drive requirements back into the Data and Personalisation Engine streams within the Programme
- Manage the reporting and tracking of benefits to ensure that we are measuring the impact of our activities
- Contribute as part of the Personalisation customer data leadership team
- Look to the medium-term future and think about potential innovations in the area of personalised access to customer’s data to bring into the overall programme roadmap
- Stay close to the customer through market scanning, networking and by building relationships with key internal and external thought leaders

If you spot any ads from other companies that look as if they are #midata related, please post a link to them, the job title and if possible a clip/quick summary, in the comments;-)

PPPS On my “possibly related?” to read list: Network Accountability for the Domestic Intelligence Apparatus. From the abstract, “The network is anchored by “fusion centers,” novel sites of intergovernmental collaboration that generate and share intelligence and information. Several fusion centers have generated controversy for engaging in extraordinary measures that place citizens on watch lists, invade citizens’ privacy, and chill free expression. … A new concept of accountability – network accountability – is needed to address the shortcomings of fusion centers. Network accountability has technical, legal, and institutional dimensions. Technical standards can render data exchange between agencies in the network better subject to review. Legal redress mechanisms can speed the correction of inaccurate or inappropriate information.” With public datasets, we can of course create our own “fusion centres”.

PPPPS …and on the “to play with” list, analyze the consumer expenditure survey (ce) with r (“the consumer expenditure survey (ce) is the primo data source to understand how americans spend money. participating households keep a running diary about every little purchase over the year. those diaries are then summed up into precise expenditure categories.” And the data is available:-).

PPPPPS December 2012: FTC to Study Data Broker Industry’s Collection and Use of Consumer Data “The Federal Trade Commission issued orders requiring nine data brokerage companies to provide the agency with information about how they collect and use data about consumers. The agency will use the information to study privacy practices in the data broker industry.

“Data brokers are companies that collect personal information about consumers from a variety of public and non-public sources and resell the information to other companies. In many ways, these data flows benefit consumers and the economy; for example, having this information about consumers enables companies to prevent fraud. Data brokers also provide data to enable their customers to better market their products and services.”

Could be interesting… It also links to a March 2012 report on Protecting Consumer Privacy in an Era of Rapid Change: Recommendations for Businesses and Policymakers.

Personal Data Exploitation – Recent Reports

A Tesco advert I’ve noticed airing again recently shows how data collected around a Tesco Clubcard can be used to prepopulate an online shopping basket using Tesco Direct using just(?) the Clubcard number:

(It’s not quite that simple of course. Form a quick look at the Tesco website, you first need to register an account with Tesco.com, then I’m guessing there’s some simple name/address validation around whatever Clubcard number you enter before the data is revealed to you.)

Over July and August, BIS picked up a bit of publicity around the #midata policy initiative that seeks to encourage businesses to make consumer data and data products available back to the consumers who generate it (eg From Communications Data to #midata – with a Mobile Phone Data Example). You can get an idea about how BIS are trying to woo businesses into getting on board from the midata company briefing pack (July 2012).

Whilst the talk was all upbeat, a Jigsaw Research report for BIS on Potential consumer demand for midata was more circumspect:

Whilst consumers have nagging concerns about the security and privacy of their data online, the majority of those who choose to transact online currently put these concerns to the back of their mind. This is because most people perceive the benefits of being online to outweigh the risks; it is also due to most not fully understanding the nature of the ‘threat’ as they have limited understanding of how their data is currently collected and used by third parties as well as about what value it holds or how to protect themselves against misuse. People therefore avoid dwelling on these nagging and undetermined concerns and rely on the absence for most of any serious incident in their previous experience. In many respects, they can be described as ‘sleep walking’ in the age of data.

When initially shown an expression of the midata concept, consumers were bewildered about why this is being proposed and what difference it would make. As consumers typically define personal data as personal identity information, they struggled initially to identify what benefits the release of such data (which they already own/know) would have for them.

There is unlikely to be very much initial consumer interest in the overarching principle of companies releasing personal data for use by consumers. If anything, this news is likely to be received with suspicion until the benefits of this can be observed in practice.

Adoption of #midata services would therefore be driven by companies developing data related products and services rather than meeting a need articulated by consumers.

A new (O2 sponsored?) report from Demos – The Data Dialogue (Sept 2012) – on an O2 commissioned survey “looking into the public’s attitudes towards personal information”. Apparently, “[t]he Populus survey suggests that people share an increasing amount of information about themselves – and expect to share even more in the future. However, there is a crisis of confidence: the public is uncomfortable about the way personal information and behavioural data are collected by government and commercial companies. There is a danger that this loss of confidence will lead to people sharing less information and data, which would have detrimental results for individuals, companies and the economy.”

I haven’t had a chance – yet – to read the report (it’s only just come out…) but when I do I’ll probably also read it in the context of some other related reports on personal data (including rereading the Jigsaw consumer interest report around #midata). My gut feeling(?!;-) is that there isn’t really much concern (other than the sort of concern expressed when someone asks you whether you are concerned about something in a tone that suggests you should be…), rather there is a background level of disinterest and then mild confusion if forced to consider it at all…

Anyway, here are some of the other reports in the area:

Given the notion of trust is a big part of this, maybe I need to give my colleague Ray Corrigan’s recent presentation on Trust in the Digital Economy (conference page) a close reading too, along with this First Monday article by Bibi van den Berg and Simone van der Hof: What happens to my data? A novel approach to informing users of data processing practices

Hmmm… maybe I need to block a couple of days away somewhere to just get through them and try to plot out their various lobbying positions…?

PS a couple of other things caught my eye in the last day or too… via @jonhew and @martinstabe, an ICO ruling about whether database queries create new information for the purposes of FOI (answer: it depends how hard the query is to write..) and an Out-Law note on the legitimacy (or otherwise) of outsourcing the processing of sensitive personal data.

PPS Seems like the OU is a founder member of the new Centre for Research into Information, Surveillance and Privacy (CRISP). It launches at the OU in Milton Keynes on September 20th, with a panel session on “The Future of Information, Surveillance and Privacy Research” (press welcome, I believe…).

PPPS loosely related, from late last year, a news report on how Visa/Mastercard were planning to start selling on anonymised data to marketers… I’m not sure if/how this has progressed? Also loosely related: arXiv: A Theory of Pricing Private Data; OU study on Consumer Activity Data: Usages and Challenges.

Whither Transparency? This Week in Open Data

I’m starting to feel as if I need to do myself a weekly round-up, or newsletter, on open data, if only to keep track of what’s happening and how it’s being represented. Today, for example, the Commons Public Accounts Committee published a report on Implementing the Transparency Agenda.

From a data wrangling point of view, it was interesting that the committee picked up on the following point in its Conclusions and recommendations (thanks for the direct link, Hadley:-), whilst also missing the point…:

2. The presentation of much government data is poor. The Cabinet Office recognises problems with the functionality and usability of its data.gov.uk portal. Government efforts to help users access data, as in crime maps and the schools performance website, have yielded better rates of access. But simply dumping data without appropriate interpretation can be of limited use and frustrating. Four out of five people who visit the Government website leave it immediately without accessing links to data. So there is a clear benefit to the public when government data is analysed and interpreted by third parties – whether that be, for example, by think-tanks, journalists, or those developing online products and smartphone applications. Indeed, the success of the transparency agenda depends on such broader use of public data. The Cabinet Office should ensure that:
– the publication of data is accessible and easily understood by all; and
– where government wants to encourage user choice, there are clear criteria to determine whether government itself should repackage information to promote public use, or whether this should be done by third parties.

A great example of how data not quite being published consistently can cause all sorts of grief when trying to aggregate it came to my attention yesterday via @lauriej:

It leads to a game where you can help make sense of not quite right column names used to describe open spending data… (I have to admit, I found the instructions a little hard to follow – a screenshot walked through example would have helped? It is, after all, largely a visual pattern matching exercise…)

From a spend mapping perspective, this is also relevant:

6. We are concerned that ‘commercial confidentiality’ may be used as an inappropriate reason for non-disclosure of data. If transparency is to be meaningful and comprehensive, private organisations providing public services under contract must make available all relevant public information. The Cabinet Office should set out policies and guidance for public bodies to build full information requirements into their contractual agreements, in a consistent way. Transparency on contract pricing which is often hidden behind commercial confidentiality clauses would help to drive down costs to the taxpayer.

And from a knowing “what the hell is going on?” perspective, there was also this:

7. Departments do not make it easy for users to understand the full range of information available to them. Public bodies have not generally provided full inventories of all of the information they hold, and which may be available for disclosure. The Cabinet Office should develop guidance for departments on information inventories, covering, for example, classes of information, formats, accuracy and availability; and it should mandate publication of the inventories, in an easily accessible way.

The publication of government department open data strategies may go some way to improving this. I’ve also been of a mind that more accessible ways of releasing data burden reporting requirements could help clarify what “working data” is available, in what form, and the ways in which it is routinely being generated and passed between bodies. Sorting out better pathways between FOI releases of data and the then regular release of such data as opendata is also something I keep wittering on about (eg FOI Signals on Useful Open Data? and The FOI Route to Real (Fake) Open Data via WhatDoTheyKnow).

From within the report, I also found a reiteration of this point notable:

This Committee has previously argued that it is vital that we and the public can access data from private companies who contract to provide public services. We must be able to follow the taxpayers’ pound wherever it is spent. The way contracts are presently written does not enable us to override rules about commercial confidentiality. Data on public contracts delivered by private contractors must be available for scrutiny by Parliament and the public. Examples we have previously highlighted include the lack of transparency of financial information relating to the Private Finance Initiative and welfare to work contractors.

…not least because data releases from companies is also being addressed on another front, midata, most notably via the recently announced BIS Midata 2012 review and consultation [consultation doc PDF]. For example, the consultation document suggests:

1.10 The Government is not seeking to require the release of data electronically at this stage, and instead is proposing to take a power to do so. The Secretary of State would then have to make an order to give effect to the power. An order making power, if utilised, would compel suppliers of services and goods to provide to their customers, upon request, historic transaction/ consumption data in a machine readable format. The requirement would only apply to businesses that already hold this information electronically about individual consumers.
1.11. Data would only have to be released electronically at the request of the consumer and would be restricted to an individual’s consumption and transaction data, since in our view this can be used to better understand consumers’ behaviour. It would not cover any proprietary analysis of the data, which has been done for its own purposes by the business receiving the request.

(More powers to the Minister then…?!) I wonder how this requirement would extend rights available under the Data Protection Act (and why couldn’t that act be extended? For example, Data Protection Principle 6 includes “a right of access to a copy of the information comprised in their personal data” – couldn’t that be extended to include transaction data, suitably defined? Though I note 1.20. There are a number of different enforcement bodies that might be involved in enforcing midata. Data protection is enforced by the Information Commissioner’s Office (ICO), whilst the Office of Fair Trading (OFT), Trading Standards and sector regulators currently enforce consumer protection law. and Question 17: Which body/bodies is/are best placed to perform the enforcement role for this right?) There are so many bits of law around relating to data that I don’t understand at all that I think I need to do myself an uncourse on them… (I also need to map out the various panels, committees and groups that have an open data interest… The latest, of course, is the Open Data User Group (ODUG), the minutes of whose first meeting were released some time ago now, although not in a directly web friendly format…)

The consultation goes on:

1.18. For midata to work well the data needs be made available to the consumer in electronic format as quickly as possible following a request (maybe immediately) and as inexpensively as possible. This will minimise friction and ensure that consumers are able to access meaningful data at the point it is most useful to them. This requirement will only cover data that is already held electronically at the time of the request so we expect that the time needed to respond to a consumer’s request will be short – in many cases instant

Does the Data Protection Act require the release of data in an electronic format, and ideally a structured electronic format (i.e. as something resembling a dataset? The recent Protection of Freedoms Act amended the FOI Act with language relating to the definition and release of datasets, so I wonder if this approach might extend elsewhere?

Coming at the transparency thing from another direction, I also note with interest (via the BBC) that MPs say all lobbyists should be on new register:

All lobbyists, including charities, think tanks and unions, should be subject to new lobbying regulation, a group of MPs have said. They criticised government plans to bring in a statutory register for third-party lobbyists, such as PR firms, only. They said the plan would “do nothing to improve transparency”. Instead, the MPs said, regulation should be brought in to cover all those who lobby professionally.

This is surely a blocking move? If we can’t have a complete register, we shouldn’t have any register. So best not to have one at all for a year or two.. or three… or four… Haven’t they heard of bootstrapping and minimum viability releases?! Or maybe I got the wrong idea from the lead I took from the start of the news report? I guess I need to read what the MPs actually said in the Political and Constitutional Reform – Second Report: Introducing a statutory register of lobbyists.

PS For a round-up of other recent reports on open data, see OpenData Reports Round Up (Links…).

PPS This is also new to me: new UK Data Service “starting on 1 October 2012, [to] integrate the Economic and Social Data Service (ESDS), the Census Programme, the Secure Data Service and other elements of the data service infrastructure currently provided by the ESRC, including the UK Data Archive.”

We Can Haz Our Personal Data Back from Corporates? #midata

Yesterday, UK gov folk announced what I imagine someone, somewhere, has termed “a breakthrough in consumer empowerment”, a voluntary scheme for corporates to opt in to that means they may let us have access to some of the data they’ve collected about us.

According to BBC technology reporter Rory Cellan-Jones (Midata: Will the public share government’s enthusiasm?), here’s what we can expect:

[From] Callcredit which holds credit files on every adult in the UK.

It’s now promising that every consumer will be able to look at their file for free for life, in a radical change to its business model. …

Scottish Power’s midata plans involve making its customers’ annual energy consumption data more easily accessible to make the process of switching suppliers easier.

And finally, there was the Royal Bank of Scotland which is promising to give its customers “a complete walkthrough” of all their annual transactions. So, for instance, you will be able to find out how much you spent at Tesco last year*.

*Only you won’t be able to get that information from Tesco, because they haven’t signed up?

A lot of this information is likely to already be available to folk who are interested in the quantified self. For example, you can download your statements from your bank or credit card company, as data, or use services (in the US at least?) such as Mint to aggregate and report on your personal finance data; you can use devices such as Current Cost to track your energy usage, or apps on your phone to break down how you’ve used it, and so on…

But maybe if neatly packaged and re-presented data (as well as data as data) were more available, there would be wider interest in it? Maybe…

I also noticed that Google is a signatory to the #midata initiative. So what might we expect from them? Here are three things that I think might already hit #midata buttons, so maybe this will give us a clue as to the sort of thing we can expect to see when (if…) companies start rolling our #midata services next year:

PS By the by, if you search around things like “mi data” you tend to turn up jobs around the areas of market intelligence and management information systems… Just noticing…;-)

PPS I also noticed this in Rory’s article: “Meanwhile the government’s drive to free up public data has hit a few roadbumps. … The consumer affairs minister Edward Davey said there was a balance to be struck when it came to public data: “It’s got to be sustainable. If we gave away large datasets that cost a lot of money to collect, the data would degenerate over time.” So: the plan is that companies make data they were keeping to themselves “open” to the people who generated it, presumably for free rather than for a fee, but we need to hold off on opening up data collected at public expense that could be used to drive innovation, efficiencies (or so govt were claiming a year or two ago) and wider awareness in the public sector because it’s unsustainable?

PPPS Something I’d like to see in my data returns from signatories is a list of folk and partner organisations who they’ve sold or otherwise exchanged my personal data with, along with a list of what data was included in that transaction…

