Archive for the ‘privacy’ Category
A BIS Press Release (Next steps making midata a reality) seems to have resulted in folk tweeting today about the #midata consultation that was announced last month. If you haven’t been keeping up, #midata is the policy initiative around getting companies to make “[consumer data] that may be actionable and useful in making a decision or in the course of a specific activity” (whatever that means) available to users in a machine readable form. To try to help clarify matters, several vignettes are described in this July 2012 report – Example applications of the midata programme – which plays the role of a ‘draft for discussion’ at the September midata Strategy Board [link?]. Here’s a quick summary of some of them:
- form filling: a personal datastore will help you pre-populate forms and provide certified evidence of things like: proof of her citizenship, qualified to drive, passed certain exams and achieved certain qualifications, passed a CRB check, and so on. (Note: I’ve previously tried to argue the case for the OU starting to develop a service (OU Qualification Verification Service) around delivering verified tokens relating to the award of OU degrees, and degrees awarded by the polytechnics, as was (courtesy of the OU’s CNAA Aftercare Service), but after an initial flurry of interest, it was passed on. midata could bring it back maybe?
- home moving admin: change your details in a personal “mydata” data store, and let everyone pick up the changes from there. Just think what fun you could have with an attack on this;-)
- contracts and warranties dashboard: did my crApple computer die the week before or after the guarantee ran out?
- keeping track of the housekeeping: bank and financial statement data management and reporting tools. I thought there already was software for doing this? do we use it though? I’d rather my bank improved the tools it provided me with?
- keeping up with the Jones’s: how does my house’s energy consumption compare with that of my neighbours?
- which phone? Pick a tariff automatically based on your actual phone usage. From going through this recently, the problem is not with knowing how I use my phone (easy enough to find out), it’s with navigating the mobile phone sites trying to understand their offers. (And why can’t Vodafone send me an SMS to say I’m 10 minutes away from using up this month’s minutes, rather than letting me go over? The midata answer might be an agent that looks at my usage info and tells me when I’m getting close to my limit, which requires me having access to my contract details in a machine readable form, I guess?
And here’s a BIS blog post summarising them: A midata future: 10 ways it could shape your choices.
(The #midata policy seems based on a belief that users want better access to data so they can do things with it. I’m not convinced – why should I have to export my bank data to another service (increasing the number of services I must trust) rather than my bank providing me with useful tools directly? I guess one way this might play out is that any data that does dribble out may get built around by developers who then sell the tools back to the data providers so they can offer them directly? In this context, I guess I should read the BIS commissioned Jigsaw Research report: Potential consumer demand for midata.)
Today has also seen a minor flurry of chat around the call for evidence on the Communications Data Bill, presumably because the closing date for responses is tomorrow (draft Communications Data Bill). (Related reading: latest Annual Report of the Interception of Communications Commissioner.) Again, if you haven’t been keeping up, the draft Communications Data Bill describes communications data in the following terms:
- Communications data is information about a communication; it can include the details of the time, duration, originator and recipient of a communication; but not the content of the communication itself
- Communications data falls into three categories: subscriber data; use data; and traffic data.
The categories are further defined in an annex:
- Subscriber Data – Subscriber data is information held or obtained by a provider in relation to persons to whom the service is provided by that provider. Those persons will include people who are subscribers to a communications service without necessarily using that service and persons who use a communications service without necessarily subscribing to it. Examples of subscriber information include:
– ‘Subscriber checks’ (also known as ‘reverse look ups’) such as “who is the subscriber of phone number 012 345 6789?”, “who is the account holder of e-mail account firstname.lastname@example.org?” or “who is entitled to post to web space http://www.xyz.anyisp.co.uk?”;
– Subscribers’ or account holders’ account information, including names and addresses for installation, and billing including payment method(s), details of payments;
– information about the connection, disconnection and reconnection of services which the subscriber or account holder is allocated or has subscribed to (or may have subscribed to) including conference calling, call messaging, call waiting and call barring telecommunications services;
– information about the provision to a subscriber or account holder of forwarding/redirection services;
– information about apparatus used by, or made available to, the subscriber or account holder, including the manufacturer, model, serial numbers and apparatus codes.
– information provided by a subscriber or account holder to a provider, such as demographic information or sign-up data (to the extent that information, such as a password, giving access to the content of any stored communications is not disclosed).
- Use data – Use data is information about the use made by any person of a postal or telecommunications service. Examples of use data may include:
– itemised telephone call records (numbers called);
– itemised records of connections to internet services;
– itemised timing and duration of service usage (calls and/or connections);
– information about amounts of data downloaded and/or uploaded;
– information about the use made of services which the user is allocated or has subscribed to (or may have subscribed to) including conference calling, call messaging, call waiting and call barring telecommunications services;
– information about the use of forwarding/redirection services;
– information about selection of preferential numbers or discount calls;
- Traffic Data – Traffic data is data that is comprised in or attached to a communication for the purpose of transmitting the communication. Examples of traffic data may include:
– information tracing the origin or destination of a communication that is in transmission;
– information identifying the location of equipment when a communication is or has been made or received (such as the location of a mobile phone);
– information identifying the sender and recipient (including copy recipients) of a communication from data comprised in or attached to the communication;
– routing information identifying equipment through which a communication is or has been transmitted (for example, dynamic IP address allocation, file transfer logs and e-mail headers – to the extent that content of a communication, such as the subject line of an e-mail, is not disclosed);
– anything, such as addresses or markings, written on the outside of a postal item (such as a letter, packet or parcel) that is in transmission;
– online tracking of communications (including postal items and parcels).
To put the communications data thing into context, here’s something you could try for yourself if you have a smartphone. Using something like the SMS to Text app (if you trust it!), grab your txt data from your phone and try charting it: SMS analysis (coming from an Android smartphone or an IPhone). And now ask yourself: what if I also mapped my location data, as collected by my phone? And will this sort of thing be available as midata, or will I have to collect it myself using a location tracking app if I want access to it? (There’s an asymmetry here: the company potentially collecting the data, or me collecting the data…)
It’s also worth bearing in mind that even if access to your data is locked down, access to the data of people associated with you might reveal quite a lot of information about you, including your location, as Adam Sadilek et al. describe: Finding Your Friends and Following Them to Where You Are (see also Far Out: Predicting Long-Term Human Mobility). My own tinkerings with emergent social positioning (looking at who the followers of particular twitter users also follow en masse) also suggest we can generate indicators about potential interests of a user by looking at the interests of their followers… Even if you’re careful about who your friends are, your followers might still reveal something about you you have tried not to disclose yourself (such as your birthday…). (That’s one of the problems with asymmetric trust models! Hmmm… could be interesting to start trying to model some of this… )
Both of these consultations provide a context for reflecting on the extent to which companies use data for their own processing purposes (for a recent review, see What happens to my data? A novel approach to informing users of data processing practices), the extent to which they share this data in raw and processed form with other companies or law enforcement agencies, the extent to which they may use it to underwrite value-added/data-powered services to users directly or when combined with data from other sources, the extent to which they may be willing to share it in raw or processed form back with users, and the extent to which users may then be willing (or licensed) to share that data with other providers, and/or combine it with data from other providers.
One of the biggest risks from a “what might they learn about me” point of view – as well as some of the biggest potential benefits – comes from the reconciliation of data from multiple different sources. Mosaic theory is an idea taken from the intelligence community that captures the idea that when data from multiple sources is combined, the value of the whole view may be greater than the sum of the parts. When privacy concerns are idly raised as a reason against the release of data, it is often suspicion and fears around what a data mosaic picture might reveal that act as drivers of these concerns. (Similar fears are also used as a reason against the release of data, for example under Freedom of Information requests, in case a mosaic results in a picture that can be used against national interests: eg D.E. Pozen, The Mosaic Theory, National Security, and the Freedom of Information Act and MP Goodwin, A National Security Puzzle: Mosaic Theory and the First Amendment Right of Access in the Federal Courts).
Note that within a particular dataset, we might also appeal to mosaic theory thinking; for example, might we learn different things when we observe individual data records as singletons, as opposed to a set of data (and the structures and patterns it contains) as a single thing: GPS Tracking and a ‘Mosaic Theory’ of Government Searches. And as a consequence, might we want to treat individual data records, and complete datasets, differently?
PS via this ORG post – Consulympics: opportunities to have your say on tech policies – which details a whole raft of currently open ICT related consultations in the UK, I am reminded of this ICO Consultation on the draft Anonymisation code of practice along with a draft of the anaoymisation code itself.
I rarely link social apps to other social apps, but sometimes I click through on the first through stages of the linking process to see what happens. Here’s an example I just tried using Klout, which wants me to link in to my account on Facebook. The screenshot is taken from Facebook… but what does it mean?
Does that horizontal arrow aligned with the first element mean permission is only being requested for my personal information? Or is that thin vertical line an “AND” that says persmission is being requested to access my personal information AND post to my wall AND etc etc…
I have no idea….?
[A story a few days ago (March 2012) brought this post to mind... Here's the recent story - Walmart buys a Facebook-based calendar app to get a look at customers' dates: "The Social Calendar app and its file of 110 million birthdays and other events, acquired from Newput Corp., will give Walmart the ability to expand its efforts to dig deeper into the lives of customers—allowing customers to make purchases on Walmart.com directly from event reminders from the Web or their mobile device." It's time I started brushing up on my legal understanding, I think: in the UK, would data protection legislation prevent one company from buying another for its data, and then using that data for a different reason to the reason for which it was collected? And if so, how is different defined? Could the data be used to annotate/be annotated by other data to create a derived product? Hmm... And how will #midata fit in with all this? eg We Can Haz Our Personal Data Back from Corporates?]
A long time ago, I wrote:
A couple of weeks ago [err, that'll be years now;-)], I was telling a colleague about a podcast I’d heard earlier that day: Future Proofing Your Privacy. At the start of the talk, the speaker, Mark Hedland, tells of how he posted to an online group a post that said…
For those of you who haven’t followed the links, here’s a recap. Something that was posted over 10 years ago to a part of the web that wasn’t supposed to be being archived, was – and now Mark Hedland can show how foolish he was then in thinking that [what] he was saying then would disappear.
As we talked, my colleague ["Sam Smith"] mentioned how 5 or so years ago they had posted a request to a news group asking for a translation of a traditional, Canadian French folk song, a translation they have since lost, along with the name of the song. (Actually, it wasn’t a song, French or Canadian, but it was to do with translation; I have changed the specific details to protect my colleague’s privacy!)
Two minutes after leaving their office (or maybe it was three, certainly no longer than that) I mailed my colleague a link to a Google Groups search page containing their long lost post. The query used the equivalent of these search terms: translation song “sam smith”. The post being searched for was the third item in the list of search results.
And so, as Google continues to roll out its social circle search facilities and use the people you know (and the people they know) to inform what search results you see, [and as Google buys up other social search companies, such as Aardvark (e.g. Google Buys Human-driven Search Engine Aardvark: Will It Make It to the Main SERPS?)], it’s worth bearing in mind a few things:
1) Just because you haven’t given Google your Twitter details, Google may know you’re my friend becuase I have given Google my twitter details and my friends and followers lists are public (an ‘asymmetric disclosure’? So for example, for a symmetric disclosure, Google might only use the belief that we’re friends if I follow you AND you have given Google your Twitter credentials AND you follow me. But if it you uses you to inform my results simply because I follow you, that would be asymmetric?)
2) Just because you haven’t given Google any personal info, Google might buy a company you have disclosed personal information to and then assimilate it into their growing total information awareness… (You do know Google owns Youtube, don’t you, and so has a pretty good idea of everything you’ve watched on it?;-)
3) Your mum may be influencing your search results… And you might be influencing your kids’ results… ;-)
PS a not evil thing to do would be to give users of an acquired service a guaranteed period of grace between the announcement that company has been acquired and the time when Google first has access to personal data, with the guarantee that users can withdraw from the service within that period and have their records permanently deleted.
PPS what does Google know about you? Here are two things to try: if you have a Google account, see who’s in your social circle; and whether or not you have a Google account, see what Google’s social graph API can turn up about you… .
PPPS if you’re on Facebook, Twitter and LinkedIn, Mashed In provides a widget based tool for letting other people on those networks see how closely linked they are to you… The asymmetries might arise here from all over the place, depending on what Mashed In is actually doing (I’ll try to do some digging…). For example, you might log on to my site and see that you are connected to someone on Facebook who is connected to someone on Twitter who I’m connect to on Linked In. Those intermediaries, who maybe are trying to maintain privacy of a sort by having separate social circles on different networks, are suddenly exposed. Like weddings where guests from different parts of the happy couple’s life collide, your connections may b your undoing. (Hmmm, so I wonder, are all these social tools going to start being deployed on prospective MPs I wonder? Prospctiv Parliamentary Candidate X is only two steps away from both a member of an dodgy looking group on Facebook and an ex porn star, for example… MPs expenses could be as if nothing compared to the sorts of selective storytelling you might be abl to turn up as a result of friend of a friend connections. Think Twiangulate, but working over multiple servics (as Mashed In might do?), court records, local news searches, gossip sites, company directorships, etc etc… Nightmare…
PPPPS Not to self – do a post on this… Reidentification Using Social Networks (i.e. deanonymisation); for sample History attack code, see SocialHistory.js: See Which Sites Your Users Visit]