OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Paranoia’ Category

A Loss of Sovereignty?

with 4 comments

Over the course of the weekend, rummaging through old boxes of books as part of a loft clearout, I came across more than a few OU textbooks and course books. Way back when, OU course materials were largely distributed in the form of print items and hard media – audio and video cassettes, CD- and DVD-ROMs and so on. Copies of the course materials could be found in college and university libraries that acted as OU study centres, via the second hand market, or in some cases purchased from the OU via OU Worldwide.

Via an OU press release out today, I notice that “[c]ourse books from The Open University (OU) have been donated to an educational sponsorship charity in Kenya, giving old course books a new use for the local communities.” Good stuff…

..but it highlights an issue about the accessibility of our materials as they increasingly move to digital form. More and more courses deliver more and more content to students via the VLE. Students retain access to online course materials and course environments for a period of time after a module finishes, but open access is not available.

True, many courses now release some content onto OpenLearn, the OU’s free open learning platform. And the OU also offers courses on the FutureLearn platform (an Open University owned company that made some share allotments earlier this year).

But access to the electronic form is not tangible – the materials are not persistent, the course materials not tradeable. They can’t really be owned.

I’m reminded of a noticing I had earlier this week about our Now TV box that lets us watch BBC iPlayer, 4oD, youTube and so on via the telly. The UI is based around a “My subscriptions” model which shows the channels (or apps) you subscribe to. Only, there are some channels in their that I didn’t subscribe to, and that – unlike the channels I did subscribe to – I can’t delete from my subscriptions. Sky – I’m looking at you. (Now TV is a Sky/BSkyB product.)

In a similar vein, Apple and U2 recently teamed together to dump a version of U2’s latest album into folks’ iTunes accounts, “giving away music before it can flop, in an effort to stay huge” as Iggy Pop put it in his John Peel Lecture [on BBC iPlayer], and demonstrating once again that our “personal” areas on these commercial services are no such thing. We do not have sovereignty over them. Apple is no Sir Gawain. We do not own the things that are in our collections on these services and nor do we own the collection: I doubt you hold a database right in any collection you curate on youtube or in iTunes, even if you do expend considerable time, effort and skill in putting that collection together; and I fully imagine that the value of those collections as databases are exploited by the recommendation engine mining tools the platform services operate.

And just as platform operators can add things to out collections, so too can they take them away. Take Amazon, for example, who complement their model of selling books with one of renting you limited access to ebooks via their Kindle platform. As history shows – Amazon wipes customer’s Kindle and deletes account with no explanation or The original Big Brother is watching you on Amazon Kindle – Amazon is often well within its rights, and it is well within its capacity, to remove books from your device whenever it likes.

In the same way that corporate IT can remotely manage “your” work devices using enterprise mobile device management (Blackberry: MDM and beyond, Goole apps: mobile management overview, Apple: iOS and the new IT, for example), so too can platform operators of devices – and services – reach into your devices – or service clients – and poke around inside them. Unless we’ve reclaimed it as our own, we’re all users of enterprise technology masked as consumer offerings and have ceded control over our services and devices to the providers of them.

The loss of sovereignty also extends to the way in which devices and services are packaged so that we can’t look inside them, need special tools to access them, can’t take ownership of them in order to appropriate them for other purposes. We are users in a pejorative sense; and we are used by service and platform providers as part of their business models.

Written by Tony Hirst

October 21, 2014 at 10:27 am

Posted in Anything you want, Paranoia

Tagged with ,

More Digital Traces…

Via @wilm, I notice that it’s time again for someone (this time at the Wall Street Journal) to have written about the scariness that is your Google personal web history (the sort of thing you probably have to opt out of if you sign up for a new Google account, if other recent opt-in by defaults are to go by…)

It may not sound like much, but if you do have a Google account, and your web history collection is not disabled, you may find your emotional response to seeing months of years of your web/search history archived in one place surprising… Your Google web history.

Not mentioned in the WSJ article was some of the games that the Chrome browser gets up. @tim_hunt tipped me off to a nice (if technically detailed, in places) review by Ilya Grigorik of some the design features of the Chrome browser, and some of the tools built in to it: High Performance Networking in Chrome. I’ve got various pre-fetching tools switched off in my version of Chrome (tools that allow Chrome to pre-emptively look up web addresses and even download pages pre-emptively*) so those tools didn’t work for me… but looking at chrome://predictors/ was interesting to see what keystrokes I type are good predictors of web pages I visit…

chrome predictors

* By the by, I started to wonder whether webstats get messed up to any significant effect by Chrome pre-emptively prefetching pages that folk never actually look at…?

In further relation to the tracking of traffic we generate from our browsing habits, as we access more and more web/internet services through satellite TV boxes, smart TVs, and catchup TV boxes such as Roku or NowTV, have you ever wondered about how that activity is tracked? LG Smart TVs logging USB filenames and viewing info to LG servers describes not only how LG TVs appear to log the things you do view, but also the personal media you might view, and in principle can phone that information home (because the home for your data is a database run by whatever service you happen to be using – your data is midata is their data).

there is an option in the system settings called “Collection of watching info:” which is set ON by default. This setting requires the user to scroll down to see it and, unlike most other settings, contains no “balloon help” to describe what it does.

At this point, I decided to do some traffic analysis to see what was being sent. It turns out that viewing information appears to be being sent regardless of whether this option is set to On or Off.

you can clearly see that a unique device ID is transmitted, along with the Channel name … and a unique device ID.

This information appears to be sent back unencrypted and in the clear to LG every time you change channel, even if you have gone to the trouble of changing the setting above to switch collection of viewing information off.

It was at this point, I made an even more disturbing find within the packet data dumps. I noticed filenames were being posted to LG’s servers and that these filenames were ones stored on my external USB hard drive.

Hmmm… maybe it’s time I switched out my BT homehub for a proper hardware firewalled router with a good set of logging tools…?

PS FWIW, I can’t really get my head round how evil on the one hand, or damp squib on the other, the whole midata thing is turning out to be in the short term, and what sorts of involvement – and data – the partners have with the project. I did notice that a midata innovation lab report has just become available, though to you and me it’ll cost 1500 squidlly diddlies so I haven’t read it: The midata Innovation Opportunity. Note to self: has anyone got any good stories to say about TSB supporting innovation in micro-businesses…?

PPS And finally, something else from the Ilya Grigorik article:

The HTTP Archive project tracks how the web is built, and it can help us answer this question. Instead of crawling the web for the content, it periodically crawls the most popular sites to record and aggregate analytics on the number of used resources, content types, headers, and other metadata for each individual destination. The stats, as of January 2013, may surprise you. An average page, amongst the top 300,000 destinations on the web is:

- 1280 KB in size
– composed of 88 resources
– connects to 15+ distinct hosts

Let that sink in. Over 1 MB in size on average, composed of 88 resources such as images, JavaScript, and CSS, and delivered from 15 different own and third-party hosts. Further, each of these numbers has been steadily increasing over the past few years, and there are no signs of stopping. We are increasingly building larger and more ambitious web applications.

Is it any wonder that pages take so long to load on a mobile phone off the 3G netwrok, and that you can soon eat up your monthly bandwidth allowance!

Written by Tony Hirst

November 21, 2013 at 12:37 am

Posted in Infoskills, Paranoia, privacy

Tagged with , ,

Mental Health and the Web – Hounded by Ads, Bullied by Retargeters?

Some thinkses not thought through…

- can web ads feed paranoia, particularly retargeted ads that follow you round the web?
– can personalised ads be upsetting, or appear bullying? For example, suppose you’re critically ill, or in debt, or you’ve been researching a sensitive personal or family matter, and then the ads start to hound you and remind you about it…

Is one take on privacy that it is not to be reminded about something by an external agency? (Is there a more memorable quote along those lines somewhere..?!)

Are personalised ads one of the more pernicious examples of a “filter bubble” effect?

I think I need a proper holiday, well away from the web!

Written by Tony Hirst

October 23, 2013 at 10:58 am

Posted in Evilness, Paranoia

Centralising User Tracking on the Web – Let Google Track Everyone For You

In The Curse of Our Time – Tracking, Tracking Everywhere, I noted how the likes of Google set up cookie matching services that allow advertisers to reconcile their cookies with Google’s:

The Cookie Matching Service enables a buyer to associate two kinds of data:

- the cookie that identifies a user within the buyer domain, and
– the doubleclick.net cookie that identifies the user for Google. (We share a buyer-specific encrypted Google User ID for buyers to match on.)

The data structure that the buyer uses to maintain these associations is called a Match Table. While the buyer is responsible for building and maintaining match tables, Google can host them.

With an RTB [real-time bidding] application, the buyer can bid on impressions where the user has a specific Google User ID, and can use information associated with the Google User ID as criteria in a bid for an ad impression.

But it seems that this isn’t enough for Google. It actually gets worse… A USA Today story suggests Google is exploring the idea of an AdID, an identifier that it would share with advertisers to uniquely identify eyeballs rather than them having to use a range of alternative third party user tracking services.

How AdID would actually work (if indeed it is ever comes to pass) is not explained, although a post on the AdExchanger blog – What The Google AdID Means For Ad Tech – comes up with possible mechanism: Google uniquely identifies users (presumably using cookies and authenticated user credentials (topped up with a little bit of browser fingerprinting, I wonder?)) then provides an advertiser with a hashed version of the ID. The hashed identifier means advertisers can’t share information with each other.

The Google AdID service seems like it would be offered as an alternative to tracking users using third party services that use their own third party cookies, with a user tracking system that offers more effective identity tracking techniques (such as a logged in Google user id). Which is to say, Google wants to replace third party cookie based tracking services with it’s own (logged in user + cookies + browser fingerprinting + etc etc) user tracking service? Or have I misinterpreted all this?

One AdID to rule them all, One AdID to find them,
One AdID to bring them all and in the darkness bind them
In the Land of Google where the Shadows lie.

PS by the by, I notice in a post on Author information in search results, that “If you want your authorship information to appear in search results for the content you create, you’ll need a Google+ Profile with a good, recognizable headshot as your profile photo. Then, verify authorship of your content by associating it with your profile using either of the methods below.” Ah ha… so you agree to give the Goog a good photo of yourself that it can use in it’s face matching algos, such as in the sort of thing that could be used to unlock your phone. Good. Not. Will faceID play a part of AdID, I wonder? With a gaze tracking feedback loop? Or maybe Google will be getting more into the tracked user, outside advertising market?

PPS This then also brings back to mind the face tracking approaches mentioned in The Curse of Our Time – Tracking, Tracking Everywhere

PPPS By the by, it seems the Culture, Media and Sport Committee have no problem with online targeted ads:

85. Inevitably public funding is under pressure, a point illustrated by cuts in the budget of Arts Council England.[161] Given the essential role of public funding in sustaining the wider creative economy, it is crucial that adequate resources are available. Of course, the private sector should be encouraged as much as possible to invest in the creative industries. One good example is provided by advertising, which not only provides a major source of funding but is a creative industry in its own right. Evidence from the Advertising Association points to advertising as “a major creative industry and a critical source of funding for other creative industries”. The Advertising Association’s evidence goes on to express deep concern about draft EU Data Protection Regulation “which could damage direct marketing, internet advertising, and the UK economy both off and online”.[162] Increasing use is being made of personal data to target online advertising better. While concerns around this have prompted reviews of data protection legislation, we do not think the targeting of appropriate advertising—essential to so many business models —represents the greatest threat to privacy. [original emphasis]

Written by Tony Hirst

September 21, 2013 at 10:31 am

Posted in Paranoia

Tagged with

Some Sketchnotes on a Few of My Concerns About #opendata

With my growing unease about just what the agenda driving open government/public data is, I think I’m going to have to find some time away to walk the dog lots, and mull over what pieces might be part of the jigsaw, as well as having a go at trying to put some of them together…

Near the top of the list is a concern about information asymmetry and how open data may be used by private concerns to provide a one-off advantage for them when it comes to poaching services from the public sector. How so? My gut reaction thinking is this: if, as part of the procurement process, the private sector can use open public data to help it secure a contract in competition with a public sector provider, then when contracts come to renewal the public sector may know less when it comes to bid than the private sector company was able to learn when it first tendered. The question here is: does open public data put private sector companies in an advantage when it comes to bidding for public service contracts against an encumbent public provider compared to a public body bidding to recapture a service from an encumbent private provider, given that the private provider may not be required to open up information (for example, through FOI requests, transparency or public reporting obligations) in the same way that a public body is.

Another take on a similar theme is the extent to which there may be a loss of transparency when a service goes from a public to a private provider. If we think there is some benefit to be had from transparency in general terms, then private providers of public services should have the same openness requirements placed on them as the public body. If private companies can claim revealing information is against their commercial interest, can public bodies make the same claims on exactly the same terms under FOI exemption rules, for example (eg MoJ Freedom of information guidance: Exemptions guidance – Section 43: Commercial interests).

Taking the NHS as a case example, here are a few things on my reading list:

  • Monitor report from March 2013 on A fair playing field for the benefit of NHS patients [actual report]. For example, the report identified the following distortions:

    1. Participation distortions. Some providers are directly or indirectly excluded from offering their services to NHS patients for reasons other than quality or efficiency. Restrictions on participation disadvantage providers seeking to expand into new services or new areas, regardless of whether the providers are public, charitable or private. Participation distortions disadvantage nonincumbent providers of every type.
    2. Cost distortions. Some types of provider face externally imposed costs that do not fall on other providers. On balance, cost distortions mostly disadvantage charitable and private health care providers compared to public providers.
    3. Flexibility distortions. Some providers’ ability to adapt their services to the changing needs of patients and commissioners is constrained by factors outside their control. These flexibility distortions mostly disadvantage public sector providers compared to other types.

    I’m not sure to what extent, if any, the report reviews distortions and asymmetries arising from open data issues.
    A search of the report for mentions of FOI turns up:

    Provider transparency
    Historically, public providers have faced higher levels of scrutiny than other providers, including requests for information under the Freedom of Information Act. This degree of scrutiny can improve accountability to patients and promote good practice. Freedom of Information requirements have been extended through the standard NHS contract to private and charitable providers. However, it is not clear that this is operating effectively as yet, and other aspects of transparency do not apply across all types of provider.
    29. The Government and commissioners should ensure that transparency, including Freedom of Information requirements, is implemented across all types of provider of NHS services on a consistent basis.

    As I said, it’s on the reading list…

  • A terrifying post on the Computer Weekly/Public Sector IT blog – NHS watchdog commandeers data in bid to stimulate privatization and an earlier one on the naive take on hospital mortality data: Data regime makes merciless start on NHS privatization. Are there any reports or strategy documents from the Care Quality Commission (CQC) I need to add to my reading list?
  • Something academic… such as this piece from the Proceedings of the 21st European Conference on Information Systems on The Generative Mechanisms of Open Government Data, much of which I suspect is summarised by these two figures taken (without permission) from the the paper:

    generative mechanisms open gov data

    THE GENERATIVE MECHANISMS OF OPEN GOVERNMENT DATA

  • Opening up data (particularly data held by public bodies) around private companies is another area I can quite get my head round, particularly when it comes to comparing information about the machinations of private companies as compared to public bodies. To what extent should companies that are public and limited liability have data that is held by them by public bodies be openly available, for example? Maybe related to this is a currently open BIS consultation: Company ownership: transparency and trust discussion paper as well as the HMRC consultationon Sharing and publishing data for public benefit (press release) that I linked to from yesterday’s Rambling Round-Up of Some Recent #OpenData Notices. (OpenCorporate’s Chris Taggart posts some interesting thoughts on the sheen given to the proposed release of VAT registration data to credit agencies that the consultation is in part based around: Open tax data, or just VAT ‘open wash’.) A recent edition of File on 4 (h/t @onthewight) on charity based tax fraud – Faith, Hope and… Tax Avoidance – also got me wondering further about what information is openly available about charities’ activities (eg A Quick Peek at Some Charities Data…)?
  • One to dig for… via Lexology, a post on Freedom of information in the private sector? which claims that “The Confederation of British Industry (“CBI“) has revealed that it is developing ‘transparency guidelines’ that will apply to private companies that provide services to the NHS.” Have these appeared yet, even in draft or consultation form?

A few other things on my to-do list in this area: map out the lobbiests and board/panel members around open data; use disclosure logs to search for companies putting in FOI requests in different sectors; see who’s pitching ideas in to ODUG; map out who’s funding NGOs and activities in the opendata space.

Sigh… no time…;-)

PS Not sure if there is a full paper version of this…? Bates, J. 2013, Information policy in the crises of neoliberalism: the case of Open Government Data in the UK at International Association of Media and Communications Researchers Conference, Dublin, June 2013: “Whilst open data releases by the UK government have received substantial support within UK civil society, often being interpreted as a creative and innovative response to a range of social issues, and, for some, a radical challenge to key components of neoliberal capitalism, this paper argues that deeper analysis of the OGD initiative suggests that it is being shaped by the UK government and corporate interests in an attempt to leverage a distinctly neoliberal agenda. The adoption and development of the OGD agenda as core to the policy response adopted by the UK Government to conditions of political economic crisis, suggests that information policy is being implemented as a key, yet often opaque, element of the neoliberal policy toolbox.” See also an earlier paper, “This is what modern deregulation looks like”: Co-optation and contestation in the shaping of the UK’s Open Government Data Initiative (“whilst OGD [Open Government Data] might potentially support modes of transparent and democratic governance, the current ‘transparency agenda’ should be recognised as an initiative that also aims to enable the marketisation of public services, and this is something that is not readily apparent to the general observer.”) and a statement of Jo Bates current research project in the are: The politics of Open Government Data in the UK

PPS Supporting the idea of symmetry in reporting between public services and private companies delivering public services, Richard Murphy on Making public services accountable. And some excellent writings critiquing computational thinking and the teaching of code by Ben Williamson.

Written by Tony Hirst

August 14, 2013 at 9:56 am

Posted in Paranoia, Policy

Tagged with

So What Counts as “Communications Data”?

Picking up on a post by @nevali (Communications Data) that looks at the layered structure of internet based communications in general and a peek inside an SMTP session in particular, I idly wondered about the structure of a tweet and what, exactly, might count as the communications data part of it, as defined by the draft Communications Data Bill:

TO what extent can we make a fair comparison with something like the “communications data” associated with this sort of transaction?

(89/365) One day this will be extinct

Or how about a postcard?

See also: From Communications Data to #midata – with a Mobile Phone Data Example

PS via @smithsam, and in a similar light, a consideration of the anatomy of a Facebook message

PPS Given part of the #midata focus on transaction data, I’ve also started wondering about the extent to which financial transactions count as communications, and how different payment mechanisms might change the nature of the transaction. For example, two people meeting face-to-face engaging in a cash transaction, versus a purchase made via an online form using a credit card.

PPPS inspired by the anatomy of a Facebook message, I just posted a tweet via the Twitter web interface to see what the traffic looked like. It was an HTTP post that included the following:

Request URL:https://api.twitter.com/1/statuses/update.json
Request Method:POST
Status Code:200 OK
Request Payload
include_entities=true&include_cards=1&status=%40ousefulapi+wondering+what+data+the+twitter+web+client+sends+when+i+post+a+tweet&post_authenticity_token=*****
Response Headersview parsed
HTTP/1.1 200 OK
status: 200 OK
version: HTTP/1.1
cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
content-encoding: gzip
content-length: 880
content-type: application/json; charset=utf-8
date: Thu, 23 Aug 2012 09:32:08 GMT

It also got a response, which looks a lot like the data around a particular status update. Presumably the response to an update message is a set of data back describing that accepted status update?

{"in_reply_to_status_id_str":null, "id_str":"238569301735526400", "contributors":null, "truncated":false, "created_at":"Thu Aug 23 09:32:08 +0000 2012", "in_reply_to_user_id":64672382, "in_reply_to_user_id_str":"64672382", "in_reply_to_screen_name":"ousefulAPI", "user":{"id":7129072, "url":"http:\/\/blog.ouseful.info", "profile_use_background_image":true, "verified":false, "profile_text_color":"000000", "contributors_enabled":false, "created_at":"Thu Jun 28 11:37:39 +0000 2007", "profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/1195013164\/Picture_23_normal.png", "profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/1195013164\/Picture_23_normal.png", "statuses_count":32203,"utc_offset":0, "profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/2508031\/rss_globe.png", "profile_sidebar_border_color":"87BC44", "default_profile":false, "show_all_inline_media":false, "name":"Tony Hirst", "friends_count":742, "location":"UK","id_str":"7129072", "profile_background_tile":true, "protected":false, "profile_sidebar_fill_color":"E0FF92", "geo_enabled":false, "listed_count":423, "follow_request_sent":false, "lang":"en", "description":"OU lecturer, mashup artist; Isle of WIght resident and #f1datajunkie", "profile_background_color":"9AE4E8", "screen_name":"psychemedia", "is_translator":false, "time_zone":"London", "notifications":false, "profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/2508031\/rss_globe.png", "default_profile_image":false, "profile_link_color":"0000FF", "favourites_count":377, "following":false,"followers_count":3905},"retweeted":false, "coordinates":null, "in_reply_to_status_id":null, "geo":null, "source":"web", "entities":{"user_mentions":[{"name":"OUseful", "screen_name":"ousefulAPI", "id_str":"64672382","indices":[0,11],"id":64672382}], "hashtags":[], "urls":[]},"id":238569301735526400,"place":null, "retweet_count":0, "favorited":false, "text":"@ousefulapi wondering what data the twitter web client sends when i post a tweet"}

What you’ll notice is that whilst the update as sent was just a message string, the response identifies the sender (along with biographical data, geo data (possibly), a link to a photo (possibly), a real name, it also identifies the person to whom the tweet was sent (a Twitter convention is the tweets starting with @… are in some sense sent to @…*), and also (via user_mentions) would explicitly identify any other individuals mentioned within the body of the tweet (which as are mentioned as part of the content of the message. If the tweet began @foo @bar …, whilst @foo would be identified as some sort of addressee, @bar wouldn’t, although it would be identified as a user_mention**. However, we might assument that the tweet was addressed in some sense to both @foo and @bar, whereas “@foo Will chat to @bar later” only mentions @bar as content… And “@foo @bar said that too, I think”, whilst clunky, could be interpreted as mentioned @bar as content not suggested addressee (eg in sense of “@foo I think @bar said that too”).

* the tweet will only appear in the timeline of the person is sent to (and if they follow you?), although it is still public. Many clients also display as a timeline “user_mentions” tweets, so if your Twitter username appears anywhere in the body of a tweet, you should see the tweet, even if you don’t follow the person who sent it.

** If the tweet starts with another character, eg “.@foo” then @foo is no longer an addrssee in the sense of in_reply_to. From a communications data point of view, what’s fair game as far as communications data goes?

Because the update is sent via https, I don’t think you could argue the update was posted as a plaintext postcard? In the postal mail system, how does the law distinguish between messages placed inside an intercepted closed envelope and messages written on an intercepted postcard?

(Hmm – what;s the traffic associated with a TWitter DM I wonder?)

Written by Tony Hirst

August 22, 2012 at 11:26 pm

From Communications Data to #midata – with a Mobile Phone Data Example

A BIS Press Release (Next steps making midata a reality) seems to have resulted in folk tweeting today about the #midata consultation that was announced last month. If you haven’t been keeping up, #midata is the policy initiative around getting companies to make “[consumer data] that may be actionable and useful in making a decision or in the course of a specific activity” (whatever that means) available to users in a machine readable form. To try to help clarify matters, several vignettes are described in this July 2012 report – Example applications of the midata programme – which plays the role of a ‘draft for discussion’ at the September midata Strategy Board [link?]. Here’s a quick summary of some of them:

  • form filling: a personal datastore will help you pre-populate forms and provide certified evidence of things like: proof of her citizenship, qualified to drive, passed certain exams and achieved certain qualifications, passed a CRB check, and so on. (Note: I’ve previously tried to argue the case for the OU starting to develop a service (OU Qualification Verification Service) around delivering verified tokens relating to the award of OU degrees, and degrees awarded by the polytechnics, as was (courtesy of the OU’s CNAA Aftercare Service), but after an initial flurry of interest, it was passed on. midata could bring it back maybe?
  • home moving admin: change your details in a personal “mydata” data store, and let everyone pick up the changes from there. Just think what fun you could have with an attack on this;-)
  • contracts and warranties dashboard: did my crApple computer die the week before or after the guarantee ran out?
  • keeping track of the housekeeping: bank and financial statement data management and reporting tools. I thought there already was software for doing this? do we use it though? I’d rather my bank improved the tools it provided me with?
  • keeping up with the Jones’s: how does my house’s energy consumption compare with that of my neighbours?
  • which phone? Pick a tariff automatically based on your actual phone usage. From going through this recently, the problem is not with knowing how I use my phone (easy enough to find out), it’s with navigating the mobile phone sites trying to understand their offers. (And why can’t Vodafone send me an SMS to say I’m 10 minutes away from using up this month’s minutes, rather than letting me go over? The midata answer might be an agent that looks at my usage info and tells me when I’m getting close to my limit, which requires me having access to my contract details in a machine readable form, I guess?

And here’s a BIS blog post summarising them: A midata future: 10 ways it could shape your choices.

(The #midata policy seems based on a belief that users want better access to data so they can do things with it. I’m not convinced – why should I have to export my bank data to another service (increasing the number of services I must trust) rather than my bank providing me with useful tools directly? I guess one way this might play out is that any data that does dribble out may get built around by developers who then sell the tools back to the data providers so they can offer them directly? In this context, I guess I should read the BIS commissioned Jigsaw Research report: Potential consumer demand for midata.)

Today has also seen a minor flurry of chat around the call for evidence on the Communications Data Bill, presumably because the closing date for responses is tomorrow (draft Communications Data Bill). (Related reading: latest Annual Report of the Interception of Communications Commissioner.) Again, if you haven’t been keeping up, the draft Communications Data Bill describes communications data in the following terms:

  • Communications data is information about a communication; it can include the details of the time, duration, originator and recipient of a communication; but not the content of the communication itself
  • Communications data falls into three categories: subscriber data; use data; and traffic data.

The categories are further defined in an annex:

  • Subscriber Data – Subscriber data is information held or obtained by a provider in relation to persons to whom the service is provided by that provider. Those persons will include people who are subscribers to a communications service without necessarily using that service and persons who use a communications service without necessarily subscribing to it. Examples of subscriber information include:
    – ‘Subscriber checks’ (also known as ‘reverse look ups’) such as “who is the subscriber of phone number 012 345 6789?”, “who is the account holder of e-mail account xyz@xyz.anyisp.co.uk?” or “who is entitled to post to web space http://www.xyz.anyisp.co.uk?”;
    – Subscribers’ or account holders’ account information, including names and addresses for installation, and billing including payment method(s), details of payments;
    – information about the connection, disconnection and reconnection of services which the subscriber or account holder is allocated or has subscribed to (or may have subscribed to) including conference calling, call messaging, call waiting and call barring telecommunications services;
    – information about the provision to a subscriber or account holder of forwarding/redirection services;
    – information about apparatus used by, or made available to, the subscriber or account holder, including the manufacturer, model, serial numbers and apparatus codes.
    – information provided by a subscriber or account holder to a provider, such as demographic information or sign-up data (to the extent that information, such as a password, giving access to the content of any stored communications is not disclosed).
  • Use data – Use data is information about the use made by any person of a postal or telecommunications service. Examples of use data may include:
    – itemised telephone call records (numbers called);
    – itemised records of connections to internet services;
    – itemised timing and duration of service usage (calls and/or connections);
    – information about amounts of data downloaded and/or uploaded;
    – information about the use made of services which the user is allocated or has subscribed to (or may have subscribed to) including conference calling, call messaging, call waiting and call barring telecommunications services;
    – information about the use of forwarding/redirection services;
    – information about selection of preferential numbers or discount calls;
  • Traffic Data – Traffic data is data that is comprised in or attached to a communication for the purpose of transmitting the communication. Examples of traffic data may include:
    – information tracing the origin or destination of a communication that is in transmission;
    – information identifying the location of equipment when a communication is or has been made or received (such as the location of a mobile phone);
    – information identifying the sender and recipient (including copy recipients) of a communication from data comprised in or attached to the communication;
    – routing information identifying equipment through which a communication is or has been transmitted (for example, dynamic IP address allocation, file transfer logs and e-mail headers – to the extent that content of a communication, such as the subject line of an e-mail, is not disclosed);
    – anything, such as addresses or markings, written on the outside of a postal item (such as a letter, packet or parcel) that is in transmission;
    – online tracking of communications (including postal items and parcels).

    To put the communications data thing into context, here’s something you could try for yourself if you have a smartphone. Using something like the SMS to Text app (if you trust it!), grab your txt data from your phone and try charting it: SMS analysis (coming from an Android smartphone or an IPhone). And now ask yourself: what if I also mapped my location data, as collected by my phone? And will this sort of thing be available as midata, or will I have to collect it myself using a location tracking app if I want access to it? (There’s an asymmetry here: the company potentially collecting the data, or me collecting the data…)

    It’s also worth bearing in mind that even if access to your data is locked down, access to the data of people associated with you might reveal quite a lot of information about you, including your location, as Adam Sadilek et al. describe: Finding Your Friends and Following Them to Where You Are (see also Far Out: Predicting Long-Term Human Mobility). My own tinkerings with emergent social positioning (looking at who the followers of particular twitter users also follow en masse) also suggest we can generate indicators about potential interests of a user by looking at the interests of their followers… Even if you’re careful about who your friends are, your followers might still reveal something about you you have tried not to disclose yourself (such as your birthday…). (That’s one of the problems with asymmetric trust models! Hmmm… could be interesting to start trying to model some of this… )

    Both of these consultations provide a context for reflecting on the extent to which companies use data for their own processing purposes (for a recent review, see What happens to my data? A novel approach to informing users of data processing practices), the extent to which they share this data in raw and processed form with other companies or law enforcement agencies, the extent to which they may use it to underwrite value-added/data-powered services to users directly or when combined with data from other sources, the extent to which they may be willing to share it in raw or processed form back with users, and the extent to which users may then be willing (or licensed) to share that data with other providers, and/or combine it with data from other providers.

    One of the biggest risks from a “what might they learn about me” point of view – as well as some of the biggest potential benefits – comes from the reconciliation of data from multiple different sources. Mosaic theory is an idea taken from the intelligence community that captures the idea that when data from multiple sources is combined, the value of the whole view may be greater than the sum of the parts. When privacy concerns are idly raised as a reason against the release of data, it is often suspicion and fears around what a data mosaic picture might reveal that act as drivers of these concerns. (Similar fears are also used as a reason against the release of data, for example under Freedom of Information requests, in case a mosaic results in a picture that can be used against national interests: eg D.E. Pozen, The Mosaic Theory, National Security, and the Freedom of Information Act and MP Goodwin, A National Security Puzzle: Mosaic Theory and the First Amendment Right of Access in the Federal Courts).

    Note that within a particular dataset, we might also appeal to mosaic theory thinking; for example, might we learn different things when we observe individual data records as singletons, as opposed to a set of data (and the structures and patterns it contains) as a single thing: GPS Tracking and a ‘Mosaic Theory’ of Government Searches. And as a consequence, might we want to treat individual data records, and complete datasets, differently?

    PS via this ORG post – Consulympics: opportunities to have your say on tech policies – which details a whole raft of currently open ICT related consultations in the UK, I am reminded of this ICO Consultation on the draft Anonymisation code of practice along with a draft of the anaoymisation code itself.

    Written by Tony Hirst

    August 22, 2012 at 1:07 pm

    Posted in Data, Paranoia, Policy, privacy

    Follow

    Get every new post delivered to your Inbox.

    Join 844 other followers