Using Google to Look Up Where You Live via the Physical Location of Your Wifi Router

During a course team meeting today, I idly mentioned that we should be able to run a simple browser based activity involving the geolocation of a student’s computer based on Google knowing the location of their wifi router. I was challenged about the possibility of this, so I did a quick bit of searching to see if there was an easy way of looking up the MAC addresses (BSSID) of wifi access points that were in range, but not connected to:


which turned up:

The airport command with '-s' or '-I' options is useful: /System/Library/PrivateFrameworks/Apple80211.framework/Resources/airport


(On Windows, the equivalent is maybe something like netsh wlan show network mode=bssid ??? And then call it via python.)

The second part of the jigsaw was to try to find a way of looking up a location from a wifi access point MAC address – it seems that the Google geolocation API does that out of the can:

The_Google_Maps_Geolocation_API_ _ _Google_Maps_Geolocation_API_ _ _Google_Developers_and_Add_New_Post_‹_OUseful_Info__the_blog____—_WordPress

An example of how to make a call is also provided, as long as you have an API key… So I got a key and gave it a go:



Looking at the structure of the example Google calls, you can enter several wifi MAC addresses, along with signal strength, and the API will presumably triangulate based on that information to give a more precise location.

The geolocation API also finds locations from cell tower IDs.

So back to the idea of a simple student activity to sniff out the MAC addresses of wifi routers their computer can see from the workplace or home, and then look up the location using the Google geolocation API and pop it on a map.

Which is actually the sort of thing your browser will do when you turn on geolocation services:


But maybe when you run the commands yourself, it feels a little bit more creepy?

PS sort of very loosely related, eg in terms of trying to map spaces from signals in the surrounding aether, a technique for trying to map the insides of a room based on it’s audio signature in response to a click of the fingers:

PPS here’s a start to a Python script to grab the MAC addresses and do the geolocation calls

import sys

import subprocess
def getWifiMacAddresses():
    #autodetect platform and then report based on this?
    results = subprocess.check_output(["/System/Library/PrivateFrameworks/Apple80211.framework/Resources/airport", "-s"])
    #results = subprocess.check_output(["netsh", "wlan", "show", "network"])

    #! apt-get -y install wireless-tools
    #results = subprocess.check_output(["iwlist","scanning"])
    results = results.decode("utf-8") # needed in python 3
    ls = results.split("\n")
    ls = ls[1:]
    for l in [x.strip() for x in ls if x.strip()!='']:
        ll=l.split(' ')
        macAddr[l.strip().split(' ')[0]]=(l.strip().split(' ')[1], l.strip().split(' ')[2])
    return macAddr

#For Mac:
for h in hotspots:
    postjson['wifiAccessPoints'].append({'macAddress':addr, 'signalStrength':int(db)})

import requests
r =, json=postjson)

A Loss of Sovereignty?

Over the course of the weekend, rummaging through old boxes of books as part of a loft clearout, I came across more than a few OU textbooks and course books. Way back when, OU course materials were largely distributed in the form of print items and hard media – audio and video cassettes, CD- and DVD-ROMs and so on. Copies of the course materials could be found in college and university libraries that acted as OU study centres, via the second hand market, or in some cases purchased from the OU via OU Worldwide.

Via an OU press release out today, I notice that “[c]ourse books from The Open University (OU) have been donated to an educational sponsorship charity in Kenya, giving old course books a new use for the local communities.” Good stuff…

..but it highlights an issue about the accessibility of our materials as they increasingly move to digital form. More and more courses deliver more and more content to students via the VLE. Students retain access to online course materials and course environments for a period of time after a module finishes, but open access is not available.

True, many courses now release some content onto OpenLearn, the OU’s free open learning platform. And the OU also offers courses on the FutureLearn platform (an Open University owned company that made some share allotments earlier this year).

But access to the electronic form is not tangible – the materials are not persistent, the course materials not tradeable. They can’t really be owned.

I’m reminded of a noticing I had earlier this week about our Now TV box that lets us watch BBC iPlayer, 4oD, youTube and so on via the telly. The UI is based around a “My subscriptions” model which shows the channels (or apps) you subscribe to. Only, there are some channels in their that I didn’t subscribe to, and that – unlike the channels I did subscribe to – I can’t delete from my subscriptions. Sky – I’m looking at you. (Now TV is a Sky/BSkyB product.)

In a similar vein, Apple and U2 recently teamed together to dump a version of U2’s latest album into folks’ iTunes accounts, “giving away music before it can flop, in an effort to stay huge” as Iggy Pop put it in his John Peel Lecture [on BBC iPlayer], and demonstrating once again that our “personal” areas on these commercial services are no such thing. We do not have sovereignty over them. Apple is no Sir Gawain. We do not own the things that are in our collections on these services and nor do we own the collection: I doubt you hold a database right in any collection you curate on youtube or in iTunes, even if you do expend considerable time, effort and skill in putting that collection together; and I fully imagine that the value of those collections as databases are exploited by the recommendation engine mining tools the platform services operate.

And just as platform operators can add things to out collections, so too can they take them away. Take Amazon, for example, who complement their model of selling books with one of renting you limited access to ebooks via their Kindle platform. As history shows – Amazon wipes customer’s Kindle and deletes account with no explanation or The original Big Brother is watching you on Amazon Kindle – Amazon is often well within its rights, and it is well within its capacity, to remove books from your device whenever it likes.

In the same way that corporate IT can remotely manage “your” work devices using enterprise mobile device management (Blackberry: MDM and beyond, Goole apps: mobile management overview, Apple: iOS and the new IT, for example), so too can platform operators of devices – and services – reach into your devices – or service clients – and poke around inside them. Unless we’ve reclaimed it as our own, we’re all users of enterprise technology masked as consumer offerings and have ceded control over our services and devices to the providers of them.

The loss of sovereignty also extends to the way in which devices and services are packaged so that we can’t look inside them, need special tools to access them, can’t take ownership of them in order to appropriate them for other purposes. We are users in a pejorative sense; and we are used by service and platform providers as part of their business models.

More Digital Traces…

Via @wilm, I notice that it’s time again for someone (this time at the Wall Street Journal) to have written about the scariness that is your Google personal web history (the sort of thing you probably have to opt out of if you sign up for a new Google account, if other recent opt-in by defaults are to go by…)

It may not sound like much, but if you do have a Google account, and your web history collection is not disabled, you may find your emotional response to seeing months of years of your web/search history archived in one place surprising… Your Google web history.

Not mentioned in the WSJ article was some of the games that the Chrome browser gets up. @tim_hunt tipped me off to a nice (if technically detailed, in places) review by Ilya Grigorik of some the design features of the Chrome browser, and some of the tools built in to it: High Performance Networking in Chrome. I’ve got various pre-fetching tools switched off in my version of Chrome (tools that allow Chrome to pre-emptively look up web addresses and even download pages pre-emptively*) so those tools didn’t work for me… but looking at chrome://predictors/ was interesting to see what keystrokes I type are good predictors of web pages I visit…

chrome predictors

* By the by, I started to wonder whether webstats get messed up to any significant effect by Chrome pre-emptively prefetching pages that folk never actually look at…?

In further relation to the tracking of traffic we generate from our browsing habits, as we access more and more web/internet services through satellite TV boxes, smart TVs, and catchup TV boxes such as Roku or NowTV, have you ever wondered about how that activity is tracked? LG Smart TVs logging USB filenames and viewing info to LG servers describes not only how LG TVs appear to log the things you do view, but also the personal media you might view, and in principle can phone that information home (because the home for your data is a database run by whatever service you happen to be using – your data is midata is their data).

there is an option in the system settings called “Collection of watching info:” which is set ON by default. This setting requires the user to scroll down to see it and, unlike most other settings, contains no “balloon help” to describe what it does.

At this point, I decided to do some traffic analysis to see what was being sent. It turns out that viewing information appears to be being sent regardless of whether this option is set to On or Off.

you can clearly see that a unique device ID is transmitted, along with the Channel name … and a unique device ID.

This information appears to be sent back unencrypted and in the clear to LG every time you change channel, even if you have gone to the trouble of changing the setting above to switch collection of viewing information off.

It was at this point, I made an even more disturbing find within the packet data dumps. I noticed filenames were being posted to LG’s servers and that these filenames were ones stored on my external USB hard drive.

Hmmm… maybe it’s time I switched out my BT homehub for a proper hardware firewalled router with a good set of logging tools…?

PS FWIW, I can’t really get my head round how evil on the one hand, or damp squib on the other, the whole midata thing is turning out to be in the short term, and what sorts of involvement – and data – the partners have with the project. I did notice that a midata innovation lab report has just become available, though to you and me it’ll cost 1500 squidlly diddlies so I haven’t read it: The midata Innovation Opportunity. Note to self: has anyone got any good stories to say about TSB supporting innovation in micro-businesses…?

PPS And finally, something else from the Ilya Grigorik article:

The HTTP Archive project tracks how the web is built, and it can help us answer this question. Instead of crawling the web for the content, it periodically crawls the most popular sites to record and aggregate analytics on the number of used resources, content types, headers, and other metadata for each individual destination. The stats, as of January 2013, may surprise you. An average page, amongst the top 300,000 destinations on the web is:

– 1280 KB in size
– composed of 88 resources
– connects to 15+ distinct hosts

Let that sink in. Over 1 MB in size on average, composed of 88 resources such as images, JavaScript, and CSS, and delivered from 15 different own and third-party hosts. Further, each of these numbers has been steadily increasing over the past few years, and there are no signs of stopping. We are increasingly building larger and more ambitious web applications.

Is it any wonder that pages take so long to load on a mobile phone off the 3G netwrok, and that you can soon eat up your monthly bandwidth allowance!

Mental Health and the Web – Hounded by Ads, Bullied by Retargeters?

Some thinkses not thought through…

– can web ads feed paranoia, particularly retargeted ads that follow you round the web?
– can personalised ads be upsetting, or appear bullying? For example, suppose you’re critically ill, or in debt, or you’ve been researching a sensitive personal or family matter, and then the ads start to hound you and remind you about it…

Is one take on privacy that it is not to be reminded about something by an external agency? (Is there a more memorable quote along those lines somewhere..?!)

Are personalised ads one of the more pernicious examples of a “filter bubble” effect?

I think I need a proper holiday, well away from the web!

Centralising User Tracking on the Web – Let Google Track Everyone For You

In The Curse of Our Time – Tracking, Tracking Everywhere, I noted how the likes of Google set up cookie matching services that allow advertisers to reconcile their cookies with Google’s:

The Cookie Matching Service enables a buyer to associate two kinds of data:

– the cookie that identifies a user within the buyer domain, and
– the cookie that identifies the user for Google. (We share a buyer-specific encrypted Google User ID for buyers to match on.)

The data structure that the buyer uses to maintain these associations is called a Match Table. While the buyer is responsible for building and maintaining match tables, Google can host them.

With an RTB [real-time bidding] application, the buyer can bid on impressions where the user has a specific Google User ID, and can use information associated with the Google User ID as criteria in a bid for an ad impression.

But it seems that this isn’t enough for Google. It actually gets worse… A USA Today story suggests Google is exploring the idea of an AdID, an identifier that it would share with advertisers to uniquely identify eyeballs rather than them having to use a range of alternative third party user tracking services.

How AdID would actually work (if indeed it is ever comes to pass) is not explained, although a post on the AdExchanger blog – What The Google AdID Means For Ad Tech – comes up with possible mechanism: Google uniquely identifies users (presumably using cookies and authenticated user credentials (topped up with a little bit of browser fingerprinting, I wonder?)) then provides an advertiser with a hashed version of the ID. The hashed identifier means advertisers can’t share information with each other.

The Google AdID service seems like it would be offered as an alternative to tracking users using third party services that use their own third party cookies, with a user tracking system that offers more effective identity tracking techniques (such as a logged in Google user id). Which is to say, Google wants to replace third party cookie based tracking services with it’s own (logged in user + cookies + browser fingerprinting + etc etc) user tracking service? Or have I misinterpreted all this?

One AdID to rule them all, One AdID to find them,
One AdID to bring them all and in the darkness bind them
In the Land of Google where the Shadows lie.

PS by the by, I notice in a post on Author information in search results, that “If you want your authorship information to appear in search results for the content you create, you’ll need a Google+ Profile with a good, recognizable headshot as your profile photo. Then, verify authorship of your content by associating it with your profile using either of the methods below.” Ah ha… so you agree to give the Goog a good photo of yourself that it can use in it’s face matching algos, such as in the sort of thing that could be used to unlock your phone. Good. Not. Will faceID play a part of AdID, I wonder? With a gaze tracking feedback loop? Or maybe Google will be getting more into the tracked user, outside advertising market?

PPS This then also brings back to mind the face tracking approaches mentioned in The Curse of Our Time – Tracking, Tracking Everywhere

PPPS By the by, it seems the Culture, Media and Sport Committee have no problem with online targeted ads:

85. Inevitably public funding is under pressure, a point illustrated by cuts in the budget of Arts Council England.[161] Given the essential role of public funding in sustaining the wider creative economy, it is crucial that adequate resources are available. Of course, the private sector should be encouraged as much as possible to invest in the creative industries. One good example is provided by advertising, which not only provides a major source of funding but is a creative industry in its own right. Evidence from the Advertising Association points to advertising as “a major creative industry and a critical source of funding for other creative industries”. The Advertising Association’s evidence goes on to express deep concern about draft EU Data Protection Regulation “which could damage direct marketing, internet advertising, and the UK economy both off and online”.[162] Increasing use is being made of personal data to target online advertising better. While concerns around this have prompted reviews of data protection legislation, we do not think the targeting of appropriate advertising—essential to so many business models —represents the greatest threat to privacy. [original emphasis]

Some Sketchnotes on a Few of My Concerns About #opendata

With my growing unease about just what the agenda driving open government/public data is, I think I’m going to have to find some time away to walk the dog lots, and mull over what pieces might be part of the jigsaw, as well as having a go at trying to put some of them together…

Near the top of the list is a concern about information asymmetry and how open data may be used by private concerns to provide a one-off advantage for them when it comes to poaching services from the public sector. How so? My gut reaction thinking is this: if, as part of the procurement process, the private sector can use open public data to help it secure a contract in competition with a public sector provider, then when contracts come to renewal the public sector may know less when it comes to bid than the private sector company was able to learn when it first tendered. The question here is: does open public data put private sector companies in an advantage when it comes to bidding for public service contracts against an encumbent public provider compared to a public body bidding to recapture a service from an encumbent private provider, given that the private provider may not be required to open up information (for example, through FOI requests, transparency or public reporting obligations) in the same way that a public body is.

Another take on a similar theme is the extent to which there may be a loss of transparency when a service goes from a public to a private provider. If we think there is some benefit to be had from transparency in general terms, then private providers of public services should have the same openness requirements placed on them as the public body. If private companies can claim revealing information is against their commercial interest, can public bodies make the same claims on exactly the same terms under FOI exemption rules, for example (eg MoJ Freedom of information guidance: Exemptions guidance – Section 43: Commercial interests).

Taking the NHS as a case example, here are a few things on my reading list:

  • Monitor report from March 2013 on A fair playing field for the benefit of NHS patients [actual report]. For example, the report identified the following distortions:

    1. Participation distortions. Some providers are directly or indirectly excluded from offering their services to NHS patients for reasons other than quality or efficiency. Restrictions on participation disadvantage providers seeking to expand into new services or new areas, regardless of whether the providers are public, charitable or private. Participation distortions disadvantage nonincumbent providers of every type.
    2. Cost distortions. Some types of provider face externally imposed costs that do not fall on other providers. On balance, cost distortions mostly disadvantage charitable and private health care providers compared to public providers.
    3. Flexibility distortions. Some providers’ ability to adapt their services to the changing needs of patients and commissioners is constrained by factors outside their control. These flexibility distortions mostly disadvantage public sector providers compared to other types.

    I’m not sure to what extent, if any, the report reviews distortions and asymmetries arising from open data issues.
    A search of the report for mentions of FOI turns up:

    Provider transparency
    Historically, public providers have faced higher levels of scrutiny than other providers, including requests for information under the Freedom of Information Act. This degree of scrutiny can improve accountability to patients and promote good practice. Freedom of Information requirements have been extended through the standard NHS contract to private and charitable providers. However, it is not clear that this is operating effectively as yet, and other aspects of transparency do not apply across all types of provider.
    29. The Government and commissioners should ensure that transparency, including Freedom of Information requirements, is implemented across all types of provider of NHS services on a consistent basis.

    As I said, it’s on the reading list…

  • A terrifying post on the Computer Weekly/Public Sector IT blog – NHS watchdog commandeers data in bid to stimulate privatization and an earlier one on the naive take on hospital mortality data: Data regime makes merciless start on NHS privatization. Are there any reports or strategy documents from the Care Quality Commission (CQC) I need to add to my reading list?
  • Something academic… such as this piece from the Proceedings of the 21st European Conference on Information Systems on The Generative Mechanisms of Open Government Data, much of which I suspect is summarised by these two figures taken (without permission) from the the paper:

    generative mechanisms open gov data


  • Opening up data (particularly data held by public bodies) around private companies is another area I can quite get my head round, particularly when it comes to comparing information about the machinations of private companies as compared to public bodies. To what extent should companies that are public and limited liability have data that is held by them by public bodies be openly available, for example? Maybe related to this is a currently open BIS consultation: Company ownership: transparency and trust discussion paper as well as the HMRC consultationon Sharing and publishing data for public benefit (press release) that I linked to from yesterday’s Rambling Round-Up of Some Recent #OpenData Notices. (OpenCorporate’s Chris Taggart posts some interesting thoughts on the sheen given to the proposed release of VAT registration data to credit agencies that the consultation is in part based around: Open tax data, or just VAT ‘open wash’.) A recent edition of File on 4 (h/t @onthewight) on charity based tax fraud – Faith, Hope and… Tax Avoidance – also got me wondering further about what information is openly available about charities’ activities (eg A Quick Peek at Some Charities Data…)?
  • One to dig for… via Lexology, a post on Freedom of information in the private sector? which claims that “The Confederation of British Industry (“CBI“) has revealed that it is developing ‘transparency guidelines’ that will apply to private companies that provide services to the NHS.” Have these appeared yet, even in draft or consultation form?

A few other things on my to-do list in this area: map out the lobbiests and board/panel members around open data; use disclosure logs to search for companies putting in FOI requests in different sectors; see who’s pitching ideas in to ODUG; map out who’s funding NGOs and activities in the opendata space.

Sigh… no time…;-)

PS Not sure if there is a full paper version of this…? Bates, J. 2013, Information policy in the crises of neoliberalism: the case of Open Government Data in the UK at International Association of Media and Communications Researchers Conference, Dublin, June 2013: “Whilst open data releases by the UK government have received substantial support within UK civil society, often being interpreted as a creative and innovative response to a range of social issues, and, for some, a radical challenge to key components of neoliberal capitalism, this paper argues that deeper analysis of the OGD initiative suggests that it is being shaped by the UK government and corporate interests in an attempt to leverage a distinctly neoliberal agenda. The adoption and development of the OGD agenda as core to the policy response adopted by the UK Government to conditions of political economic crisis, suggests that information policy is being implemented as a key, yet often opaque, element of the neoliberal policy toolbox.” See also an earlier paper, “This is what modern deregulation looks like”: Co-optation and contestation in the shaping of the UK’s Open Government Data Initiative (“whilst OGD [Open Government Data] might potentially support modes of transparent and democratic governance, the current ‘transparency agenda’ should be recognised as an initiative that also aims to enable the marketisation of public services, and this is something that is not readily apparent to the general observer.”) and a statement of Jo Bates current research project in the are: The politics of Open Government Data in the UK

PPS Supporting the idea of symmetry in reporting between public services and private companies delivering public services, Richard Murphy on Making public services accountable. And some excellent writings critiquing computational thinking and the teaching of code by Ben Williamson.

So What Counts as “Communications Data”?

Picking up on a post by @nevali (Communications Data) that looks at the layered structure of internet based communications in general and a peek inside an SMTP session in particular, I idly wondered about the structure of a tweet and what, exactly, might count as the communications data part of it, as defined by the draft Communications Data Bill:

TO what extent can we make a fair comparison with something like the “communications data” associated with this sort of transaction?

(89/365) One day this will be extinct

Or how about a postcard?

See also: From Communications Data to #midata – with a Mobile Phone Data Example

PS via @smithsam, and in a similar light, a consideration of the anatomy of a Facebook message

PPS Given part of the #midata focus on transaction data, I’ve also started wondering about the extent to which financial transactions count as communications, and how different payment mechanisms might change the nature of the transaction. For example, two people meeting face-to-face engaging in a cash transaction, versus a purchase made via an online form using a credit card.

PPPS inspired by the anatomy of a Facebook message, I just posted a tweet via the Twitter web interface to see what the traffic looked like. It was an HTTP post that included the following:

Request URL:
Request Method:POST
Status Code:200 OK
Request Payload
Response Headersview parsed
HTTP/1.1 200 OK
status: 200 OK
version: HTTP/1.1
cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
content-encoding: gzip
content-length: 880
content-type: application/json; charset=utf-8
date: Thu, 23 Aug 2012 09:32:08 GMT

It also got a response, which looks a lot like the data around a particular status update. Presumably the response to an update message is a set of data back describing that accepted status update?

{"in_reply_to_status_id_str":null, "id_str":"238569301735526400", "contributors":null, "truncated":false, "created_at":"Thu Aug 23 09:32:08 +0000 2012", "in_reply_to_user_id":64672382, "in_reply_to_user_id_str":"64672382", "in_reply_to_screen_name":"ousefulAPI", "user":{"id":7129072, "url":"http:\/\/", "profile_use_background_image":true, "verified":false, "profile_text_color":"000000", "contributors_enabled":false, "created_at":"Thu Jun 28 11:37:39 +0000 2007", "profile_image_url_https":"https:\/\/\/profile_images\/1195013164\/Picture_23_normal.png", "profile_image_url":"http:\/\/\/profile_images\/1195013164\/Picture_23_normal.png", "statuses_count":32203,"utc_offset":0, "profile_background_image_url_https":"https:\/\/\/profile_background_images\/2508031\/rss_globe.png", "profile_sidebar_border_color":"87BC44", "default_profile":false, "show_all_inline_media":false, "name":"Tony Hirst", "friends_count":742, "location":"UK","id_str":"7129072", "profile_background_tile":true, "protected":false, "profile_sidebar_fill_color":"E0FF92", "geo_enabled":false, "listed_count":423, "follow_request_sent":false, "lang":"en", "description":"OU lecturer, mashup artist; Isle of WIght resident and #f1datajunkie", "profile_background_color":"9AE4E8", "screen_name":"psychemedia", "is_translator":false, "time_zone":"London", "notifications":false, "profile_background_image_url":"http:\/\/\/profile_background_images\/2508031\/rss_globe.png", "default_profile_image":false, "profile_link_color":"0000FF", "favourites_count":377, "following":false,"followers_count":3905},"retweeted":false, "coordinates":null, "in_reply_to_status_id":null, "geo":null, "source":"web", "entities":{"user_mentions":[{"name":"OUseful", "screen_name":"ousefulAPI", "id_str":"64672382","indices":[0,11],"id":64672382}], "hashtags":[], "urls":[]},"id":238569301735526400,"place":null, "retweet_count":0, "favorited":false, "text":"@ousefulapi wondering what data the twitter web client sends when i post a tweet"}

What you’ll notice is that whilst the update as sent was just a message string, the response identifies the sender (along with biographical data, geo data (possibly), a link to a photo (possibly), a real name, it also identifies the person to whom the tweet was sent (a Twitter convention is the tweets starting with @… are in some sense sent to @…*), and also (via user_mentions) would explicitly identify any other individuals mentioned within the body of the tweet (which as are mentioned as part of the content of the message. If the tweet began @foo @bar …, whilst @foo would be identified as some sort of addressee, @bar wouldn’t, although it would be identified as a user_mention**. However, we might assument that the tweet was addressed in some sense to both @foo and @bar, whereas “@foo Will chat to @bar later” only mentions @bar as content… And “@foo @bar said that too, I think”, whilst clunky, could be interpreted as mentioned @bar as content not suggested addressee (eg in sense of “@foo I think @bar said that too”).

* the tweet will only appear in the timeline of the person is sent to (and if they follow you?), although it is still public. Many clients also display as a timeline “user_mentions” tweets, so if your Twitter username appears anywhere in the body of a tweet, you should see the tweet, even if you don’t follow the person who sent it.

** If the tweet starts with another character, eg “.@foo” then @foo is no longer an addrssee in the sense of in_reply_to. From a communications data point of view, what’s fair game as far as communications data goes?

Because the update is sent via https, I don’t think you could argue the update was posted as a plaintext postcard? In the postal mail system, how does the law distinguish between messages placed inside an intercepted closed envelope and messages written on an intercepted postcard?

(Hmm – what;s the traffic associated with a TWitter DM I wonder?)

From Communications Data to #midata – with a Mobile Phone Data Example

A BIS Press Release (Next steps making midata a reality) seems to have resulted in folk tweeting today about the #midata consultation that was announced last month. If you haven’t been keeping up, #midata is the policy initiative around getting companies to make “[consumer data] that may be actionable and useful in making a decision or in the course of a specific activity” (whatever that means) available to users in a machine readable form. To try to help clarify matters, several vignettes are described in this July 2012 report – Example applications of the midata programme – which plays the role of a ‘draft for discussion’ at the September midata Strategy Board [link?]. Here’s a quick summary of some of them:

  • form filling: a personal datastore will help you pre-populate forms and provide certified evidence of things like: proof of her citizenship, qualified to drive, passed certain exams and achieved certain qualifications, passed a CRB check, and so on. (Note: I’ve previously tried to argue the case for the OU starting to develop a service (OU Qualification Verification Service) around delivering verified tokens relating to the award of OU degrees, and degrees awarded by the polytechnics, as was (courtesy of the OU’s CNAA Aftercare Service), but after an initial flurry of interest, it was passed on. midata could bring it back maybe?
  • home moving admin: change your details in a personal “mydata” data store, and let everyone pick up the changes from there. Just think what fun you could have with an attack on this;-)
  • contracts and warranties dashboard: did my crApple computer die the week before or after the guarantee ran out?
  • keeping track of the housekeeping: bank and financial statement data management and reporting tools. I thought there already was software for doing this? do we use it though? I’d rather my bank improved the tools it provided me with?
  • keeping up with the Jones’s: how does my house’s energy consumption compare with that of my neighbours?
  • which phone? Pick a tariff automatically based on your actual phone usage. From going through this recently, the problem is not with knowing how I use my phone (easy enough to find out), it’s with navigating the mobile phone sites trying to understand their offers. (And why can’t Vodafone send me an SMS to say I’m 10 minutes away from using up this month’s minutes, rather than letting me go over? The midata answer might be an agent that looks at my usage info and tells me when I’m getting close to my limit, which requires me having access to my contract details in a machine readable form, I guess?

And here’s a BIS blog post summarising them: A midata future: 10 ways it could shape your choices.

(The #midata policy seems based on a belief that users want better access to data so they can do things with it. I’m not convinced – why should I have to export my bank data to another service (increasing the number of services I must trust) rather than my bank providing me with useful tools directly? I guess one way this might play out is that any data that does dribble out may get built around by developers who then sell the tools back to the data providers so they can offer them directly? In this context, I guess I should read the BIS commissioned Jigsaw Research report: Potential consumer demand for midata.)

Today has also seen a minor flurry of chat around the call for evidence on the Communications Data Bill, presumably because the closing date for responses is tomorrow (draft Communications Data Bill). (Related reading: latest Annual Report of the Interception of Communications Commissioner.) Again, if you haven’t been keeping up, the draft Communications Data Bill describes communications data in the following terms:

  • Communications data is information about a communication; it can include the details of the time, duration, originator and recipient of a communication; but not the content of the communication itself
  • Communications data falls into three categories: subscriber data; use data; and traffic data.

The categories are further defined in an annex:

  • Subscriber Data – Subscriber data is information held or obtained by a provider in relation to persons to whom the service is provided by that provider. Those persons will include people who are subscribers to a communications service without necessarily using that service and persons who use a communications service without necessarily subscribing to it. Examples of subscriber information include:
    – ‘Subscriber checks’ (also known as ‘reverse look ups’) such as “who is the subscriber of phone number 012 345 6789?”, “who is the account holder of e-mail account” or “who is entitled to post to web space”;
    – Subscribers’ or account holders’ account information, including names and addresses for installation, and billing including payment method(s), details of payments;
    – information about the connection, disconnection and reconnection of services which the subscriber or account holder is allocated or has subscribed to (or may have subscribed to) including conference calling, call messaging, call waiting and call barring telecommunications services;
    – information about the provision to a subscriber or account holder of forwarding/redirection services;
    – information about apparatus used by, or made available to, the subscriber or account holder, including the manufacturer, model, serial numbers and apparatus codes.
    – information provided by a subscriber or account holder to a provider, such as demographic information or sign-up data (to the extent that information, such as a password, giving access to the content of any stored communications is not disclosed).
  • Use data – Use data is information about the use made by any person of a postal or telecommunications service. Examples of use data may include:
    – itemised telephone call records (numbers called);
    – itemised records of connections to internet services;
    – itemised timing and duration of service usage (calls and/or connections);
    – information about amounts of data downloaded and/or uploaded;
    – information about the use made of services which the user is allocated or has subscribed to (or may have subscribed to) including conference calling, call messaging, call waiting and call barring telecommunications services;
    – information about the use of forwarding/redirection services;
    – information about selection of preferential numbers or discount calls;
  • Traffic Data – Traffic data is data that is comprised in or attached to a communication for the purpose of transmitting the communication. Examples of traffic data may include:
    – information tracing the origin or destination of a communication that is in transmission;
    – information identifying the location of equipment when a communication is or has been made or received (such as the location of a mobile phone);
    – information identifying the sender and recipient (including copy recipients) of a communication from data comprised in or attached to the communication;
    – routing information identifying equipment through which a communication is or has been transmitted (for example, dynamic IP address allocation, file transfer logs and e-mail headers – to the extent that content of a communication, such as the subject line of an e-mail, is not disclosed);
    – anything, such as addresses or markings, written on the outside of a postal item (such as a letter, packet or parcel) that is in transmission;
    – online tracking of communications (including postal items and parcels).

    To put the communications data thing into context, here’s something you could try for yourself if you have a smartphone. Using something like the SMS to Text app (if you trust it!), grab your txt data from your phone and try charting it: SMS analysis (coming from an Android smartphone or an IPhone). And now ask yourself: what if I also mapped my location data, as collected by my phone? And will this sort of thing be available as midata, or will I have to collect it myself using a location tracking app if I want access to it? (There’s an asymmetry here: the company potentially collecting the data, or me collecting the data…)

    It’s also worth bearing in mind that even if access to your data is locked down, access to the data of people associated with you might reveal quite a lot of information about you, including your location, as Adam Sadilek et al. describe: Finding Your Friends and Following Them to Where You Are (see also Far Out: Predicting Long-Term Human Mobility). My own tinkerings with emergent social positioning (looking at who the followers of particular twitter users also follow en masse) also suggest we can generate indicators about potential interests of a user by looking at the interests of their followers… Even if you’re careful about who your friends are, your followers might still reveal something about you you have tried not to disclose yourself (such as your birthday…). (That’s one of the problems with asymmetric trust models! Hmmm… could be interesting to start trying to model some of this… )

    Both of these consultations provide a context for reflecting on the extent to which companies use data for their own processing purposes (for a recent review, see What happens to my data? A novel approach to informing users of data processing practices), the extent to which they share this data in raw and processed form with other companies or law enforcement agencies, the extent to which they may use it to underwrite value-added/data-powered services to users directly or when combined with data from other sources, the extent to which they may be willing to share it in raw or processed form back with users, and the extent to which users may then be willing (or licensed) to share that data with other providers, and/or combine it with data from other providers.

    One of the biggest risks from a “what might they learn about me” point of view – as well as some of the biggest potential benefits – comes from the reconciliation of data from multiple different sources. Mosaic theory is an idea taken from the intelligence community that captures the idea that when data from multiple sources is combined, the value of the whole view may be greater than the sum of the parts. When privacy concerns are idly raised as a reason against the release of data, it is often suspicion and fears around what a data mosaic picture might reveal that act as drivers of these concerns. (Similar fears are also used as a reason against the release of data, for example under Freedom of Information requests, in case a mosaic results in a picture that can be used against national interests: eg D.E. Pozen, The Mosaic Theory, National Security, and the Freedom of Information Act and MP Goodwin, A National Security Puzzle: Mosaic Theory and the First Amendment Right of Access in the Federal Courts).

    Note that within a particular dataset, we might also appeal to mosaic theory thinking; for example, might we learn different things when we observe individual data records as singletons, as opposed to a set of data (and the structures and patterns it contains) as a single thing: GPS Tracking and a ‘Mosaic Theory’ of Government Searches. And as a consequence, might we want to treat individual data records, and complete datasets, differently?

    PS via this ORG post – Consulympics: opportunities to have your say on tech policies – which details a whole raft of currently open ICT related consultations in the UK, I am reminded of this ICO Consultation on the draft Anonymisation code of practice along with a draft of the anaoymisation code itself.

    News Corp in K12 Education Play

    One of the sections of the OU’s new Innovating Pedagogy report (the first in what is intended to be an ongoing review series), refers to Publisher-led mini courses, a consideration of how news publishers may encroach on or enter the informal HE/lifelong learning or CPD markets through partnerships with HEIs or otherwise.

    Whilst the OU’s business interests – and hence the focus of the report – are not on primary or secondary (K12) education (aside from teacher training considerations), today I notice that News Corp has announced an entrance into the K12 market: “News Corp unveils ‘Amplify’ to bring digitial innovation to K12 innovation”:

    Today, News Corporation unveiled the brand and business of its Education Division. Amplify is dedicated to reimagining K-12 education by creating digital products and services that empower students, teachers and parents in new ways. Amplify will enhance the potential of students with new curricular experiences, support teachers with new instructional tools and engage parents through extended learning opportunities. Amplify will introduce these unique and pioneering offerings in collaboration with AT&T.

    Amplify appears to be offering a tablet based play to compete with the K12 textbook market, offering rich interactive content with value adding learning analytics. Learning analytics and formative assessment are provided by another Amplify (i.e. News Corp) company, wireless generation (News Corp acquired a 90% stake in Wireless Generation in November 2010: FT: News Corp ‘bet’ on education sector). By the by, it seems Wireless Generation has itself been on the acquisition trail recently: Wireless Generation Buys Assessment Company Intel-Assess.

    There’s not a lot of substance on the Amplify site yet, so rather than rehash it here, I suggest you poke around the site yourself and see what jumps out (feel free to mention anything interesting you find in the comments;-) If that seems like to much hard work, try this report from GigaOm: How will News Corps’ new ed tech business ‘Amplify’ education?.

    From a quick dig around, though, Amplify appears to be focussing on delivery rather than credentialed assessment. I wondered briefly if that might be because it could introduce a conflict of interest if the company provided both content and assessment services, but presumably not, as the OU’s Innovating Pedagogy report noted:

    [I]n the UK Pearson operate EdExcel for the assessment of GCSE, GCE (A-level) and BTEC/vocational qualifications. Pearson has recently bought vocational trainers Education Development International and assessment and testing providers Centiport. If education is ripe for disruption, it may be that the assessment of training and the offering of examination services at higher levels of education will provide a route by which publishers can develop credibility in the assessment and award of an ever wider range of qualification products based around their content offerings.

    A couple of other things that strike me about the announcement, and that I should really try to ponder further: the extent to which the economics of education are influenced by the content business (maybe Andy Lane will chip in with a comment about the business model of school education..?;-); and the rate at which performance tracking and learning analytics style approaches are going to focus attention on reporting dashboards than human teacher-pupil relationships.

    I also wonder about the role of plurality in all this? In the UK, the notion of media plurality “helps to support a democratic society by ensuring citizens are informed by a diverse range of views and by preventing too much influence over political processes by one media owner or outlet”, or at least, that’s what a June 2012 press release announcing an OfCom report on measuring media plurality claims. But how about in education? How about in education where the publishers who control either – or both – the content and the means of certified assessment are also news publishers? How far should the notion of plurality extend then? Across all content businesses, including education?

    And So It Begins… The Disinteroperability of the Web (Or Just a Harmless Bug…?)

    When does a keyboard shortcut *not* do the same thing as the menu command it shortcuts? When it’s a Google docs copy command in Google Chrome, maybe?

    Although I know that I, and I suspect many of the any readers of this blog, use keyboard shortcuts unconsciously, intuitively, on a regularly basis: ctrl/cmd-f for within page search, -c for copy, -x for cut, and -v for paste. I also suspect that keyboard shortcuts are alien to many, and that a more likely route to these everyday operations is through the file menu:

    or (more unlikely?) via a right-click contextual pop-up menu:

    As keyboard shortcut users, we assume that the keyboard shortcuts and the menu based operations do the same thing. But whether a bug or not, I noticed today in the course of using Google docs in Google Chrome that when I tried to copy a highlighted text selection using either the file menu Copy option, or the contextual menu copy option, I was presented with this:

    (The -c route to copying still worked fine.)

    With Chrome well on its way to becoming the world’s most popular browser, allowing Google to dominate not just our searchable view over the web, but also intermediate our direct connection to the web through the desktop client we use to gain access to it, this makes me twitchy… Firstly, because it suggests that whilst the keyboard shortcut is still routing copied content via my clipboard, the menued option is routing it through the browser, or maybe even the cloud where an online connection is present? Secondly, because in prompting me to extend my browser, I realised I have no real idea of what sorts of updates Google is regularly pushing to me through Chrome’s silent updating behaviour (I’m on version 19.0.1084.46 at the moment, it seems… 19.0.1084.46.

    A lot of Google’s activities are driven by technical decisions based on good technical reasons for “improving” how applications work and interoperate with each other. But it seems to me that Google is also closing in on itself and potentially adopting technical solutions that either break interoperability, or include a Google subsystem or process at every step (introducing an alternative de facto operating system onto out desktop by a thousand tiny updates and extensions). So for example, whilst I haven’t installed the Chrome copy extension, I wonder if I had: would a menu based copy from a Google doc allow me to then paste the content into a Word doc running as a Microsoft Office desktop application, or paste it into my online WordPress editor. And if so, would Chrome be cacheing that copied content via the extension?

    Maybe this is something and nothing. Maybe I’m just confused about how the cut-and-paste thing works at all. Or maybe Google is starting to overstep its mark and is opening up an attack on host operating system functions from installed browser base. Which as the upcoming most popular browser in the world is not a bad beachhead to have…

    PS At least Google Public DNS isn’t forced onto Chrome users as the default way of identifying the actual IP address of a website that is used to actually connect the browser to it from an entered domain name or clicked on link…