Data Sharing is Good, Right? Or is HM Gov Evil?

I made a couple of soft resolutions to myself at the start of this year, one of which was to try to take more interest in policy matters, particular in areas that impact upon the web and “information”. But I suspect that getting my head round the implications of proposed new legislation is going to be non-trivial.

For example, the MySpace generation believes that sharing personal information in public is the thing you do, right? But what about when government agencies can freely share your personal data between themselves.

Becuase it seems that Her Majesty’s Government also seems to think the naive MySpace generation way… A couple of days ago, the Coroners and Justice Bill was introduced to Parliament containing a proposed amendment to the Data Protection Act:

152 Information sharing
After section 50 of the Data Protection Act 1998 (c.29) insert—
“PART 5A INFORMATION SHARING
50A Power to enable information sharing
(1) Subject to the following provisions of this Part, a designated authority may by order (an “information-sharing order”) enable any person to share information which consists of or includes personal data.

(3) For the purposes of this Part a person shares information if the
person—
(a) discloses the information by transmission, dissemination or otherwise making it available, or
(b) consults or uses the information for a purpose other than the purpose for which the information was obtained.”

I’m not sure what this might mean in practice (if you can think of any scenarios, please post them as a comment), so I’ll try to keep an ear out for what examples are given by Members when the Bill goes through its readings.

It seems, though, that there are “Explanatory notes” that explain the intention behind some of the proposals: Explanatory notes (Clause 152: Information Sharing):

691. Section 50A(1) creates an order-making power to enable a person to share information that consists of, or includes, personal data. The power is conferred on a designated authority. “Designated authority” is defined in new section 50A(2) as the Scottish Ministers, the Welsh Ministers, a Northern Ireland Department or an appropriate Minister. Section 50C determines when each of these designated authorities is entitled to make an order. An order under section 50A is known as an information-sharing order.

692. New section 50A(3) sets out the definition of data-sharing for the purposes of this section. Sharing in this section includes both the disclosure of data between two or more persons (such as when one company provides its client list to another company for commercial purposes), as well as where a single person uses some data for a purpose other than that which it was obtained for (for example where a Government Department obtains information for the purposes of exercising one particular statutory function such as the collection of tax but then later wishes to use the same information for another statutory function such as the provision of benefits and credits).

Here’s a bit more from the introduced Bill itself:

50B Information-sharing orders: supplementary provision
(1) An information-sharing order may—
(a) confer powers on the person in respect of whom it is made;
(b) remove or modify any prohibition or restriction imposed (whether by virtue of an enactment or otherwise) on the sharing of the information by that person or on further or onward disclosure of the information;
(c) confer powers on any person to enable further or onward disclosure of the information;
(d) prohibit or restrict further or onward disclosure of the information;
(e) impose conditions on the sharing of information;
(f) provide for a person to exercise a discretion in dealing with any matter;
(g) enable information to be shared by, or disclosed to, the designated authority;
(h) modify any enactment.

Now I’m not a lawyer, and I don’t speak Legislation, but what do paragraphs b and c mean exactly? In “real terms”? And how do they operate differently to g? Read them again… go on… read them…

(And “confer powers”? WTF? Like Heroes, or something?! Heh, heh… but seriously, folks, how far does that “powers” word unpack…?)

And some more:
(1) An information-sharing order may—
(b) remove or modify any prohibition or restriction imposed (whether by virtue of an enactment or otherwise) on the sharing of the information by that person or on further or onward disclosure of the information;
(c) confer powers on any person to enable further or onward disclosure of the information;

(g) enable information to be shared by, or disclosed to, the designated authority;

Paragraph f looks to me (but what do I know?) like a get-out/escape clause, if “exercise a discretion” means “I’ll do what the f**k I want” (which is how I’d naively translate it? Maybe a lawyer could correct my interpretation (“I’ll do what the f**k I want if I can bully or bluster you into believing I had a good reason at the time”, maybe?)

Here are the “explanatory notes“:

697. New section 50B of the 1998 Act makes supplemental provisions in relation to the information-sharing order powers in section 50A, and includes a non-exhaustive list of the kinds of provisions that may be included in an order under section 50A. New section 50B(1) provides that an order may remove or modify any legal barrier to information-sharing. This could be by repealing or amending other primary legislation, changing any other rule of law (for example, the application of the common law of confidentiality to defined circumstances), or creating a new power to share information where that power is currently absent. This section also provides for the conferral of powers on particular persons, the imposition of prohibitions and restrictions upon disclosure or sharing, and the provisions of a power to allow persons to exercise a discretion in dealing with such matters.

When popular websites changes their terms and conditions, it sometimes hits the blogosphere. But we have to remember that the web isn’t for everyone, and that most people don’t use the web as aggressively as do most OUseful.info readers. But if you live in the UK, the above “terms and conditions” could well apply to you one day soon. So maybe we should be taking more of an interest? (Or maybe y’all do, and it’s just me who’s late to the party…?)

One way forward might be if we encourage the growth of initiatives like the following (reviewed here) by making use of the resources provided and adding our own commentary/contribution to the discussion, as well as airing these matters in a wider forum than the Westminster village?

which leads to a page of White Paper Resources for Media and Bloggers. Resources that us bloggers can use to build a post around (remembering, of course, who actually produced the resources…), or at least that will act as a starting point for developing an understanding of what the government thinks it’s trying to achieve with yet more legislation. (See also: New Opportunities – Goevrnment 2.0 sites.)

(Remember also the reach out into the blogosphere by the debate on the future of Higher Education? We all got involved, right? Err….)

Anyway, anyway, if Tesco Clubcard’s T’s and C’s included the “data-sharing” provisions outlined above, would you be any more or less concerned?

Nah, probably not; who cares anyway…?!

PS if you do want to follow the Bill along, you can track its progress here:
Coroners and Justice Bill 2008-09: Progress of Bill including links to debates

I didn’t notice an RSS feed, though? If I get a chance, I’ll try to put one together over the weekend… (although, arguably, doing it now would be more useful – and more timely – than revising online course materials (for a deadline I’ve already missed) that go live to students for the first time in, err, March 2010, I believe?!;-)

Keeping Your Facebook Updates Private

So it seems as if Facebook is trying to encourage everyone to open up a little, and just share… Ah, bless… I suppose it is getting near to Christmas, after all…

So if you don’t want the world and Google to know everything you’re posting about on Facebook, and you are quite happy with privacy settings as they currently are, thank you very much, here’s what I (think) you need to do… Continue to the next step and change the settings from Everyone:

to Old Settings:

When you hover over the Old Settings radio button, a tooltip should pop up telling you what your current settings are. If anything looks odd, make a note of it so that you can change the setting later.

If you think you’d like to make things available to Everyone, bear in mind these important things to remember:

Information you choose to share with Everyone is available to everyone on the internet.

And when you install an application:

When you visit a Facebook-enhanced application, it will be able to access your publicly available information, which includes Name, Profile Photo, Gender, Current City, Networks, Friend List, and Pages. This information is considered visible to Everyone.

To save the settings, click to do exactly what it says on the button:

If, whilst changing the settings, you noticed that an Old Setting tooltip suggested that your current privacy settings were different to what you thought they were, you’ll need to go in to the Privacy Settings panel, which you can find from the Settings on the toolbar at the top of each Facebook page:

Looking at the actual privacy settings page, there are several menu options that lead to yet more menu options and then screenfuls of different settings…

When I have a spare 2-3 hours, I’ll try to post a summary of them… (unless anyone already knows of a good tutorial on “managing your Facebook privacy settings”?) For now, though, I’m afraid you’re own trying to track down the setting you disagreed with so that you can change it to a setting you do want to have…

Rant About URL Shorteners…

It’s late, I’m tired, and I have a 5am start… but I’ve confused several people just now with a series of loosely connected idle ranty tweets, so here’s the situation:

– I’m building a simple app that looks at URLs tweeted recently on a twitter list;
– lots of the the URLs are shortened;
– some of the shortened URLs are shortened with different services but point to the same target/destination/long URL;
– all I want to do – hah! ALL I want to do – is call a simple webservice example.com/service?short2long=shorturl that will return the long url given the short URL;
– i have two half solutions at the moment; the first is using python to call the url (urllib.urlopen(shorturl)), then use .getinfo() on the return to look-up the page that was actually returned; then I use Beautiful Soup to try and grab the <title> element for the page so I can display the page title as well as the long (actual) URL; BUT – sometimes the urllib call appears to hang (and I can’t see how to set a timeout/force and except), and sometimes the page is so tatty Beuatiful Soup borks on the scrape;
– my alternative solution is to call YQL with something like select head.title from html where url=”http://icio.us/5evqbm&#8221; and xpath = “//html” (h/t to @codepo8 for pointing out the xpath argument); if there’s a redirect, the diagnostics YQL info gives the redirect URL. But for some services, like the Yahoo owned delicious/icio.us shortener, the robots.txt file presumably tells the well-behaved YQL to f**k off, becuase 3rd party resolution is not allowed.

It seems to me that in exchange for us giving shorteners traffic, they should conform to a convention that allows users, given a shorturl, to:

1) lookup the long URL, necessarily, using some sort of sameas convention;
2) lookup the title of the target page, as an added value service/benefit;
3) (optionally) list the other alternate short URLs the service offers for the same target URL.

If I was a militant server admin, I’d be tempted to start turning traffic away from the crappy shorteners… but then. that’s because I’m angry and ranting at the mo…;-)

Even if I could reliably call the short URL and get the long URL back, this isn’t ideal… suppose 53 people all mint their own short URLs for the same page. I have to call that page 53 times to find the same URL and page title? WTF?

… or suppose the page is actually an evil spam filled page on crappyevilmalware.example.com with page title “pr0n t1t b0ll0x”; maybe I see that and don’t want to go anywhere near the page anyway…

PS see also Joshua Schachter on url shorteners

PPS sort of loosely related, ish, err, maybe ;-) Local or Canonical URIs?. Chris (@cgutteridge) also made the point that “It’s vital that any twitter (or similar) archiver resolves the tiny URLs or the archive is, well, rubbish.”

Predictive Ads…? Or Email Address Targeted Advertising…?!

As I get was getting increasingly annoyed by large flashing display ads in my feedreader this morning, the thought suddenly occurred to me: could Google serve me ads on third party sites based on my unread Gmail emails?

That is, as I check my feeds before my email in a morning, could I be seeing ads that foreshadow the content of the email I’ve been ignoring for way too long? Or could I receive ads that flag the content of my Priority Inbox messages?

Rules regarding sensitivity and privacy would have to be carefully thought through,m of course. Here’s how they currently stand regarding contextual ads delivered in Gmail (More on Gmail and privacy: Targeted ads in Gmail):

By offering Gmail users relevant ads and information related to the content of their messages, we aim to offer users a better webmail experience. For example, if you and your friends are planning a vacation, you may want to see news items or travel ads about the destination you’re considering.

To ensure a quality user experience for all Gmail users, we avoid showing ads reflecting sensitive or inappropriate content by only showing ads that have been classified as “Family-Safe.” We also avoid targeting ads to messages about catastrophic events or tragedies. [Google’s emphasis]

[See also: Ads in Gmail and your personal data Share Comment]

Not quite as future predictive as gDay™ with MATE™ that lets you “search tomorrow’s web today” and “[discover] content on the internet before it is created”, but almost…!

It’s also a step on the road to Eric Schmidt’s dream of providing you with results even before you search for them. (For a more recent interview, see Google’s Eric Schmidt predicts the future of computing – and he plans to be involved.)

Here’s another, more practical(?!) thought – suppose Google served me headers of Priority Inbox email messages that were also marked as urgent through Adwords ads, in a full-on attempt to try to attract my attention to “really important” messages?! “Flashmail” messages delivered through the Adwords network… (I can imagine at least one course manager who I suspect would try to contact me via ads when I don’t pick up my email! ;-)

Searching the internet of things may still be a little way off though….

PS thinking email address targeted ads (mailads?) through a bit more, here are a couple of ways of doing it that immediately come to mind. Suppose I want to target an ad at whoever@example.com:

1) Adwords could place that ad in my GMail sidebar; (I think they’d be unlikely to place ads within emails, even if clearly marked, because this approach has been hugely unpopular in the past (it also p****s me off in feeds ); that said, Google has apparently started experimenting with (image based) display ads in gmail;

2) Adwords could place the ad on a third party site if the Goog spots me via a cookie and sees I’m currently logged in to Google, for example, with the whoever@example.com email address.

As Facebook gets into the universal messaging game, email address based ad targeting would also work there?

PPS interesting – the best ads act as content, so maybe ads could be used to deliver linked content? Twitter promoted tweets – the AdWords for live news?. Which reminds me, I need to work up my bid for using something like AdWords to deliver targeted educational content.

BBC Click Radio – Openness Special on “Privacy”: Jeff Jarvis vs. Andrew Keen

This week saw the latest episode in the OU/BBC World Service Click (radio) co-produced season on openness, with a focus this week on privacy… You can hear an extended version of the discussion between entrepeneurial journalism and openness advocate, Jeff Jarvis, and professional contrarian, Andrew Keen: Privacy in a connected world

Unfortunately, the episode aired just too early to pick up up on this week’s “Who needs privacy?!” news, and in particular the new iPhone’s “secret” location logging behaviour: iPhone keeps record of everywhere you go; (find out how to see where your iPhone thinks you’ve been here: Got an iPhone or 3G iPad? Apple is recording your moves); but the discussion is a great one, so I encourage you to listen to it…(I’ll be asking questions later!;-)

The programme also saw the launch of its new hashtag: #bbcClickRadio

Whilst the Digital PlanetClick twitter audience is still dwarfed by the Digital Planet Listeners’ Facebook group, I’m keen to see if we can try to grow it… one way might be to show who’s recently been tweeting about the programme, and encourage people to start following each other and chatting about the issues raised in the programme a little bit more – something Gareth Mitchell (@garethm) can now pick up on at least on the first airing, as Click now goes out live…. So to that end, I’m going to try to work up a special version of my Twtter friendviz application that shows connections between folk who’ve recently tweeted a particular term, and in this case, the #bbcClickRadio hashtag. To see the map, visit http://bit.ly/bbcclickradiocommunity.

As a tease, here’s a rather more polished version of a map I grabbed recently…

Snapshot of #bbcClickRadioCommunity - http://bit.ly/bbcclickradiocommunity

(Unfortunately, the live one is unlikely to ever look like this!)

PS I wonder if the investigation into the iPhone tracking was inspired by the recent story about German politician Malte Spitz who managed to obtain a copy of the data his phone provider had stored about his location… Zeit Online: Tell-all telephone (If you want to play with the data, it’s available from there…)

Name-based Robots.txt for Wifi Access Points?

Google just announced via a blog post – Greater choice for wireless access point owners – that owners of wifi acccess points who did not want Google to add the address and location of the access point to the Google Location Server that they need to rename the access point by adding _nomap to the end of the access point name or SSID (e.g. My Network_nomap) [UPDATE: note that this means it’s an opt-out model rather than a _mapme opt-in strategy (h/t @patparslow for that…)]

This is a bit like the declarative approach webpublishers take to identify pages they don’t want search robots indexing, by including the names/paths of “please don’t” content in a robots.txt file. The Google assumption seems to be that if anything is visible in pretty much any way, they can index it unless you explicitly tell them not to.

All well and good, but what about the access points that Google has already added to the index, even if their publishers rather they didn’t? Will these be automagically removed next time a lookup is made?

Maybe the removal protocol will work like this: Android phone or browser with location service enabled* detects local access point name, tells Google Location Service, Google notes that the name is now ‘_nomap’, deletes it from the index, returns ‘not found’?

*You do know your browser often knows where you are from local wifi points, don’t you, even if your laptop doesn’t have GPS or a 3G card? It tends to go by the name location aware browsing and involves your browser sending identifiers such as your IP address, the names of local wifi access points, and a browser ID to a Google service that has a big database of identifiers and geo-location data for where it thinks each identifier is located. (Hmmm..interesting… I hadn’t realised that Firefox uses the Google Location Service till just now..?)

I don’t think you even need to be logged on to a network for it’s name to be phoned back to the location service? As the Mozilla FAQ puts it: “By default, Firefox uses Google Location Services to determine your location by sending … information about the nearby wireless access points…” (note nearby wireless access points).

PS by the by, here’s the strategy used by Android phones for detecting location.

Obtaining locations in android http://developer.android.com/guide/topics/location/obtaining-user-location.html

Is there a similar diagram for how browsers approach location detection anywhere?

Inappropriate Linkification (aka redirection attacks?!) in Google Docs

Reading through another wonderful post on the FullFact blog last night (Full Fact sources index: where to find the information you need), I noticed that the linked to resources from that post were being redirected via Google URL:

drafting in Google docs

A tweet confirmed that this wasn’t intentional, so what had happened? I gather the workflow used to generate the post was to write it in Google docs, and then copy and paste the rich/HTML text into a rich text editor in Drupal, although I couldn’t recreate this effect (and nor could FullFact). However, suitably suspicious, I started having a play, writing a simple test document in Google docs:

gogle doc link tracking

The Google doc automatically links the test URL I added to the document. (This is often referred to as “linkification” – if a piece of text is recognised as something that looks like a URL or web link, it gets rewritten as a clickable link. Typically, you might assume that the link you’ll now be clicking on is the link that was recognised. This may be a bad assumption to make…) If you hover over the URL as written in the document, you get a tooltip that suggests the link is to the same URL. However, if you hover over the tooltip listed URL, (or click on it) you can see from the indicator in the bottom left hand corner of the browser what the actual URL you’re clicking on is. Like this:

google docs link direction

In this case, the link you’ll actually click on is referral to the original link via a Google URL. This one, in fact:

http://www.google.com/url?q=http%3A%2F%2Fblog.ouseful.info&sa=D&sntz=1&usg=AFQjCNHgu25L-v9rkkMqZSX54E8kP_XR-A

What this means is that if I click on the link, Google tracks the fact that the link was clicked on. From the value of the usg variable (in this case, AFQjCNHgu25L-v9rkkMqZSX54E8kP_XR-A) it presumably also knows the source document containing the link and whatever follows from that.

Hmmm… If I publish the document, the Google rewrite appears to be removed:

google doc publish to web

There are also several export options associated with the document:

google doc export options

So what links get exported?

Here’s the Word export:

google doc export docx word

That seems okay – no tracking. How about odt?

google doc export as odt

That looks okay too. RTF and and HTML export also seem to publish the “clean” link.

What about PDF?

google doc export as PDF

Hmm… so tracking is included here. So if you write a doc in Google docs that contains links that are autolinked, then you export that doc as PDF and share it with other folk, Google will know when folk click on that link from a copy of that PDF document and (presumably) the originally authored Google docs document (and all that that entails…)

How about if we email a doc as a PDF attachment to someone from within Google docs:

google doc email pdf

So that seems okay (untracked).

What’s the story then? FullFact claimed they cut and paste rich HTML from Google docs into a rich text editor and the Google redirection attack was inserted into the link. I couldn’t recreate that, and nor could the FullFact folk, so either there are some Google “experiments” going on, or the workflow was misremembered.

In my own experiments, I got a Google redirection from clicking links within my original document, and from the exported PDF, but not from any other formats?

So what do we learn? I guess this at least: be aware that when Google linkifies links for you, it may be redirecting clicks on those links through Google tracking servers. And that these tracked links may be propagated to exported and/or otherwise shared versions of the document.

PS see also Google/Feedburner Link Pollution or More Link Pollution – This Time from WordPress.com for more of the same, and Personal Declarations on Your Behalf – Why Visiting One Website Might Tell Another You Were There for a quick overview of what might happen when you actually land on a page…

Link rewriters are, of course, to be find in lots of other places too…

Twitter, for example, actually wraps all shared links in it’s t.co wrapper:

twitter rewrite

Delicious (which I’ve stopped using – I moved over to Pinboard) also uses it’s own proxy for clicked on stored bookmarks…

delicious link rewriter

If you have any other examples, particularly of link rewriting/annotation/pollution where you wouldn’t expect it, please let me know via the comments…

Time for Chaff as Google Analytics Adds Demographic and Interest Based Segmentation?

Via @mhawksey RTing @R3beccaF (I missed Rebecca’s tweet first time round), I notice that “Google Analytics can now segment visitors by age, gender and interests”, as described here: Getting Excited about Google Analytics’ Upcoming Features. The supported dimensions – age, gender and interest – allow you to get some idea about the demographics of your site visitors and segment stats on the same (though I wonder about sampling errors, how the demographic data is associated with user cookies etc?) Note also that demographics stats have previously been available in other Google products, such as Youtube and (via Karen Blakeman), Blogger, and demographic targeting of ads has been around for some time, of course…

Previously, to get demographic data into Google Analytics, I think you had to push it there yourself via custom variables (eg example; see also some of these sneaky tricks (I quite liked the idea of finessing the acquisition of user demographics data by capturing responses to ads placed via demographic targeting tools…!;-)

In passing, I just wonder about this phrase from the Google Analytics terms of service (my emphasis): You will not (and will not allow any third party to) use the Service to track, collect or upload any data that personally identifies an individual (such as a name, email address or billing information), or other data which can be reasonably linked to such information by Google.

So does this mean Google is free to try to learn from and link to whatever it thinks it can from your custom variable data, for example?

In any case, this all seems in keeping with Google’s aim to do everyone’s tracking on their behalf

Note to self: get up to speed on cohorts (90 days history only? This section in this post on unified segments suggests at least 6 months history?).

Note to self, 2: how could we go about obfuscating the data collected from us? I wonder about how we might go about creating digital/browser chaff? For example, running a background process that visits random websites and runs random searches under the guise of my Google account…?

I should probably tag this under: targeting countermeasures.

Mental Health and the Web – Hounded by Ads, Bullied by Retargeters?

Some thinkses not thought through…

– can web ads feed paranoia, particularly retargeted ads that follow you round the web?
– can personalised ads be upsetting, or appear bullying? For example, suppose you’re critically ill, or in debt, or you’ve been researching a sensitive personal or family matter, and then the ads start to hound you and remind you about it…

Is one take on privacy that it is not to be reminded about something by an external agency? (Is there a more memorable quote along those lines somewhere..?!)

Are personalised ads one of the more pernicious examples of a “filter bubble” effect?

I think I need a proper holiday, well away from the web!