Thoughts on Telling….

Earlier today, I spent an hour or so Telling outside a poll station on behalf of a friend who’s standing in a local election. Tellers are those party reps (though not strictly so, in my case…) who grab the poll card numbers of folk who’ve voted so that when evening strikes, any folk who promised to vote for you who haven’t turned out can be bundled into the back of a car and driven to the polling station. Or something like that.

The Telling process itself is paper based, scribbling down numbers of voters (or a line to show a vote from an unknown person) as they enter or leave the polling station. (I asked the other Tellers whether the form was to wait until the voter left the polling station, so as not to be seen as attempting to unfairly influence voters on the way into the station. One said he had been originally “trained” to do just that, but the retired Conservative lady tellers seemed to prefer grabbing the number on the way in…)

So I stood there for an hour, writing down numbers on a piece of paper… (I also annotated some of the marks with a simple clock face showing the approximate o’clock of when the vote was cast, just because…; (the Telling sheet instructions advised starting collections from each new hour on a fresh sheet, to give an idea of hourly turnout)).

Just before I went out, I spent all of 2 mins creating a Google form that would capture the voter number information (plus optional info – I’ve no idea what that might be? Maybe definite knowledge of how the voter claimed to be voting or have voted?):

Simple form...

Grabbing the shortcode link for the form meant it was on my phone and available for recording numbers in electronic form… (It also struck me that it should be quite easy to let folk submit numbers to somewhere via a text message; so for anyone with unlimited free texts, there’d be no reason not to send in voter numbers to either a local or national collection point, maybe with sender number reconciled back to a particular ward or polling station?) I didn’t use this “app” in the end, (are there data collection/protection issues with regard to shoving this info into a Google Spreadsheet, for example?), but there was no technical reason why it couldn’t have been used to collect this data in real time.

So what else came to mind? Gaining access to the electoral roll allows (I think) the domicile of voters to be identified by poll number, at least to the street level; which means that it would be possible to generate a crude heat map, in real time, of where votes had been cast from.

Another part of the jigsaw: when canvassing, I believe the theory is that candidates capture the details of voters who might vote their way, so that they can be knocked up (or whatever the phrase is!) if it turns out that by a particular time they don’t appear to have voted (as recorded by the Tellers). Again, this seems to be a paper based activity, although it would be trivial to capture this information, along with exact address or lat/long details, with an app, and feed that info back to a local or national data collection point.

When I mentioned in chatting to the other Tellers that it would be trivial to put tells onto a rough map, the response was: “but why?” My first thought was: well why not, it’s not hard….?! Then followed by the thought: why not “demap” markers? If you have built up a map of possible friendly votes in pre-poll canvassing through a 2 minute to create app, every time one of your voters turns out you could remove their marker from the map (or change the shade of their marker). That way you’d have an at a glance view of where you needed to go encouraging voters out.

So… would the be a Good Thing to do or not? Or would it be a bit creepy? Is capturing this sort of data in this sort of way legal? Does it infringe on areas governed by the Data Protection Act, or electoral regulations?

PS Hmmm… what would be really handy would be a DIFF spredsheet formula that would generate a list of items that appear in one column that donlt appear in another column. If you know of one, or an easy way of achieving this in a spreadsheet, please post a comment below letting me know how ;-)

Personal Declarations on Your Behalf – Why Visiting One Website Might Tell Another You Were There

A couple of weeks ago, I was chatting to someone about their use of Google Analytics on OU related websites. I asked a question about whether or not privacy concerns had been taken into account, and was provided with a response along the lines of “we don’t know who the data relates to, so it’s not really a problem”. Which is not quite what I was asking…

Whenever you visit a website that uses Adsense or Google Analytics, Google knows that a particular person – as identified by a cookie that Google stores in your browser – has visited that website. If you’re logged in to a Google account (in another tab or window in the same browser, for example; or from a previous visit to GMail or Google docs), Google also potentially knows that it is INSERT YOUR NAME HERE who is represented by that cookie, and hence, Google knows which of the many, many Google Analytics and Adsense serving websites you have visited. (This may also be why Google just acquired the Labpixies widget platform – it wants to extend its reach…)

With the release of Facebook’s “Like” button that has started appearing on many websites, Facebook is now developing the capacity to track who of its users are visiting “Likeable” websites. If you’ve logged in to Facebook, Facebook will have placed a cookie in your browser to identify you. Whenever you visit a website that has installed any Facebook for Websites utilities (see also Facebook widgets), the Facebook code inserted into the page tells Facebook that you have visited that site.

This information may then be passed back to the website you have visited in an anonymised, aggregated form as domain analytics, aggregated analytics that reveal demographic data about the makeup of the site visitors that Facebook knows about. (See Facebook Overhauls Page and App Insights, Adds Domain Analytics Features and an API for a good overview of insight and analytics services provided by Facebook that cover apps and pages, as well as domains.)

Youtube also offers a similar service to video publishers, in the form of reports about the demographics of people who have viewed a particular video, presumbaly based on samples of viewers who have entered this personal information as part of their Youtube profile.

Note that by claiming your website on Facebook, you can also get hold of reports relating to activity around your website on Facebook itself:

Facebook Insights for your domain

So what information can we learn from this?

  • demographics/personal profile data may be passed on (in aggregate form) as part of analytics reports;
  • user tracking across multiple websites being achieved by the big web companies, but in an indirect way. If the OU includes Google Analytics or a Facebook Like button on an OU page, Google or Facebook respectively know that you have visited that page. The OU doesn’t necessarily know, but the third party site whose tools are annotating the OU’s pages do know.

(Your ISP will know too, of course; as does your browser; so if you were paranoid, you might think that the browser supplied on your iPhone or Android phone is phoning home information about the websites you have visited. But that would just be paranoid, right…?! After all, EULA details probably rule that sort of thing out (anyone checked?), unless they rule them in… in an opt-in-able way, of course. (So for example, if you install the Google toolbar, you can let Google maintain a history of all the websites you visit; as it (opt-in-ally) does with all the websites you click through to from a Google search results page;-)

There’s a great discussion about whether Facebook needed to implement its widgets in an intrusive, always user tracking way, in this episode of the Technometria podcast – What’s Facebook Thinking? (the long and the short of it: it didn’t need to…): Audio.

As to why Facebook and Google want this demographic and interest/attention data? They make megabucks from selling targeted advertising

And as to what we can do about it? I’d love to see a map of t’internet that shows the proportion of websites that Google and Facebook can track my behaviour on because those site owners have invited them in…

PS h/t to @stuartbrown for pointing out the work that Mathieu d’Aquin from the OU’s KMi is doing on tracking the amount of personal data we reveal to websites.

PPS as a user, taking defensive measures using things like browser privacy settings may also not be very much help. See for example Why private browsing isn’t…

PPPS Heh heh – browser DNA: combinations of browser extension version numbers might uniquely identify your browser! And you all know this, right? How to sniff browser history files.

PPPPS Here’s another way of leaking personal data: if you click on a link on a page, the site you are visiting is notified about the URL of the referring page (so if you click a link on the OUseful.info blog the site you click through to knows you came from this blog). If personal information – such as your name – is encoded into the URL of a page you click from (for example, because you are clicking through an ad on your profile page whose URL includes your name), this information may be passed to the clicked thru to site. Ref: WSJ: Facebook, MySpace Confront Privacy Loophole and Ars Technica Report: Facebook caught sharing secret data with advertisers. See also: On the Leakage of Personally Identifiable Information Via Online Social Networks[pdf]

P-whatever-S If you want to op-out of being tracked by Google Analytics, here’s a RWW report of an “official” tool for doing just that.. (This can all get a bit meta- though.. e.g. best way of finding out public content websites don’t want Google to index is just to look at the robots.txt file for the website;-) I also wonder: should privacy policies for sites that include things like Google Analytics also link to tools that would allow the user to the site to opt-out of being part of that tracking?

And finally? The moral of the story is: does your privacy policy describe who else might be tracking your users whenever they visit your site?

[UPDATE: here’s another example – NHS.uk allowing Google, Facebook, and others to track you]

Debunking Uniform Swing… Maybe Next Time…?

A few days ago, I doodled this:

3-way swingometer doodle

The idea was to use a three way swingometer to show the changing fortunes of the three parties with respect to each other in terms of votes cast today compared to 2005, or as input to a visualisation that would allow folk to play around with swings from one party to another (based on election 2005 results) and see the effect on likely allocation of seats this time round. But then I went and did something else

Anyway, via John Naughton just now, I picked up on this post – Labour Danger: Uniform Swing Calculations May Understate Risk to Incumbents – which comments about the dangers of the uniform swing model whose only attraction, as far as I can tell, is that it is “fairly easy to calculate”. Instead, “an alternative approach [is proposed] which, while also based on fairly simple assumptions, is potentially more robust. The approach works by assigning shares of one party’s 2005 vote to another. For instance, what happens if 10 percent of people who voted for Labour in 2005 defect to the Conservatives, 15 percent of Labour’s voters defect to LibDems, and 10 percent of the Conservatives’ voters defect to LibDems?”

Exactly…

I don’t have time today to hack together my three way swingometer thingy, though something similar already exists here:

Swingometer

…but maybe if we get another election after the summer, I’ll hack it together for then…?;-)

PS see also UK General Election 2010 – Interactive Maps and Swingometers.

Confluence in My Feed Reader – The Side Effects of Presenting

Don’tcha just love it when a complementary posts happen along within a day or two of each other? Earlier this week, Martin posted on the topic of Academic output as collateral damage suggested that “you can view higher education as a long tail content production system. And if you are producing this stuff as a by-product of what you do anyway then a host of new possibilities open up. You can embrace unpredictability”.

And then today, other Martin comes along with a post – Presentation: Twitter for in-class voting and more for ESTICT SIG – linking to a recording of a presentation he gave yesterday, but one that includes twitter backchannel captions from the presentation that were tweeted by the presentation that in turn itself, as well as the (potentially extended/remote) audience.

Brilliant… I love it…I’m pretty much lost for words…

`Just... awesome...

What we have here, then, is the opening salvo in a presentation capture and amplification strategy where the side effects of the presentation create a legacy in several different dimensions – an audio-visual record, for after the fact; a presentation that announces it’s own state to a potentially remote Twitter audience, and that in turn can drive backchannel activity; a recording of the backchannel, overlaid as captions on the video recording; and a search index that provides timecoded results from a search based on the backchannel and the tweets broadcast by the presentation itself. (If nothing else, capturing just the tweets from the presentation provides a way of deep searching in time into the presentation).

Amazing… just amazing…

Programming, Not Coding: Infoskills for Journalists (and Librarians..?!;-)

A recent post on the journalism.co.uk site asks: How much computer science does a journalist really need?, commenting that whilst coding skills may undoubtedly be useful for journalists, knowing what can be achieved easily in a computational way may be more important, because there are techies around who can do the coding for you… (For another take on this, see Charles Arthur’s If I had one piece of advice to a journalist starting out now, it would be: learn to code, and this response to it: Learning to Think Like A Programmer.)

Picking up on a few thoughts that came to mind around a presentation I gave yesterday (Web Lego And Format Glue, aka Get Yer Mashup On), here’s a slightly different take on it, based on the idea that programming doesn’t necessarily mean writing arcane computer code.

Note that a lot of what follows I’d apply to librarians as well as journalists… (So for example, see Infoskills for the Future – If You Can’t Handle Information, Get Out of the Library for infoskills that I think librarians as information professionals should at least be aware of (and these probably apply to journalists too…); Data Handling in Action is also relevant – it describes some of the practical skills involved in taking a “dirty” data set and getting it into a form where it can be easily visualised…)

So here we go…. An idea I’ve started working on recently as an explanatory device is the notion of feed oriented programming. I appreciate that this probably already sounds scary geeky, but it’s a made up phrase and I’ll try to explain it. A feed is something like an RSS feed. (If you don’t know what an RSS feed, this isn’t a remedial class, okay… go and find out… this old post should get you started: We Ignore RSS at OUr Peril.)

Typically, an RSS feed will contain a set of items, such as a set of blog posts, news stories, or bookmarks. Each item has the same structure in terms of how it is represented on a computer. Typically, the content of the feed will change over time – a blog feed represents the most recent posts on a blog, for example. That is, the publisher of the feed makes sure that the feed has current content in it – as a “programmer” you don’t really need to do anything to get the fresh content in the feed – you just need to look at the feed to see if there is new content in it – or let your feed reader show you that new content when it arrives. The feed is accessed via a web address/URL.

Some RSS feeds might not change over time. On WriteToReply, where we republish public documents, it’s possible to get hold of an RSS version of the document. The document RSS feed doesn’t change because the content of the document doesn’t change), although the content of the comment feeds might change as people comment on the document.

A nice thing about RSS is that lots of things publish it, and lots of things can import it. Importing an RSS feed into an application such as Google Reader simply means pasting the web address of the feed into a “Subscribe to feed” box in the application. Although it can do other things too, like supporting search, Google Reader is primarily a display application. It takes in RSS feeds and presents them to the user in an easy to read way. Google Maps and Google Earth are other display applications – they display geographical information in an appropriate way, a way that we can readily make sense of.

So what do we learn from this? Information can be represented in a standard way, such as RSS, and displayed in a visual way by an application that accepts RSS as an input. By subscribing to an RSS feed, which we identify by a fixed/permanent web address, we can get new content into our reader without doing anything. Subscribing is just a matter of copying a web address from the publisher’s web site and pasting it into our reader application. Cut and paste. No coding required. The feed publisher is responsible for putting new content into the feed, and our reader application is responsible for pulling that new content out and displaying it to us.

One of the tools I use a lot is Yahoo Pipes. Yahoo Pipes can take in RSS feeds and do stuff with it; it can take in a list of blog posts as an RSS feed and filter them so that you only get posts out that do – or don’t – mention cats, for example. And the output is in the form of an RSS feed…

What this means is that if we have a Yahoo pipe that does something we want in computational terms to an RSS feed, all we have to do is give it the web address of the feed we want to process, and then grab the RSS output web address from the Pipe. Cut and paste the original feed web address into the Pipe’s input. Cut and paste the web address of the RSS output from the pipe into our feed reader. No coding required.

Another couple of tools I use are Google Spreadsheets (a spreadsheet application) and Many Eyes WIkified (an interactive visualisation application). If you publish a spreadsheet on Google docs, you can get a web address/URL that points to a CSV (comma separated variable) version of the selected sheet. A CSV file is a simple text file where each spreadsheet row is a represented as a row in the CSV structured text file; and the value of each cell along a row in the original spreadsheet is represented as the same value in the text file, separated from the previous value by a comma. But you don’t need to know that… All you do need to know is that you can think of it as a feed… With a web address… And in a particular format…

Going to the “Share” options in the spreadsheet, you can publish the sheet and generate a web address that points to a range of cells in the spreadsheet (eg: B1:D120) represented as a CSV file. If we now turn to Many Eyes Wikified, I can provide it with the web address of a CSV file and it will go and fetch the data for me. At the click of a button I can then generate an interactive visualisation of the data in the spreadsheet. Cut and paste the web address of the CSV version of the data in a spreadsheet that Google Docs will generate for me into Many Eyes Wikified, and I can then create an interactive visualisation using the spreadsheet at the click of a button. Cut and paste a URL/web address that is generated for me. No coding required.

As to where the data in the spreadsheet came from? Well maybe it came from somewhere else on the web, via a URL? Like this, maybe?

So the model I’m working towards with feed oriented programming is the idea that you can get the web address of a feed which a publisher will publish current content or data to, and paste that address in an application that will render, or display the content (e.g. Google Reader, Many Eyes Wikified) or process/transform that data on your behalf.

So for example, Google Reader can transfrom an HTML table to CSV for you; (Google spreadsheets also lets you do all the normal spreadsheet things, so you could generate one sheet from another sheet using whatever spreadsheet formulae you like, and publish the CSV representation of that second sheet). Or in Yahoo Pipes, you can process an RSS feed by filtering its contents so that you only see posts that mention cats.

Yahoo Pipes offers other sorts of transformation as well. For example, in my original Wikipedia scraping demo, I took the feed from a Google spreadsheet and passed it to Yahoo Pipes where I geocoded city names and let pipes generate a map friendly feed (known as a KML feed) for me. Copying the web address of the KML feed output from the pipe and pasting it into Google Maps means I can generate an embeddable Google map view of data originally pulled from Wikipedia:

Once you start to think of the world in this way:

– where the web contains data and information that is represented in various standard ways and made available via a unique and persistent web address,

– where web applications can accept data and content that is represented in a standard way given the web address of that data,

– where web applications can transform data represented at one web address in one particular way and republish it in another standard format at another web address,

– or where web applications can take data represented in a particular way from one web adress and provide you with the tools to then visualise or display that data,

then the world is your toolbox. Have URL, will travel. All you need to know is which applications can import what format data, and how they can republish that data for you, whether in a different format, such as Google spreadsheets taking an HTML table from Wikipedia and publishing it as a CSV file, or as a visualisation/human friendly display (Many Eyes Wikified, Google Reader). And if you need to do “proper” programmer type things, then you might be able to do it using a spreadsheet formula or a Yahoo Pipe (no coding required…;-)

See also: The Journalist as Programmer: A Case Study of The New York Times Interactive News Technology Department [PDF]

OU Facebook App Competition

What OU Facebook App would you like to see? Here’s a chance to get it made…

Two and half years ago, as part of an informally convened skunkworks team, we released a couple of Open University related Facebook apps inspired, in part, by observing student behaviour in online course forums.

Current privacy fears aside (?!;-), the apps are getting another push as part of an announcement about a “Design an OU Facebook App” Competition (Share your Facebook app ideas for chance to win OU vouchers):

The rules are simple: tell us what new app you’d like to see us build on Facebook.

And in return? For the winner of our competition, which runs until June 8, 2010, there’s £100 course vouchers, and, even more exciting perhaps, the chance to see your app built, with your name on the developer credits.

OU Facebook app competition - http://bit.ly/9Dt9nc

Although there are only a handful ideas posted so far it’s interesting to see how some of them already tally with ideas we had for additional apps at the time Course Profiles and My OU Story were built. (Liam, maybe we should dig out the old email exchanges we had bouncing round new app ideas, and submit them to the current competition?! Heh heh;-)

The competition is being managed through an online suggestion-and-voting system that appeared on the OU Platform site earlier this year and which is being used to solicit ideas for new courses from any one who registers on the site (Platform is open to all, not just OU students, staff, and alumni).

OU Platform - I would like to study

The Platform team seem to have really got into the idea of competitions, so presumably it works as a marketing exercise. Just out of interest, are there any other HEIs out there that run competitions in a similar way?