Recent OU Programmes on the BBC, via iPlayer

As @liamgh will tell you, Coast is getting a quite a few airings at the moment on various BBC channels. And how does @liamgh know this? Because he’s following the open2 openuniversity twitter feed, which sends out alerts when an OU programme is about to be aired on a broadcast BBC channel.

(As well as the feed from the open2 twitter account, you can also find out what’s on from the OU/BBC schedule feed (http://open2.net/feeds/rss_schedule.xml), via the Open2.net schedule page; iCal feeds appear not to be available…)

So to make it easier for him to catch up on any episodes he missed, here’s a quick hack that mines the open2 twitter feed to create a “7 day catch up” site for broadcast OU TV programmes (the page also links through to several video playlists from the OU’s Youtube site).

The page actually displays links to programmes that are currently viewable on BBC iPlayer (either via a desktop web browser, or via a mobile browser – which means you can view this stuff on your iPhone ;-), and a short description of the programme, as pulled from the programme episode‘s web page on the BBC website. You’ll note that the original twitter feed just mentions the programme title; the TinyURLd link goes back to the series web page on the Open2 website.

Thinking about it, I could probably have done the hackery required to get iPlayer URLs from with in the page; but I didn’t… Given the clue that page is put together using a JQuery script I stole from this post on Parsing Yahoo Pipes JSON Feeds with jQuery, you can maybe guess where the glue logic for this site lives?;-)

There are three pipes involved in the hackery – the JSON that is pulled into the page comes from this OU Recent programmes (via BBC iPlayer) pipe.

THe first part grabs the feed, identifies the programme title, and then searches for that programme on the BBC iPlayer site.

The nested BBC Search Results scrape pipe searches the BBC programmes site and filters results that point to an actual iPlayer page (so we can we can watch the result on iPlayer).

Back in the main pipe, we take the list of recently tweeted OU programmes that are available on iPlayer, grab the programme ID (which is used as a key in all manner of BBC URLs :-), and then call another nested pipe that gets the programme description from the actual programme web page.

This second nested pipe just gets the programme description, creates a title and builds the iPlayer URL:

(The logic is all a bit hacked – and could be tidied up – but I was playing through my fingertips and didn’t feel like ‘rearchitecting’ the system once I knew what I wanted it to do… which it is what it does do…;-)

As an afterthought, the items in the main pipe are annotated with a link to the mobile iPlayer version of each programme:

So there you have it: a “7 day catch up” site for broadcast OU TV programmes, with replay via iPlayer or mobile iPlayer.

[18/11/08 – the site that the app runs on is down at the moment, as network security update is carried out; sorry about that – maybe I should use a cloud server?]

The Convenience of Embedded, Flash Played, PDFs

Yesterday, my broadband connection went down as BT replaced the telegraph pole that hangs the phone wire to our house, which meant I managed to get a fair bit of reading done, both offline and via a tab sweep.

One of my open tabs contained a ReadWriteWeb Study: Influencers are Alive and Well on Social Media Sites, which reviewed a study form Rubicon Consulting that provides some sort of evidence for the majority of “user generated content” on the web being produced by a small percentage of the users. The post linked to a PDF of the white paper which I assumed (no web connection) I’d have to remember to look up later.

And then – salvation:

The PDF had been embedded in a PDFMENOT Flash player (cf. Scribd etc.), which itself was embedded in the post. So I could read the paper at my leisure without having to connect back to the network.

The Tesco Data Business (Notes on “Scoring Points”)

One of the foundational principles of the Web 2.0 philosophy that Tim O’Reilly stresses relates to “self-improving” systems that get better as more and more people use them. I try to keep a watchful eye out for business books on this subject – books about companies who know that data is their business; books like the somehow unsatisfying Competing on Analytics, and a new one I’m looking forward to reading: Data Driven: Profiting from Your Most Important Business Asset (if you’d like to buy it for me… OUseful.info wishlist;-).

So as part of my summer holiday reading this year, I took away Scoring Points: How Tesco Continues to WIn Customer Loyalty, a book that tells the tale of the Tesco Loyalty Card. (Disclaimer: the Open University has a relationship with Tesco, which means that you can use Tesco clubcard points in full or part payment of certain OU courses. It also means, of course, that Tesco knows far, far more about certain classes of our students than we do…)

For those of you who don’t know of Tesco, it’s the UK’s dominant supermarket chain, taking a huge percentage of the UK’s daily retail spend, and is now one of those companies that’s so large it can’t help but be evil. (They track their millions of “users” as aggressively as Google tracks theirs.) Whenever you hand over your Tesco Clubcard alongside a purchase, you get “points for pounds” back. Every 3 months (I think?), a personalised mailing comes with vouchers that convert points accumulated over that period into “cash”. (The vouchers are in nice round sums – £1, £2.50 and so on. Unconverted points are carried over to the convertable balance in next mailing.) The mailing also comes with money off vouchers for things you appear to have stopped purchasing, rewards on product categories you frequently buy from, or vouchers trying to entice you to buy things you might not be in the habit of buying regularly (but which Tesco suspects you might desire!;-)

Anyway, that’s as maybe – this is supposed to be a brief summary of corner-turned pages I marked whilst on holiday. The book reads a bit like a corporate briefing book, repetitive in parts, continually talking up the Tesco business, and so on, but it tells a good story and contains more than a few a gems. So here for me were some of the highlights…

First of all, the “Clubcard customer contract”: more data means better segmentation, means more targeted/personalised services, means better profiling. In short, “the more you shop with us, the more benefit you will accrue” (p68).

This is at the heart of it all – just like Google wants to understand it’s users better so that it can serve them with more relevant ads (better segmentation * higher likelihood of clickthru = more cash from the Google money machine), and Amazon seduces you with personal recommendations of things it thinks you might like to buy based on your purchase and browsing history, and the purchase history of other users like you, so Tesco Clubcard works in much the same way: it feeds a recommendation engine that mines and segments data from millions of people like you, in order to keep you engaged.

Scale matters. In 1995, when Tesco Clubcard launched, dunhumby, the company that has managed the Clubcard from when it was still an idea to the present day, had to make do with the data processing capabilities that were available then, which meant that it was impossible to track every purchase, in every basket, from every shopper. (In addition, not everything could be tracked by the POS tills of the time – only “the customer ID, the total basket size and time the customer visited, and the amount spent in each department” (p102)). In the early days, this meant data had to be sampled before analysis, with insight from a statistically significant analysis of 10% of the shopping records being applied to the remaining 90%. Today, they can track everything.

Working out what to track – first order “instantaneous” data (what did you buy on a particular trip, what time of day was the visit) or second order data (what did you buy this time you didn’t buy last time, how long has it been between visits) – was a major concern, as were indicators that could be used as KPIs in the extent to which Clubcard influenced customer loyalty.

Now I’m not sure to what extent you could map website analytics onto “store analytics”, but some of the loyalty measures seem familiar to me. Take, for example, the RFV analysis (pp95-6) :

  • Recency – time between visits;
  • Frequency – “how often you shop”
  • Value – how profitable is the customer to the store (if you only buy low margin goods, you aren’t necessarily very profitable), and how valuable is the store to the customer (do you buy your whole food shop there, or only a part of it?).

Working out what data to analyse also had to fit in with the business goals – the analytics needed to be actionable (are you listening, Library folks?!;-). For example, as well as marketing to individuals, Clubcard data was to be used to optimise store inventory (p124). “The dream was to ensure that the entire product range on sale at each store accurately represented, in selection and proportion, what the customers who shopped there wanted to buy.” So another question that needed to be asked was how should data be presented “so that it answered a real business problem? If the data was ‘interesting’, that didn’t cut it. But adding more sales by doing something new – that did.” (p102). Here, the technique of putting data into “bins” meant that it could be aggregated and analysed more efficiently in bulk and without loss of insight.

Returning to the customer focus, Tesco complemented the RFV analysis with the idea of “Loyalty Cube” within which each customer could be placed (pp126-9).

  • Contribution: that is, contribution to the bottom line, the current profitability of the customer;
  • Commitment: future value – “how likely that customer is to remain a customer”, plus “headroom”, the “potential for the customer to be more valuable in the future”. If you buy all your groceries in Tesco, but not your health and beauty products, there’s headroom there;
  • Championing: brand ambassadors; you may be low contribution, low commitment, but if you refer high value friends and family to Tesco, Tesco will like you:-)

By placing individuals in separate areas of this chart, you can tune your marketing to them, either by marketing items that fall squrely within that area, or if you’re feeling particularly aggressive, by trying to move them from through the differnt areas. As ever, it’s contextual relevancy that’s the key.

But what sort of data is required to locate a customer within the loyalty cube? “The conclusion was that the difference between customers existed in each shopper’s trolley: the choices, the brqnds, the preferences, the priorities and the trade-offs in managing a grocery budget.” (p129).

The shopping basket could tel a lot about two dimensions of the loyalty cube. Firstly, it could quantify contribution, simply by looking at the profit margins on the goods each customer chose. Second, by assessing the calories in a shopping basket, it could measure the headroom dimension. Just how much of a customer’s food needs does Tesco provide?

(Do you ever feel like you’re being watched…?;-)

“Products describe People” (p131): one way of categorising shoppers is to cluster them according to the things they buy, and identify relationships between the products that people buy (people who buy this, also tend to buy that). But the same product may have a different value to different people. (Thinking about this in terms of the OU Course Profiles app, I guess it’s like clustering people based on the similar courses they have chosen. And even there, different values apply. For example, I might dip into the OU web services course (T320) out of general interest, you might take it because it’s a key part of your professional development, and required for your next promotion).

Clustering based on every product line (or SKU – stock keeping unit) is too highly dimensional to be interesting, so enter “The Bucket” (p132): “any significant combination of products that appeared from the make up of a customer’s regular shopping baskets. Each Bucket was defined initially by a ‘marker’, a high volume product that had a particular attribute. It might typify indulgence, or thrift, or indicate the tendency to buy in bulk. … [B]y picking clusters of products that might be bought for a shared reason, or from a shared taste” the large number of Buckets required for the marker approach could be reduced to just 80 Buckets using the clustered products approach. “Every time a key item [an item in one of the clusters that identifes a Bucket] was scanned [at the till],it would link that Clubcard member with an appropriate Bucket. The combination of which shoppers bought from which Buckets, and how many items in those Buckets they bought, gave the first insight into their shopping preferences” (p133).

By applying cluster analysis to the Buckets (i.e. trying to see which Buckets go together) the next step was to identify user lifestyles (p134-5). 27 of them… Things like “Loyal Low Spenders”, “Can’t Stay Aways”, “Weekly Shoppers”, “Snacking and Lunch Box” and “High Spending Superstore Families”.

Identifying people from the products they buy and clustering on that basis is one way of working. But how about defining products in terms of attributes, and then profiling people based on those attributes?

Take each product, and attach to it a series of appropriate attributes, describing what that product implicitly represented to Tesco customers. Then buy scoring those attributes for each customer based on their shopping behaviour, and building those scores into an aggregate measurement per individual, a series of clusters should appear that would create entirely new segments. (p139)

(As a sort of example of this, brand tags has a service that lets you see what sorts of things people associate with corporate brands. I imagine a similar sort of thing applies to Kellogs cornflakes and Wispa chocolate bars ;-)

In the end, 20 attributes were chosen for each product (p142). Clustering people based on the attributes of the products they buy produces segments defined by their Shopping Habits. For these segments to be at their most useful, each customer should slot neatly into a single segment, each segment needs to be large enough to be viable for it to be acted on, as well as being distinctive and meaningful. Single person segments are too small to be exploited cost effectively (pp148-9).

Here a few more insights that I vaguely seem to remember from the book, that you may or may not think are creepy and/or want to drop into conversation down the pub:-)

  • calorie count – on the food side, calorie sellers are the competition. We all need so many calories a day to live. If you do a calorie count on the goods in someone’s shopping basket, and you have an idea of the size of the household, you can find out whether someone is shopping elsewhere (you’re not buying enough calories to keep everyone fed) and maybe guess when a copmetitor has stolen some of your business or when someone has left home. (If lots of shoppers from a store stop buying pizza, maybe a new pizza delivery service has started up. If a particular family’s basket takes a 15% drop in calories, maybe someone has left home)?
  • life stage analysis – if you know the age, you can have a crack at the life stage. Pensioners probably don’t want to buy kids’ breakfast cereal, or nappies. This is about as crude as useful segmentation gets – but it’s easy to do…
  • Beer and nappies go together – young bloke has a baby, has to go shopping for the first time in his life, gets the nappies, sees the beers, knows he won’t be going anywhere for the next few months, and gets the tinnies in… (I think that was from this book!;-)

Anyway, time to go and read the Tesco Clubcard Charter I think?;-)

PS here’s an interesting, related, personal tale from a couple of years ago: Tesco stocks up on inside knowledge of shoppers’ lives (Guardian Business blog, Sept. 2005) [thanks, Tim W.]

PPS Here are a few more news stories about the Tesco Clubcard: Tesco’s success puts Clubcard firm on the map (The Sunday Times, Dec. 2004), Eyes in the till (FT, Nov 2006), and How Tesco is changing Britain (Economist, Aug. 2005) and Getting an edge (Irish Times, Oct 2007) which both require a login, so f**k off…).

PPPS see also More remarks on the Tesco data play/, although having received at takedown notice at the time from Dunnhumby, the post is less informative than in was when originally posted…

2.0 1.0, and a Huge Difference in Style

A couple of weeks ago I received an internal email announcing a “book project”, described on the project blog as follows:

During the summer I [Darrell Ince, an OU academic in the Computing department] read a lot about Web 2.0 and became convinced that there might be some mileage in asking our students to help develop materials for teaching. I set up two projects: the first is the mass book writing project that this blog covers …

The book writing project involves OU students, and anyone else who wants to volunteer, writing a book about the Java-based computer-art system known as Processing.

A student who wants to contribute 2500 words to the project will carry out the following tasks:

* Email an offer to write to the OU.
* We will send them a voucher that will buy them a copy of a recently published book by Greenberg.
* They will then read the first 3 chapters of the book.
* We will give them access to a blog which contains a specification of 85 chunks of text about 2500 words in length.
* The student will then write it and also develop two sample computer programs
* The student will then send the final text and the two programs to the OU.

We will edit the text and produce a sample book from a self-publisher and then attempt to interest a mainstream publisher to take the book.

[Darrel Ince Mass Writing Blog: Introduction]

A second project blog – Book Fragments – contains a list of links to blogs of people who are participating in the project, and other project related information, such as a “sample chapter”, and a breakdown of the work assigned to each “chunk” of the book (see all but the first post in the September archive; some education required there in the use of blog post tags, I think?! ;-)

This is quite an ambitious – and exciting – project, but it really feels to me like far too much like the “trad OU” authoring model, not least in that the focus is on producing a print item (a book) about an exciting interactive medium (Processing). It also seems to be using the tools from a position of inexperience about what the tools can do, or what other tools are on offer. For example, I wonder what sorts of decisions were made regarding the recommended authoring environment (Blogspot blogs).

Now just jumping in and doing it with these tools is a Good Thing, but a little bit of knowledge could maybe help extract more value from the tools? And a couple of days with a developer and a designer could probably pull quite a powerful authoring and publishing environment together that would work really well for developing in-browser, no plugin or download required, visually appealing interactive Processing related materials.

So for what it’s worth, here are some of the things I’d have pondered at some length if I was allowed to run this sort of project (which I’m not…;-)

Authoring Environment:

  • as the target output is a book, I’d have considered authoring in Google docs. (Did I mention I got a hack in Google Apps Hacks, which was authored in Google docs? ;-) Google docs supports single or multiple author access, public, shared or private documents (with a viariety of RW access privileges) and the ability to look back over historical changes. Even if authors were ecouraged to write separate drafts of their chapters, this could have been done in separate Google docs documents, linked to from a blog post.
  • would authoring chunks in a wiki have been appropriate? We can get a Mediawiki set up on the open.ac.uk domain on request, either in public or behind the firewall. Come to that, we can also get WordPress blogs set up too, either individual ones or a team blog – would a team blog have been a better approach, with sensible use of categories to partition the content? Niall would probably say the project should have used a Moodle VLE blog or wiki, but I”d respond that probably wouldn’t be a very good idea ;-)

Given that authors have been encouraged to use blogs, I’d have straightway pulled a blog roll together, maybe created a Planet aggregator site like Planet OU (here’s a temporary pipe aggregation solution), and probably indexed all the blogs in a custom search engine? And I’d have tried to interest the authors in using tags and categories.

Looking over the project blog to date, it seems there has been an issue with how to lay out code fragments in HTML (given that in vanilla HTML, white space is reduced to a single space when the HTML is rendered).

Now for anyone who lives in the web, the answer is to use a progressive enhancement library that will mark up the code in a language sensitive way. I quite like the Syntax Highlighter library, although on a quick trial with a default Blogspot template, it didn’t work:-) (That said, a couple of hours work from a competent web designer should result in a satisfactory, if simple, stylesheet that could use this library, and could then be made available to project participants).

A clunkier approach is to use something like Quick Highlighter, one of several such services that lets you paste in a block of programme code and get marked up HTML out. (The trick here is to paste the the CSS for e.g. the Java mark-up into the blog stylesheet template, and then all you have to do is paste marked up programme code into a blog post.)

A second issue I have is that I imagine that writing – and testing – the Processing code requires a download, and Java, and probably a Java authoring environment; and that’s getting too far away from the point, which is learning how to do stuff in Processing (or maybe that isn’t the point? Maybe the point is to teach trad programming and programming tools using Processing to provide some sort of context?)

So my solution? Use John Resig’s processing.js library – a port of Processing to Javascript – and then use something like the browser based Obsessing editor – write Processing code in the browser, then run it using processing.js:

A tweak to the Blogspot template should mean that processing code can be included in a post and executed using processing.js? Or if not, we could probably get it to work in an OU hosted WordPress environment?

Finally, the “worthy academic” pre-planned structure of the intended book just doesn’t work for me. I’d phrase the project in a far more playful way, and try to accrete comments and questions around mini-projects working out how to get various things working in Processing, probably in a blogged uncourse like way.

Sort of related to this, I’ve been thinking of writing something not too dissimilar from I’m Leaving, along the lines of “I’m not into dumbing down, but I’m quitting the ivory tower”, because the arrogance of academia is increasingly doing my head in. (If you’re a serious academic, you’re not allowed to use “slang” like that. You have to say you are “seriously concerned by the blah blah blah blah blah blah blah”… It does my head in ;-)

PS not liking the proposed book structure is not say I’m not into teaching proper computer science – I’d love to see us teaching compiler theory, or web services using real webservices, like some of the telecoms companies’ APIs;-) But there’s horses for courses, and this Processing stuff should be fun and accessible, right? (And that doesn’t mean it has to be substanceless…)

PPS how intersting would it have been to coolaboratively write an interatcive book, along the lines of this interactive presentation: Learning Advanced Javascript – double click in any of the code displaying slides, and you can edit – and run – the Javascript code in the browser/withi the presentation (described here: Adv. JavaScript and Processing.js, which includes a link to a downloadable version of the interactive presentation).

Chasing Data – Are You Datablogging Yet?

It’s strange to think that the web search industry is only 15 years or so old, and in that time the race has been run on indexing and serving up results for web pages, images, videos, blogs, and so on. The current race is focused on chasing the mobile (local) searcher, making use of location awareness to serve up ads that are sensitive to spatial context, but maybe it’s data that is next?

(Maybe I need to write a “fear post” about how we’re waking into a world with browsers that know where we are, rather than “just” GPS enabled devices and mobile phone cell triangulation? ;-) [And, err, it seems Microsoft are getting in there too: Windows 7 knows where you are – “So just what is it that Microsoft is doing in Windows 7? Well, at a low level, Microsoft has a new application programming interface (API) for sensors and a second API for location. It uses any of a number of things to actually get the location, depending on what’s available. Obviously there’s GPS, but it also supports Wi-Fi and cellular triangulation. At a minimum.”]

So… data. Take for example this service on the Microsoft Research site: Data Depot. To me, this looks a site that will store and visualiise your telemetry data, or more informally collected data (you can tweet in data points, for example):

Want to ‘datablog’ your running miles or your commute times or your grocery spending? DataDepot provides a simple way to track any type of data over time. You can add data via the web or your phone, then annotate, view, analyze, and add related content to your data.

Services like Trendrr have also got the machinery in place to take daily “samples” and produce trend lines over time from automatically collected data. For example, here are some of the data sources they can already access:

  • Weather details – High and the low temperatures on weather.com for a specific zipcode.
  • Amazon Sales Rank – Sales rank on amazon.com
  • Monster Job Listings – Number of job results from Monster.com for the given query in a specific city.

Now call me paranoid, but I suddenly twigged why I thought the Google announcement about an extension to the Google Visualisation API that will enabl[e] developers to display data from any data source connected to the web (any database, Excel spreadsheet, etc.), not just from Google Spreadsheets could have some consequences.

At the moment, the API will let you pull datatable formatted data from your database into the Google namespace. But suppose the next step is for the API to make a call on your database using a query you have handcrafted; then add in some fear that Google has already sussed out how to Crawl through HTML forms by parsing a form and then automatically generating and posting queries using those forms to find more links from deep within a website, and you can see how giving the Google API a single query on your database would tell them some “useful info” (?!;-) about your database schema – info they could use to scrape and index a little more data out of your database…

Now of course the Viz API service may never extend that far, and I’m sure Google’s T&C’s would guarantee “good Internet citizenry practices”, but the potential for evil will be there…

And finally, it’s probably also worth mentioning that even if we don’t give the Goog the keys to our databases, plenty of us are in the habit of feeding public data stores anyway. For example, there are several sites built specifically around visualising user submitted data, (if you make it public…): Many Eyes and Swivel, for example. And then of course, there’s also Google Spreadsheets, DabbleDB, Zoho sheet etc etc.

The race for data is on… what are the consequences?!;-)

PS see also Track’n’graph, iCharts and widgenie. Or how about Daytum and mycrocosm.

Also related: “Self-surveillance”.

Confused About the Consequences

In the previous couple of posts, I’ve rambled about web apps that will find a book from its cover and a song just by playing it and your online contacts across a myriad of services from your username on a single service.

But today I saw something that brought home to me the consequences of aggregating millions of tiny individual actions, in this case photo uploads to the flickr social photo site.

Form my reading of the post, the purple overlays in the images above – not the blue bounding boxes – are generated automatically by clustering geotagged and placename tagged images and extrapolating a well contoured shape around them.

That is, from the photos tagged “London” [that is, photos that are tagged with London in Yahoo’s WOE service], the algorithm creates the purple “London city” overlay in the above diagram.

For each an every photo upload, there is maybe a tiny personal consequence. For millions of photo uploads, there are consequences like this… (From millions of personal votes cast, there’s the possible consequence of change…) [Update: apparently, flickr received its 3 billionth upload at the start of November…]

And it struck me that even the relatively unsophisticated form of signals intelligence that is traffic analysis was capable of changing the face of war. So what are the consequences of traffic analysis at this scale?

What are the possible consequences? What are we walking into?

(Of course, following a brief moment of “I want to stop contributing to this; I’m gonna kill my computer and go and grow onions somewhere”, I then started wondering: “hmm, maybe if we also mine the info about what camera took each photo, and looked up the price of that camera, we might be able to generate socio-economic overlays over different neighbourhoods, and then… arrghh… stop, no, evil, evil…;-)

So to add to the mix, here’s a couple more things that the web made easy this week. Firstly, the Google Visualisation API was extended so that it could consume data in a simple format from your own data sources. That is, if you allow your own database to output data in a simple tabular structure, the Google visualisation API makes it trivial to generate charts and graphs from that data. Secondly, Google added RSS feed support to their Google alerts service. This makes it easy to subscribe to an RSS feed that will alert you to new results on Google for a particular search. What really surprised me was how, after setting up a couple of alerts, they appeared without me doing anything (or maybe that should be – without me changing something to say “no”?) in my Google Reader account.

Small components is one thing.

Small components loosely coupled is another – and one where many of us see value.

Small components automatically wired together is yet another thing – and one that is increasingly going to happen. A consequence I hadn’t anticipated of setting up a Google RSS alert was that the feed appeared automatically in my feed reader.

Yesterday, an unanticipated consequence of me adding my blog URL to my Google Profile page was that several other URLs I control were automatically suggested to me as things I might want to add to my profile.

Whenever I go into Facebook, the platform suggests a list of people I might know to me, whom I might want to “friend”.

Now this recommendation may be because we share a large number of friends, or it might be that I’ve appeared in the same photograph as some of these people… How would Facebook know? Maybe Mircosoft, their search provider, told them: Why “People” Tags? describes how the beta version of Microsoft Live Photo gallery automatically identifies faces in photos and then prompts you to tag them with people’s names… Google already does this, of course, in Picasa, with its “name tags“.

And finally…a chance clickthru from someone on the Copac developments blog, which lists OUseful.info in the blogroll, alerted me through my blog stats to this post on Spooky Personalisation (should we be afraid?) which discusses the extent to which “adaptive personalisation” may appear “spooky” to the user.

(A serendipitous link discovery for me? Surely… Spooky? Maybe!;-)

And that maybe is going to be an ever more apparent unanticipated consequence of the way in which it’s getting so much easier to glue apps together? Spookiness…

PS see also Does Google Know Too Much? (h/t Ray@B2FXXX)

The Future of Search Is Already Here

One of my favourite quotes (and one I probably misquote – which is a pre-requisite of the best quotes) is William Gibson’s “the future is already here, it’s just not evenly distributed yet”…

Several times tonight, I realised that the future is increasingly happening around me, and it’s appearing so quickly I’m having problems even imagining what might come next.

So here for you delectation are some of the things I saw earlier this evening:

  • SnapTell: a mobile and iPhone app that lets you photograph a book, CD or game cover and it’ll recognise it, tell you what it is and take you to the appropriate Amazon page so you can buy it… (originally via CogDogBlog;

  • Shazam, a music recognition application that will identify a piece of music that’s playing out loud, pop up some details, and then let you buy it on iTunes or view a version of the song being played on Youtube (the CogDog also mentioned this, but it was arrived at tonight independently);

    So just imagine the “workflow” here: you hear a song playing, fire up the Shazam app, it recognises the song, then you can watch someone play a version of the song (maybe even the same version on Youtube.

  • A picture of a thousand words?: if you upload a scanned document onto the web as a PDF document, Google will now have a go at running an OCR service over the document, extracting the text, indexing it and making it searchable. Which means you can just scan and post, flag the content to the Googlebot via a sitemap, and then search into the OCR’d content; (I’m not sure if the OCR service is built on top of the Tesseract OCR code?)
  • barely three months ago, Youtube added the ability to augment videos with captions. With a little bit of glue, the Google translate service will take those captions and translate them into another language for you (Auto Translate Now Available For Videos With Captions):

    “To get a translation for your preferred language, move the mouse over the bottom-right arrow, and then over the small triangle next to the CC (or subtitle) icon, to see the captions menu. Click on the “Translate…” button and then you will be given a choice of many different languages.” [Youtube blog]

Another (mis)quote, this time from Arthur C. Clarke: “any sufficiently advanced technology is indistinguishable from magic”. And by magic, I guess one thing we mean is that there is no “obvious” causal relationship between the casting of a spell and the effect? And a second thing is that if we believe something to be possible, then it probably is possible.

I think I’m starting to believe in magic…

PS Google finally got round to making their alerts service feed a feed: Feed me! Google Alerts not just for email anymore, so now you can subscribe to an alerts RSS feed, rather than having to receive alerts via email. If you want to receive the updates via Twitter, just paste the feed URL into a service like Twitterfeed or f33d.in.

PPS I guess I should have listed this in the list above – news that Google has (at least in the US) found a way of opening up its book search data: Google pays small change to open every book in the world. Here’s the blog announcement: New chapter for Google Book Search: “With this agreement, in-copyright, out-of-print books will now be available for readers in the U.S. to search, preview and buy online — something that was simply unavailable to date. Most of these books are difficult, if not impossible, to find.”

Time to Get Scared, People?

Last week, I posted a couple of tweets (via http://twitter.com/psychemedia) that were essentially doodles around the edge of what services like Google can work out about you from your online activity.

As ever in these matters, I picked on AJCann in the tweets, partly because he evangelises social web tool use to his students;-)

So what did I look at?

  • the Google Social Graph API – a service that tries to mine your social connections from public ‘friendships’ on the web. Check out the demo services…

    For example, here’s what the Google social API can find from Alan’s Friendfeed account using the “My Connections” demo:

    • people he links to on twitter and flickr;
    • people who link to him as a contact on twitter, delicious, friendfeed and flickr;
    • a link picked up from Science of the Invisible (which happens to be one of Alan’s blogs), also picks out his identi.ca identity; adding that URL to the Social Graph API form pulls out more contacts – via foaf records – from Alan’s identi.ca profile;

    The “Site Connections” demo pulls out all sorts of info about an individual by looking at URLs prominently associated with them, such as a personal blog:

    The possible connections reveal Alan’s possible identity on Technorati, Twitter, identi.ca, friendfeed, swurl, seesmic and mybloglog.

  • For anyone who doesn’t know what Alan looks like, you can always do a “face” search on Google images;
  • increasingly, there are “people” search engines out there that are built solely for searching for people. One example is Spock (other examples include pipl, zoominfo and wink [and 123people, which offers and interesting federated search results page]). The Spock “deep web” search turns up links that potentially point to Alan’s friendfeed and twitter pages, his revver videos, slideshare account and so on;
  • Alan seems to be pretty consistent in the username he uses on different sites. This makes it easy to guess his account on different sites, of course – or use a service like User Name Check to do a quick search;

Now I wasn’t going to post anything about this, but today I saw the following on Google Blogoscoped: Search Google Profiles, which describes a new Google search feature. (Didn’t know you had a Google Profile? If you have a Google account, you probably do – http://www.google.com/s2/profiles/me/? And if you want to really scare yourself with what your Google account can do to you, check http://www.google.com/history/… go on, I dare you…)

I had a quick look to see if I could find a link for the new profile search on my profile page, but didn’t spot one, although it’s easy enough to find the search form here: http://www.google.com/s2/profiles. (Maybe I don’t get a link because my profile isn’t public?)

Anyway, while looking over my profile, I thought I’d add my blog URL (http://ouseful.info) to it – and as soon as I clicked enter, got this:

A set of links that I might want to add to my profile – taken in part from the Social Graph API, maybe? Over the next 6 months I could see Google providing a de facto social network aggregation site, just from re-posting to you what they know about your social connections from mining the data they’ve crawled, and linking some of it together…

And given that the Goog can learn a lot about you by virtue of crawling public pages that are already out there, how much more comprehensive will your profile on Google be (and how certain will it be in the profile it can automatically generate around you?) if you actually feed it yourself? (Bear in mind things like health care records exist already…)

PS I just had a look at my own Web History page on Google, and it seems like they’ve recently added some new features, such as “popular searches related to my searches”, and also something on search trends that I don’t fully (or even partially) understand? Or maybe they were already there and I’ve not noticed before/forgotten (I rarely look at my search history…)

PPS does the web know when your birthday is??? Bewar of “Happy Birthday me…”. See also My Web Birthday.

[Have you heard about Google’s ‘social circle’ technology yet? read more]

Amazon Reviews from Different Editions of the Same Book

A couple of days ago I posted a Yahoo pipe that showed how to Look Up Alternative Copies of a Book on Amazon, via ThingISBN. The main inspiration for that hack was that it could be useful to get “as new” prices for different editions of the same book if you’re not so bothered about which edition you get, but you are bothered by the price. (Or maybe you wanted an edition of a book with a different cover…)

It struck me last night that it might also be useful to aggregate the reviews from different editions of the same book, so here’s a hack that will do exactly that: produce a feed listing the reviews for the different editions of a particular book, and label each review with the book it came from via its cover:

The pipe starts exactly as before – get an ISBN, check that the ISBN is valid, then look up the ISBNs of the alternative editions of the book. The next step is to grab the Amazon comments for each book, before annotating each item (that is, each comment) with a link to the book cover that the review applies to; we also grab the ISBN (the ASIN) for each book and make a placeholder using it for the item link and image link:

Then we just create the appropriate URLs back to the Amazon site for that particular book edition:

The patterns are as follows:
– book description page: http://www.amazon.co.uk/exec/obidos/ASIN/ISBN
– book cover image: http://images.amazon.com/images/P/ISBN.01.TZZZZZZZ

Here’s how the nested pipe that grabs the comments works (Amazon book reviews lookup by ISBN pipe): first construct the URL to call the webservice that gets details for a book with a particular ISBN – the large report format includes the reviews:

Grab the results XML and point to the reviews (which are at Items.Item.CustomerReviews.Review):

Construct a valid RSS feed containing one comment per item:

And there you have it – a pipe that looks up the different editions of a particular book using ThingISBN, and then aggregates the Amazon reviews for all those editions.

Time for a TinyNS?

In a comment to Printing Out Online Course Materials With Embedded Movie Links Alan Levine suggests: “I’d say you are covered for people lacking a QR reader device since you have the video URL in print; about all you could is run through some process that generates a shorter link” [the emphasis is mine].

I suspect that URL shortening services have become increasingly popular because of the rise of the blog killing (wtf?!) microblogging services, but they’ve also been used for quite some time in magazines and newspapers. And making use of them in (printed out) course materials might also be a handy thing to do. (Assessing the risks involved in using such services is the sort of thing Brian Kelly may well have posted about somewhere; but see also towards the end of this post.)

Now anyone who knows me knows that my mobile phone is a hundred years old and won’t go anywhere near the interweb (though I can send short emails through a free SMS2email gateway I found several years ago!). So I don’t know if the browsers in smart phones can do this already… but it seems to me a really useful feature for a mobile browser would be something like the Mozilla/Firefox smart keywords.

Smart keywords are essentially bookmarks that are invoked by typing a keyword in the browser address bar and hitting return – the browser will then take you to the desired URL. Think of it like a URL “keyboard shortcut”…

One really nice feature of smart keywords is that they can handle an argument… For example, here’s a smart keyword I have defined in my browser (Flock, which is built from the Firefox codebase).

Given a TinyURL (such as http://tinyurl.com/6nf2z) all I need to type into my browser address bar is t 6nf2z to go there.

Which would seem like a sensible thing to be able to do in a browser on a mobile device… (maybe you already can? But how many people know how to do it, if so?)

(NB To create a TinyURL for the page you’re currently viewing at the click of a button, it’s easiest to use something like the TinyURL bookmarklet.)

Now one of the problems with URL shortening services is that you become reliant on the short URL provider to decode the shortened URL and redirect you to the intended “full length” URL. The relationship between the actual URL and the shortened URL is arbitrary, which is where the problem lies – the shortened URL is not a “lossless compressed” version of the original URL, it’s effectively the assignment of a random code that can be used to look up the full URL in a database owned by the short URL service provider. Cf. the scheme used by services like delicious, which generate an “MD5 hash” of a URL which does decode (usually!) to the original URL (see Pivotal Moments… (pivotwitter?!) for links to Yahoo pipes that decode both TinyURLs and delcious URL encodings).

So this got me thinking – what would a “TinyNS” resolution service look like that sat one level above DNS resolution – the domain name resolution service that takes you from a human readable domain name (e.g. http://www.open.ac.uk) to an IP (internet protocol) address (something like 194.66.152.28).

Could (should) we set up trusted parties to mirror the mapping of shortened URL codes from the different URL shortening services (TinyURL, bit.ly, is.gd and so on) and provide distributed resolution of these short form URLs, just in case the original services go down?