Can SocialLearn Be Built As Such? Plus an OU Jobs RoundUp

A tweet from Scott Leslie on Saturday alerted me to the fact he had a major post brewing…

And here it is: Planning to Share versus Just Sharing.

Do yourself a favour and go and read it now… Then come back and finish reading this post… or not… but read that one…

Here’s the link again: Planning to Share versus Just Sharing.

‘Nuff said? Here’s one thing it made me think of: Planning to Build versus Just Building.

Speaking of which, I wonder if we have any more SocialLearn planning meetings this week? ;-)

On another tack, it looks like the OU’s recruiting to some interesting posts again:

  • Director of Research and Enterprise, Research School, Strategy Unit: “The Open University plans to increase the range and volume of research of international quality and to expand its knowledge transfer activity at national and regional levels. We need an experienced, proactive and forward looking Director of Research and Enterprise who can help us achieve these ambitions.” I’d personally argue blogs like OUseful.info are in the KT business – if you get the post, feel free to buy me a coffee and vehemently disagree;-)
  • Online Marketing Manager, Marketing and Sales: “In this role, you will contribute to the new media strategy, setting strategies to achieve the online objectives to achieve student targets. You will manage the implementation and evaluation of PPC, affiliate programmes and third party partnerships and manage the development of existing and future marketing websites.” = you will spend lots of money with Google. Just beware Simpson’s Paradox
  • Development Advisor – Collaborative Tools, Learning & Teaching Solutions (LTS): “Collaborative tools are a key part of the online learning experience of Open University students. You will play a key role in both promoting the effective use of collaborative tools in new OU courses and the introduction of new collaborative tools across existing courses.” IMHO, don’t even think about mentioning Second Life, unless it’s to advocate the use of flamethrowers ;-)
  • Programmer/ Web developer, The Library and Learning Resource Centre: “Would you like to contribute in a key role in the development of the Open University’s Library systems, services and products to support all its business processes for both customers and Library staff? You will be providing technical input to projects and service developments, in particular maintaining and developing new services for the Library website.” Far be it from me to say that any Library website redesign should be informed by at least a passing familiarity with what the Library website analytics have say about how the site is used… And if you persuade them to dump Voyager, I’ll buy you a pint of whatever you want…
  • Broadcast Project Manager, Open Broadcasting Unit (OBU): “we need an additional Broadcast Project Manager to work with OU colleagues, the BBC and others to develop and manage detailed project plans for TV, radio and broadband commissions and associated support elements (e.g. print items). You’ll have your own group of projects and opportunities to contribute to process developments.” Tell ’em you watch OU programmes via the “OU Catchup Channel” on MythTV – the panel won’t have a clue what you’re talking about, so you could maybe follow up by suggesting a quick project that would produce a Wii front end for the the OU CatchUp Channel;-) (Hint condition: steal the BBC iPlayer Wii interface and ask Guy to make ice from it ;-)
  • e-Learning Developer, Learning and Teaching Solutions: “We are looking for an experienced e-learning developer with a web/software background. Working as part of a project team and in close collaboration with academics and other media specialists, you will play a key role in developing effective OU distance learning materials for delivery online or via disc.”
  • Research Fellow – SocialLearn, Knowledge Media Institute (KMi): “your responsibility will be to use your understanding of learning and sensemaking online to improve the SocialLearn platform.” I have no idea what this post is about? Maybe trying to think about ways we can mine the platform for data. I can offer you the 5k user records we have on Course Profiles to get started with, and suggestions about how to scale that app in terms of numbers and the data it can collect, but to date no else seems to think this is in anyway relevant to the data/insight that SocialLearn will collect, so maybe that’s just a red herring…;-)
  • Web Developer – cohere.open.ac.uk, Knowledge Media Institute (KMi): a Cohere hacking post. IMHO, Cohere isn’t yet what it may turn to be useful as…. (My attempts at grokking a simpler, more literal version of it, are Linktracks? Trackmarks? Linkmarks? and Doublemarks!)
  • Publicity and Evaluation Officer, Personalised Integrated Learning Support (PILS), Centre for Excellence in Teaching and Learning: “in this new role we are looking for an experienced secretary to support one of our PILS managers and our Publicity and Evaluation Manager. You will be required to use your IT, written communication and numeric skills to support the production of publicity and evaluation materials, and to update our websites.” Personally, I’d look to appoint an evangelist to the Open CETL, but I suppose we still have to service the old-fashioned markets (that aren’t so amenable to social network leverage) somehow?;-)

As ever, I have nothing to do with any of the above…

Orange Broadband ISP Hijacks Error Pages

An article in the FT last week (referenced here: British ISP Orange Shuns Phorm) described how ISP Orange have decided not to go with the behavioural advertising service Phorm, which profiles users internet activity in order to serve them with relevant ads.

But one thing I have noticed them doing over the last week is hijacking certain “domain not found” pages:

Orange broadband intercepting (some) page not found pages...

…which means that Orange must be looking at my HTML page headers for certain error codes?

Now I wonder if anyone from Orange Customer services or the Orange Press Office would like to comment on whether this is reasonable or not, and/or how they are doing it?

Just by the by, I found the Orange Customer Services web page interesting – not just the links to all the premium rate phine lines, more the font size;-) (click through for the full size image):

//www.orange.co.uk/contact/internet/default.htm?&article=contactussplitterwanadoo - check out the font size

I’ve also noticed what appear to be a few geo-targeted ads coming at me through my browser, so wonder if Orange is revealing my approximate location data to online ad targeting services (I’ll try to remember to grab a screenshot next time I see one). The reason I suspect it’s Orange is because I ran a test using a cookie blocking browser…

PS note to self: try to find out how ad services like NebuAd, Tacoda and of course Phorm make use of user data, and see just how far their reach goes…

PPS Hmmm… so just like there is a “junk mail opt out“, “unaddressed mail opt out and “junk phone call opt out” in the UK, it seems like there is a (cookie based….?!) initiative for opting out of online ad targeting from the Network Advertising Initiative. Does anyone know anything about this? Is it legitimate, or a gateway to yet more unwanted ads? I’d maybe trust it more if it was linked to from mydm, which I trust becasue it was linked to from the Royal Mail…

Recent OU Programmes on the BBC, via iPlayer

As @liamgh will tell you, Coast is getting a quite a few airings at the moment on various BBC channels. And how does @liamgh know this? Because he’s following the open2 openuniversity twitter feed, which sends out alerts when an OU programme is about to be aired on a broadcast BBC channel.

(As well as the feed from the open2 twitter account, you can also find out what’s on from the OU/BBC schedule feed (http://open2.net/feeds/rss_schedule.xml), via the Open2.net schedule page; iCal feeds appear not to be available…)

So to make it easier for him to catch up on any episodes he missed, here’s a quick hack that mines the open2 twitter feed to create a “7 day catch up” site for broadcast OU TV programmes (the page also links through to several video playlists from the OU’s Youtube site).

The page actually displays links to programmes that are currently viewable on BBC iPlayer (either via a desktop web browser, or via a mobile browser – which means you can view this stuff on your iPhone ;-), and a short description of the programme, as pulled from the programme episode‘s web page on the BBC website. You’ll note that the original twitter feed just mentions the programme title; the TinyURLd link goes back to the series web page on the Open2 website.

Thinking about it, I could probably have done the hackery required to get iPlayer URLs from with in the page; but I didn’t… Given the clue that page is put together using a JQuery script I stole from this post on Parsing Yahoo Pipes JSON Feeds with jQuery, you can maybe guess where the glue logic for this site lives?;-)

There are three pipes involved in the hackery – the JSON that is pulled into the page comes from this OU Recent programmes (via BBC iPlayer) pipe.

THe first part grabs the feed, identifies the programme title, and then searches for that programme on the BBC iPlayer site.

The nested BBC Search Results scrape pipe searches the BBC programmes site and filters results that point to an actual iPlayer page (so we can we can watch the result on iPlayer).

Back in the main pipe, we take the list of recently tweeted OU programmes that are available on iPlayer, grab the programme ID (which is used as a key in all manner of BBC URLs :-), and then call another nested pipe that gets the programme description from the actual programme web page.

This second nested pipe just gets the programme description, creates a title and builds the iPlayer URL:

(The logic is all a bit hacked – and could be tidied up – but I was playing through my fingertips and didn’t feel like ‘rearchitecting’ the system once I knew what I wanted it to do… which it is what it does do…;-)

As an afterthought, the items in the main pipe are annotated with a link to the mobile iPlayer version of each programme:

So there you have it: a “7 day catch up” site for broadcast OU TV programmes, with replay via iPlayer or mobile iPlayer.

[18/11/08 – the site that the app runs on is down at the moment, as network security update is carried out; sorry about that – maybe I should use a cloud server?]

The Convenience of Embedded, Flash Played, PDFs

Yesterday, my broadband connection went down as BT replaced the telegraph pole that hangs the phone wire to our house, which meant I managed to get a fair bit of reading done, both offline and via a tab sweep.

One of my open tabs contained a ReadWriteWeb Study: Influencers are Alive and Well on Social Media Sites, which reviewed a study form Rubicon Consulting that provides some sort of evidence for the majority of “user generated content” on the web being produced by a small percentage of the users. The post linked to a PDF of the white paper which I assumed (no web connection) I’d have to remember to look up later.

And then – salvation:

The PDF had been embedded in a PDFMENOT Flash player (cf. Scribd etc.), which itself was embedded in the post. So I could read the paper at my leisure without having to connect back to the network.

The Tesco Data Business (Notes on “Scoring Points”)

One of the foundational principles of the Web 2.0 philosophy that Tim O’Reilly stresses relates to “self-improving” systems that get better as more and more people use them. I try to keep a watchful eye out for business books on this subject – books about companies who know that data is their business; books like the somehow unsatisfying Competing on Analytics, and a new one I’m looking forward to reading: Data Driven: Profiting from Your Most Important Business Asset (if you’d like to buy it for me… OUseful.info wishlist;-).

So as part of my summer holiday reading this year, I took away Scoring Points: How Tesco Continues to WIn Customer Loyalty, a book that tells the tale of the Tesco Loyalty Card. (Disclaimer: the Open University has a relationship with Tesco, which means that you can use Tesco clubcard points in full or part payment of certain OU courses. It also means, of course, that Tesco knows far, far more about certain classes of our students than we do…)

For those of you who don’t know of Tesco, it’s the UK’s dominant supermarket chain, taking a huge percentage of the UK’s daily retail spend, and is now one of those companies that’s so large it can’t help but be evil. (They track their millions of “users” as aggressively as Google tracks theirs.) Whenever you hand over your Tesco Clubcard alongside a purchase, you get “points for pounds” back. Every 3 months (I think?), a personalised mailing comes with vouchers that convert points accumulated over that period into “cash”. (The vouchers are in nice round sums – £1, £2.50 and so on. Unconverted points are carried over to the convertable balance in next mailing.) The mailing also comes with money off vouchers for things you appear to have stopped purchasing, rewards on product categories you frequently buy from, or vouchers trying to entice you to buy things you might not be in the habit of buying regularly (but which Tesco suspects you might desire!;-)

Anyway, that’s as maybe – this is supposed to be a brief summary of corner-turned pages I marked whilst on holiday. The book reads a bit like a corporate briefing book, repetitive in parts, continually talking up the Tesco business, and so on, but it tells a good story and contains more than a few a gems. So here for me were some of the highlights…

First of all, the “Clubcard customer contract”: more data means better segmentation, means more targeted/personalised services, means better profiling. In short, “the more you shop with us, the more benefit you will accrue” (p68).

This is at the heart of it all – just like Google wants to understand it’s users better so that it can serve them with more relevant ads (better segmentation * higher likelihood of clickthru = more cash from the Google money machine), and Amazon seduces you with personal recommendations of things it thinks you might like to buy based on your purchase and browsing history, and the purchase history of other users like you, so Tesco Clubcard works in much the same way: it feeds a recommendation engine that mines and segments data from millions of people like you, in order to keep you engaged.

Scale matters. In 1995, when Tesco Clubcard launched, dunhumby, the company that has managed the Clubcard from when it was still an idea to the present day, had to make do with the data processing capabilities that were available then, which meant that it was impossible to track every purchase, in every basket, from every shopper. (In addition, not everything could be tracked by the POS tills of the time – only “the customer ID, the total basket size and time the customer visited, and the amount spent in each department” (p102)). In the early days, this meant data had to be sampled before analysis, with insight from a statistically significant analysis of 10% of the shopping records being applied to the remaining 90%. Today, they can track everything.

Working out what to track – first order “instantaneous” data (what did you buy on a particular trip, what time of day was the visit) or second order data (what did you buy this time you didn’t buy last time, how long has it been between visits) – was a major concern, as were indicators that could be used as KPIs in the extent to which Clubcard influenced customer loyalty.

Now I’m not sure to what extent you could map website analytics onto “store analytics”, but some of the loyalty measures seem familiar to me. Take, for example, the RFV analysis (pp95-6) :

  • Recency – time between visits;
  • Frequency – “how often you shop”
  • Value – how profitable is the customer to the store (if you only buy low margin goods, you aren’t necessarily very profitable), and how valuable is the store to the customer (do you buy your whole food shop there, or only a part of it?).

Working out what data to analyse also had to fit in with the business goals – the analytics needed to be actionable (are you listening, Library folks?!;-). For example, as well as marketing to individuals, Clubcard data was to be used to optimise store inventory (p124). “The dream was to ensure that the entire product range on sale at each store accurately represented, in selection and proportion, what the customers who shopped there wanted to buy.” So another question that needed to be asked was how should data be presented “so that it answered a real business problem? If the data was ‘interesting’, that didn’t cut it. But adding more sales by doing something new – that did.” (p102). Here, the technique of putting data into “bins” meant that it could be aggregated and analysed more efficiently in bulk and without loss of insight.

Returning to the customer focus, Tesco complemented the RFV analysis with the idea of “Loyalty Cube” within which each customer could be placed (pp126-9).

  • Contribution: that is, contribution to the bottom line, the current profitability of the customer;
  • Commitment: future value – “how likely that customer is to remain a customer”, plus “headroom”, the “potential for the customer to be more valuable in the future”. If you buy all your groceries in Tesco, but not your health and beauty products, there’s headroom there;
  • Championing: brand ambassadors; you may be low contribution, low commitment, but if you refer high value friends and family to Tesco, Tesco will like you:-)

By placing individuals in separate areas of this chart, you can tune your marketing to them, either by marketing items that fall squrely within that area, or if you’re feeling particularly aggressive, by trying to move them from through the differnt areas. As ever, it’s contextual relevancy that’s the key.

But what sort of data is required to locate a customer within the loyalty cube? “The conclusion was that the difference between customers existed in each shopper’s trolley: the choices, the brqnds, the preferences, the priorities and the trade-offs in managing a grocery budget.” (p129).

The shopping basket could tel a lot about two dimensions of the loyalty cube. Firstly, it could quantify contribution, simply by looking at the profit margins on the goods each customer chose. Second, by assessing the calories in a shopping basket, it could measure the headroom dimension. Just how much of a customer’s food needs does Tesco provide?

(Do you ever feel like you’re being watched…?;-)

“Products describe People” (p131): one way of categorising shoppers is to cluster them according to the things they buy, and identify relationships between the products that people buy (people who buy this, also tend to buy that). But the same product may have a different value to different people. (Thinking about this in terms of the OU Course Profiles app, I guess it’s like clustering people based on the similar courses they have chosen. And even there, different values apply. For example, I might dip into the OU web services course (T320) out of general interest, you might take it because it’s a key part of your professional development, and required for your next promotion).

Clustering based on every product line (or SKU – stock keeping unit) is too highly dimensional to be interesting, so enter “The Bucket” (p132): “any significant combination of products that appeared from the make up of a customer’s regular shopping baskets. Each Bucket was defined initially by a ‘marker’, a high volume product that had a particular attribute. It might typify indulgence, or thrift, or indicate the tendency to buy in bulk. … [B]y picking clusters of products that might be bought for a shared reason, or from a shared taste” the large number of Buckets required for the marker approach could be reduced to just 80 Buckets using the clustered products approach. “Every time a key item [an item in one of the clusters that identifes a Bucket] was scanned [at the till],it would link that Clubcard member with an appropriate Bucket. The combination of which shoppers bought from which Buckets, and how many items in those Buckets they bought, gave the first insight into their shopping preferences” (p133).

By applying cluster analysis to the Buckets (i.e. trying to see which Buckets go together) the next step was to identify user lifestyles (p134-5). 27 of them… Things like “Loyal Low Spenders”, “Can’t Stay Aways”, “Weekly Shoppers”, “Snacking and Lunch Box” and “High Spending Superstore Families”.

Identifying people from the products they buy and clustering on that basis is one way of working. But how about defining products in terms of attributes, and then profiling people based on those attributes?

Take each product, and attach to it a series of appropriate attributes, describing what that product implicitly represented to Tesco customers. Then buy scoring those attributes for each customer based on their shopping behaviour, and building those scores into an aggregate measurement per individual, a series of clusters should appear that would create entirely new segments. (p139)

(As a sort of example of this, brand tags has a service that lets you see what sorts of things people associate with corporate brands. I imagine a similar sort of thing applies to Kellogs cornflakes and Wispa chocolate bars ;-)

In the end, 20 attributes were chosen for each product (p142). Clustering people based on the attributes of the products they buy produces segments defined by their Shopping Habits. For these segments to be at their most useful, each customer should slot neatly into a single segment, each segment needs to be large enough to be viable for it to be acted on, as well as being distinctive and meaningful. Single person segments are too small to be exploited cost effectively (pp148-9).

Here a few more insights that I vaguely seem to remember from the book, that you may or may not think are creepy and/or want to drop into conversation down the pub:-)

  • calorie count – on the food side, calorie sellers are the competition. We all need so many calories a day to live. If you do a calorie count on the goods in someone’s shopping basket, and you have an idea of the size of the household, you can find out whether someone is shopping elsewhere (you’re not buying enough calories to keep everyone fed) and maybe guess when a copmetitor has stolen some of your business or when someone has left home. (If lots of shoppers from a store stop buying pizza, maybe a new pizza delivery service has started up. If a particular family’s basket takes a 15% drop in calories, maybe someone has left home)?
  • life stage analysis – if you know the age, you can have a crack at the life stage. Pensioners probably don’t want to buy kids’ breakfast cereal, or nappies. This is about as crude as useful segmentation gets – but it’s easy to do…
  • Beer and nappies go together – young bloke has a baby, has to go shopping for the first time in his life, gets the nappies, sees the beers, knows he won’t be going anywhere for the next few months, and gets the tinnies in… (I think that was from this book!;-)

Anyway, time to go and read the Tesco Clubcard Charter I think?;-)

PS here’s an interesting, related, personal tale from a couple of years ago: Tesco stocks up on inside knowledge of shoppers’ lives (Guardian Business blog, Sept. 2005) [thanks, Tim W.]

PPS Here are a few more news stories about the Tesco Clubcard: Tesco’s success puts Clubcard firm on the map (The Sunday Times, Dec. 2004), Eyes in the till (FT, Nov 2006), and How Tesco is changing Britain (Economist, Aug. 2005) and Getting an edge (Irish Times, Oct 2007) which both require a login, so f**k off…).

PPPS see also More remarks on the Tesco data play/, although having received at takedown notice at the time from Dunnhumby, the post is less informative than in was when originally posted…

2.0 1.0, and a Huge Difference in Style

A couple of weeks ago I received an internal email announcing a “book project”, described on the project blog as follows:

During the summer I [Darrell Ince, an OU academic in the Computing department] read a lot about Web 2.0 and became convinced that there might be some mileage in asking our students to help develop materials for teaching. I set up two projects: the first is the mass book writing project that this blog covers …

The book writing project involves OU students, and anyone else who wants to volunteer, writing a book about the Java-based computer-art system known as Processing.

A student who wants to contribute 2500 words to the project will carry out the following tasks:

* Email an offer to write to the OU.
* We will send them a voucher that will buy them a copy of a recently published book by Greenberg.
* They will then read the first 3 chapters of the book.
* We will give them access to a blog which contains a specification of 85 chunks of text about 2500 words in length.
* The student will then write it and also develop two sample computer programs
* The student will then send the final text and the two programs to the OU.

We will edit the text and produce a sample book from a self-publisher and then attempt to interest a mainstream publisher to take the book.

[Darrel Ince Mass Writing Blog: Introduction]

A second project blog – Book Fragments – contains a list of links to blogs of people who are participating in the project, and other project related information, such as a “sample chapter”, and a breakdown of the work assigned to each “chunk” of the book (see all but the first post in the September archive; some education required there in the use of blog post tags, I think?! ;-)

This is quite an ambitious – and exciting – project, but it really feels to me like far too much like the “trad OU” authoring model, not least in that the focus is on producing a print item (a book) about an exciting interactive medium (Processing). It also seems to be using the tools from a position of inexperience about what the tools can do, or what other tools are on offer. For example, I wonder what sorts of decisions were made regarding the recommended authoring environment (Blogspot blogs).

Now just jumping in and doing it with these tools is a Good Thing, but a little bit of knowledge could maybe help extract more value from the tools? And a couple of days with a developer and a designer could probably pull quite a powerful authoring and publishing environment together that would work really well for developing in-browser, no plugin or download required, visually appealing interactive Processing related materials.

So for what it’s worth, here are some of the things I’d have pondered at some length if I was allowed to run this sort of project (which I’m not…;-)

Authoring Environment:

  • as the target output is a book, I’d have considered authoring in Google docs. (Did I mention I got a hack in Google Apps Hacks, which was authored in Google docs? ;-) Google docs supports single or multiple author access, public, shared or private documents (with a viariety of RW access privileges) and the ability to look back over historical changes. Even if authors were ecouraged to write separate drafts of their chapters, this could have been done in separate Google docs documents, linked to from a blog post.
  • would authoring chunks in a wiki have been appropriate? We can get a Mediawiki set up on the open.ac.uk domain on request, either in public or behind the firewall. Come to that, we can also get WordPress blogs set up too, either individual ones or a team blog – would a team blog have been a better approach, with sensible use of categories to partition the content? Niall would probably say the project should have used a Moodle VLE blog or wiki, but I”d respond that probably wouldn’t be a very good idea ;-)

Given that authors have been encouraged to use blogs, I’d have straightway pulled a blog roll together, maybe created a Planet aggregator site like Planet OU (here’s a temporary pipe aggregation solution), and probably indexed all the blogs in a custom search engine? And I’d have tried to interest the authors in using tags and categories.

Looking over the project blog to date, it seems there has been an issue with how to lay out code fragments in HTML (given that in vanilla HTML, white space is reduced to a single space when the HTML is rendered).

Now for anyone who lives in the web, the answer is to use a progressive enhancement library that will mark up the code in a language sensitive way. I quite like the Syntax Highlighter library, although on a quick trial with a default Blogspot template, it didn’t work:-) (That said, a couple of hours work from a competent web designer should result in a satisfactory, if simple, stylesheet that could use this library, and could then be made available to project participants).

A clunkier approach is to use something like Quick Highlighter, one of several such services that lets you paste in a block of programme code and get marked up HTML out. (The trick here is to paste the the CSS for e.g. the Java mark-up into the blog stylesheet template, and then all you have to do is paste marked up programme code into a blog post.)

A second issue I have is that I imagine that writing – and testing – the Processing code requires a download, and Java, and probably a Java authoring environment; and that’s getting too far away from the point, which is learning how to do stuff in Processing (or maybe that isn’t the point? Maybe the point is to teach trad programming and programming tools using Processing to provide some sort of context?)

So my solution? Use John Resig’s processing.js library – a port of Processing to Javascript – and then use something like the browser based Obsessing editor – write Processing code in the browser, then run it using processing.js:

A tweak to the Blogspot template should mean that processing code can be included in a post and executed using processing.js? Or if not, we could probably get it to work in an OU hosted WordPress environment?

Finally, the “worthy academic” pre-planned structure of the intended book just doesn’t work for me. I’d phrase the project in a far more playful way, and try to accrete comments and questions around mini-projects working out how to get various things working in Processing, probably in a blogged uncourse like way.

Sort of related to this, I’ve been thinking of writing something not too dissimilar from I’m Leaving, along the lines of “I’m not into dumbing down, but I’m quitting the ivory tower”, because the arrogance of academia is increasingly doing my head in. (If you’re a serious academic, you’re not allowed to use “slang” like that. You have to say you are “seriously concerned by the blah blah blah blah blah blah blah”… It does my head in ;-)

PS not liking the proposed book structure is not say I’m not into teaching proper computer science – I’d love to see us teaching compiler theory, or web services using real webservices, like some of the telecoms companies’ APIs;-) But there’s horses for courses, and this Processing stuff should be fun and accessible, right? (And that doesn’t mean it has to be substanceless…)

PPS how intersting would it have been to coolaboratively write an interatcive book, along the lines of this interactive presentation: Learning Advanced Javascript – double click in any of the code displaying slides, and you can edit – and run – the Javascript code in the browser/withi the presentation (described here: Adv. JavaScript and Processing.js, which includes a link to a downloadable version of the interactive presentation).

Chasing Data – Are You Datablogging Yet?

It’s strange to think that the web search industry is only 15 years or so old, and in that time the race has been run on indexing and serving up results for web pages, images, videos, blogs, and so on. The current race is focused on chasing the mobile (local) searcher, making use of location awareness to serve up ads that are sensitive to spatial context, but maybe it’s data that is next?

(Maybe I need to write a “fear post” about how we’re waking into a world with browsers that know where we are, rather than “just” GPS enabled devices and mobile phone cell triangulation? ;-) [And, err, it seems Microsoft are getting in there too: Windows 7 knows where you are – “So just what is it that Microsoft is doing in Windows 7? Well, at a low level, Microsoft has a new application programming interface (API) for sensors and a second API for location. It uses any of a number of things to actually get the location, depending on what’s available. Obviously there’s GPS, but it also supports Wi-Fi and cellular triangulation. At a minimum.”]

So… data. Take for example this service on the Microsoft Research site: Data Depot. To me, this looks a site that will store and visualiise your telemetry data, or more informally collected data (you can tweet in data points, for example):

Want to ‘datablog’ your running miles or your commute times or your grocery spending? DataDepot provides a simple way to track any type of data over time. You can add data via the web or your phone, then annotate, view, analyze, and add related content to your data.

Services like Trendrr have also got the machinery in place to take daily “samples” and produce trend lines over time from automatically collected data. For example, here are some of the data sources they can already access:

  • Weather details – High and the low temperatures on weather.com for a specific zipcode.
  • Amazon Sales Rank – Sales rank on amazon.com
  • Monster Job Listings – Number of job results from Monster.com for the given query in a specific city.

Now call me paranoid, but I suddenly twigged why I thought the Google announcement about an extension to the Google Visualisation API that will enabl[e] developers to display data from any data source connected to the web (any database, Excel spreadsheet, etc.), not just from Google Spreadsheets could have some consequences.

At the moment, the API will let you pull datatable formatted data from your database into the Google namespace. But suppose the next step is for the API to make a call on your database using a query you have handcrafted; then add in some fear that Google has already sussed out how to Crawl through HTML forms by parsing a form and then automatically generating and posting queries using those forms to find more links from deep within a website, and you can see how giving the Google API a single query on your database would tell them some “useful info” (?!;-) about your database schema – info they could use to scrape and index a little more data out of your database…

Now of course the Viz API service may never extend that far, and I’m sure Google’s T&C’s would guarantee “good Internet citizenry practices”, but the potential for evil will be there…

And finally, it’s probably also worth mentioning that even if we don’t give the Goog the keys to our databases, plenty of us are in the habit of feeding public data stores anyway. For example, there are several sites built specifically around visualising user submitted data, (if you make it public…): Many Eyes and Swivel, for example. And then of course, there’s also Google Spreadsheets, DabbleDB, Zoho sheet etc etc.

The race for data is on… what are the consequences?!;-)

PS see also Track’n’graph, iCharts and widgenie. Or how about Daytum and mycrocosm.

Also related: “Self-surveillance”.