Chasing Data – Are You Datablogging Yet?
It’s strange to think that the web search industry is only 15 years or so old, and in that time the race has been run on indexing and serving up results for web pages, images, videos, blogs, and so on. The current race is focused on chasing the mobile (local) searcher, making use of location awareness to serve up ads that are sensitive to spatial context, but maybe it’s data that is next?
(Maybe I need to write a “fear post” about how we’re waking into a world with browsers that know where we are, rather than “just” GPS enabled devices and mobile phone cell triangulation? ;-) [And, err, it seems Microsoft are getting in there too: Windows 7 knows where you are – “So just what is it that Microsoft is doing in Windows 7? Well, at a low level, Microsoft has a new application programming interface (API) for sensors and a second API for location. It uses any of a number of things to actually get the location, depending on what’s available. Obviously there’s GPS, but it also supports Wi-Fi and cellular triangulation. At a minimum.”]
So… data. Take for example this service on the Microsoft Research site: Data Depot. To me, this looks a site that will store and visualiise your telemetry data, or more informally collected data (you can tweet in data points, for example):
Want to ‘datablog’ your running miles or your commute times or your grocery spending? DataDepot provides a simple way to track any type of data over time. You can add data via the web or your phone, then annotate, view, analyze, and add related content to your data.
Services like Trendrr have also got the machinery in place to take daily “samples” and produce trend lines over time from automatically collected data. For example, here are some of the data sources they can already access:
- Weather details – High and the low temperatures on weather.com for a specific zipcode.
- Amazon Sales Rank – Sales rank on amazon.com
- Monster Job Listings – Number of job results from Monster.com for the given query in a specific city.
Now call me paranoid, but I suddenly twigged why I thought the Google announcement about an extension to the Google Visualisation API that will enabl[e] developers to display data from any data source connected to the web (any database, Excel spreadsheet, etc.), not just from Google Spreadsheets could have some consequences.
At the moment, the API will let you pull datatable formatted data from your database into the Google namespace. But suppose the next step is for the API to make a call on your database using a query you have handcrafted; then add in some fear that Google has already sussed out how to Crawl through HTML forms by parsing a form and then automatically generating and posting queries using those forms to find more links from deep within a website, and you can see how giving the Google API a single query on your database would tell them some “useful info” (?!;-) about your database schema – info they could use to scrape and index a little more data out of your database…
Now of course the Viz API service may never extend that far, and I’m sure Google’s T&C’s would guarantee “good Internet citizenry practices”, but the potential for evil will be there…
And finally, it’s probably also worth mentioning that even if we don’t give the Goog the keys to our databases, plenty of us are in the habit of feeding public data stores anyway. For example, there are several sites built specifically around visualising user submitted data, (if you make it public…): Many Eyes and Swivel, for example. And then of course, there’s also Google Spreadsheets, DabbleDB, Zoho sheet etc etc.
The race for data is on… what are the consequences?!;-)
Also related: “Self-surveillance”.