Playing With Google Search Data Trends

Early last week, Google announced a Google Flu trends service, that leverages the huge number of searches on Google to provide a near real-time indicator of ‘flu outbreaks in the US. Official reports from medical centres and doctors can lag actual outbreaks by up to a couple of weeks, but by correlating search trend data with real medical data, the Google folks were able to show that their data led the the official reports.

John Naughton picked up on this service in his Networker Observer column this week, and responded to an email follow-up comment I sent him idly wondering what search terms might be indicators of recession in this post on Google as a predictor. “Jobseeker’s allowance” appears to be on the rise, unfortunately (as does “redundancy”).

For some time, I’ve been convinced that spotting clusters of related search terms, or meaningful correlations between clusters of search terms, is going to be big the next step towards, err, something(?!), and Google Flu trends is one of the first public appearances of this outside the search, search marketing and ad sales area.

Which is why, on the playful side, I tried to pitch something like Trendspotting to the Games With a Purpose (GWAP) folks (so far unreplied to!), the idea being that players would have to try to identify search terms who’s trends were correlated in some “folk reasonable” way. Search terms like “flowers” and “valentine”, for example, which appear to be correlated according to the Google Trends service:

Just out of interest, can you guess what causes the second peak? Here’s one way of finding out – take a look at those search terms on the Google Insights for Search service (like Google Trends on steroids!):

Then narrow down the date over which we’re looking at the trend:

By inspection, it looks like the peak hits around May, so narrow the trend display to that period:

If you now scroll down the Google Insights for Search page, you can see what terms were “breaking out” (i.e. being searched for in volumes way out of the the norm) over that period:

So it looks like a Mother’s Day holiday? If you want to check, the Mother’s Day breakout (and ranking in the top searches list) is even more evident if you narrow down the date range even further.

Just by the by, what else can we find out? That the “Mother’s Day” holiday at the start of May is not internationally recognised, maybe?

There are several other places that are starting to collect trend data – not just search trend data – from arbitrary sources, such as Microsoft Research’s DataDepot (which I briefly described in Chasing Data – Are You Datablogging Yet?) and Trendrr.

The Microsoft service allegedly allows you to tweet data in, and the Trendrr service has a RESTful API for getting data in.

Although I’ve not seen it working yet (?!), the DataDepot looks like it tries to find correlations between data sets:

Next stop convolution of data, maybe?

So whither the future? In an explanatory blog post on the flu trends service – How we help track flu trends – the Googlers let slip that “[t]his is just the first launch in what we hope will be several public service applications of Google Trends in the future.”

It’ll be interesting to see what exactly those are going to be?

PS I’m so glad I did electronics as an undergrad degree. Discrete maths and graph theory drove web 2.0 social networking theory algorithms, and signal processing – not RDF – will drive web 3.0…

8 comments

  1. Pingback: Memex 1.1 » Blog Archive » Google Search Data Trends
  2. Pingback: OUseful.Info, the blog…
  3. Pingback: Recession, What Recession? « OUseful.Info, the blog…
  4. karl dubost, w3c

    Hey,

    quite cool data. Did you skip the class on Physics of Solids? ;) Indeed Signal Processing is key to understand large volume of data. Signal Processing is a tool to extract probabilities and occurrences. You still need models to understand these data and their own structure. RDF and Signal Processing work hand in hand :)

    Before doing Web, I was in Astrophysics, which is specifically a domain where we deal with a very large volume of data and where signal processing is key to extract information and correlate it with… well defined structures :) Having only RDF would not work, having only signal processing will not work.

    Another domain which has to deal with massive number of data (and signal processing) is Genetics. They are working on RDF models for their data, because they need to *share*, combine these data. See http://www.w3.org/2001/sw/hcls/

  5. henrylow

    After last post on marketing without search engines, I decided to follow up with a strategy you can use to get quality free traffic. One of the easiest ways to get visitors to your web site is to spend money. Nothing is more effortless then paying for traffic. But if you can’t afford it or don’t want to pay, there’s an equally simple but free way to get traffic: ad swaps.

    http://www.onlineuniversalwork.com

  6. Pingback: Social Network Specialist
  7. Pingback: Social Computing Experts » Playing With Google Search Data Trends « OUseful.Info, the blog…