OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Improving Autocorrelation Calculations on Google Trends Data

with 3 comments

In Identifying Periodic Google Trends, Part 1: Autocorrelation, I described how to calculate the autocorrelation statistic for Google Trend data using matplotlib. One the hacks that I found was required in order to calculate an informative autocorrelogram was to subtract the mean signal value from the original signal before running the calculation.

A more pathological situation occurs in the following case, using the Google Trends data for “run”:

Visual inspection of the original trend data suggests there is annual periodicity (note to self: learn how to add vertical gridlines at required points using matplotlib;-):

However, the autocorelogram does not detect the periodicity for two reasons: firstly, as with the previous cases, the non-zero mean value of the original time series data means the periodic excursions are attenuated in the autocorrelation calculation compared to excursions form a mean zero; and secondly, the increasing trend of the data adds further confusion to the year on year comparisons used in autocorrelation calculation.

Googling around remove trend and matplotlib turned up a detrend function that looked like it could help clean the data used for the autocorrelation calculation. In fact, the detrend function is mentioned in the acorr autocorrelation function documentation, although no details of values the function can take are provided there. However, searching the rest of that documentation page for detrend does turn up valid values for the function: detrend=mlab.detrend_mean, and mlab.detrend_linear, mlab.detrend_none where import matplotlib.mlab as mlab

If we set the detrend processor to mlab.detrend_mean we get the following:

And with detrend set to mlab.detrend_linear we get:

In each of these latter two cases, we see evidence of the 52 week correlation (i.e. annual periodicity).

FWIW, here’s the gist for the modified code.

Written by Tony Hirst

January 6, 2011 at 10:51 am

Posted in Analytics, Data

Tagged with

3 Responses

Subscribe to comments with RSS.

  1. [...] same conclusion, pragmatically, in Identifying Periodic Google Trends, Part 1: Autocorrelation and Improving Autocorrelation Calculations on Google Trends Data but now I’ll be remembering this as a condition of [...]

  2. [...] Improving Autocorrelation Calculations on Google Trends Data (ouseful.info) [...]

  3. [...] At the start of this year, I revisited the topic with a post on Identifying Periodic Google Trends, Part 1: Autocorrelation (followd by Improving Autocorrelation Calculations on Google Trends Data). [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 134 other followers