In Identifying Periodic Google Trends, Part 1: Autocorrelation, I described how to calculate the autocorrelation statistic for Google Trend data using matplotlib. One the hacks that I found was required in order to calculate an informative autocorrelogram was to subtract the mean signal value from the original signal before running the calculation.
A more pathological situation occurs in the following case, using the Google Trends data for “run”:
Visual inspection of the original trend data suggests there is annual periodicity (note to self: learn how to add vertical gridlines at required points using matplotlib;-):
However, the autocorelogram does not detect the periodicity for two reasons: firstly, as with the previous cases, the non-zero mean value of the original time series data means the periodic excursions are attenuated in the autocorrelation calculation compared to excursions form a mean zero; and secondly, the increasing trend of the data adds further confusion to the year on year comparisons used in autocorrelation calculation.
Googling around remove trend and matplotlib turned up a detrend function that looked like it could help clean the data used for the autocorrelation calculation. In fact, the detrend function is mentioned in the acorr autocorrelation function documentation, although no details of values the function can take are provided there. However, searching the rest of that documentation page for detrend does turn up valid values for the function: detrend=mlab.detrend_mean, and mlab.detrend_linear, mlab.detrend_none where import matplotlib.mlab as mlab
If we set the detrend processor to mlab.detrend_mean we get the following:
And with detrend set to mlab.detrend_linear we get:
In each of these latter two cases, we see evidence of the 52 week correlation (i.e. annual periodicity).
FWIW, here’s the gist for the modified code.
3 thoughts on “Improving Autocorrelation Calculations on Google Trends Data”
Comments are closed.