A post on Richard “Joy of Tax” Murphy’s blog a few days ago caught my eye – Data shows January is often the quietest time of the year for A & E departments – with a time series chart showing weekly admission numbers to A&E from a time when the numbers were produced weekly (they’re now produced monthly).
For me, this just cries out for a seasonal subseries plot. These are typically plotted over months or quarters and show for each month (or quarter) the year on year change of a indicator value. Rendering weekly subseries plots is a but more cluttered – 52 weekly subcharts rather 12 monthly ones – but still doable.
I haven’t generated subseries plots from pandas before, but the handy statsmodels Python library has a charting package that looks like it does the trick. The documentation is a bit sparse (I looked to the source…), but given a pandas dataframe and a suitable period based time series index, the chart falls out quite simply…
Here’s the chart and then the code… the data comes from NHS England, A&E Attendances and Emergency Admissions 2015-16 (2015.06.28 A&E Timeseries).
DO NOT TRUST THE FOLLOWING CHART
(Yes, yes I know; needs labels etc etc; but it’s a crappy graph, and if folk want to use it they need to generate a properly checked and labelled version themselves, right?!;-)
import pandas as pd # !pip3 install statsmodels import statsmodels.api as sm import statsmodels.graphics.tsaplots as tsaplots import matplotlib.pyplot as plt !wget -P data/ https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2015/04/2015.06.28-AE-TimeseriesBaG87.xls dfw=pd.read_excel('data/2015.06.28-AE-TimeseriesBaG87.xls',skiprows=14,header=None,na_values='-').dropna(how='all').dropna(axis=1,how='all') #Faff around with column headers, empty rows etc dfw.ix[0,2]='Reporting' dfw.ix[1,0]='Code' dfw= dfw.fillna(axis=1,method='ffill').T.set_index([0,1]).T.dropna(how='all').dropna(axis=1,how='all') dfw=dfw[dfw[('Reporting','Period')].str.startswith('W/E')] #pandas has super magic "period" datetypes... so we can cast a week ending date to a week period dfw['Reporting','_period']=pd.to_datetime(dfw['Reporting','Period'].str.replace('W/E ',''), format='%d/%m/%Y').dt.to_period('W') #Check the start/end date of the weekly period #dfw['Reporting','_period'].dt.asfreq('D','s') #dfw['Reporting','_period'].dt.asfreq('D','e') #Timeseries traditionally have the datey-timey thing as the index dfw=dfw.set_index([('Reporting', '_period')]) dfw.index.names = ['_period'] #Generate a matplotlib figure/axis pair to give us easier access to the chart chrome fig, ax = plt.subplots() #statsmodels has quarterly and montthly subseries plots helper functions #but underneath, they use a generic seasonal plot #If we groupby the week number, we can plot the seasonal subseries on a week number basis tsaplots.seasonal_plot(dfw['A&E attendances']['Total Attendances'].groupby(dfw.index.week), list(range(1,53)),ax=ax) #Tweak the display fig.set_size_inches(18.5, 10.5) ax.set_xticklabels(ax.xaxis.get_majorticklabels(), rotation=90);
As to how you read the chart – each line shows the trend over years for a particular week’s figures. The week number is along the x-axis. This chart type is really handy for letting you see a couple of things: year on year trend within a particular week; repeatable periodic trends over the course of the year.
A glance at the chart suggests weeks 24-28 (months 6/7 – so June/July) are the busy times in A&E?
PS the subseries plot uses pandas timeseries periods; see eg Wrangling Time Periods (such as Financial Year Quarters) In Pandas.
PPS Looking at the chart, it seems odd that the numbers always go up in a group. Looking at the code:
def seasonal_plot(grouped_x, xticklabels, ylabel=None, ax=None): """ Consider using one of month_plot or quarter_plot unless you need irregular plotting. Parameters ---------- grouped_x : iterable of DataFrames Should be a GroupBy object (or similar pair of group_names and groups as DataFrames) with a DatetimeIndex or PeriodIndex """ fig, ax = utils.create_mpl_ax(ax) start = 0 ticks =  for season, df in grouped_x: df = df.copy() # or sort balks for series. may be better way df.sort() nobs = len(df) x_plot = np.arange(start, start + nobs) ticks.append(x_plot.mean()) ax.plot(x_plot, df.values, 'k') ax.hlines(df.values.mean(), x_plot, x_plot[-1], colors='k') start += nobs ax.set_xticks(ticks) ax.set_xticklabels(xticklabels) ax.set_ylabel(ylabel) ax.margins(.1, .05) return fig
df.sort() in there – which I think should be removed, assuming that the the data presented is pre-sorted in the group?