Weekly Subseries Charts – Plotting NHS A&E Admissions

A post on Richard “Joy of Tax” Murphy’s blog a few days ago caught my eye – Data shows January is often the quietest time of the year for A & E departments – with a time series chart showing weekly admission numbers to A&E from a time when the numbers were produced weekly (they’re now produced monthly).

In a couple of follow up posts, Sean Danaher did a bit more analysis to reinforce the claim, again generating time series charts over the whole reporting period.

For me, this just cries out for a seasonal subseries plot. These are typically plotted over months or quarters and show for each month (or quarter) the year on year change of a indicator value. Rendering weekly subseries plots is a but more cluttered – 52 weekly subcharts rather 12 monthly ones – but still doable.

I haven’t generated subseries plots from pandas before, but the handy statsmodels Python library has a charting package that looks like it does the trick. The documentation is a bit sparse (I looked to the source…), but given a pandas dataframe and a suitable period based time series index, the chart falls out quite simply…

Here’s the chart and then the code… the data comes from NHS England, A&E Attendances and Emergency Admissions 2015-16 (2015.06.28 A&E Timeseries).



(Yes, yes I know; needs labels etc etc; but it’s a crappy graph, and if folk want to use it they need to generate a properly checked and labelled version themselves, right?!;-)

import pandas as pd
# !pip3 install statsmodels
import statsmodels.api as sm
import statsmodels.graphics.tsaplots as tsaplots
import matplotlib.pyplot as plt

!wget -P data/ https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2015/04/2015.06.28-AE-TimeseriesBaG87.xls

#Faff around with column headers, empty rows etc
dfw= dfw.fillna(axis=1,method='ffill').T.set_index([0,1]).T.dropna(how='all').dropna(axis=1,how='all')


#pandas has super magic "period" datetypes... so we can cast a week ending date to a week period
dfw['Reporting','_period']=pd.to_datetime(dfw['Reporting','Period'].str.replace('W/E ',''), format='%d/%m/%Y').dt.to_period('W') 

#Check the start/end date of the weekly period

#Timeseries traditionally have the datey-timey thing as the index
dfw=dfw.set_index([('Reporting', '_period')])
dfw.index.names = ['_period']

#Generate a matplotlib figure/axis pair to give us easier access to the chart chrome
fig, ax = plt.subplots()

#statsmodels has quarterly and montthly subseries plots helper functions
#but underneath, they use a generic seasonal plot
#If we groupby the week number, we can plot the seasonal subseries on a week number basis
tsaplots.seasonal_plot(dfw['A&E attendances']['Total Attendances'].groupby(dfw.index.week),

#Tweak the display
fig.set_size_inches(18.5, 10.5)
ax.set_xticklabels(ax.xaxis.get_majorticklabels(), rotation=90);

As to how you read the chart – each line shows the trend over years for a particular week’s figures. The week number is along the x-axis. This chart type is really handy for letting you see a couple of things: year on year trend within a particular week; repeatable periodic trends over the course of the year.

A glance at the chart suggests weeks 24-28 (months 6/7 – so June/July) are the busy times in A&E?

PS the subseries plot uses pandas timeseries periods; see eg Wrangling Time Periods (such as Financial Year Quarters) In Pandas.

PPS Looking at the chart, it seems odd that the numbers always go up in a group. Looking at the code:

def seasonal_plot(grouped_x, xticklabels, ylabel=None, ax=None):
    Consider using one of month_plot or quarter_plot unless you need
    irregular plotting.

    grouped_x : iterable of DataFrames
        Should be a GroupBy object (or similar pair of group_names and groups
        as DataFrames) with a DatetimeIndex or PeriodIndex
    fig, ax = utils.create_mpl_ax(ax)
    start = 0
    ticks = []
    for season, df in grouped_x:
        df = df.copy() # or sort balks for series. may be better way
        nobs = len(df)
        x_plot = np.arange(start, start + nobs)
        ax.plot(x_plot, df.values, 'k')
        ax.hlines(df.values.mean(), x_plot[0], x_plot[-1], colors='k')
        start += nobs

    ax.margins(.1, .05)
    return fig

there’s a df.sort() in there – which I think should be removed, assuming that the the data presented is pre-sorted in the group?