Looking at some NHS 111 and A&E data today, the reported data I was interested in was being reported for different sorts of period, specifically, months and quarters.
The pandas python library has quite a few tools for dealing with periods, so here are a couple of examples of tricks I put to use today.
Firstly, casting months to a month period. Monthly periods (in column df['Periodname']) were reported in the form “Dec-10”, “Jan-11”, etc, which is to say a three letter month followed by a two digit year representation. If we create a datetime on the first of the corresponding month, we can then cast that date to a month period for the corresponding month:
#Generate a pandas period for time series indexing purposes #Create a date from each month (add the first date of the month) then set to period #Note: if the datetime is an index, drop the .dt. df['_period']=pd.to_datetime(('01-'+df['Periodname']), \ format='%d-%b-%y').dt.to_period('M')
If the datetimes form the dataframe index, we could drop the .dt. element and cast the timeseries index directly to the monthly period. The %b term maps onto the locale specific month description.
The second example had financial years and dates in two columns – Year and Quarter. The months specified in the quarter description mix three letter representations and the fully written name of the month. (As with many NHS datasets, the text used to describe dates can be all over the place, even within a single spreadsheet or datafile!)
If we grab the first three letters of the month, we can identify the month as it’s short form. If we create a date stamp for the first day of the quarter (in the UK, corporate financial years run 1-April, to 31-March (it’s the personal tax year that reports from April 5th?) we can then cast to a quarterly reporting period whose year ends in a particular month, eg March (freq='Q-MAR').
Note that the YearQuarter uses the calendar year in which the financial year ends. We can check the start and end date of the quarters explicitly:
One issue when it comes to plotting the financial year quarterly periods is that the chart looks to me as if everything is a year out:
That is, I read the year as the calendar year, but it’s actually the calendar year in which a particular financial year ends.
So I’m wondering – should pandas really report and plot the financial year in which the fiscal year starts? Is there a switch for this? Or do I need to submit a bug report (and if so, what’s the bug?)
PS for a notebook worked example, see here: NHS 111 example.