Category: Anything you want

Weekly Subseries Charts – Plotting NHS A&E Admissions

A post on Richard “Joy of Tax” Murphy’s blog a few days ago caught my eye – Data shows January is often the quietest time of the year for A & E departments – with a time series chart showing weekly admission numbers to A&E from a time when the numbers were produced weekly (they’re now produced monthly).

In a couple of follow up posts, Sean Danaher did a bit more analysis to reinforce the claim, again generating time series charts over the whole reporting period.

For me, this just cries out for a seasonal subseries plot. These are typically plotted over months or quarters and show for each month (or quarter) the year on year change of a indicator value. Rendering weekly subseries plots is a but more cluttered – 52 weekly subcharts rather 12 monthly ones – but still doable.

I haven’t generated subseries plots from pandas before, but the handy statsmodels Python library has a charting package that looks like it does the trick. The documentation is a bit sparse (I looked to the source…), but given a pandas dataframe and a suitable period based time series index, the chart falls out quite simply…

Here’s the chart and then the code… the data comes from NHS England, A&E Attendances and Emergency Admissions 2015-16 (2015.06.28 A&E Timeseries).

aeseasonalsubseries

(Yes, yes I know; needs labels etc etc; but it’s a crappy graph, and if folk want to use it they need to generate a properly checked and labelled version themselves, right?!;-)

import pandas as pd
# !pip3 install statsmodels
import statsmodels.api as sm
import statsmodels.graphics.tsaplots as tsaplots
import matplotlib.pyplot as plt

!wget -P data/ https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2015/04/2015.06.28-AE-TimeseriesBaG87.xls

dfw=pd.read_excel('data/2015.06.28-AE-TimeseriesBaG87.xls',skiprows=14,header=None,na_values='-').dropna(how='all').dropna(axis=1,how='all')
#Faff around with column headers, empty rows etc
dfw.ix[0,2]='Reporting'
dfw.ix[1,0]='Code'
dfw= dfw.fillna(axis=1,method='ffill').T.set_index([0,1]).T.dropna(how='all').dropna(axis=1,how='all')

dfw=dfw[dfw[('Reporting','Period')].str.startswith('W/E')]

#pandas has super magic "period" datetypes... so we can cast a week ending date to a week period
dfw['Reporting','_period']=pd.to_datetime(dfw['Reporting','Period'].str.replace('W/E ',''), format='%d/%m/%Y').dt.to_period('W') 

#Check the start/end date of the weekly period
#dfw['Reporting','_period'].dt.asfreq('D','s')
#dfw['Reporting','_period'].dt.asfreq('D','e')

#Timeseries traditionally have the datey-timey thing as the index
dfw=dfw.set_index([('Reporting', '_period')])
dfw.index.names = ['_period']

#Generate a matplotlib figure/axis pair to give us easier access to the chart chrome
fig, ax = plt.subplots()

#statsmodels has quarterly and montthly subseries plots helper functions
#but underneath, they use a generic seasonal plot
#If we groupby the week number, we can plot the seasonal subseries on a week number basis
tsaplots.seasonal_plot(dfw['A&E attendances']['Total Attendances'].groupby(dfw.index.week),
                       list(range(1,53)),ax=ax)

#Tweak the display
fig.set_size_inches(18.5, 10.5)
ax.set_xticklabels(ax.xaxis.get_majorticklabels(), rotation=90);

As to how you read the chart – each line shows the trend over years for a particular week’s figures. The week number is along the x-axis. This chart type is really handy for letting you see a couple of things: year on year trend within a particular week; repeatable periodic trends over the course of the year.

A glance at the chart suggests weeks 24-28 (months 6/7 – so June/July) are the busy times in A&E?

PS the subseries plot uses pandas timeseries periods; see eg Wrangling Time Periods (such as Financial Year Quarters) In Pandas.

A Unit of Comparison for Local Council Budget Consultations, Based on Transparency Spending Data – ASCdays?

A few years ago, via the BBC Radio 4 & World Service programme More or Less (incidentally co-produced by the OU), I came across the notion of miciromorts (Contextualising the Chance of Something Happening – Micromorts), a one in a million chance of death that can be used as a unit of risk to compare various likelihoods of dying. Associated with this measure is David Spiegelhalter’s microlife, “30 minutes of your life expectancy”. The point behind the microlife measure is that it provides a way of comparing life threatening risks based on how much life is likely to be lost, on average, when exposed to such risks.

Life expectancy for a man aged 22 in the UK is currently about 79 years, which is an extra 57 years, or 20,800 days, or 500,000 hours, or 1 million half hours. So, a young man of 22 typically has 1,000,000 half-hours (57 years) ahead of him, the same as a 26 year-old woman. We define a microlife as the result of a chronic risk that reduces life, on average, by just one of the million half hours that they have left.

The idea of micromorts came to mind last night as I was reflecting on a public budget consultation held by the Isle of Wight Council yesterday (a day that also saw the Council’s Leader and Deputy Leader resign their positions). The Council needs to improve budgetary matters by £20 million over the next 3 years, starting with £7.5m in the next financial year. This can come through increasing funding, or cuts. By far the biggest chunk of expenditure by the council, as with all councils, is on adult social care (ASC) [community care statistics /via @jonpoole].

As with every year for the past however many years, I’ve had a vague resolution to do something with local council spending data, and never got very far. Early dabblings with the data that I’ve so far played with this year (and intend to continue…) reinforce the notion that ASC is expensive. Here’s a quick summary of the spending data items for October, 2016:

The spend for each of the directorates was as follows:

  • Adult Services:
    • total spend: £7,746,875.55 (48.33%% of total monthly spend)
    • capital: £395,900.06 (5.11% of directorate monthly spend)
    • revenue: £7,350,975.49 (94.89% of directorate monthly spend)
  • Chief Executive:
    • total spend: £501,021.32 (3.13%% of total monthly spend)
    • capital: £492,507.54 (98.30% of directorate monthly spend)
    • revenue: £8,513.78 (1.70% of directorate monthly spend)
  • Childrens Services:
    • total spend: £2,044,524.26 (12.76%% of total monthly spend)
    • capital: £243,675.08 (11.92% of directorate monthly spend)
    • revenue: £1,800,849.18 (88.08% of directorate monthly spend)
  • Place:
    • total spend: £4,924,117.40 (30.72%% of total monthly spend)
    • capital: £974,024.13 (19.78% of directorate monthly spend)
    • revenue: £3,950,093.27 (80.22% of directorate monthly spend)
  • Public Health:
    • total spend: £434,654.13 (2.71%% of total monthly spend)
    • revenue: £434,654.13 (100.00% of directorate monthly spend)
  • Regeneration:
    • total spend: £57.65 (0.00%% of total monthly spend)
    • revenue: £57.65 (100.00% of directorate monthly spend)
  • Resources:
    • total spend: £377,172.20 (2.35%% of total monthly spend)
    • capital: £20,367.87 (5.40% of directorate monthly spend)
    • revenue: £356,804.33 (94.60% of directorate monthly spend)

Cancelling out Adult Services revenue spend for a month would match the £7.5 million required to make up next year’s funds. That’s unlikely to happen, but it does perhaps hint at a possible unit of comparison when trying to make budget decisions, or at least, support budget consultations.

From my naive perspective, adult social care needs to support a certain number of people, a number that evolves (probably?) in line with demographics. One of the ways people exit care is by dying, though the service is set up to minimise harm and help prolong life. Folk may also be transiently cared for (that is, they enter the care system and then leave it). By looking at the amount spent on adult social care, we can come up with an average cost (mean, median?) per person per day of adult social care – ASCdays. We can reduce the total cost by reducing the amount of time folk spend in the system, either by shortening transient stays or postponing entry into the system.

So what I’ve started wondering is this: as one way of trying to make sense of transparency spending data, is there any use in casting it into equivalent units of ASCdays? If we use ASCday equivalent units, can we take a weak systems view and try to get a feel for whether a cut to a particular service (or improvement of another) can help us get a handle on the ASC expenditure – or whether it might cause problems down the line?

For example, suppose a week’s respite care costs the same as two weeks worth of ASCdays. If that week’s respite care keeps someone out of the adult care service for a month, we’re quids in. If cutting respite care saves 100 ASCdays of funding, but is likely to bring just one person into the care system 3 months early, we might start to doubt whether it will actually lead to any saving at all. (Longer tail saves complicate matters given councils need to balance a budget within a financial year. Spending money this year to save next year requires access to reserves – and confidence in your bet…)

For trying to make budget decisions, or helping engage citizens in budgetary consultations, costing things as per ASCday equivalents, and then trying to come up with some probabilities about the likelihood that a particular cut or expense will result in a certain number of people entering or leaving ASC sooner or later, may help you get a feel for the consequences for a particular action.

As to whether prior probabilities exist around whether cutting this service, or supporting that, are likely to impact on the adult care system, maybe data for that is out there, also?

EU Legal Affairs – Proposals on Civil Law Rules on Robotics

The EU Legal Affairs Committee voted last week to call for legislation that is likely to include a definition of “smart autonomous robots”, regulation of the same, and an ethical code of conduct for designers, producers and users.

Note that the full proposal to be put forward is not currently available – the draft is here: COMMITTEES-JURI-PR-2017-01-12-1095387EN), with amendments.

One thing I noticed in the original document was that it seemed limited in it’s definition of “smart autonomous robots” as physically instantiated things, and ignored more general “smart systems” – AI software systems, for example. An amendment in PE592.405 addressed this, broadening the scope to cover “AI” more generally: smart autonomous robots and their subcategories by taking into consideration the following characteristics of a smart robot and an autonomous system, that comprised a physical support or is connected to a software programme without being embedded in a physical support.

When the topic was originally announced, a big thing was made in the news about the calls for robots to be classed as “electronic persons”. A policy study – European Civil Law Rules In Robotics – that fed into the deliberations attempted to debunk this:

3.1. Incongruity of establishing robots as liable legal persons
The motion for a resolution proposes creating a new category of individual, specifically for robots: electronic persons. Paragraph 31(f) calls upon the European Commission to explore the legal consequences of “creating a specific legal status for robots, so that at least the most sophisticated autonomous robots could be established as having the status of electronic persons with specific rights and obligations, including that of making good any damage they may cause [to third parties], and applying electronic personality to cases where robots make smart autonomous decisions or otherwise interact with third parties”.

When considering civil law in robotics, we should disregard the idea of autonomous robots having a legal personality, for the idea is as unhelpful as it is inappropriate.

Traditionally, when assigning an entity legal personality, we seek to assimilate it to humankind. This is the case with animal rights, with advocates arguing that animals should be assigned a legal personality since some are conscious beings, capable of suffering, etc., and so of feelings which separate them from things. Yet the motion for a resolution does not tie the acceptance of the robot’s legal personality to any potential consciousness. Legal personality is therefore not linked to any regard for the robot’s inner being or feelings, avoiding the questionable assumption that the robot is a conscious being. Assigning robots such personality would, then, meet a simple operational objective arising from the need to make robots liable for their actions.

… the motion for a resolution would appear more inclined to fully erase the human presence. In viewing as an electronic person any “robots [which] make smart autonomous decisions or otherwise interact with third parties” (end of paragraph 31(f)), the motion seems to suggest that the robot itself would be liable and become a legal actor. This analysis finds support in paragraph S, which states that “the more autonomous robots are, the less they can be considered simple tools in the hands of other actors […] [and this] calls for new rules which focus on how a machine can be held — partly or entirely — responsible for its acts or omissions”. Once a robot is no longer controlled by another actor, it becomes the actor itself. Yet how can a mere machine, a carcass devoid of consciousness, feelings, thoughts or its own will, become an autonomous legal actor? How can we even conceive this reality as foreseeable within 10 to 15 years, i.e. within the time frame set in paragraph 25 of the motion for a resolution? From a scientific, legal and even ethical perspective, it is impossible today — and probably will remain so for a long time to come — for a robot to take part in legal life without a human being pulling its strings.

What is more, considering that the main purpose of assigning a robot legal personality would be to make it a liable actor in the event of damage, we should note that other systems would be far more effective at compensating victims; for example, an insurance scheme for autonomous robots, perhaps combined with a compensation fund (paragraphs 31(a) to (e)).

We also have to bear in mind that this status would unavoidably trigger unwanted legal consequences. Paragraph T of the motion states that creating a legal personality would mean that robots’ rights and duties had to be respected. How can we contemplate conferring rights and duties on a mere machine? How could a robot have duties, since this idea is closely linked with human morals? Which rights would we bestow upon a robot: the right to life (i.e. the right to non-destruction), the right to dignity, the right to equality with humankind, the right to retire, the right to receive remuneration (an option explicitly explored in paragraph 31(b) of the motion), etc.? …

In reality, advocates of the legal personality option have a fanciful vision of the robot, inspired by science-fiction novels and cinema. They view the robot — particularly if it is classified as smart and is humanoid — as a genuine thinking artificial creation, humanity’s alter ego. We believe it would be inappropriate and out-of-place not only to recognise the existence of an electronic person but to even create any such legal personality. Doing so risks not only assigning rights and obligations to what is just a tool, but also tearing down the boundaries between man and machine, blurring the lines between the living and the inert, the human and the inhuman. Moreover, creating a new type of person – an electronic person – sends a strong signal which could not only reignite the fear of artificial beings but also call into question Europe’s humanist foundations. Assigning person status to a nonliving, non-conscious entity would therefore be an error since, in the end, humankind would likely be demoted to the rank of a machine. Robots should serve humanity and should have no other role, except in the realms of science-fiction.

Yes but, no but… replace “robot” with “company” in the above. Companies have a legal basis, are defined as legal entities, albeit requiring (in the UK at least), at least one human officer to be responsible for them. To what extent could smart systems be similarly treated, or incorporated?

Or consider this – employees, who have human agency have responsibilities, and liabilities incurred by them may or may not become liabilities for their employer. To the extent that smart systems have “machine agency”, to what extent might they also be classed as employees, eg for insurance or liability purposes? (Related: Fragments – Should Algorithms, Deep Learning AI Models and/or Robots be Treated as Employees?.) To be an “employee”, do you also have to be a legal entity?

PS there’s also a clause in the amendments that expresses neo-Luddite tendencies with which I am not totally unsympathetic:

“whereas robotics and AI that can perform similar tasks to those performed by humans should be used mainly to support and boost the abilities of man, as opposed to trying to replace the human element completely”

I haven’t read all the amendments yet – there may be more nuggets in there…

Convention Based Used URLs Support Automation

I spent a chunk of last week at Curriculum Development Hackathon for a Data Carpentry workshop on Reproducible Research using Jupyter Notebooks (I’d like to thank the organisers for the travel support). One of the planned curriculum areas looked at data project organisation, another on automation. Poking around on an NHS data publication webpage for a particular statistical work area just now suggests an example of how the two inter-relate… and how creating inconsistent URLs or filenames makes automatically downloading similar files a bit of a faff when it could be so easy…

To begin with, the A&E Attendances and Emergency Admissions statistical work area has URL:

https://www.england.nhs.uk/statistics/statistical-work-areas/ae-waiting-times-and-activity/

The crumb trail in the on-page navigation has the form:

Home -> Statistics -> Statistical Work Areas -> A&E Attendances and Emergency Admissions

which we might note jars somewhat with the slug ae-waiting-times-and-activity, and perhaps reflects some sort of historical legacy in how the data was treated previously…

Monthly data is collected on separate financial year related pages linked from that page:

https://www.england.nhs.uk/statistics/statistical-work-areas/ae-waiting-times-and-activity/statistical-work-areasae-waiting-times-and-activityae-attendances-and-emergency-admissions-2016-17/

https://www.england.nhs.uk/statistics/statistical-work-areas/ae-waiting-times-and-activity/statistical-work-areasae-waiting-times-and-activityae-attendances-and-emergency-admissions-2015-16-monthly-3/

The breadcrumb for these pages has the form:

Home -> Statistics -> Statistical Work Areas -> A&E Attendances and Emergency Admissions -> A&E Attendances and Emergency Admissions 20MM-NN

A few of observations about those financial year related page URLs. Firstly, the path is rooted on the parent page (a Good Thing), but the slug looks mangled together from what looks like a more reasonable parent path (statistical-work-areasae-waiting-times-and-activity; this looks as if it’s been collapsed from statistical-work-areas/ae-waiting-times-and-activity).

The next part of the URL specifies the path to the A & E Attendances and Emergency Admissions page for a particular year, with an appropriate slug for the name – ae-attendances-and-emergency-admissions- but differently formed elements for the years: 2016-17 compared to 2015-16-monthly-3.

(Note that the 2015-16 monthly listing is incomplete and starts in June 2015.)

If we look at URLs for some of the monthly 2016-17 Excel data file downloads, we see inconsistency in the filenames:

https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2016/06/November-2016-AE-by-provider-W0Hp0.xls
https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2016/06/October-2016-AE-by-provider-Nxpai.xls
https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2016/06/September-2016-AE-by-provider-BtD4b.xls

(Note that CSV data seems only to be available for the latest (November 2016) data set. I don’t know if this means that the CSV data link only appears for the current month, or data in the CSV format only started to be published in November 2016.)

For the previous year we get:

https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2015/08/March-2016-AE-by-provider-9g0dQ-Revised-11082016.xls
https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2015/08/February-2016-AE-by-provider-1gWNy-Revised-11082016.xls

and so on.

Inspection of these URLs suggests:

  1. the data is being uploaded to and published from a WordPress site (wp-content/uploads);
  2. the path to the data directory for the annual collection is minted according to the month in which the first dataset of the year is uploaded (data takes a month or two to be uploaded, so presumably the April 2016 data was posted in June, 2016 (2016/06); the 2015 data started late – the first month (June 2015) presumably being uploaded in August of that year (2015/08);
  3. the month slug for the data file starts off fine, being of the form MONTH-YEAR-AE-by-provider-, but then breaks things by having some sort of code value that perhaps uniquely identifies the version of the file;
  4. the month slug may be further broken by the addition of a revision element (eg -Revised-11082016).

If the URLs all had a common pattern, it would be easy enough to automate their generation from a slug pattern and month/year combination, and then automatically download them. (I haven’t yet explored inside each spreadsheet to see what inconsistency errors/horrors make it non-trivial to try to combine the monthly data into a single historical data set…)

As it is, to automate the download of the files requires scraping the web pages for the links, or manually retrieving them. (At least the link text on the pages seems to be reasonably consistent!)

An Alternative Way of Motivating the Use of Functions?

At the end of the first of the Curriculum Development Hackathon on Reproducible Research using Jupyter Notebooks held at BIDS in Berkeley, yesterday, discussion turned on whether we should include a short how-to on the use of interactive IPython widgets to support exploratory data analysis. This would provide workshop participants with an example of how to rapidly prototype a simple exploratory data analysis application such as an interactive chart, enabling them to explore a range of parameter values associated with the data being plotted in a convenient way.

In summarising how the ipywidgets interact() function works, Fernando Perez made a comment that made wonder whether we could use the idea of creating simple interactive chart explorers as a way of motivating the use of functions.

More specifically, interact() takes a function name and the set of parameters passed into that function and creates a set of appropriate widgets for setting the parameters associated with the function. Changing the widget setting runs the function with the currently selected values of the parameters. If the function returns a chart object, then the function essentially defines an interactive chart explorer application.

So one reason for creating a function is that you may be able to automatically convert into an interactive application using interact().

Here’s a quick first sketch notebook that tries to set up a motivating example: An Alternative Way of Motivating Functions?

PS to embed an image of a rendered widget in the notebook, select the Save notebook with snapshots option from the Widgets menu:

simplewidgetdemo

See also: Simple Interactive View Controls for pandas DataFrames Using IPython Widgets in Jupyter Notebooks

Google’s Appetite for Training Data

A placeholder post – I’ll try to remember to add to this as and when I see examples of Google explicitly soliciting training data from users that can be used to train its AI models…

Locations – “Popular Times”

tesco_extra_ryde_isle_of_wight_-_google_search_and_tesco_extra_ryde_isle_of_wight_-_google_search

For example: Google Tracking How Busy Places are by Looking at Location Histories [SEO by the Sea] which also refers to a patent describing the following geo-intelligence technique: latency analysis.

A latency analysis system determines a latency period, such as a wait time, at a user destination. To determine the latency period, the latency analysis system receives location history from multiple user devices. With the location histories, the latency analysis system identifies points-of-interest that users have visited and determines the amount of time the user devices were at a point-of-interest. For example, the latency analysis system determines when a user device entered and exited a point-of-interest. Based on the elapsed time between entry and exit, the latency analysis system determines how long the user device was inside the point-of-interest.

Simple Interactive View Controls for pandas DataFrames Using IPython Widgets in Jupyter Notebooks

A quick post to note a couple of tricks for generating simple interactive controls that let you manipulate the display of a pandas dataframe in a Jupyter notebook using IPython widgets.

Suppose we have a dataframe df, and we want to limit the display of rows to just the rows for which the value in a particular column matches a particular categorical value. We can create a drop down list containing the distinct/unique values contained within the column, and use this to control the display of the dataframe rows. Adding an “All” option allows us to display all the rows:

iw_property_register

import ipywidgets as widgets
from ipywidgets import interactive

items = ['All']+sorted(df['Asset Type Description'].unique().tolist())

def view(x=''):
    if x=='All': return df
    return df[df['Asset Type Description']==x]

w = widgets.Select(options=items)
interactive(view, x=w)

We can also create multiple controller widgets and bind them into a single interactive display function. For example, we might have one control to filter rows according to the value contained within a particular column, and another widget limiting how many rows are displayed by creating two widgets and accessing them via an interactive ipywidgets construction call:

iw_property_register2

def view2(x='',y=3):
    if x=='All': return df.head(y)
    return df[df['Asset Type Description']==x].head(y)

a_slider = widgets.IntSlider(min=0, max=15, step=1, value=5)
b_select =  widgets.Select(options=items)
widgets.interactive(view2,y=a_slider,x=b_select)

The attributes of a widget can also be set dynamically, even to the effect of setting the attribute values of one widget as some ultimate function of the value returned from another widget. For example, suppose we want to limit the rows displayed to range from 0 to the total number of rows associated with a particular “subselected” dataframe. Using the .observe() method applied to a widget, we can call a function whenever that widget is interacted with that acts on its new value:

iw_property_register3

c_slider = widgets.IntSlider(min=0, max=len(df), step=1, value=5)
d_select =  widgets.Select(options=items)

def update_c_range(*args):
    if d_select.value=='All':
        c_slider.max = len(df)
    else:
        c_slider.max = len(df[df['Asset Type Description']==d_select.value])

d_select.observe(update_c_range, 'value')

def view3(x='',y=3):
    if x=='All': return df.head(y)
    return df[df['Asset Type Description']==x].head(y)

widgets.interactive(view3,y=c_slider,x=d_select)