A Recipe for Automatically Going From Data to Text to Reveal.js Slides

Over the last few years, I’ve experimented on and off with various recipes for creating text reports from tabular data sets, (spreadsheet plugins are also starting to appear with a similar aim in mind). There are several issues associated with this, including:

  • identifying what data or insight you want to report from your dataset;
  • (automatically deriving the insights);
  • constructing appropriate sentences from the data;
  • organising the sentences into some sort of narrative structure;
  • making the sentences read well together.

Another approach to humanising the reporting of tabular data is to generate templated webpages that review and report on the contents of a dataset; this has certain similarities to dashboard style reporting, mixing tables and charts, although some simple templated text may also be generated to populate the page.

In a business context, reporting often happens via Powerpoint presentations. Slides within the presentation deck may include content pulled from a templated spreadsheet, which itself may automatically generate tables and charts for such reuse from a new dataset. In this case, the recipe may look something like:


#render via: http://blockdiag.com/en/blockdiag/demo.html

  Y1[label='Powerpoint slide']
  Y2[label='Powerpoint slide']

   data -> Excel -> Chart -> X1 -> Y1;
   Excel -> Table -> X2 -> Y2 ;

In the previous couple of posts, the observant amongst you may have noticed I’ve been exploring a couple of components for a recipe that can be used to generate reveal.js browser based presentations from the 20% that account for the 80%.

The dataset I’ve been tinkering with is a set of monthly transparency spending data from the Isle of Wight Council. Recent releases have the form:


So as hinted at previously, it’s possible to use the following sort of process to automatically generate reveal.js slideshows from a Jupyter notebook with appropriately configured slide cells (actually, normal cells with an appropriate metadata element set) used as an intermediate representation.


  X2[label="Jupyter notebook\n(slide mode)"]


  data -> "pandas dataframe" -> X1  -> X2 ->X3
  "pandas dataframe" -> Y1,Y2,Y3  -> X2 ->X3

  Y2 [shape = "dots"];

There’s an example slideshow based on October 2016 data here. Note that some slides have “subslides”, that is, slides underneath them, so watch the arrow indicators bottom left to keep track of when they’re available. Note also that the scrolling is a bit hit and miss – ideally, a new slide would always be scrolled to the top, and for fragments inserted into a slide one at a time the slide should scroll down to follow them).

The structure of the presentation is broadly as follows:


For example, here’s a summary slide of the spends by directorate – note that we can embed charts easily enough. (The charts are styled using seaborn, so a range of alternative themes are trivially available). The separate directorate items are brought in one at a time as fragments.


The next slide reviews the capital versus expenditure revenue spend for a particular directorate, broken down by expenses type (corresponding slides are generated for all other directorates). (I also did a breakdown for each directorate by service area.)

The items listed are ordered by value, and taken together account for at least 80% of the spend in the corresponding area. Any further items contributing more than 5%(?) of the corresponding spend are also listed.


Notice that subslides are available going down from this slide, rather than across the mains slides in the deck. This 1.5D structure means we can put an element of flexible narrative design into the presentation, giving the reader an opportunity to explore the data, but in a constrained way.

In this case, I generated subslides for each major contributing expenses type to the capital and revenue pots, and then added a breakdown of the major suppliers for that spending area.


This just represents a first pass at generating a 1.5D slide deck from a tabular dataset. A Pareto (80/20) heurstic is used to try to prioritise to the information displayed in order to account for 80% of spend in different areas, or other significant contributions.

Applying this principle repeatedly allows us to identify major spending areas, and then major suppliers within those spending areas.

The next step is to look at other ways of segmenting and structuring the data in order to produce reports that might actually be useful…

If you have any ideas, please let me know via the comments, or get in touch directly…

PS FWIW, it should be easy enough to run any other dataset that looks broadly like the example at the top through the same code with only a couple of minor tweaks…

Accounting for the 80% – A Quick Pareto Principle Filter for pandas

Having decided (again) to try to do something with local government transparency spending data this year, I thought I take the take of generating some simple reports that just identify the significant spending areas within a particular directorate or service area.

It’s easy enough to render dozens of charts that show some bars bigger than others, and from this suggest the significant spending areas, but this still requires folk to spend time reading those charts and runs the risk that that they don’t “read” from the chart what you wanted them to… (This is one way in which titles and captions can help…)


So how about putting the Pareto Principle, or 80/20 rule, to work, where 80% of some effect or other (such as spending) is accounted for by 20% of contributory effects (such as spending in a particular area, or to a particular set of suppliers)?

In other words, is one way in to the spending data to use it simply to see what accounts for 80%, or thereabouts, of monthly spend?

Here’s a quick function that tries to do something like that, that can be applied to a pandas Series:

def paretoXY(s, x, y,threshold=0):
    ''' Return a list until you account for X% of the whole and remainders are less than Y% individually.
        The threshold percentage allows you to hedge your bets and check items just past the treshold. '''
    #Generate percentages, and sort, and find accumulated total
    #Exclude any negative payments that can make the cumulative percentage go > 100% before we get to them
    df=pd.DataFrame( s[s>0].sort_values(ascending=False) )
    df['pc']= 100*s/s.sum()
    #Return ordered items that either together account at least for X% of the total,
    # and/or individually account for at least Y% of the total
    #The threshold provides a fudge factor on both components...
    return df[ (df['cum_pc']-df['pc'] <= x+ x*threshold/100) | (df['pc'] >= y-y*threshold/100) ]


The resulting report simply describes just the components that either make up 80% (or whatever) of the total in each area, or that represent a significant contribution (howsoever defined), in their own right, to the corresponding total.

In the above case, the report describes the significant expense types in capital or revenue streams for each directorate for a given month’s spending data.

The resulting dataframe can also be converted to a crude text report summarising percentage contributions to specific areas easily enough:


Automatically Generating Two Dimensional Reveal.js Documents Using Jupyter Notebooks

One of the things I finally got round to exploring whilst at the Reproducible Research Using Jupyter Notebooks curriculum development hackathon was the ability to generate slideshows from Jupyter notebooks.

The underlying slideshow presentation framework is reveal.js. This uses a 1.5(?) dimensional slide geometry, so slides can transition left to right, or you can transition down to subslides off a single slide.

This got me wondering… could I use a notebook/script to generate a reveal.js slideshow that could provide a convenient way of navigating automatically generated slideshows made up from automatically generated data2text slides?

The 1.5/two dimensional component would mean that slides could be structured by topic horizontally, with subtopic vertically downwards within a topic.

A quick test suggests that this is absolutely doable…

import IPython.nbformat as nb
import IPython.nbformat.v4.nbbase as nb4

test.cells.append(nb4.new_markdown_cell('# Test slide1',metadata={"slideshow": {"slide_type": "slide"}}))
test.cells.append(nb4.new_markdown_cell('# Test slide2',metadata={"slideshow": {"slide_type": "slide"}}))
test.cells.append(nb4.new_markdown_cell('Slide2 extra content line 1\n\nSlide2 extra content line 2'))
test.cells.append(nb4.new_markdown_cell('# Test slide3',metadata={"slideshow": {"slide_type": "slide"}}))
test.cells.append(nb4.new_markdown_cell('Slide3 fragment 1',metadata={"slideshow": {"slide_type": "fragment"}}))
test.cells.append(nb4.new_markdown_cell('Slide3 fragment 2',metadata={"slideshow": {"slide_type": "fragment"}}))
test.cells.append(nb4.new_markdown_cell('# Slide4',metadata={"slideshow": {"slide_type": "slide"}}))
test.cells.append(nb4.new_markdown_cell('Slide4 extra content line 1\n\nSlide4 extra content line 2'))
test.cells.append(nb4.new_markdown_cell('# Slide4 subslide',metadata={"slideshow": {"slide_type": "subslide"}}))


#Generate and render slideshow
!jupyter nbconvert $nbf --to slides --post serve

Let the fun begin…:-)

PS here’s a second pass:

def addSlideComponent(notebook, content, styp=''):
    if styp in ['slide','fragment','subslide']: styp={"slideshow": {"slide_type":styp}}
    else: styp={}
    notebook.cells.append(nb4.new_markdown_cell(content, metadata=styp))

addSlideComponent(test,'# Test2 slide1','slide')
addSlideComponent(test,'# Test slide2','slide')
addSlideComponent(test,'Slide2 extra content line 1\n\nSlide2 extra content line 2')
addSlideComponent(test,'# Test slide3','slide')
addSlideComponent(test,'Slide3 fragment 1','fragment')
addSlideComponent(test,'Slide3 fragment 2','fragment')
addSlideComponent(test,'# Slide4','slide')
addSlideComponent(test,'Slide4 extra content line 1\n\nSlide2 extra content line 1')
addSlideComponent(test,'# Slide4 subslide','subslide')


Weekly Subseries Charts – Plotting NHS A&E Admissions

A post on Richard “Joy of Tax” Murphy’s blog a few days ago caught my eye – Data shows January is often the quietest time of the year for A & E departments – with a time series chart showing weekly admission numbers to A&E from a time when the numbers were produced weekly (they’re now produced monthly).

In a couple of follow up posts, Sean Danaher did a bit more analysis to reinforce the claim, again generating time series charts over the whole reporting period.

For me, this just cries out for a seasonal subseries plot. These are typically plotted over months or quarters and show for each month (or quarter) the year on year change of a indicator value. Rendering weekly subseries plots is a but more cluttered – 52 weekly subcharts rather 12 monthly ones – but still doable.

I haven’t generated subseries plots from pandas before, but the handy statsmodels Python library has a charting package that looks like it does the trick. The documentation is a bit sparse (I looked to the source…), but given a pandas dataframe and a suitable period based time series index, the chart falls out quite simply…

Here’s the chart and then the code… the data comes from NHS England, A&E Attendances and Emergency Admissions 2015-16 (2015.06.28 A&E Timeseries).


(Yes, yes I know; needs labels etc etc; but it’s a crappy graph, and if folk want to use it they need to generate a properly checked and labelled version themselves, right?!;-)

import pandas as pd
# !pip3 install statsmodels
import statsmodels.api as sm
import statsmodels.graphics.tsaplots as tsaplots
import matplotlib.pyplot as plt

!wget -P data/ https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2015/04/2015.06.28-AE-TimeseriesBaG87.xls

#Faff around with column headers, empty rows etc
dfw= dfw.fillna(axis=1,method='ffill').T.set_index([0,1]).T.dropna(how='all').dropna(axis=1,how='all')


#pandas has super magic "period" datetypes... so we can cast a week ending date to a week period
dfw['Reporting','_period']=pd.to_datetime(dfw['Reporting','Period'].str.replace('W/E ',''), format='%d/%m/%Y').dt.to_period('W') 

#Check the start/end date of the weekly period

#Timeseries traditionally have the datey-timey thing as the index
dfw=dfw.set_index([('Reporting', '_period')])
dfw.index.names = ['_period']

#Generate a matplotlib figure/axis pair to give us easier access to the chart chrome
fig, ax = plt.subplots()

#statsmodels has quarterly and montthly subseries plots helper functions
#but underneath, they use a generic seasonal plot
#If we groupby the week number, we can plot the seasonal subseries on a week number basis
tsaplots.seasonal_plot(dfw['A&E attendances']['Total Attendances'].groupby(dfw.index.week),

#Tweak the display
fig.set_size_inches(18.5, 10.5)
ax.set_xticklabels(ax.xaxis.get_majorticklabels(), rotation=90);

As to how you read the chart – each line shows the trend over years for a particular week’s figures. The week number is along the x-axis. This chart type is really handy for letting you see a couple of things: year on year trend within a particular week; repeatable periodic trends over the course of the year.

A glance at the chart suggests weeks 24-28 (months 6/7 – so June/July) are the busy times in A&E?

PS the subseries plot uses pandas timeseries periods; see eg Wrangling Time Periods (such as Financial Year Quarters) In Pandas.

A Unit of Comparison for Local Council Budget Consultations, Based on Transparency Spending Data – ASCdays?

A few years ago, via the BBC Radio 4 & World Service programme More or Less (incidentally co-produced by the OU), I came across the notion of miciromorts (Contextualising the Chance of Something Happening – Micromorts), a one in a million chance of death that can be used as a unit of risk to compare various likelihoods of dying. Associated with this measure is David Spiegelhalter’s microlife, “30 minutes of your life expectancy”. The point behind the microlife measure is that it provides a way of comparing life threatening risks based on how much life is likely to be lost, on average, when exposed to such risks.

Life expectancy for a man aged 22 in the UK is currently about 79 years, which is an extra 57 years, or 20,800 days, or 500,000 hours, or 1 million half hours. So, a young man of 22 typically has 1,000,000 half-hours (57 years) ahead of him, the same as a 26 year-old woman. We define a microlife as the result of a chronic risk that reduces life, on average, by just one of the million half hours that they have left.

The idea of micromorts came to mind last night as I was reflecting on a public budget consultation held by the Isle of Wight Council yesterday (a day that also saw the Council’s Leader and Deputy Leader resign their positions). The Council needs to improve budgetary matters by £20 million over the next 3 years, starting with £7.5m in the next financial year. This can come through increasing funding, or cuts. By far the biggest chunk of expenditure by the council, as with all councils, is on adult social care (ASC) [community care statistics /via @jonpoole].

As with every year for the past however many years, I’ve had a vague resolution to do something with local council spending data, and never got very far. Early dabblings with the data that I’ve so far played with this year (and intend to continue…) reinforce the notion that ASC is expensive. Here’s a quick summary of the spending data items for October, 2016:

The spend for each of the directorates was as follows:

  • Adult Services:
    • total spend: £7,746,875.55 (48.33%% of total monthly spend)
    • capital: £395,900.06 (5.11% of directorate monthly spend)
    • revenue: £7,350,975.49 (94.89% of directorate monthly spend)
  • Chief Executive:
    • total spend: £501,021.32 (3.13%% of total monthly spend)
    • capital: £492,507.54 (98.30% of directorate monthly spend)
    • revenue: £8,513.78 (1.70% of directorate monthly spend)
  • Childrens Services:
    • total spend: £2,044,524.26 (12.76%% of total monthly spend)
    • capital: £243,675.08 (11.92% of directorate monthly spend)
    • revenue: £1,800,849.18 (88.08% of directorate monthly spend)
  • Place:
    • total spend: £4,924,117.40 (30.72%% of total monthly spend)
    • capital: £974,024.13 (19.78% of directorate monthly spend)
    • revenue: £3,950,093.27 (80.22% of directorate monthly spend)
  • Public Health:
    • total spend: £434,654.13 (2.71%% of total monthly spend)
    • revenue: £434,654.13 (100.00% of directorate monthly spend)
  • Regeneration:
    • total spend: £57.65 (0.00%% of total monthly spend)
    • revenue: £57.65 (100.00% of directorate monthly spend)
  • Resources:
    • total spend: £377,172.20 (2.35%% of total monthly spend)
    • capital: £20,367.87 (5.40% of directorate monthly spend)
    • revenue: £356,804.33 (94.60% of directorate monthly spend)

Cancelling out Adult Services revenue spend for a month would match the £7.5 million required to make up next year’s funds. That’s unlikely to happen, but it does perhaps hint at a possible unit of comparison when trying to make budget decisions, or at least, support budget consultations.

From my naive perspective, adult social care needs to support a certain number of people, a number that evolves (probably?) in line with demographics. One of the ways people exit care is by dying, though the service is set up to minimise harm and help prolong life. Folk may also be transiently cared for (that is, they enter the care system and then leave it). By looking at the amount spent on adult social care, we can come up with an average cost (mean, median?) per person per day of adult social care – ASCdays. We can reduce the total cost by reducing the amount of time folk spend in the system, either by shortening transient stays or postponing entry into the system.

So what I’ve started wondering is this: as one way of trying to make sense of transparency spending data, is there any use in casting it into equivalent units of ASCdays? If we use ASCday equivalent units, can we take a weak systems view and try to get a feel for whether a cut to a particular service (or improvement of another) can help us get a handle on the ASC expenditure – or whether it might cause problems down the line?

For example, suppose a week’s respite care costs the same as two weeks worth of ASCdays. If that week’s respite care keeps someone out of the adult care service for a month, we’re quids in. If cutting respite care saves 100 ASCdays of funding, but is likely to bring just one person into the care system 3 months early, we might start to doubt whether it will actually lead to any saving at all. (Longer tail saves complicate matters given councils need to balance a budget within a financial year. Spending money this year to save next year requires access to reserves – and confidence in your bet…)

For trying to make budget decisions, or helping engage citizens in budgetary consultations, costing things as per ASCday equivalents, and then trying to come up with some probabilities about the likelihood that a particular cut or expense will result in a certain number of people entering or leaving ASC sooner or later, may help you get a feel for the consequences for a particular action.

As to whether prior probabilities exist around whether cutting this service, or supporting that, are likely to impact on the adult care system, maybe data for that is out there, also?

EU Legal Affairs – Proposals on Civil Law Rules on Robotics

The EU Legal Affairs Committee voted last week to call for legislation that is likely to include a definition of “smart autonomous robots”, regulation of the same, and an ethical code of conduct for designers, producers and users.

Note that the full proposal to be put forward is not currently available – the draft is here: COMMITTEES-JURI-PR-2017-01-12-1095387EN), with amendments.

One thing I noticed in the original document was that it seemed limited in it’s definition of “smart autonomous robots” as physically instantiated things, and ignored more general “smart systems” – AI software systems, for example. An amendment in PE592.405 addressed this, broadening the scope to cover “AI” more generally: smart autonomous robots and their subcategories by taking into consideration the following characteristics of a smart robot and an autonomous system, that comprised a physical support or is connected to a software programme without being embedded in a physical support.

When the topic was originally announced, a big thing was made in the news about the calls for robots to be classed as “electronic persons”. A policy study – European Civil Law Rules In Robotics – that fed into the deliberations attempted to debunk this:

3.1. Incongruity of establishing robots as liable legal persons
The motion for a resolution proposes creating a new category of individual, specifically for robots: electronic persons. Paragraph 31(f) calls upon the European Commission to explore the legal consequences of “creating a specific legal status for robots, so that at least the most sophisticated autonomous robots could be established as having the status of electronic persons with specific rights and obligations, including that of making good any damage they may cause [to third parties], and applying electronic personality to cases where robots make smart autonomous decisions or otherwise interact with third parties”.

When considering civil law in robotics, we should disregard the idea of autonomous robots having a legal personality, for the idea is as unhelpful as it is inappropriate.

Traditionally, when assigning an entity legal personality, we seek to assimilate it to humankind. This is the case with animal rights, with advocates arguing that animals should be assigned a legal personality since some are conscious beings, capable of suffering, etc., and so of feelings which separate them from things. Yet the motion for a resolution does not tie the acceptance of the robot’s legal personality to any potential consciousness. Legal personality is therefore not linked to any regard for the robot’s inner being or feelings, avoiding the questionable assumption that the robot is a conscious being. Assigning robots such personality would, then, meet a simple operational objective arising from the need to make robots liable for their actions.

… the motion for a resolution would appear more inclined to fully erase the human presence. In viewing as an electronic person any “robots [which] make smart autonomous decisions or otherwise interact with third parties” (end of paragraph 31(f)), the motion seems to suggest that the robot itself would be liable and become a legal actor. This analysis finds support in paragraph S, which states that “the more autonomous robots are, the less they can be considered simple tools in the hands of other actors […] [and this] calls for new rules which focus on how a machine can be held — partly or entirely — responsible for its acts or omissions”. Once a robot is no longer controlled by another actor, it becomes the actor itself. Yet how can a mere machine, a carcass devoid of consciousness, feelings, thoughts or its own will, become an autonomous legal actor? How can we even conceive this reality as foreseeable within 10 to 15 years, i.e. within the time frame set in paragraph 25 of the motion for a resolution? From a scientific, legal and even ethical perspective, it is impossible today — and probably will remain so for a long time to come — for a robot to take part in legal life without a human being pulling its strings.

What is more, considering that the main purpose of assigning a robot legal personality would be to make it a liable actor in the event of damage, we should note that other systems would be far more effective at compensating victims; for example, an insurance scheme for autonomous robots, perhaps combined with a compensation fund (paragraphs 31(a) to (e)).

We also have to bear in mind that this status would unavoidably trigger unwanted legal consequences. Paragraph T of the motion states that creating a legal personality would mean that robots’ rights and duties had to be respected. How can we contemplate conferring rights and duties on a mere machine? How could a robot have duties, since this idea is closely linked with human morals? Which rights would we bestow upon a robot: the right to life (i.e. the right to non-destruction), the right to dignity, the right to equality with humankind, the right to retire, the right to receive remuneration (an option explicitly explored in paragraph 31(b) of the motion), etc.? …

In reality, advocates of the legal personality option have a fanciful vision of the robot, inspired by science-fiction novels and cinema. They view the robot — particularly if it is classified as smart and is humanoid — as a genuine thinking artificial creation, humanity’s alter ego. We believe it would be inappropriate and out-of-place not only to recognise the existence of an electronic person but to even create any such legal personality. Doing so risks not only assigning rights and obligations to what is just a tool, but also tearing down the boundaries between man and machine, blurring the lines between the living and the inert, the human and the inhuman. Moreover, creating a new type of person – an electronic person – sends a strong signal which could not only reignite the fear of artificial beings but also call into question Europe’s humanist foundations. Assigning person status to a nonliving, non-conscious entity would therefore be an error since, in the end, humankind would likely be demoted to the rank of a machine. Robots should serve humanity and should have no other role, except in the realms of science-fiction.

Yes but, no but… replace “robot” with “company” in the above. Companies have a legal basis, are defined as legal entities, albeit requiring (in the UK at least), at least one human officer to be responsible for them. To what extent could smart systems be similarly treated, or incorporated?

Or consider this – employees, who have human agency have responsibilities, and liabilities incurred by them may or may not become liabilities for their employer. To the extent that smart systems have “machine agency”, to what extent might they also be classed as employees, eg for insurance or liability purposes? (Related: Fragments – Should Algorithms, Deep Learning AI Models and/or Robots be Treated as Employees?.) To be an “employee”, do you also have to be a legal entity?

PS there’s also a clause in the amendments that expresses neo-Luddite tendencies with which I am not totally unsympathetic:

“whereas robotics and AI that can perform similar tasks to those performed by humans should be used mainly to support and boost the abilities of man, as opposed to trying to replace the human element completely”

I haven’t read all the amendments yet – there may be more nuggets in there…

Convention Based Used URLs Support Automation

I spent a chunk of last week at Curriculum Development Hackathon for a Data Carpentry workshop on Reproducible Research using Jupyter Notebooks (I’d like to thank the organisers for the travel support). One of the planned curriculum areas looked at data project organisation, another on automation. Poking around on an NHS data publication webpage for a particular statistical work area just now suggests an example of how the two inter-relate… and how creating inconsistent URLs or filenames makes automatically downloading similar files a bit of a faff when it could be so easy…

To begin with, the A&E Attendances and Emergency Admissions statistical work area has URL:


The crumb trail in the on-page navigation has the form:

Home -> Statistics -> Statistical Work Areas -> A&E Attendances and Emergency Admissions

which we might note jars somewhat with the slug ae-waiting-times-and-activity, and perhaps reflects some sort of historical legacy in how the data was treated previously…

Monthly data is collected on separate financial year related pages linked from that page:



The breadcrumb for these pages has the form:

Home -> Statistics -> Statistical Work Areas -> A&E Attendances and Emergency Admissions -> A&E Attendances and Emergency Admissions 20MM-NN

A few of observations about those financial year related page URLs. Firstly, the path is rooted on the parent page (a Good Thing), but the slug looks mangled together from what looks like a more reasonable parent path (statistical-work-areasae-waiting-times-and-activity; this looks as if it’s been collapsed from statistical-work-areas/ae-waiting-times-and-activity).

The next part of the URL specifies the path to the A & E Attendances and Emergency Admissions page for a particular year, with an appropriate slug for the name – ae-attendances-and-emergency-admissions- but differently formed elements for the years: 2016-17 compared to 2015-16-monthly-3.

(Note that the 2015-16 monthly listing is incomplete and starts in June 2015.)

If we look at URLs for some of the monthly 2016-17 Excel data file downloads, we see inconsistency in the filenames:


(Note that CSV data seems only to be available for the latest (November 2016) data set. I don’t know if this means that the CSV data link only appears for the current month, or data in the CSV format only started to be published in November 2016.)

For the previous year we get:


and so on.

Inspection of these URLs suggests:

  1. the data is being uploaded to and published from a WordPress site (wp-content/uploads);
  2. the path to the data directory for the annual collection is minted according to the month in which the first dataset of the year is uploaded (data takes a month or two to be uploaded, so presumably the April 2016 data was posted in June, 2016 (2016/06); the 2015 data started late – the first month (June 2015) presumably being uploaded in August of that year (2015/08);
  3. the month slug for the data file starts off fine, being of the form MONTH-YEAR-AE-by-provider-, but then breaks things by having some sort of code value that perhaps uniquely identifies the version of the file;
  4. the month slug may be further broken by the addition of a revision element (eg -Revised-11082016).

If the URLs all had a common pattern, it would be easy enough to automate their generation from a slug pattern and month/year combination, and then automatically download them. (I haven’t yet explored inside each spreadsheet to see what inconsistency errors/horrors make it non-trivial to try to combine the monthly data into a single historical data set…)

As it is, to automate the download of the files requires scraping the web pages for the links, or manually retrieving them. (At least the link text on the pages seems to be reasonably consistent!)