Chatting With ONS Data Via a Simple Slack Bot

A recent post on the ONS Digital blog – Dueling with datasets – describes some of the design decisions taken when putting together the new Office for National Statistics website (such as having a single page for a particular measure that would provide the current figures at the top as well as historical figures further down the page) and some of the challenges still facing the team (such as the language and titling used to describe the statistics).

The emphasis is still very much on publishing the data via a website, however, which promotes two particular sorts of interaction style: browse and search. Via Laura Dewis (Deputy Director, Digital Publishing at Office for National Statistics, and ex- of the OpenLearn parish), I got a peek at some of the popular search terms used on the pre-updated website, which suggest (to me) a mix of vernacular keyword search terms as well as official terms (for example, rpi, baby names, cpi, gdp, retail price index, population, Labour Market Statistics unemployment, inflation, labour force survey).

Over the last couple of years, regular readers will have noticed that I’ve been dabbling with some simple data2text conversions, as well as dipping my toes into some simple custom slackbots (that is, custom slack robots…) capable of responding to simple parameterised queries with texts automatically generated from online data sources (for example, querying the Nomis JSA figures as part of a Slackbot Data Wire, Initial Sketch or my First Steps in a Conversational Slackbot interface to CQC Inspection Data ).

I’m still fumbling around how best to try to put these bots together. On the one hand is trying to work out what sorts of thing we might want to ask of the data, as well as how we might actually ask for it in natural language terms. On the other, is generating queries over the data, and figuring out how to provide the response (creating a canned text around the results from a data query).

But what if there was already a ready source of text interpreting particular datasets that could be used as the response part of a conversational data agent? Then all we’d have to focus on would be parsing queries and matching them to the texts?

A couple of weeks ago, when the new ONS website came out of beta, the human facing web pages were complemented with a data view in the form of JSON feeds that mirrored the HTML text (I don’t know if the HTML is actually generated from the JSON feeds?), as described in More Observations on the ONS JSON Feeds – Returning Bulletin Text as Data. So here we have a ready source of data interpreting text that we may be able to use to provide a backend to a conversational UI to the ONS content. (Whether or not the text is human generated or machine generated is irrelevant – though it does also provide a useful model for developing and testing my own data to text routines!)

So let’s see… it being to wet to go and dig the vegetable patch yesterday, I thought I’d have a quick play trying to put together some simple response rules, in part building on some of the ONS JSON parsing code I started putting together following the ONS website refresh.

Here’s a snapshot of where I’m at…

Firstly, asking for a summary of some popular recent figures:

dtest___OUseful_Slack_1

The latest figures are assumed for some common keyword driven queries. We can also ask for a chart:

dtest___OUseful_Slack_2

The ONS publish different sorts of product that can be filtered against:

rate_-_Search_-_Office_for_National_Statistics

So for example, we can run a search to find what bulletins are available on a particular topic:

dtest___OUseful_Slack_3

(For some reason, the markdown isn’t being interpreted as such?)

We can then go on to ask about a particular bulletin, and get the highlights from it:

dtest___OUseful_Slack_4

(I did wonder about numbering the items in the list, retaining the state of the previous response in the bot, and then allowing an interaction along the lines of “tell me more about item 3”?)

We can also ask about other publication types, but I haven’t checked the JSON yet to see whether it makes sense to handle the response from those slightly differently:

dtest___OUseful_Slack_5

At the moment, it’s all a bit Wizard of Oz, but it’s amazing how fluid you can be in writing queries that are matched by some very simple regular expressions:

dtest___OUseful_Slack_woz

So not bad for an hour or two’s play… Next steps would require getting a better idea about what sorts of conversation folk might want to have with the data, and what they actually expect to see in return. For example, it would be possible to mix in links to datafiles, or perhaps even upload datafiles to the slack channel?

PS Hmm, thinks.. what would a slack interface to a Jupyter server be like…?

More Observations on the ONS JSON Feeds – Returning Bulletin Text as Data

Whilst starting to sketch out some python functions for grabbing the JSON data feeds from the new ONS website, I also started wondering how I might be able to make use of them in a simple slackbot that could provide a crude conversational interface to some of the ONS stats.

(To this end, it would also be handy to see some ONS search logs to see what sort of things folk search – and how they phrase their searches…)

One of the ways of using the data is as the basis for some simple data2text scripts, that can report the outcomes of some simple canned analyses of the data (comparing the latest figures with those from the previous month, or a year ago, for example). But the ONS also produce commentary on various statistics for via their statistical bulletins – and it seems that these, too, are available in JSON form simply by adding /data to the end of the IRL path as before:

UK_Labour_Market_-_Office_for_National_Statistics

One thing to note is that whist the HTML view of bulletins can include a name element to focus the page on a particular element:

http://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/bulletins/uklabourmarket/february2016/#comparison-between-unemployment-and-the-claimant-count

the name attribute switch doesn’t work to filter the JSON output to that element (though it would be easy enough to script a JSON handler to return that focus) so there’s no point adding it to the JSON feed URL:

http://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/bulletins/uklabourmarket/february2016/data

One other thing to note about the JSON feed is that it contains cross-linked elements for items such as charts and tables. If you look closely at the above screenshot, you’ll see it contains a reference to an ons-table.

...
sections: [
...
{
title: "Summary of latest labour market statistics",
markdown: "Table A shows the latest estimates, for October to December 2015, for employment, unemployment and economic inactivity. It shows how these estimates compare with the previous quarter (July to September 2015) and the previous year (October to December 2014). Comparing October to December 2015 with July to September 2015 provides the most robust short-term comparison. Making comparisons with earlier data at Section (ii) has more information. <ons-table path="cea716cc" /> Figure A shows a more detailed breakdown of the labour market for October to December 2015. <ons-image path="718d6bbc" />"
},
...
]
...

This resource is then described in detail elsewhere in the data feed linked by the same ID value:

www_ons_gov_uk_employmentandlabourmarket_peopleinwork_employmentandemployeetypes_bulletins_uklabourmarket_february2016_data_comparison-between-unemployment-and-the-claimant-count

...
tables: [
{
title: "Table A: Summary of UK labour market statistics for October to December 2015, seasonally adjusted",
filename: "cea716cc",
uri: "/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/bulletins/uklabourmarket/february2016/cea716cc"
}
],
...

Images are identified via the ons-image tag, charts via the ons-chart tag, and so on.

So now I’m thinking – maybe this is the place to start thinking about a simple conversational UI? Something that can handle simple references into different parts of a bulletin, and return the ONS text as the response?