A recent post on the ONS Digital blog – Dueling with datasets – describes some of the design decisions taken when putting together the new Office for National Statistics website (such as having a single page for a particular measure that would provide the current figures at the top as well as historical figures further down the page) and some of the challenges still facing the team (such as the language and titling used to describe the statistics).
The emphasis is still very much on publishing the data via a website, however, which promotes two particular sorts of interaction style: browse and search. Via Laura Dewis (Deputy Director, Digital Publishing at Office for National Statistics, and ex- of the OpenLearn parish), I got a peek at some of the popular search terms used on the pre-updated website, which suggest (to me) a mix of vernacular keyword search terms as well as official terms (for example, rpi, baby names, cpi, gdp, retail price index, population, Labour Market Statistics unemployment, inflation, labour force survey).
Over the last couple of years, regular readers will have noticed that I’ve been dabbling with some simple data2text conversions, as well as dipping my toes into some simple custom slackbots (that is, custom slack robots…) capable of responding to simple parameterised queries with texts automatically generated from online data sources (for example, querying the Nomis JSA figures as part of a Slackbot Data Wire, Initial Sketch or my First Steps in a Conversational Slackbot interface to CQC Inspection Data ).
I’m still fumbling around how best to try to put these bots together. On the one hand is trying to work out what sorts of thing we might want to ask of the data, as well as how we might actually ask for it in natural language terms. On the other, is generating queries over the data, and figuring out how to provide the response (creating a canned text around the results from a data query).
But what if there was already a ready source of text interpreting particular datasets that could be used as the response part of a conversational data agent? Then all we’d have to focus on would be parsing queries and matching them to the texts?
A couple of weeks ago, when the new ONS website came out of beta, the human facing web pages were complemented with a data view in the form of JSON feeds that mirrored the HTML text (I don’t know if the HTML is actually generated from the JSON feeds?), as described in More Observations on the ONS JSON Feeds – Returning Bulletin Text as Data. So here we have a ready source of data interpreting text that we may be able to use to provide a backend to a conversational UI to the ONS content. (Whether or not the text is human generated or machine generated is irrelevant – though it does also provide a useful model for developing and testing my own data to text routines!)
So let’s see… it being to wet to go and dig the vegetable patch yesterday, I thought I’d have a quick play trying to put together some simple response rules, in part building on some of the ONS JSON parsing code I started putting together following the ONS website refresh.
Here’s a snapshot of where I’m at…
Firstly, asking for a summary of some popular recent figures:
The latest figures are assumed for some common keyword driven queries. We can also ask for a chart:
The ONS publish different sorts of product that can be filtered against:
So for example, we can run a search to find what bulletins are available on a particular topic:
(For some reason, the markdown isn’t being interpreted as such?)
We can then go on to ask about a particular bulletin, and get the highlights from it:
(I did wonder about numbering the items in the list, retaining the state of the previous response in the bot, and then allowing an interaction along the lines of “tell me more about item 3”?)
We can also ask about other publication types, but I haven’t checked the JSON yet to see whether it makes sense to handle the response from those slightly differently:
At the moment, it’s all a bit Wizard of Oz, but it’s amazing how fluid you can be in writing queries that are matched by some very simple regular expressions:
So not bad for an hour or two’s play… Next steps would require getting a better idea about what sorts of conversation folk might want to have with the data, and what they actually expect to see in return. For example, it would be possible to mix in links to datafiles, or perhaps even upload datafiles to the slack channel?
PS Hmm, thinks.. what would a slack interface to a Jupyter server be like…?