Over the last couple of years, I’ve been dabbling with producing simple textual reports from datasets that can be returned in response to simple natural language style queries using chat interfaces such as Slack (for example, Sketching a Slack Slash Parliamentary Auto-Responder Using AWS Lambda Functions). The Amazon Echo, which launches in the UK at the end of September, provides another context for publishing natural languages style responses, in this case in the form of spoken responses to spoken requests.
In the same way that apps brought a large amount of feature level functionality to mobile phones, the Amazon Echo provides an opportunity for publishers to develop “skills” that can respond to particular voice command issued within hearing of the Echo. Amazon is hopeful that one class of commands – Smart Home Skills – will be used to bootstrap a smart home ecosystem that allows you to interact with smart-home devices though voice commands, such as commands to turn your lights on and off, or questions about the status of your home, (“is the garage door still open?”, for example). Another class of services relate to more general information based services, or even games, which can be developed using a second API environment, the Alexa Skills Kit. For a full range of available skills, see the Alexa Skills Store.
The Alexa Skills Kit has a similar sort of usability to other AWS services (i.e. it’s a bit rubbish…), but I thought I’d give it a go repurposing some old functions around the UK Parliament API, such as finding out which committees a particular MP sits on, or who are the members of a particular committee, as well as some new ones.
For example, I thought it might be amusing to try to implement a skill that could respond to questions like the following :
- what written statements were published last week?
- were there any written statements published last Tuesday?
using some of the “natural language” date-related Python functions I dabbled with yesterday.
One of the nice things about the Alexa Skills API is that it also supports conversational contexts. For example, an answer to one of the above questions (generated by my code) might take the form “There were 27 written statements published then”, but session state associated with that response can also be passed back as metadata to the Alexa service, and then returned from Alexa as session metadata attached to a follow-up question. The answer to the follow-up questions that can then draw on context generated earlier in the conversation. So for example, exchanges such as the following now become possible:
- Q: were there any written statements published last Tuesday?
- A: There were 27 written statements published then. Do you want to know them all?
- Q: No, just the ones from DCLG.
- A: Okay, there were three written statements issued by the Department for Communities and Local Government last week. One on …. by….; etc etc
So how can we build an Alexa Skill? I opted for implementing one using Python, with the answer engine running on my Reclaim Hosting webserver rather than as an AWS Lambda Function, which I think Amazon would prefer. (The AWS Lambda functions are essentially free, but it means you have to go through the pain of using another AWS service.) For an example of getting a Python application up and running on your own web host using cPanel, see here.
To make life simpler, I installed the Flask-ASK library (docs), which extends the Flask web application framework so that it plays nicely with the Alexa Skills API. (There’s a standalone tutorial that runs without the need for any web hosting described here: Flask-Ask: A New Python Framework for Rapid Alexa Skills Kit Development.)
The Flask-Ask library allows you to create two sorts of response types in your application that can respond to “intents” defined as part of the Alexa skill itself:
- a statement, which is a response Alexa that essentially closes a session;
- and a question, which keeps the session open and allows you to pop session state into the response so you can get it back as part of the next intent command issued from Alexa in that conversation.
The following bit of code shows how to decorate a function that will handle a particular Alexa Skill intent. The session variable can be used to pass session state back to Alexa that can be returned as part of the next intent. The question() wrapper packages up the response (txt) appropriately and keeps the conversational session alive.
@ask.intent("WrittenStatementIntent") def writtenStatement(period,myperiod): txt,tmp=statementGrabber(period=period,myperiod=myperiod) session.attributes['period'] = period session.attributes['myperiod'] = myperiod session.attributes['typ'] = 'WrittenStatementIntent' if tmp!='': txt='{} Do you want to hear them all?'.format(txt) else: txt='I don't know of any.' return question(txt)
We might then handle a response identified as to the affirmative (“yes, tell me them all”) using something like the following, which picks up the session state from the response, generates a paragraph describing all the written statements and returns it, suitably packaged, as a session ending statement().
@ask.intent("AllOfThemIntent") def sayThemAll(): period= session.attributes['period'] myperiod= session.attributes['myperiod'] typ=session.attributes['typ'] txt,tmp=statementGrabber(period=period,myperiod=myperiod) return statement(tmp)
So how do we define things on the Alexa side? (An early draft of my config can be found here.) To start with, we need to create a new skill and give it a name. A unique ID is created for the application that is passed in all service requests that we can use a key to decide whether or not to accept and respond to a request from the Alexa Skill server in our application logic. (For convenience, I defined an open service that can accept all requests. I’m not sure if Flask-Ask has a setting that allows the application to be tied to one or more Alexa Skill IDs?)
The second thing we need to do is actually define the interactions that the skill will engage in. This is composed of three parts:
- an Intent Schema, defined as a JSON object, that specifies a list of intents that the skill can handle. Each intent must be given a unique label (for example, “AllOfThemIntent”), and may be associated with one or more slots. Each slot has a name and a type. The name corresponds to the name of a variable that may be captured and passed (under that name) to the application handler; the type is either a predefined Amazon data type (for example, AMAZON.DATE, which captures date like things (including some simple natural language date terms, such as yesterday) or a custom data type;
- one or more user-defined custom data types, defined as a list of keywords that Alexa will try to match exactly (I think? I don’t think fuzzy match, partial match or regular expression matching is supported? If it is, please let me know how via the comments…)
- some sample utterances, keyed by intent and giving an example of a phrase that the skill should be able to handle; slots may be included in the example utterances, using the appropriate name as provided in the corresponding intent definition.
In the above case, I start to define a conversation where a WrittenStatementIntent is intended to identify written statements published on a particular day or over a particular period, and then a follow up AllOfThemIntent can be used to list the details of all of them or a LimitByDeptIntent can be used to limit the reporting to just statements from a specific department.
When you update the interaction model, it needs rebuilding which may take some time (wait for the spinny thing over the Interaction Model menu item to stop before you try to test anything).
The next part of the definition is used to specify where the application logic can be found. As mentioned, this may be defined as an AWS Lambda function, or you can host it yourself on an https server. In the latter case, for a Flask app, you need to provide a URL where the root of the application is served from.
If you are using your own host, you need to provide some information about the trust certificate. I published my application logic as an app on Reclaim Hosting, which appears to offer https out of the can (though I haven’t tried it for a live/published Alexa skill yet.)
With the config stuff all in place, you now just need to make sure some application logic is in place to handle it.
For reference, along with the stub of application logic shown above (which just needs a dummy statementGrabber() function that optionally accepts a couple of arguments and that returns a couple of text strings for testing purposes) I also topped my application with the following set-up components (note that as part of the WSGI handling that cPanel uses to run the app, I am creating an application variable that points to it).
import logging from random import randint from flask import Flask, render_template from flask_ask import Ask, statement, question, session app = Flask(__name__) application=app ask = Ask(app, "/")
At the end of the application code, we can fire it up…
if __name__ == '__main__': app.run(debug=True)
Get the app running on the server, and now we can test it from the Alexa Skills environment. Unlike deployed skills accessed via the echo, we don’t need to “summon” the app for testing purposes – we can just enter the utterance directly. The JSON code passed to the server is displayed as the Service Request and the Service Response from the application server is also displayed.
The test panel can also handle conversations established by using Flask-Ask question() wrappers, as shown below:
In this case, we filter down on the written statements for last Thursday to just report on the ones issued by the Department for Culture, Media and Sport.
It’s worth noting that Alexa seems to have a limit on the number of characters allowed when generating a voice output (8000 characters). For large responses, this suggests that adding some sort of sensible paging handler to the application logic could make sense if you need to return a large response; for example, something that chunks up up the response, tells it you piece by piece, and prompts you between each chunk to check you want to hear the next part.
With testing done, and a working app up and running, all that remains is to go through the legal fluff reuiqred to submit the app for publishing (which I haven’t done; a note says you can’t edit the app whilst it’s undergoing approval, but I;m not sure if you can then go back to editing it once it is published?)
A couple of things I learned along the way: firstly, when defining slots, it can be useful to have a controlled vocabulary to hand. For Parliament, things like the Members’ API Reference Data Service can be handy, eg for generating a list of MP names or committee names (in another post I’ll give some more examples about some of the queries I can run). Secondly, when thinking about conversation design, you need to think about the various bits of state than can be associated with a conversation. For example, when making a query about an MP, it makes sense to retain the name (or an identifier for) the MP as part of the session state so that you can refer to that later. If a conversation went “who is the MP for the Isle of Wight?”, “what committees are they on?”, “who else is on those committees?” , it would make sense to capture the list of committees as state somehow when responding to the second question.
One approach I took to managing state within the application was to cache calls to URLs requested in forming the response to one question. If I preserved enough session state to allow me to pull that cached data, I could reanalyse it without having to re-request it from the original URL when putting together a response to a follow up question.
Something it would be nice to have is a list of synonyms for terms in the slots definition, and maybe even a crude lookup that could be used as part of an OpenRefine style reconciliation service to try to partially match slot terms. (I’m not sure how well the model building does this anyway, eg if you put near misses in the slot definitions; or whether it just does exact matching?)
Another takeaway is that it probably makes sense to try to design the code for generating text from data or APIs so that it can be used in a variety of contexts – Slack, Alexa/Echo, email, press release generation, etc, – without much, if any, retooling. Ideally, it would make sense to define a set of test generation functions or API calls that could in turn be called via use-case application wrappers (eg one for Slack, one for Alexa, etc). Issues arise here when it comes to conversation management. Alexa manages conversations via session state, for example. But maybe api.ai can help here, by acting as application independent conversational middleware? That’ll be the next app I need to play with…
PS If you would like to see further posts here exploring Amazon Echo/Alexa skills, why not help me explore the context and gift me an Echo from my Patronage Wishlist?
PPS Example queries: ask parlibot Andrew Turner committees; ask parlibot research papers on animals; any written answers.
PPPS code example.