One of the things I keep failing to spend time looking at and playing with is the generation of text based reports from tabular datasets (“data2txt”, “data to text”, “textualisation”, “natural language generation (NLG)” etc).
One of my earlier fumblings was to look at generating “press releases” around monthly Jobseeker’s Allowance figures. One of the reasons that stalled was the amount of time it took me to trying to find my way around the nomis site and API, trying to piece together the URLs that would let me pull down the data in a meaningful way and help me properly understand what the data actually referred to.
So over the weekend, I started to put together a wrapper for the nomis API that would let have a conversation with it so that I could start to find out what sorts of datasets it knows about and how I can run queries into those datasets (that is, what dimensions are available for each dataset that we can query on) as well as pulling back actual datasets from it.
To make the data easier to work with once I have pulled it down, I put it into a pandas dataframe so that I can work with it in that context.
(With Open Knowledge, I’m running a series of Code Clubs in “Wrangling data with Python” for the Cabinet Office at the moment, based around pandas and IPython notebooks. If I can get the wrapper working reliably enough, it could be interesting to see what they make of it…)
This is a flavour of the sorts of thing I’ve been reaching for with it:
To make life easier, you can pass in dimension parameter values using either the dimension parameter codes or their actual values; because the nomis API requires the codes, legitimate values are automatically converted. (Note to self – add further checks to discard illegitimate values, where detected…)
Any comments, feedback, issues if you try it etc, please let me know via the comments to this post (for now…!).
PS next up – revisit the ONS API following this first, aborted attempt.