To Do – Textualisation With Tracery and Database Reporting 2.0

More of a just fragment of things to do, rather than a full post on anything I’ve done, because the blog thing is just increasingly hard to find the time or motivation for now; work sucks time and energy and makes everything just whatever…

Anyway – years ago a post on data textualisation. Nothing’s changed, hype around robot journalism has waned and the offerings there are are still pretty much just your plain old report/document generation.

Note a lot more than homebrew stuff like this, still:

One thing I have noticed is more official Excel spreadsheets starting to include dynmic reporting in them. For example, the NHS Digital Hospital Accident and Emergency Activity, 2016-17 data has an output sheet that will dynamicall create a report from the other data sheets based on a used selection.

The report uses formulas to generate output text:

and tables and charts are also generated as part of the report:

A couple of times, Leigh Dodds has linked out to in the data2txt / text generation context, but I’ve yet to play with it (tutorial). There’s a python port at aparrish/pytracery which is probably the one I’ll use.

There’s a notebook that looks like it could provide a handy crib to using pytracery with CSV data. I’m not sure how well it copes with turning numbers into words, but it might be interesting to try to weave in support from something like if it doesn’t.

I had a play, when I should have been in the garden… :-( Fork here with modifiers from and a demo of using it to generate sentences from rows in a pandas dataframe.

One of the things I’ve started trying to do is package up simple tools to grab structured data from webpublished CSVs and Excel docs in simple SQLite3 databases that can be published using datasette (ouseful-datasupply).

Also on the to do list in this regard is to look at some simple demos for creating datasette templates (perhaps even cribbed from report generating formulas in spreadsheets as a shortcut) that render individual rows (or joined rows) as text reports, such as paragraphs of text reporting on winter sitrep stats for a specified hospital. From one template, we’d auto generate reports for every hospital: database reporting, literally.

Or maybe database reporting, 2.0

(Hmm… datasette reporting…? The practice of generating templated news wire reports using datasette?)

Why 2.0? Because writing database reports has been going on for ever, but getting folk to think about it as a basic journalistic reporting skill that represents a practical example of the sort of thing of everyday task that might become more widespread as a result of “everyone should learn to programme” initiatives.

PS code example using my enhanced version of pytracery:

import pandas as pd

df=pd.DataFrame({'name':['Jo','Sam'], 'pos':[1,2]})

	name	pos
0	Jo	1
1	Sam	2

rules = {'origin':"#name# was placed #posord.number_to_words#.",

def row_mapper(row, rules):
    for k in row:
        rules[k] = str(row[k])

    grammar = tracery.Grammar(rules)
    return grammar.flatten("#origin#")
df['report']=df.apply(lambda row: row_mapper(row, rules), axis=1)

	name	pos	report
0	Jo	1	Jo was placed first.
1	Sam	2	Sam was placed second

PPS another example of local data report generation in the wild: Local area SEND report .

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s