I didn’t get to tinker with any of WRC rally review code this weekend (Dawners gig, so the following crew got back together…), but I’ve been pondering where to try to take the automated text generation next.
At the moment, commentary is generated from simple rules using the durable-rules
Python package. The rules are triggered in part from state recorded in a set of truth tables:
These aren’t very efficient, but they are a handy lookup and it makes it relatively easy to write rules…
At the moment, I generate sentences that get added to a list, then all the generated sentences are displayed, which is a really naive solution.
What I wonder is: could I generate simple true phrases and then pass them to a thing that would write a paragraph for me derived from those true statements. What would happen if generate lots of true statements and then run a summariser over them all to generate a shorter summary statement. Will truth be preserved in the statements? Can I have an “importance” or “interestingness” value associated with each statement that identifies which sentences should more likely appear in the summary?
I also wonder about some “AI” approaches, eg generating texts from data tables (for example, wenhuchen/LogicNLG
). I suspect that one possible issue with using texts generated using pretrained text models is the risk of non-truthy statements being generated from a table. This makes me wonder whether a two part architecture makes sense, where generated sentences are also parsed and then checked back against a truthtable (essentially, an automated fact checking step).
So, that makes three possible approaches to explore:
- generate true statements and then either find a way to make (true) paragraphs from them;
- generates loads of true statements and then summarise them (and retain truth);
- use a neural model to generate who knows what from the data table, and then try to parse the generated sentences and run them though a fact checker, only then letting the true sentences through.
Alternatively, I continue with my crappy rules, and try to learn how to compound them properly, so one rule can write state, instead of or as well as generating text, that other rules can pull on. (I should probably figure out to do this anyway…)