Back in the day, I used to tinker with various ways of generating text reports from datasets. Via my feeds yesterday, I noticed that the folks over at the ONS have been exploring automated report generation using recently released Census 2021 data. For example, the How your area has changed in 10 years: Census 2021 reports provide separately generated reports for each local authority area (ah, that takes me back…;-).
I haven’t played with that sort of thing over the last few years (the occasional dabble in the context of WRC rally reporting aside), but I wondered how easy it would be to hook into ChatGPT to generate some simple templated reports.
Interpret the follow as a tab separated CSV file: Rank County Population Region Largest_settlement 1 Greater London 8,901,000 London London 2 West Midlands 2,910,000 West Midlands Birmingham 3 Greater Manchester 2,824,000 North West Manchester 4 West Yorkshire 2,314,000 Yorkshire and the Humber Leeds 5 Hampshire 1,852,000 South East Southampton When you read in the population values, remove the commas. For example, change 8,901,000 to 8901000
ChatGPT also provided some example code:
What would the CSV look like after removing the commas in the population column?Me to ChatGPT
What is the population of the West Midlands according to that dataset?Me to ChatGPT
Write a natural language generation template for that data that would produce a sentence that describes the population of a county and the largest settlement in it.Me to ChatGPT
Apply the template to the row with Rank 2Me to ChatGPT
Now for rank 4Me to ChatGPT
On another attempt, it generated a slightly different template and did not suggest any code. I then asked it to apply the template to all the rows and generate a summary, which just a concatenation of the generated sentences. But it seems capable of doing a bit of reasoning too…
Start the summary paragraph with the phrase “The middle three counties in terms of population are”Me to ChatGPT
Start the summary paragraph with the phrase “The middle three counties in terms of population are”. Then make a comment about the counties with the largest and smallest populations and identify their largest towns and donlt say any more.Me to ChatGPT
So… can we trust it reason about small datasets and generate reports about them? How far does it scale in terms of the amount of data we can provide it?
Hmmm, I wonder… can it do joins across small datasets…?
PS After a great start on another data set, (see More Conversations With ChatGPT About Pasted in Data), ChatGPT then fell apart completely. Return to drawer labelled “random text generator” .