Templated Text Summaries From Data Using ChatGPT

Back in the day, I used to tinker with various ways of generating text reports from datasets. Via my feeds yesterday, I noticed that the folks over at the ONS have been exploring automated report generation using recently released Census 2021 data. For example, the How your area has changed in 10 years: Census 2021 reports provide separately generated reports for each local authority area (ah, that takes me back…;-).

I haven’t played with that sort of thing over the last few years (the occasional dabble in the context of WRC rally reporting aside), but I wondered how easy it would be to hook into ChatGPT to generate some simple templated reports.

Interpret the follow as a tab separated CSV file:

Rank	County	Population	Region	Largest_settlement
1	Greater London	8,901,000	London	London
2	West Midlands	2,910,000	West Midlands	Birmingham
3	Greater Manchester	2,824,000	North West	Manchester
4	West Yorkshire	2,314,000	Yorkshire and the Humber	Leeds
5	Hampshire	1,852,000	South East	Southampton

When you read in the population values, remove the commas. For example, change 8,901,000 to 8901000

ChatGPT also provided some example code:

What would the CSV look like after removing the commas in the population column?

Me to ChatGPT

What is the population of the West Midlands according to that dataset?

Me to ChatGPT

Write a natural language generation template for that data that would produce a sentence that describes the population of a county and the largest settlement in it.

Me to ChatGPT

Apply the template to the row with Rank 2

Me to ChatGPT

Now for rank 4

Me to ChatGPT

On another attempt, it generated a slightly different template and did not suggest any code. I then asked it to apply the template to all the rows and generate a summary, which just a concatenation of the generated sentences. But it seems capable of doing a bit of reasoning too…

Start the summary paragraph with the phrase “The middle three counties in terms of population are”

Me to ChatGPT

And again…

Start the summary paragraph with the phrase “The middle three counties in terms of population are”. Then make a comment about the counties with the largest and smallest populations and identify their largest towns and donlt say any more.

Me to ChatGPT

So… can we trust it reason about small datasets and generate reports about them? How far does it scale in terms of the amount of data we can provide it?

Hmmm, I wonder… can it do joins across small datasets…?

PS After a great start on another data set, (see More Conversations With ChatGPT About Pasted in Data), ChatGPT then fell apart completely. Return to drawer labelled “random text generator” .

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...