Can We Get ChatGPT to Act Like a Relational Database And Respond to SQL Queries on Provided Datasets and pandas dataframes?

Given the previous couple of posts (Templated Text Summaries From Data Using ChatGPT and More Conversations With ChatGPT About Pasted in Data), can we try to persuade ChatGPT to model a dataset a bit more formally in an attempt to help it reason a bit more reliably about a dataset?

Treat the following as a tab separated dataset. Using just the first, third and fourth columns, treat the data as if it were a relational SQL database table called "racerresults" with columns "Race", "Driver" and "Team", and the "Race" column as a primary key column. Display a SQL statement that could create the corresponding table and populate it with the data.

Bahrain	20 Mar 2022	Charles Leclerc	FERRARI	57	1:37:33.584
Saudi Arabia	27 Mar 2022	Max Verstappen	RED BULL RACING RBPT	50	1:24:19.293
Australia	10 Apr 2022	Charles Leclerc	FERRARI	58	1:27:46.548
Emilia Romagna	24 Apr 2022	Max Verstappen	RED BULL RACING RBPT	63	1:32:07.986
Miami	08 May 2022	Max Verstappen	RED BULL RACING RBPT	57	1:34:24.258
Spain	22 May 2022	Max Verstappen	RED BULL RACING RBPT	66	1:37:20.475
Monaco	29 May 2022	Sergio Perez	RED BULL RACING RBPT	64	1:56:30.265
Azerbaijan	12 Jun 2022	Max Verstappen	RED BULL RACING RBPT	51	1:34:05.941
Canada	19 Jun 2022	Max Verstappen	RED BULL RACING RBPT	70	1:36:21.757
Great Britain	03 Jul 2022	Carlos Sainz	FERRARI	52	2:17:50.311
Austria	10 Jul 2022	Charles Leclerc	FERRARI	71	1:24:24.312
France	24 Jul 2022	Max Verstappen	RED BULL RACING RBPT	53	1:30:02.112
Hungary	31 Jul 2022	Max Verstappen	RED BULL RACING RBPT	70	1:39:35.912
Belgium	28 Aug 2022	Max Verstappen	RED BULL RACING RBPT	44	1:25:52.894
Netherlands	04 Sep 2022	Max Verstappen	RED BULL RACING RBPT	72	1:36:42.773
Italy	11 Sep 2022	Max Verstappen	RED BULL RACING RBPT	53	1:20:27.511
Singapore	02 Oct 2022	Sergio Perez	RED BULL RACING RBPT	59	2:02:20.238
Japan	09 Oct 2022	Max Verstappen	RED BULL RACING RBPT	28	3:01:44.004
United States	23 Oct 2022	Max Verstappen	RED BULL RACING RBPT	56	1:42:11.687
Mexico	30 Oct 2022	Max Verstappen	RED BULL RACING RBPT	71	1:38:36.729
Brazil	13 Nov 2022	George Russell	MERCEDES	71	1:38:34.044
Abu Dhabi	20 Nov 2022	Max Verstappen	RED BULL RACING RBPT	58	1:27:45.914

Which gives:

A simple GROUP query to count the wins by team seems a bit broken…

But we can ask a simple query that seems to work ok…

We can also treat the table as a pandas dataframe…

On another attempt I got a results dataframe but no code:

I then pretended the dataframe was arelational database table and created a view from it:

What happens if we run the query and generate an answer via a generated template?

Templating the response as part of the SQL uery was quite neat, I thought… But it did make me wonder…

That’s a bit roundabout, but it sort of works…

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: