An interesting post on the Paperspace blog – Solving Question-Answering on Tabular Data: A Comparison – briefly reviews several packages to support the generation of SQL queries from natural language questions that might be asked over a particular data table.
I’m interested in this sort of thing for a couple of reasons. Firstly, how might we use this sort of thing as a generative tool for creating queries automatically in various contexts? One rationale for doing this is in authoring support: if I am an educator writing some instructional material in a particular context, or a journalist writing a data-backed story, a simple query-from-natural-language-question support tool might give me a first draft of a query I can use to interrogate a database or data table. (Note that I see this sort of tool as a one people work with, not as a tool to replace people. They’ll draft a query for you, but you might then need to edit it to pull it into shape.)
More experimentally, I can imagine playing with this sort of thing in one of my rally data sketches. For example, lots of rally (motorsport) stage reviews are written in quite a formulaic way. One of the things I’ve been doodling with are simple data2text rules for generating stage summaries. But as one route to creating a summary, I could imagine using rules to generate a set of natural language questions, and then letting the text2sql tool generate SQL queries from those natual language questions in order to answer the questions. (Imagine a “machine interviewer” that can have “machine interviews with a dataset”…) This also suggests a possible fact checking tool, looking for statements or questions in a report, automatically generating queries from them, and then trying to automatically answer those queries.
I can also see this generative approach being useful in terms of supported open learning. An important part of the original OU distance education model of supported open learning (SOL) included the idea that print materials should play the role of a tutor at your side. One particular activity type, the SAQ (self-assessment question) included a question and a worked answer that you could refer to one you have attempted the question yourself. (Some questions also included one or more hint conditions that you could optionally refer to to help you get started on a question.) But the answer didn’t just provide a worked answer. It would often also include a discusssion of “likely to be provided, but wrong” answers. This was useful to students to who had got the question wrong in that way, but also described the hinterland of the question to students who had successfully completed the question (imagine a corollary of a classroom where a teacher asks a question and several incorrect answers are given, and explanation provided as to why they’re wrong, before the correct answer is furnished).
So for example, in Helping Learners Look at Their Code I suggested how we might make use of a tool that generates a flow chart from a simple Python code function (perhaps created by a student themselves) to help students check their own (understanding of) their own work. Another example might be worked solutions to maths derivations, eg as described in Show Your Working, Check Your Working, Check the Units and demonstrated by the
In a similar way, I could imagine students coming up with their own questions to ask of a dataset (this is one of the skills we try to develop, and one of the activity models we use, in our TM351 Data Management and Analysis module) and then checking their work using an automated or generative SOL tool such as a natural-language-to-SQL generator.
Secondly, I can see scope for this sort of generative approach being used for “cheating” in an educational context in the sense of providing a tool that students can use to do some work for them or that they can use as “unauthorised support”. The old calculator maths exams come to mind here. I well remember two sorts of maths exam: the calculator paper, and the not calculator paper. In the not calculator paper, a lazy question (to my mind) would be one that could be answered using the support of a calculator, but that the examiner wanted answering manually. (Rather than a better framed question, where the calculator’s role wouldn’t really help or be relevant in demonstrating what the assessor wanted demonstrating.)
I think one way of understanding “cheating” is to see a lot of formal assessment as a game, with certain arbitrary but enforced rules. Infringing the rules is cheating. For example, in golf, you can swing wildly and miss the ball and not count it as a shot, but that’s cheating: because the rules say it was a shot, even if the ball went nowhere. To my mind, some of the best forms of assessment are more about play than a game, providing invitations for a learner to demonstrate what they have learned, defined in such a way that there are no rules, or where the rules are easily stated, and where there may be a clear goal statement. The assessment then takes the form of rewarding how someone attempted to achieve that goal, as well as how satisfactorily the goal was achieved. In this approach, there is no real sense of cheating because it is the act of performance, as much as anyhting, that is being assessed.
In passing, I note that the UK Deparment for Education’s recent announcement that Essay mills [are] to be banned under plans to reform post-16 education. For those interested in how the law might work, David Kernohan tried to make sense of the draft legislation here. To help those not keeping up at the back, David starts by trying to pin down just how in the legislation these essay mills are defined:
A service of completing all or part of an assignment on behalf of a student where the assignment completed in that way could not reasonably be considered to have been completed personally by the student Skills and Post-16 Education Bill [HL], Amendments ; via How will new laws on essay mills work?
This then immediately begs the question of what personally means and you rapidly start losing the will to live rather than try to caveat things in a way that makes any useful sense at all. The draft clause suggests that “personally” allows includes “permitted assistance” (presumably as defined by the assessor) which takes us back to how the rules of the assessment game are defined.
Which takes me back to non-calculator papers, where certain forms of assistance that were allowed in the calculator paper (i.e. calculators) were not permitted in the non-calculator exam.
And makes me think again of generative tools as routes not just to automated or technologically enabled supported open learning, but also as routes to cribbing, or “unpermitted assistance”. If I set a question in an electronics assessment to analyse a circuit, would I be cheating if I used
lcapy and copied its analysis into my assessment script (for example, see this Electronics Worked Example).
With tools such as Github CoPilot on the verge of offering automated code generation from natural language text, and GPT-3 capable of generating natural language texts that follow on from a starting phrase or paragraph, I wonder if a natural evolution for essay mills is not that they are places where people will write your essay for you, but are machines that will (or perhaps they already are?). And would it then become illegal to sell “personal essay generator” applications which you download to your desktop and then use to write your essay for you?
I suspect that copyright law might also become a weapon to use in the arms race against students – that sounds wrong, doesn’t it? That’s where we’re at, or soon will be. And it’s not the fault of the students: it’s the fault of the sucky assessment strategy and sucky forms of assessment – as they upload the course materials they’ve been provided with for a bit of top-up transfer learning on top of GPT3 that will add in some subject matter specifics to the essay generating model. (Hmm, thinks… is this a new way of using text books? Buy them as transfer learning top up packs to tune your essay generator with some source specifics? Will text books and educational materials start including the equivalent of trap streets in maps, non-existent or incorrect elements that are hidden in plain view to trap the unwary copyright infringer or generative plagiarist?!
And finally, in trying to install one of the tools mentioned in the blog post on query generation around tabular data, I observe one of those other dirty little secrets about the AI-powered future that folk keep talking up: the amount of resource it takes…
Related to this, in hacking together a simple, and unofficial, local environment that students could use for some of the activities on a new machine learning module, I noted that one quite simple activity using the Cifar-10 dataset kept knocking over my test Jupyter environment. The environment was running inside a non-GPU enabled Docker container, resource capped at 2GB of memory (we assume students run low spec machines with 4GB availably overall) but that just wasn’t enough: I needed to up the memory available to Docker to 4GB.