Writing a good (effective?) assessment question can be hard — it needs to be unambiguous, it needs to be able to assess something of what a student might be expected to have learned by studying a particular course, it needs to not give the answer but does need to be structured enough that it prompts (prompts…) an answer that is markable according to a marking guide (particularly in cases where you have large numbers of markers who are ideally marking in a consistent way), and so on. In many cases, there may be different components that are credit bearing. For example, in a numerical question, the “final answer” may get a mark, but showing the working may also gain marks; in an essay question, the central argument may gain credit, but you might also get negative marks for poor spelling, or poor grammar.
It’s not surprising, then, that once you have a question or question format that works well, it often makes sense to recycle it. Depending on the question structure, there are various ways in which you can mint new questions from old ones. In numerical questions, you might change some of the numbers, but the working remains largely the same (as a corollary, you might consider “personalised” questions where each student is given a variant of the same question that tests them equally (typically, requires the same working), but generates a different “final answer”. For example, and trivially, what is the sum of squares for: 2 and 7, 6 and 3, 4 and 5, etc etc. Making this non-commutative, eg “the difference of squares”, provides even more final answer variants for a limited range of variable numbers, but requires more careful phrasing of the question, (find the difference of squares, x**2-y**2 for x= and y= etc. But then, you are much of the way there to giving the working part of the answer. ). An essay style question that requires students to includes “examples from the last twelve months” is evergreen in terms of requiring contemporary matters to be included. If I were a scholar, I’d probably try to include a load of references here…
One of the reasons for having to create new questions, of course, is to minimise the chances that students can recycle answers. But now, it seems, there is also the additional problem of using generative AI tools to answer assessment questions. A naive take on this is to consider an assessment to be “genAI safe” if it somehow defends against students using a simple zero shot “cut and paste” strategy to paste a question in, get an answer out, and use that as their assessment answer. But that’s a really simplistic way of using genAI. Anyone who has spent any time at all in a genAI conversation knows that there a few simple tricks you can adopt to improve a prompt. For example, OpenAI guidance suggests the following
- Include details in your query to get more relevant answers
- Ask the model to adopt a persona
- Use delimiters to clearly indicate distinct parts of the input
- Specify the steps required to complete a task
- Provide examples
- Specify the desired length of the output
Challenging or critiquing the response and forcing the genAI to review or modify its answer as part of an iterative, “conversational” approach can also dramatically improve an answer. Compare this to an open book assessment, where students are expected to work on their own solutions but may also be encouraged to form study groups to discuss ways in which a particular question might be addressed, without actually sharing their own answers.
Cutting and pasting in a question, then copying the answer out might be classed as “cheating”, but if a student works with the genAI model to iterate on the answer, under the direction of the student, is that still cheating? If the student uses their understanding of the question to add guiding context in the original prompt (a description of how to approach the answering of the question, as the student understands it) is that cheating, compared to copying and pasting generic guidance about how to approach an assessment question from a study skills site?
From one of my must read blogs, commentator on matters publishing Ian Mulvany posted some musings over the weekend on the matter of data to paper, and some reflections on LLMs, noting that:
What strikes me right now about both of these projects is that there is still a very high level of effort required to get something that begins to be marginally useful. Too much effort to be radically disruptive, but certainly a level of disruption is available that was not available before.
Which is to say: power tools for power users who put the work in when using them.
But perhaps even more interesting were a couple of example prompts that Ian linked to. For example, this prompt to review a legal qeustion in the context of the draft AI act. The first part of the prompt reads very much like the framing of an assessment question. The prompt then provides a couple of few-shot examples of the form the answer might take, akin to a sample answer in a generic marking guide for a question type where a student is expected to consider a legal question in the context of a particular regulation. The second example Ian gives also embeds prompt elements that are very assessment question like.
So I wonder:
- what are the similarities and differences between a good assessment question and a good prompt, e.g. in leading the co-respondent (the student in the case of an assessment question, a genAI/LLM in the case of a prompt) to provide an answer that is “correct” and of a form the presenter (of the question or prompt) expects?
I also wonder:
- to what extent should a good prompt include “marking guide” constructs such as “a good answer might be expected to take the following form…”?
- if a student adds the “a good answer might be expected to take the following form…” element to a prompt as additional context around a pasted in question, has the student committed the same sort of academic offence as if they just pasted in the question, then copied the generated answer?
- if a student iterates on a generated answer by prompting the genAI to address concerns raised by the student about the answer generated to date, has that student committed the same sort of academic offence as a student using a one-shot, single step prompt approach to using LLMs to generate answers?