Thinking About Things That Might Be Autogradeable or Useful for Automated Marking Support

One of the the ideas we keep floating but never progressing is how we might make use of nbgrader. My feeling is we could start to make use of it now on an optional, individual tutor-marker basis. The current workflow is such that students submit assessments centrally and work is then sent to assigned markers; markers mark the work and then return it centrally, whence it is dispatched back to students.

Whilst there has been a recent procurement exercise looking at replacing the central assignment handling system, I doubt that nbgrader even featured as a side note; although it can be used to release work to students, collect it from then, manage it’s allocation to markers, etc, I suspect the chance is vanishingly small of the institution tolerating more than one assignment handling system, and I very much doubt that nbgrader would be that system.

Despite that, individual working is still a possibility and it requires the smallest of tweaks. Our data course currently distributes continuous assignments as Jupyter notebooks, and students have been encouraged to return their work as completed notebooks, although they may also return notebooks converted to Word docs, for example. So if we just marked up the notebook with each test cell marked as a manually graded assignment, or manually graded task, markers could individually decide to use the nbgrader tools to support their marking and feedback.

(We could also use the nbgrader system to generated the released-to-student notebooks and make sure we have stripped the answers out of them…Erm…)

When it comes to automated grading, lots of the questions we ask are not ideally suited to autograding, although with a few tweaks we could make them testable.

The nbgrader docs provides some good advice on writing good test cases, including examples of using mocking to help test whether functions were called or not called, as well as grading charts / plots using plotchecker.

As someone who doesn’t write tests, I started to explore for myself examples of things we can test for autograding and auto-feedback . Note the auto-feedback reference there: one of the things that started to interest me was not the extent to which we could use automated tests to generate a mark per se, but how we could use tests to provide more general and informative forms of feedback.

True, a score is a form of feedback, but quite a blunt one, and may suffer from false positives or, more likely, false negatives. So could we instead explore how tests can be used to provide more constructive feedback; cf the use of linters in this respect (for example, Nudging Student Coders into Conforming with the PEP8 Python Style Guide Using Jupyter Notebooks, flake8 and pycodestyle_magic Linters). And rather than using autograders as a be-all and end-all, could we use them as feedback generators and as a support tool for markers, making mark suggestions rather than official scores.

Once you start thinking about an autograder as a marker support tool, rather than a marker in its own right, it reduces the need for the marker to be right… that can be left to the judgement of the human marker. All that we would require is that it is mostly useful/helpful, or at least, more helpful/useful than it is a hindrance.

Here’s another example of how we might genearte useful feedback, this time as part of a grader that is capable of assigning partial credit: generating partial credit.

As an example, I wrote up some notes on the crudest of marking support tools for marking free text answers against a specimen answer. I know very little about NLP (natural language processing) and even less about automated marking of free text answers, but I think I can see some utlity even with a crappy similarity matcher from an off-the-shelf NLP package (spacy).

PS in passing, I also noticed this tip for nbgrader autograding in a Docker container using envkernel, a tool that can wrap a docker container so you can launch it as a notebook kernel. (I haven’t managed to get this working yet; I didnlt spot a demo that “just works”, so I figure I need to actually read the docs, which I haven’t made time to do yet… So if you do have a baby steps example that does work, please share it via the comments… Or submit it as a PR to the official docs…)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: