Decision Support for Third Marking Significant Difference Double Marked Assessments

In the OU, project report assessed courses tend to see project reports  double marked – once by the student’s own tutor, and once by another marker from the same marking pool. Despite our best efforts at producing marking guides and running a marker co-ordination event with a few sample scripts, marks often differ. University regulations suggest that marks that differ by more than 15% or that straddle grade boundaries should be third marked, although these rules can be tweaked a bit.

With a shed load of third marking about to turn up tomorrow, and that needs to be turned round for next Tuesday, I thought I’d have a quick look at what information provided by the first two markers is available to support the third marking effort.

For the course I have to third mark, the mark recording tool we use – OSCAR (Online Score Capture for Assessment Records) – makes available the marks for each marker in a table that also identifies the marking categories. The mark scheme we have in this particular case has five unequally weighted categories, with 8 marks available in each. (Note that this means there are 64 marks in all, and a delta of 10 marks roughly equates to a 15% difference and an automatic sigdiff/third marking flag. If I am doing my sums right and have grade boundaries about right, it also means two markers make give the same grade/classification (pass1, pass2, etc) but still raise a sigdiff.)

To try to make it easier to see where significant differences were arising between two markers, I prototyped a simple spreadsheet based display that calculated the weighted marks and charted them in a couple of ways:

  • a dodged bar chart, by category, so that we could see which categories the markers differed in;
  • a stacked bar chart that shows the total score awarded by each marker.

The stacked bar chart is also overloaded with another bar that loosely identifies the grade boundaries. (The colours in this bar do not relate to the legend.) Ideally, I’d have used grade boundaries as vertical gridlines in the chart to make it clear which grade band a final mark fell into, but I’m not familiar with Excel charting and couldn’t see how to do that offhand. (Also, I guessed at where the grade boundaries are, so don’t read anything too much into the ones presented.)

(I also came across a gotcha in my version of Excel on a Mac… the charts don’t update when I paste the new data in. Instead I have to cut them and paste them back into the sheet, at which point they do update. WTF?!)

A couple of other things that should be quick to add to the prototype:

  • a statement of the grade awarded by each marker (pass 1, pass 2, fail), perhaps also qualified (strong pass 2 (at the top of the band) , bare pass 3 (at the bottom of the band), solid pass 4 (in the middle of the band)) for example;
  • a statement of the average mark and the grade that would result. (One of the heuristics for awarding marks from markers that differ by a small amount is to use the average.)

I should probably also add a slot for third marker marks to be displayed…

More elaborate would be some rules to generate a brief text report that identify which topics the markers differ significantly on, for example, by how many awarded marks, and what this translates to in terms of weighted marks (or even percentage marks).

One reason for doing this is to try to make life easier – a report may not need completely remarking if the markers just differ in one particular respect, for example (which may even be the result of an error when entering the original marks). Such a tool may also be useful at an award board for getting a quick, visually informative view of how markers awarded markers to a particular script.

But this sort of tool may also help us start to understand better why and how markers are marking differently, and what sorts of change we might need to make to the marking scheme or marking guidance. (See the differences in a particular category in a visual way often leaves you with a different feeling to seeing a table of the numerical marks).

It also provides an environment for tinkering with some automated feedback generators, powered by the marks.

Of course, I’d rather be developing these views in Jupyter notebooks/dashboards, or R, and if we had easy access to the data it wouldn’t be hard to roll a simple app together. But as a digital first, cloud organisation, we get to view each set of double marks, one HTML page at a time.

PS I don’t think a scraper would be too hard to write to pull down the marker returns for each student on a course, which are handily all linked to from a single page, and pop them into a single dataframe…. Hypothetically, here’s how we might be able to get in, for example, using the python MechanicalSoup package, which works with python3 (mechanize requires python2)…

!pip3 install MechanicalSoup
import mechanicalsoup

def getSession():
    browser = mechanicalsoup.StatefulBrowser()
    browser.open(LOGIN_URL)
    browser.select_form(FORM_ID) #in form: #loginForm
    browser["username"] = USERNAME
    browser["password"] = PASSWORD
    resp = browser.submit_selected()
    return browser

browser=getSession()

response=browser.open(INTRANET_URL)

Of course, this sort of thing probably goes against the Computing Code of Conduct… I’m also not sure if IT folk are paranoid enough to look for rapid bursts of machine generated requests and lock an account out if they spot it? But that’s not too hard to workaround – just put a random delay in between page requests when running the scrape (which is a nice behaviour anyway).