More Scripted Diagram Extensions For Jupyter Notebook, Sphinx and Jupyter Book

Following on from Previewing Sphinx and Jupyter Book Rendered Mermaid and Wavedrom Diagrams in VS Code, I note several more sphinx extensions for rendering diagrams from source script in appropriately tagged code fenced blocks:

  • blockdiag/sphinxcontrib-blockdiag: a rather dated, but still working, extension, that generates png images from source scripts. (The resolution of the text in the image is very poor. It would perhaps be useful to be able to specify outputting SVG?) See also this Jupyter notebook renderer extension: innovationOUtside/ipython_magic_blockdiag. I haven’t spotted a VS Code preview extension for blockdiag yet. Maybe this is something I should try to build for myself? Maybe a strike day activity for me when the strikes return…
  • sphinx-contrib/plantuml: I have’t really looked at PlantUML before, but it looks like it can generate a whole host of diagram types, including sequence diagrams, activity diagrams, state diagrams, deployment diagrams, timing diagrams, network diagrams, wireframes and more.
PlantUML Activity Diagram
PlantUML Deployment Diagram
PlantUML Timing Diagram
PlantUML Wireframe (1)
PlantUML Wireframe (2)

The jbn/IPlantUML IPython extension and the markdown-preview-enhanced VS Code extension will also preview PlantUML diagrams in Jupyter notebooks and VS Code respectively. For example, in a Jupyter notebook we can render a PlantUML sequence diagram via a block magicked code cell.

Simple Jupytext Github Action to Update Jupyter .ipynb Notebooks From Markdown

In passing, a simple Github Action that will look for updates to markdown files in a GitHub push or pull request and if it finds any, will run jupytext --sync over them to update any paired files found in markdown metadata (and/or via jupytext config settings?)

Such files might have been modified, for example, by an editor proof reading the markdown materials in a text editor.

If I read the docs right, the --use-source-timestamp will set the notebook timestamp to the same as the modified markdown file(s)?

The modified markdown files themselves are identified using the dorny/paths-filter action. Any updated .ipynb files are then auto-committed to the repo using the stefanzweifel/git-auto-commit-action action.

name: jupytext-changes

on:
  push

jobs:
  sync-jupytext:
    runs-on: ubuntu-latest
    steps:

    # Checkout
    - uses: actions/checkout@v2

    # Test for markdown
    - uses: dorny/paths-filter@v2
      id: filter
      with:
        # Enable listing of files matching each filter.
        # Paths to files will be available in `${FILTER_NAME}_files` output variable.
        # Paths will be escaped and space-delimited.
        # Output is usable as command-line argument list in Linux shell
        list-files: shell

        # In this example changed markdown will be spellchecked using aspell
        # If we specify we are only interested in added or modified files, deleted files are ignored
        filters: |
            notebooks:
                - added|modified: '**.md'
        # Should we also identify deleted md files
        # and then try to identify (and delete) .ipynb docs otherwise paired to them?
        # For example, remove .ipynb file on same path ($FILEPATH is a file with .md suffix)
        # rm ${FILEPATH%.md}.ipynb

    - name: Install Packages if changed files
      if: ${{ steps.filter.outputs.notebooks == 'true' }}
      run: |
        pip install jupytext

    - name: Synch changed files
      if: ${{ steps.filter.outputs.notebooks == 'true' }}
      run: |
        # If a command accepts a list of files,
        # we can pass them directly
        # This will only synch files if the md doc include jupytext metadata
        # and has one or more paired docs defined
        # The timestamp on the synched ipynb file will be set to the
        # same time as the changed markdown file
        jupytext --use-source-timestamp  --sync ${{ steps.filter.outputs.notebooks_files }}

    # Auto commit any updated notebook files
    - uses: stefanzweifel/git-auto-commit-action@v4
      with:
        # This would be more useful if the git hash were referenced?
        commit_message: Jupytext synch - modified, paired .md files

Note that the action does not execute the notebook code cells (adding --execute to the jupytext command would fix that, although steps would also need to be taken to ensure that an appropriate code execution environment is available): for the use case I’m looking at, the assumption is that edits to the markdown do not include making changes to code.

Supporting Playful Exploration of Data Clustering and Classification Using datadraw

One of the most powerful learning techniques I know that works for me is play, the freedom to explore an idea or concept or principle in an open-ended, personally directed way, trying things out, test them, making up “what if?” scenarios, and so on.

Playing takes time of course, and the way we construst courses means that we donlt give students time to play, preferring to overload them with lots of stuff read, presumably on the basis that stuff = value.

If I were to produce a 5 hour chunk of learning material that was little more three or four pages of text, defining various bits of playful activity, I suspect that questions would be asked on the basis that 5 hours of teaching should include lots more words… I also suspect that the majority of students would not know how to play consructively within the prescribed bounds for that length of time.

Whatever.

In passing, I note this rather neat Python package, drawdata, that plays nice with Jupyter notebooks:

Example of use drawdata.draw_scatter() mode

Select a group (a, b, or c), draw a buffered line, and it will be filled (ish) with random dots. Click the copy csv button to grab the data into the clipboard, and then you can retireve it from there into a pandas dataframe:

Retrieve data from clipboard into pandas dataframe

At the risk of complicating the UI, I wonder about adding a couple more controls, one to tune with width of buffered line (and also ensure that points are only generated inside the line envelope), another to set the density of the points.

Another tool allows you to generate randonly sampled points along a line:

I note this could be a limiting case of a zero-width line in a draw-data() widget with a controllable buffer size.

Could using such a widget in a learning activity provide an example of technology enhanced learning, I wonder?! (I still don’t know what that phrase is supposed to mean…)

For example, I can easily imagine creating a simple activity where students get to draw different distributions and then run their own simple classifiers over them. The playfulness aspect would come in when students starting wondering about how different datas groups might interact, or how linear classifiers might struggle with particular multigroup distributions.

As a related example of supporting such palyfulness, the tensorflow playground provides several different test distributions with different interesing properties:

Distributions in tensorflow playground

To run your own local version of tensflow playground via a jupyter-server-proxy, see innovationOUtside/nb_tensorflow_playground_serverproxy.

With datadraw, students could quite easily create their own test cases to test their own understanding of how a particular classifier works. To my mind, developing such an understanding is supported if we can also visualise the evolution of a classifier over time. For example, the following animation (taken from some material I developed for a first year module that never made it past the “optional content” stage) shows the result of training a simple classifier over a small dataset with four groups of points.

Evolution of a classifier

See also: How to Generate Test Datasets in Python with scikit-learn, a post on the Machine Learning Mastery blog, and Generating Fake Data – Quick Roundup, which summarises various other takes on generating synthetic data.

PS This also reminds me a little bit of Google Correlate (for example,
Google Correlate: What Search Terms Does Your Time Series Data Correlate With?), where you could draw a simple timeseries and then try to find search terms on Google Trends with the same timeseries behaviour. On a quick look, none of the original URLs I had for that seem to work anymore. I’m not sure if it’s still available via Google Trends, for example?

PPS Here’s another nice animation from Doug Blank demonstrating a PCA based classification: https://nbviewer.org/github/Calysto/conx-notebooks/blob/60106453bdb66a83da7c2741d7644b7f8ee94517/PCA.ipynb

30 Second Bookmarklet for Saving a Web Location to the Wayback Machine

In passing, I just referenced a web page in another post, the content of which I really don’t want to lose access to if the page disappears. A quick fix is to submit the page to the Internet Archive Wayback Machine, so that at least I know a copy of the page will be available there.

From the Internet Archive homepage, you can paste in a URL and the Archive will check to see if it has a copy of the page. In many cases, the page will have been grabbed multiple times over the years, which also means you can track a page’s evolution over time.

Also on the homeopage is a link that allows you to submit a URL to request that that page is also saved to the archive:

Here’s the actual save page:

When you save the page, a snapshot is grabbed:

Saving a page to the Wayback Machine

Checking the URL for that page, it looks like we can grab a snapshot by passing the URL https://web.archive.org/save/ followed by the URL of the page we want to save…

Hmmm… 30s bookmarklet time. Many years ago, I spent some of the happiest and most productive months (for me) doing an Arcadia Fellowship with the University Library in Cambridge, tinkering with toys and exploring that incredible place.

Diring my time there, I posted to the Arcadia Mashups Blog, which still exists as a web fossil. One of the posts there, The ‘Get Current URL’ Bookmarklet Pattern, is a blog post and single page web app in one, that lets you generate simple redirection bookmarklets:

Window location bookmarklet form, Arcadia Mashups Blog

If you haven’t come across bookmarklets before, you could think of them as automation web links that run a bit of Javascript to do something useful for you, either my modifying the current web page, or doing something with its web location / URL.

When you save a bookmarklet, you should really check that the bookmarklet javascript code isnlt doing anyhting naughty, or make sure you inly install bookmarklets from trusted locations.

In the above Archive-it example, the code grabs the current page location and passes it to https://web.archive.org/save/ . If you drag the bookmarklet to your browser toolbar, open a web page, and click the bookmarklet, the page is archived:

Oh, happy days…

So, a 30s hack and I have built myself a tool to quickly archive a web URL. (Writing this blog post took much longer than remembering that post existed and generating the bookmarklet.)

There are of course other tools for doing similar things, not least robustlinks.mementoweb.org, but it was as quick to create my own as to try to re-find that…

See also: Fragment – Virtues of a Programmer, With a Note On Web References and Broken URLs and Name (Date) Title, Available at: URL (Accessed: DATE): So What?

Previewing Sphinx and Jupyter Book Rendered Mermaid and Wavedrom Diagrams in VS Code

In VS Code as an Integrated, Extensible Authoring Environment for Rich Media Asset Creation, I linked to a rather magical VS Code extension (shd101wyy.markdown-preview-enhanced) that lets you preview diagrams rendered from various diagram scripts, such as documents defined using Mermaid markdown script or wavedrom.

The diagram script is incorporated in code fenced block qualified by the scripting language type, such as ```mermaid or ```wavedrom.

Pondering whether I this was also a route to live previews of documents rendered from the original markdown using Sphinx (the publishing engine used in Jupyter Book workflows, for example), I had a poke around for related extensions and found a couple of likely candidates, such as:

After installing the packages from PyPi, these extensions are enabled in a Jupyter Book workflow by adding the following to the _config.yml file:

sphinx:
  extra_extensions:
    - sphinxcontrib.mermaid
    - sphinxcontrib.wavedrom

Building a Sphinx generated book from a set of markdown files using Jupyter Book (e.g. by running jupyter-book build .) does not render the diagrams… Boo…

However, changing the code fence label to a MyST style label (as suggested here), does render the diagrams in the Sphinx generated Jupyter Book output, albeit at the cost of not now being able to preview the diagram directly in the VS Code editor.

It’s not so much of an overhead to flip between the two, and an automation step could probably be set up quite straightforwardly to convert between the forms as part of a publishing workflow, but I’ve raised an issue anyway suggesting it might be nice if the shd101wyy.markdown-preview-enhanced extension also supported the MyST flavoured syntax…

See also: A Simple Pattern for Embedding Third Party Javascript Generated Graphics in Jupyter Notebooks which shows a simple recipe for addiing js diagram generation support to classic Jupyter notebooks, at least, using simple magics. A simple trasnformation script should be able to map between the magic cells and an appropriately fenced code block that can render the diagram in a Sphinx/Jupyter Book workflow.

Another Automation Step: Spell-Checking Jupyter Notebooks

Another simple automation step, this time to try to add spell checking of notebooks.

I really need to find a robust and useful way of doing this. I’ve previously explored using pyspelling, and started tinkering with a thing to generate tidier reports form it, and have also found codespell to be both quick and effective.

Anywhere, here’s a hacky Github Action for spellchecking notebooks files in the last commit of a push onto a Github repo (see also this issue regarding getting changed files from all commits in the push that raised the action, which seems to have been addressed by this PR):

name: spelling-partial-test
on:
  push

jobs:
  changednb:
    runs-on: ubuntu-latest
    # Set job outputs to values from filter step
    outputs:
      notebooks: ${{ steps.filter.outputs.notebooks }}
    steps:
    # (For pull requests it's not necessary to checkout the code)
    - uses: actions/checkout@v2
      with:
        fetch-depth: 0
    - uses: dorny/paths-filter@v2
      id: filter
      with:
        filters: |
          notebooks:
            - '**.ipynb'

  nbval-partial-demo:
    needs: changednb
    if: ${{ needs.changednb.outputs.notebooks == 'true' }}
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@master
      with:
        fetch-depth: 0 # or 2?
        #ref: nbval-test-tags

    - id: changed-files
      uses: tj-actions/changed-files@v11.4
      with:
        since_last_remote_commit: 'true'
        separator: ','
        files: |
          .ipynb$

    - name: Install spelling packages
      run: |
        sudo apt-get update && sudo apt-get install -y aspell aspell-en
        python3 -m pip install --upgrade https://github.com/ouseful-PR/nbval/archive/table-test.zip
        python3 -m pip install --upgrade https://github.com/innovationOUtside/nb_spellchecker/archive/main.zip
        python3 -m pip install --upgrade https://github.com/ouseful-PR/pyspelling/archive/th-ipynb.zip
        python3 -m pip install --upgrade codespell

    - name: Codespell
      # Codespell is a really quick and effective spellchecker
      run: |
        touch codespell.txt
        IFS="," read -a added_modified_files <<< "${{ steps.changed-files.outputs.all_modified_files }}"
        # This only seems to find files from the last commit in the push?
        for added_modified_file in "${added_modified_files[@]}"; do
          codespell  "${added_modified_files[@]}" | tee -a codespell.txt
        done

    - name: pyspelling test of changed files
      # This runs over changed files one at a time, though we could add multiple -S in one call...
      run: |
        touch typos.txt
        touch .wordlist.txt
        IFS="," read -a added_modified_files <<< "${{ steps.changed-files.outputs.all_modified_files }}"
        # This only seems to find files from the last commit in the push?
        for added_modified_file in "${added_modified_files[@]}"; do
          pyspelling -c .ipyspell.yml -n Markdown -S "${added_modified_files[@]}" | tee -a typos.txt || continue
          pyspelling -c .ipyspell.yml -n Python -S "${added_modified_files[@]}" | tee -a typos.txt || continue
        done
        cat typos.txt
        touch summary_report.txt
        nb_spellchecker reporter -r summary_report.txt
        cat summary_report.txt
      shell: bash
      # We could let the action fail on errors
      continue-on-error: true

    - name: Upload partial typos
      # Create a downloadable bundle of zipped typo reports
      uses: actions/upload-artifact@v2
      with:
        name: typos
        path: |
         codespell.txt
         typos.txt
         summary_report.txt

Typos are displayed inline in the action run:

And a zipped file of spellcheck reports is also available for download:

More Automation Sketches – Creating Student Notebook Releases

Tinkering a bit more with Github Actions, I’ve hacked together some sort of workflow for testing notebooks in a set of specified directories and then clearing the notebook output cells, zipping the notebooks into a release zip file, and then making the release zip file via a github release page.

The test and release is action is triggered by making a release with a body that contains a list of comma separate directory paths identifying the directories we want in the release. For example:

The following action is triggered by a release creation event:

name: example-release
on:
  release:
    types:
      - created
  workflow_dispatch
        
jobs:
  release-demo:
    runs-on: ubuntu-latest
    container:
      image: ouvocl/vce-tm351-monolith
    env:
      RELEASE_DIRS: ${{ github.event.release.body }}
    steps:
    - uses: actions/checkout@master
    - name: Install nbval (TH edition) and workflow tools
      run: |
        python3 -m pip install --upgrade https://github.com//ouseful-PR/nbval/archive/table-test.zip
        python3 -m pip install https://github.com/innovationOUtside/nb_workflow_tools/archive/master.zip
    - name: Restart postgres
      run: |
        sudo service postgresql restart
    - name: Start mongo
      run: |
        sudo mongod --fork --logpath /dev/stdout --dbpath ${MONGO_DB_PATH}
    - name: Get directories
      run: |
        #IFS=$"\n" read -a file_paths <<< "${{ github.event.head_commit }}"
        IFS="," read -a file_paths <<< "${{ github.event.release.body }}"
        ls
        # Test all directories
        for file_path in "${file_paths[@]}"; do
          pwd
          py.test  --nbval "$file_path" || continue
        done
      shell: bash
      # For testing...
      continue-on-error: true
    - name: Create zipped files
      run: |
        IFS="," read -a file_paths <<< "${{ github.event.release.body }}"
        for file_path in "${file_paths[@]}"; do
          tm351zip -r clearOutput -a "$file_path" release.zip
        done
        echo "Release paths: $RELEASE_DIRS" > release-files.txt
        echo "\n\nRelease zip contents:\n" >> release-files.txt
        tm351zipview release.zip >> release-files.txt
      shell: bash
    - name: Create Release
      id: create_release
      uses: softprops/action-gh-release@v1
      # The commit must be tagged for a release to happen
      # Tags can be added via Github Desktop app
      # https://docs.github.com/en/desktop/contributing-and-collaborating-using-github-desktop/managing-commits/managing-tags#creating-a-tag
      with:
        tag_name: ${{ github.ref }}-files
        name: ${{ github.event.release.name }} files
        #body: "Release files/directories: ${RELEASE_DIRS}"
        body_path: release-files.txt
        files: |
          release.zip

It then runs the tests and then generates another release that includes the cleaned and zipped release files:

Ideally, we’d just add the zip file to the original release but I couldn’t spot a way to do that.

At the moment the action will publish the file release even if some notebook tests fail. A production action should fail if a test fails, or perhaps parse the release name and ignore the fails if the original release name contains a particular flag (for example, --force).

The idea of using the release form to create the release was to try to simplify the workflow and allow a release to be generated quite straightforwardly from a repository on the Github website.

Automated Testing of Educational Jupyter Notebook Distributions Using Github Actions

For years and years and years and years (since 2016) we’ve been updating both the notebooks, and the environment we ship to students on our data management and analysis course, on an annual basis.

The notebooks are substantially the same (we update maybe 20% of the material each presentation) and the environment updates are typically year on year updates to Python packages. There are some breaking changes, but these are generally flagged by deprecation warning messages that start to appear as part of pre-breaking change package updates a year before they become breaking.

The notebooks are saved to a private Github repo with cells pre-run. The distribution process then involves clearing the output cells and zipping the cleaned notebooks and any required data files into a distributed zip file. On my to do list is automating the release steps. More on that when I get round to it…

Checking that the notebooks all run correctly in the updated environment is done as a manual process. It doesn’t have to be, becuase nbval, which runs notebooks and tests cell outputs against reference outputs in a previously run version of the notebook, has been around for years. But for all the “the OU is a content factory with industrial production techniques” bluster, module teams are cottage industries and centralised production doesn’t come anywhere near our Jupyer notebook workflows. (FWIW, I don’t do the checking… If I did, I’d have got round to automating it properly years ago!;-)

In a recent fragmentary post Structural Testing of Jupyter Notebook Cell Outputs With nbval, I described some of the tweaks I’ve been making to nbval to reduce the false positive cell output test errors and add elements of structural testing so we donlt ignore those cell outputs completely.

Today, I wasted hours and hours not understanding why a simple Github Action automation script to run the tests wasn’t working (answer: I think I was running Linux commands in the wrong shell…. Plus, our file names have spaces in them (some may even have punctuation, but I haven’t hit associated errors with that sort of filename yet… A simple tweak to the delimiter I user to separate filenames (e.g. moving away from a comma separator to | might be a quick fix for that…)).

Anyway, I think I have an automation script that works to check all the notebooks in a repo on demand, and notebooks that are changed

On the to do list is coping with things like markdown files that load as notebooks using jupytext, as well as tweaking nbval to allow me to exclude certain notebooks.

So, here’s the “check changed notebooks” script, set up in this case to pull on the container we use for the databases course. Note that for some reason the database services don’t seem to autorun, so I need to manually start them. The action is triggered by a push and a job is then run that tests to see whether there are any .ipynb files in the commits and sets a flag to that effect. If there are notebook files in the push, a second job is run that grabs a list of changed notebook filenames. These files are then separately tested using nbval. (It would be easier if we could just pass the list of files we want to test. But the crappy filenames with spaces and punctuation in them that repeatedly cause ALL SORTS OF PROBLEMS IN ALL SORTS OF WAYS, not least for students, would probably cause issues here too…)

name: nbval-partial-test
on:
  push

jobs:
 changes:
    runs-on: ubuntu-latest
    # Set job outputs to values from filter step
    outputs:
      notebooks: ${{ steps.filter.outputs.notebooks }}
    steps:
    # (For pull requests it's not necessary to checkout the code)
    - uses: actions/checkout@v2
    - uses: dorny/paths-filter@v2
      id: filter
      with:
        filters: |
          notebooks:
            - '**.ipynb'


  nbval-partial-demo:
    needs: changes
    if: ${{ needs.changes.outputs.notebooks == 'true' }}
    runs-on: ubuntu-latest
    container:
      image: ouvocl/vce-tm351-monolith
    steps:
    - uses: actions/checkout@master
      with:
        fetch-depth: 0 # or 2?
#        ref: nbval-test-tags
    - id: changed-files
      uses: tj-actions/changed-files@v11.2
      with:
        separator: ','
        files: |
          .ipynb$
    - name: Install nbval (TH edition)
      run: |
        python3 -m pip install --upgrade https://github.com//ouseful-PR/nbval/archive/table-test.zip
        #python3 -m pip install --upgrade git+https://github.com/innovationOUtside/nb_workflow_tools.git
    - name: Restart postgres
      run: |
        sudo service postgresql restart
    - name: Start mongo
      run: |
        sudo mongod --fork --logpath /dev/stdout --dbpath ${MONGO_DB_PATH}
    - name: test changed files
      run: |
        # The read may be redundant...
        # I re-ran this script maybe 20 times tryin to get it to work...
        # Having discovered the shell: switch, we may
        # be able to simplify this back to just the IFS setting
        # and a for loop, without the need to set the array
        IFS="," read -a added_modified_files <<< "${{ steps.changed-files.outputs.all_modified_files }}"
        for added_modified_file in "${added_modified_files[@]}"; do
          py.test  --nbval "$added_modified_file" || continue
        done
      # The IFS commands require we're in a bash shell
      # By default, I think the container may drop users into sh
      shell: bash
      continue-on-error: true

At the moment, the output report exists in the Action report window:

The action will also pass even if there are errors detected: removing the continue-on-error: true line will ensure that if there is an error the action will fail.

I should probably also add an automated test to spell check all modified notebooks and at least publish a spelling report.

The other script will check all the notebooks in the repo based on a manual trigger:

name: nbval-test
on:
  workflow_dispatch:
    inputs:
      logLevel:
        description: 'Log level'     
        required: true
        default: 'warning'
      tags:
        description: 'Testing nbval' 

jobs:
  nbval-demo:
    runs-on: ubuntu-latest
    container:
      image: ouvocl/vce-tm351-monolith
    steps:
    - uses: actions/checkout@master
    - name: Install nbval (TH edition)
      run: |
        python3 -m pip install --upgrade https://github.com//ouseful-PR/nbval/archive/table-test.zip
    - name: Restart postgres
      run: |
        sudo service postgresql restart
    - name: Start mongo
      run: |
        sudo mongod --fork --logpath /dev/stdout --dbpath ${MONGO_DB_PATH}
    - name: Test notebooks in notebooks/ path
      run: |
        py.test --nbval py.test  --nbval ./notebooks/*
      continue-on-error: true

So… with these scripts, we should be able to:

  • test updated notebooks to check they are correctly committed into the repo;
  • manually test all notebooks in a repo, e.g. when we update the environment.

Still to do is some means of checking notebooks that we want to release. This probably needs doing as part of a release process that allows us to:

  • specify which notebook subdirectories are to form the release;
  • get those subdirectories and the reference pre-run notebooks they contain;
  • test those notebooks; note that we could also run additional tests at this point, such as a spell checker;
  • clear the output cells of those notebooks; we could also run other bits of automation here, such as checking that activity answers are collapsed, etc.
  • zip those cleared cell notebooks into a distribution zip file.

But for today, I am sick of Github f****g Actions.

Fragment: Generative Plagiarism and Technologically Enhanced Cheating?

An interesting post on the Paperspace blog – Solving Question-Answering on Tabular Data: A Comparison – briefly reviews several packages to support the generation of SQL queries from natural language questions that might be asked over a particular data table.

I’m interested in this sort of thing for a couple of reasons. Firstly, how might we use this sort of thing as a generative tool for creating queries automatically in various contexts? One rationale for doing this is in authoring support: if I am an educator writing some instructional material in a particular context, or a journalist writing a data-backed story, a simple query-from-natural-language-question support tool might give me a first draft of a query I can use to interrogate a database or data table. (Note that I see this sort of tool as a one people work with, not as a tool to replace people. They’ll draft a query for you, but you might then need to edit it to pull it into shape.)

More experimentally, I can imagine playing with this sort of thing in one of my rally data sketches. For example, lots of rally (motorsport) stage reviews are written in quite a formulaic way. One of the things I’ve been doodling with are simple data2text rules for generating stage summaries. But as one route to creating a summary, I could imagine using rules to generate a set of natural language questions, and then letting the text2sql tool generate SQL queries from those natual language questions in order to answer the questions. (Imagine a “machine interviewer” that can have “machine interviews with a dataset”…) This also suggests a possible fact checking tool, looking for statements or questions in a report, automatically generating queries from them, and then trying to automatically answer those queries.

I can also see this generative approach being useful in terms of supported open learning. An important part of the original OU distance education model of supported open learning (SOL) included the idea that print materials should play the role of a tutor at your side. One particular activity type, the SAQ (self-assessment question) included a question and a worked answer that you could refer to one you have attempted the question yourself. (Some questions also included one or more hint conditions that you could optionally refer to to help you get started on a question.) But the answer didn’t just provide a worked answer. It would often also include a discusssion of “likely to be provided, but wrong” answers. This was useful to students to who had got the question wrong in that way, but also described the hinterland of the question to students who had successfully completed the question (imagine a corollary of a classroom where a teacher asks a question and several incorrect answers are given, and explanation provided as to why they’re wrong, before the correct answer is furnished).

So for example, in Helping Learners Look at Their Code I suggested how we might make use of a tool that generates a flow chart from a simple Python code function (perhaps created by a student themselves) to help students check their own (understanding of) their own work. Another example might be worked solutions to maths derivations, eg as described in Show Your Working, Check Your Working, Check the Units and demonstrated by the handcalcs package.

In a similar way, I could imagine students coming up with their own questions to ask of a dataset (this is one of the skills we try to develop, and one of the activity models we use, in our TM351 Data Management and Analysis module) and then checking their work using an automated or generative SOL tool such as a natural-language-to-SQL generator.

Secondly, I can see scope for this sort of generative approach being used for “cheating” in an educational context in the sense of providing a tool that students can use to do some work for them or that they can use as “unauthorised support”. The old calculator maths exams come to mind here. I well remember two sorts of maths exam: the calculator paper, and the not calculator paper. In the not calculator paper, a lazy question (to my mind) would be one that could be answered using the support of a calculator, but that the examiner wanted answering manually. (Rather than a better framed question, where the calculator’s role wouldn’t really help or be relevant in demonstrating what the assessor wanted demonstrating.)

I think one way of understanding “cheating” is to see a lot of formal assessment as a game, with certain arbitrary but enforced rules. Infringing the rules is cheating. For example, in golf, you can swing wildly and miss the ball and not count it as a shot, but that’s cheating: because the rules say it was a shot, even if the ball went nowhere. To my mind, some of the best forms of assessment are more about play than a game, providing invitations for a learner to demonstrate what they have learned, defined in such a way that there are no rules, or where the rules are easily stated, and where there may be a clear goal statement. The assessment then takes the form of rewarding how someone attempted to achieve that goal, as well as how satisfactorily the goal was achieved. In this approach, there is no real sense of cheating because it is the act of performance, as much as anyhting, that is being assessed.

In passing, I note that the UK Deparment for Education’s recent announcement that Essay mills [are] to be banned under plans to reform post-16 education. For those interested in how the law might work, David Kernohan tried to make sense of the draft legislation here. To help those not keeping up at the back, David starts by trying to pin down just how in the legislation these essay mills are defined:

A service of completing all or part of an assignment on behalf of a student where the assignment completed in that way could not reasonably be considered to have been completed personally by the student

Skills and Post-16 Education Bill [HL], Amendments ; via How will new laws on essay mills work?

This then immediately begs the question of what personally means and you rapidly start losing the will to live rather than try to caveat things in a way that makes any useful sense at all. The draft clause suggests that “personally” allows includes “permitted assistance” (presumably as defined by the assessor) which takes us back to how the rules of the assessment game are defined.

Which takes me back to non-calculator papers, where certain forms of assistance that were allowed in the calculator paper (i.e. calculators) were not permitted in the non-calculator exam.

And makes me think again of generative tools as routes not just to automated or technologically enabled supported open learning, but also as routes to cribbing, or “unpermitted assistance”. If I set a question in an electronics assessment to analyse a circuit, would I be cheating if I used lcapy and copied its analysis into my assessment script (for example, see this Electronics Worked Example).

With tools such as Github CoPilot on the verge of offering automated code generation from natural language text, and GPT-3 capable of generating natural language texts that follow on from a starting phrase or paragraph, I wonder if a natural evolution for essay mills is not that they are places where people will write your essay for you, but are machines that will (or perhaps they already are?). And would it then become illegal to sell “personal essay generator” applications which you download to your desktop and then use to write your essay for you?

I suspect that copyright law might also become a weapon to use in the arms race against students – that sounds wrong, doesn’t it? That’s where we’re at, or soon will be. And it’s not the fault of the students: it’s the fault of the sucky assessment strategy and sucky forms of assessment – as they upload the course materials they’ve been provided with for a bit of top-up transfer learning on top of GPT3 that will add in some subject matter specifics to the essay generating model. (Hmm, thinks… is this a new way of using text books? Buy them as transfer learning top up packs to tune your essay generator with some source specifics? Will text books and educational materials start including the equivalent of trap streets in maps, non-existent or incorrect elements that are hidden in plain view to trap the unwary copyright infringer or generative plagiarist?!

And finally, in trying to install one of the tools mentioned in the blog post on query generation around tabular data, I observe one of those other dirty little secrets about the AI-powered future that folk keep talking up: the amount of resource it takes…

Related to this, in hacking together a simple, and unofficial, local environment that students could use for some of the activities on a new machine learning module, I noted that one quite simple activity using the Cifar-10 dataset kept knocking over my test Jupyter environment. The environment was running inside a non-GPU enabled Docker container, resource capped at 2GB of memory (we assume students run low spec machines with 4GB availably overall) but that just wasn’t enough: I needed to up the memory available to Docker to 4GB.

Fragment: Structural Testing of Jupyter Notebook Cell Outputs With nbval

In an attempt to try to automate a bit more of our educational notebook testing and release process, I’ve started looking again at nbval [repo]. This package allows you to take a set of run notebooks and then re-run them, comparing the new cell outputs with the original cell outputs.

This allows for the automated testing of notebooks when our distributed code execution environment is updated. This allows us to check for code that has stopped working for whatever reason, as well as picking up new warning notices, such as deprecation notices.

It strikes me that it would also be useful to generate a report for each notebook that captures the notebook execution time. Which makes me think, is there also a package that profiles notebook execution time on a per cell basis?

The basis of comparison I’ve been looking at is the string match on each code cell output area and on each code cell stdout (print) area. In several of the notebooks I’m interested in checking in the first instance, we are raising what are essentially false positive errors in certain cases:

  • printed outputs that have a particular form (for example, a printed output at each iteration of a loop) but where the printed content may differ within a line;
  • database queries that return pandas dataframes with a fixed shape but variable content, or Python dictionaries will a particular key structure but variable values;
  • %%timeit queries that return different times each time the cell is run.

For the timing errors, nbval does support the use of regular expressions to rewrite cell ouptut before comparing it. For example:

[regex1]
regex: CPU times: .*
replace: CPU times: CPUTIME

[regex2]regex: Wall time: .*
replace: Wall time: WALLTIME

[regex3]
regex: .* per loop \(mean ± std. dev. of .* runs, .* loops each\)
replace: TIMEIT_REPORT

In a fork of the nbval repo, I’ve added these as a default sanitisation option, although it strikes me it might also be useful to capture timing reports and then raise an error if the times are significantly different (for example, and order of magnitude difference either way). This would then also start to give us some sort of quality of service test as well.

For the dataframes, we can grab the dataframe table output from the text/html cell output data element and parse it back into a dataframe using the pandas pd.read_html() function. We can then compare structual elements of the dataframe, such as its size (number of rows and columns) and the column headings. In my hacky code, this behaviour is triggered using an nbval-test-df cell tag:

def compare_dataframes(self, item, key="data", data_key="text/html"):
        """Test outputs for dataframe comparison. """
        df_test = False
        test_out = ()
        if "nbval-test-df" in self.tags and key in item and data_key in item[key]:
            df = pd.read_html(item[key][data_key])[0]
            df_test = True
            test_out = (df.shape, df.columns.tolist())
        return df_test, data_key, test_out

The error report separately reports on shape and column name mismatches:

    def format_output_compare_df(self, key, left, right):
        """Format a dataframe output comparison for printing"""
        cc = self.colors

        self.comparison_traceback.append(
            cc.OKBLUE
            + "dataframe mismatch from parsed '%s'" % key
            + cc.FAIL)

        size_match = left[0]==right[0]
        cols_match = left[1]==right[1]
        
        if size_match:
            self.comparison_traceback.append(cc.OKGREEN 
                + f"df size match: {size_match} [{left[0]}]" + cc.FAIL)
        else:
            self.comparison_traceback.append("df size mismatch")
            self.fallback_error_report(left[0], right[0])
        
        if cols_match:
            self.comparison_traceback.append(cc.OKGREEN
                + f"df cols match: {cols_match} [{left[1]}]"+ cc.FAIL)
        else:
            self.comparison_traceback.append("df cols mismatch")
            self.fallback_error_report(left[1], right[1])
        self.comparison_traceback.append(cc.ENDC)

In passing, I also extended the reporting for mismatched output fields to highlight what output was either missing or added:

        missing_output_fields = ref_keys - test_keys
        unexpected_output_fields = test_keys - ref_keys

        if missing_output_fields:
            self.comparison_traceback.append(
                cc.FAIL
                + "Missing output fields from running code: %s"
                % (missing_output_fields)
                + '\n'+'\n'.join([f"\t{k}: {reference_outs[k]}" for k in missing_output_fields])
                + cc.ENDC
            )
            return False
        elif unexpected_output_fields:
            self.comparison_traceback.append(
                cc.FAIL
                + "Unexpected output fields from running code: %s"
                % (unexpected_output_fields)
                + '\n'+'\n'.join([f"\t{k}: {testing_outs[k]}" for k in unexpected_output_fields])
                + cc.ENDC

For printed output, we can grab the stdout cell output element, and run a simple linecount test to check the broad shape of the output is similar, at least in terms of linecount.

    def compare_print_lines(self, item, key="stdout"):
        """Test line count similarity in print output."""
        linecount_test = False
        test_out = None
        if "nbval-test-linecount" in self.tags and key in item:
            test_out = (len(item[key].split("\n")))
            linecount_test = True
        return linecount_test, test_out

The report is currently just a simple “mismatch” error message:

            for ref_out, test_out in zip(ref_values, test_values):
                # Compare the individual values
                if ref_out != test_out:
                    if df_test:
                        self.format_output_compare_df(key, ref_out, test_out)
                    if linecount_test:
                        self.comparison_traceback.append(
                            cc.OKBLUE
                            + "linecount mismatch '%s'" % key
                            + cc.FAIL)
                    if not df_test and not linecount_test:
                        self.format_output_compare(key, ref_out, test_out)
                    return False

I also added support fork some convenience tags: nb-variable-output and folium-map both suppress the comparison of outputs of cells in a behaviour that currntly models the NBVAL_IGNORE_OUTPUT case, but with added semantics. (My thinking is this should make it easy to improve the test coverage of notebooks as I figure out how to sensibly test different things, rather than just “escaping” problematic false positive cells with the nbval-ignore-output tag.

PS I just added a couple more tags: nbval-test-listlen allows you to test a list code cell output to check that is it the same length in test and reference notebooks, even as the list content differs; nbval-test-dictkeys allows you to test the (top level) sorted dictionary keys of dictionary output in test and reference notebooks, even as the actual dictionary values differ.