Another Automation Step: Spell-Checking Jupyter Notebooks

Another simple automation step, this time to try to add spell checking of notebooks.

I really need to find a robust and useful way of doing this. I’ve previously explored using pyspelling, and started tinkering with a thing to generate tidier reports form it, and have also found codespell to be both quick and effective.

Anywhere, here’s a hacky Github Action for spellchecking notebooks files in the last commit of a push onto a Github repo (see also this issue regarding getting changed files from all commits in the push that raised the action, which seems to have been addressed by this PR):

name: spelling-partial-test
on:
  push

jobs:
  changednb:
    runs-on: ubuntu-latest
    # Set job outputs to values from filter step
    outputs:
      notebooks: ${{ steps.filter.outputs.notebooks }}
    steps:
    # (For pull requests it's not necessary to checkout the code)
    - uses: actions/checkout@v2
      with:
        fetch-depth: 0
    - uses: dorny/paths-filter@v2
      id: filter
      with:
        filters: |
          notebooks:
            - '**.ipynb'

  nbval-partial-demo:
    needs: changednb
    if: ${{ needs.changednb.outputs.notebooks == 'true' }}
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@master
      with:
        fetch-depth: 0 # or 2?
        #ref: nbval-test-tags

    - id: changed-files
      uses: tj-actions/changed-files@v11.4
      with:
        since_last_remote_commit: 'true'
        separator: ','
        files: |
          .ipynb$

    - name: Install spelling packages
      run: |
        sudo apt-get update && sudo apt-get install -y aspell aspell-en
        python3 -m pip install --upgrade https://github.com/ouseful-PR/nbval/archive/table-test.zip
        python3 -m pip install --upgrade https://github.com/innovationOUtside/nb_spellchecker/archive/main.zip
        python3 -m pip install --upgrade https://github.com/ouseful-PR/pyspelling/archive/th-ipynb.zip
        python3 -m pip install --upgrade codespell

    - name: Codespell
      # Codespell is a really quick and effective spellchecker
      run: |
        touch codespell.txt
        IFS="," read -a added_modified_files <<< "${{ steps.changed-files.outputs.all_modified_files }}"
        # This only seems to find files from the last commit in the push?
        for added_modified_file in "${added_modified_files[@]}"; do
          codespell  "${added_modified_files[@]}" | tee -a codespell.txt
        done

    - name: pyspelling test of changed files
      # This runs over changed files one at a time, though we could add multiple -S in one call...
      run: |
        touch typos.txt
        touch .wordlist.txt
        IFS="," read -a added_modified_files <<< "${{ steps.changed-files.outputs.all_modified_files }}"
        # This only seems to find files from the last commit in the push?
        for added_modified_file in "${added_modified_files[@]}"; do
          pyspelling -c .ipyspell.yml -n Markdown -S "${added_modified_files[@]}" | tee -a typos.txt || continue
          pyspelling -c .ipyspell.yml -n Python -S "${added_modified_files[@]}" | tee -a typos.txt || continue
        done
        cat typos.txt
        touch summary_report.txt
        nb_spellchecker reporter -r summary_report.txt
        cat summary_report.txt
      shell: bash
      # We could let the action fail on errors
      continue-on-error: true

    - name: Upload partial typos
      # Create a downloadable bundle of zipped typo reports
      uses: actions/upload-artifact@v2
      with:
        name: typos
        path: |
         codespell.txt
         typos.txt
         summary_report.txt

Typos are displayed inline in the action run:

And a zipped file of spellcheck reports is also available for download:

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: