Fragment: Author Support Tools – Markdown in VS Code

In passing, some exgtensions I’m using to support authoring markdown in VS Code:

See also: VS Code as an Integrated, Extensible Authoring Environment for Rich Media Asset Creation and the (should now be deprecated?) Fragment: More Typo Checking for Jupyter Notebooks — Repeated Words and Grammar Checking.

“Save and Reveal” Discussion Activities in Moodle VLEs and Jupyter Notebooks

Having a quick peek at some of the materials that have been produced for a new OU course that I’ve had nothing to do with, I notice a new to me VLE interaction style in the form of a FreeResponse text box…

The activity design provides some set up, encourages the the learner to make (and record) a free text response, and (once some text has been enetered into the free response area), a sample discussion can be displayed.

In the underlying markup, the structure is defined as an activity embedding a question, an interaction and a discussion.

<Activity>
  <Heading></Heading>
  <Timing></Timing>
  <Question>
     <Paragraph>In your own words, write brief definitions of the following terms that you’ve seen so far.</Paragraph>
     <BulletedList>
       <ListItem>supervised learning</ListItem>
       <ListItem>unsupervised learning</ListItem>
     </BulletedList>
  </Question>
  <Interaction>
    <FreeResponse size="formatted" id="act_x"/>
  </Interaction>
  <Discussion>
    <Paragraph>Your descriptions will, of course, differ from mine. That’s OK, so long as you’ve understood the key idea behind each term.</Paragraph>
     <BulletedList>
       <ListItem>Supervised learning: learning where the training data is labelled with the ‘correct answer’, such as the correct classification or final result.</ListItem>
      <ListItem>Unsupervised learning: learning where the training data has no labels so the learning system has to make its own discoveries about what the data means.</ListItem>
    </BulletedList>
  </Discussion>
</Activity>

Here’s what the underlying OU-XML looks like;

In the materials I produced for a module update last year, I’d started making use of a related pattern we’d started exploring in the data management module notebooks, specifically, explicit calls to action that get students to engage in note taking and reflection at certain points in the notebook.

My actual aim is to get students to take full ownership of the notebook materials so that they are more likely to annotate them at will; but for a first year equivalent module at least, or in a module where students are exposed to Jupyter notebooks for the first time, we need to coach learners into feeling that they can and should really make their own inline notes when they feel it is useful to do so…

Here’s an example of a simple call out suggesting a student makes some form of commentary. This is done outside of a “formal” activity and is presented in line, almost as a prompt to make a marginal comment:

Structurally, the coloured section is rendered using the nb_extension_empinken Jupyter notebook extension based on a cell tag:

For other examples of using the nb_extension_empinken extension, see “Try it and See” Interactive Learning Activities in Jupyter Notebooks.

Calls to action for making comments as part of an activity are also used, and are supported by a hidden example discussion in an activity design that matches the activity design used in VLE FreeResponse activity:

The design pattern is actually a very old one that’s been used in OU materials for decades. I first saw it explicitly referenced in the OU materials bible, the SOL (“Supported Open Learning” guide) along with the notion of worked examples as if voiced by a tutor at your side.)

The markup for the call to action in my embdedded activity is not a million miles away from the markup used in OU-XML. Here’s what the markup looks like when the Jupyter notebook is saved in the myst-nb format using Jupytext:

Jupyter notebook tags for certain style markup can also be transformed into myst-nb admonitions that can be rendered natively in a Jupyter Book output. See for example the tags2myst utility in ou-jupyter-book-tools.

Generating this sort of markup from the OU-XML should be easy enough (eg using a XSLT transformation) but the reverse may be trickier when it comes to ensuring that all the cells associated with a single activity are hirearchically grouped as part of that activity. (Creating an activity ID applied to each cell as a tag or other metadata would be one say of solving this. But an issue still arises as to how the grouping should be managed within the notebook UI: selecting all cells associated with an activity an then clicking a toolbar button to group them would be one solution, but a method would still be need to use some optional style to indicate that contiguous cells are all part of the same group. I’m not familiar enough with CSS to know if or how that could be easily done using the notebook HTML structure?)

PS this has got me wondering whether I should put together a document that demonstrates possible mappings from all OU-XML tags, eg as described in the the OU-XML Structured Content Tag Guide (OU staff only). It might be useful if I also mined existing OU-XML documents (eg as per Reusing Educational Assets) to see if I could pull out various activity designs (such as the question-interactiveFeeText-discussion pattern) and demonstrate how those might appear in a Jupyter notebook editor, Jupyter Book HTML output, or myst-md “source” document.

Reusing Educational Assets

So it seems that there’s a new internal project running over the next few weeks scoping out how a metadata approach might allow to us to exploit our teaching content as a corpus of assets we can search and reuse in order to save time and money in production. Or something.

When OU-XML was first mooted as a gold master document standard for representing course materials, I naturally thought we might use it as the basis of a searchable repository and did some early demos of searching content scraped from OpenLearn OU-XML docs, revisiting related ideas several times over the years: a searchable image gallery, a meta-glossary (scraped from glossary items in all OpenLearn materials), a directory of learning objectives and the OpenLearn units they related to, and so on. I also did various mindmap style navigational surfaces for navigating a single module, or dynamically created to link to related items from multiple units. All the obvious stuff. More recently, I explored converting OU-XML to markdown, and even started sketching out (hard to use!) automated republishing workflows (for more recent thinking on authoring environments, see for example here). (It looks like there may be strike days soon again, so that may provide an opportunity to revisit that stuff…) Anyway, no one thought enough of any of that stuff to think it might be usefully innovative, so it was all just a(nother) OUsefulless invention.

So here we are again…

Another kick off meeting…

Anyway, quick impressions off the back of it.

OU materials are often narrative based and items can be tightly embedded in a strong narrative. This can make literal reuse, reuse without modification, really hard. It really is often easier to rewirte stuff from scratch. Another issue that s easily forgottten if you are trying to reuse text is to do with voice matching. Whilst it is possible to write in a voice that removes all sense of who the author is, I prefer to write course materials using a particular conversational voice. Other others have their own voices. So literally reusing content that someone else has written might be jarring to the reader.

So a question I keep coming back to, time after time after time after time after time, is the question of what assets are actually reusable?

Here’s my current take on some of the things that are usefully reusable, often very limited in scope but that take time to get right. Which is to say that they are very granular but they take a long time to produce. It can be really quick and easy to generate 500 words of blah text, but it can be really difficult and time consuming to get a figure or figure description right; it can be really fiddly marking up a complicated equation or getting the appropriate steps of a proof in place), and so on.

So when it comes to reuse, what are the granular things that take time to produce, that can be hard to get right, and that someone may have already done? And what other spin-offs or benefits might there be from being able to reuse the asset?

  • learning outcomes (hard to get right);
  • glossary items (fiddly to write);
  • images / figures (often require an artist)
    • already rights cleared
    • may be generic
    • already annotated with figure descriptions (hard to write well)
  • equations (Maths, Chemistry)
    • already marked up eg using LaTeX (may be hard to get right)
    • proofs and derivations (often tricky to sequence to support learning)
  • activities and exercises (may be hard to make interesting and relevant)
    • activity statement
    • example solution and commentary
    • completion time (hard to estimate)
    • (assets associated with activity/exercise)
    • (may be linked to particular learning objective)
  • SAQs, interactive questions
    • statement
    • example solution
  • good quality linked web resources (hard to find; need maintaining in sesne of checking over time)
  • readings / Library items (hard to discover)
    • reading time (needs estimating)
    • Library availability (time/cost)
    • rights clearance
  • datasets (often hard to find ones that tick all the boxes)
    • simple to understand
    • good basis for an example
    • (rights cleared)
    • examples of use
    • (usefully dirty)
  • higher level design patterns (learning design)
    • structured sequences (text, reading, activity)
  • animations and screencasts
    • storyboard
    • transcript
  • audio and video material (expensive to produce)
    • transcripts

As well as these atomic items, there might be paired assets where each component takes time to produce and they also need to be compatible:

  • figure + equation
  • data + figure / chart

So those are some of the things it may be useful to discover and potentially reuse.

OU-XML is structured and allows us to create concordances (metadata) to help us discover or search through a lot of thise things. For a quick example, see this interactive Jupyter notebook demo (or this static demo) of scraping images and figure desciptions from an OpenLearn OU-XML document to support search (click the Open demo direclty button).

However, OU-XML tagging is often used inconsistently and may not be complete so the metadata or structured items we can (could) extract may not be as high quality as we might hope. Note that it may be possible to automate / bootstrap some quality improvement, but manual annotations/retagging of legacy materials may also be required to get most benefit from them in terms of discoverability, reuse etc.

On the question of reuse, direct reuse will often be difficult if not impossible because things may not be exactly suted to the purpose required. In the case of instructionla software screencasts, software updates may invalidate the video component but the shooting script or trasncript might be reusable. However, reuse-by/with-modification may offer huge efficiencies in terms of production. But there’s a but. In order to support reuse-with-modification, or “derived re-production” we ideally need an “editable” form of the asset. We might discover an image by searching through caption text, and may like what we see, almost, but if we need a tweak to the image, how easy is that to achieve? If it required redrawing from scratch, that will take more time than opening up a source file and changing a text label. If an image is a statistical chart derived from data, or an equation, and we want to tweak a small part of it, do we have access to the equation or the original data and chat plotting script, for example?

So as well as finding the “finished” published asset, we also need to find the “source” of the asset. Which is one reason I like “generative” production so much, where our gold master document also includes the means of production of the assets that are rendered in the final published document. For examples of what I mean by generative production, see for example Subject matter Authoring Using Jupyter Notebooks.

F**k You WordPress

Two hours spent on a blog post on a day where I am now really rushed and you don’t backup the content since I started editing and you throw all the content away and just post the title.

F**k you. Pages filled and filled with crappy Javascript, an editing environment that sucks with Javascript powered interaction, and you can’t do the simplest thing of saving the content I (thought I’d) posted.

F**K YOU.

(Some Scraggy Notes) Pondering Jupyter Book Interactive Code Deployments

This was originally a considered blog post, but the pile of crap that is WordPress threw it all away so now it’s just notes. F**k you, WordPress.

Jupyter Book is cool. And it’s useful. It does good stuff. I spent an hour writing about it and WordPress threw it away. F**k you WordPress.

I also commented on my workflow (early days: ou-jupyter-book-tools) where I process tagged Jupyter notebooks to generate Myst files that Jupyter Book can render in very rich ways, but WordPress threw it away. F**k you, WordPress.

Jupyter Book would be more useful for me if it let you execute code locally. I wrote several considered reasons why that’s really useful for edu purposes, not least because of the range of book types you can produce (eg books with generated assets but the code removed from final output, how such production workflows are good for quality and maintenance etc.) but WordPress threw it away. F**k you, WordPress.

At the heart of my local Jupyter Book deployment, I see the Jupyter Book being generated from a notebook deployment using a jupyter-server-proxy wrapper (jupyter-book-server-proxy). I explained the thinking and rationale behind this (the reduced dependency on trying to get anyone else to run a Jupyter server for you (which is a massive blovker in education), other than one you might already have access to), but WordPress threw it away. F**k you, WordPress.

One way to hook into a local server is just to have a link from the book page to the notebook running in the local server. In the dozens of executablebooks repos (I was more polite in the original that WordPress threw away (f**k you, WordPress) but I am ratty as hell right now) I can’t find where or how to make the change that would let me specify a simple _config.yml declaration to add an “interactive computing” link, so what I do is set-up Jupyter Book to add JupyterHub links then I string replace the generated files:

<!--
      <a class="jupyterhub-button" href="localhost:8351/notebooks/notebooks/hub/user-redirect/git-pull?repo=http://github.com/crap/path&urlpath=tree/path/notebooks/notebooks/Part 01 Notebooks/01.3 Basic python data structures.ipynb&branch=master"><button type="button"
                class="btn btn-secondary topbarbtn" title="Launch JupyterHub" data-toggle="tooltip"
                data-placement="left"><img class="jupyterhub-button-logo"
                    src="../_static/images/logo_jupyterhub.svg"
                    alt="Interact on JupyterHub">JupyterHub</button></a>
-->

<a class="jupyterhub-button" href="http://localhost:8351/notebooks/notebooks/Part 01 Notebooks/01.1 Getting started with IPython and Jupyter Notebooks - Bootcamp.ipynb"><button type="button"
                class="btn btn-secondary topbarbtn" title="Open local notebook" data-toggle="tooltip"
                data-placement="left"><img class="jupyterhub-button-logo"
                    src="../_static/images/logo_jupyterhub.svg"
                    alt="Interact on local notebook server">localhost</button></a>

Via @choldgraf, here’s a way in to the launch button templates.

Another way of using Jupyter Book is to execute code inline via a remote MyBinder server using ThebeLab. I wrote various bits about how that can work in edu contexts, before exploring what it means to be able to do that with purely local server (rather than remotely launched MyBinder server), but WordPress threw it away. F**k you, WordPress.

My current hack for running the proxied Jupyter Book is to replace the Jupyter Book created ThebeLab settings with local settings:

<script type="text/x-thebe-config">
  {
    requestKernel: true,
    kernelOptions: {
      name: "python3",
      serverSettings: {
        "baseUrl": "http://localhost:8351",
        "token": "letmein"
      }
    },
  }
  </script>
  <script>kernelName = 'python3'</script>

This works, but you get legacy cruft, like the announcement that the kernel has been launched using mybinder; that text is in the ThebeLab js not the Jupyter Book HTML, so to update it would probably require a tweak to the ThebeLab code and _config.yml settings. It’s also lacking an explicit start directory but I haven’t checked to see if (how) the local server route can accept a start directory path; from a quick test using the Binder start directory settings (setting kernelOptions.path), the setting didnlt seem to be picked up by the local server launch ).

I did explore what sorts of setting would be handy a bit more in the orginal post, but WordPress threw it away. F**k you, WordPress.

I tried to explore some of the security issues around using a local server as the code execution environment (albeit given my limited understanding of security stuff) in the two contexts I am interested in — a containerised environment running locally on a student’s own computer with a simple default token (letmein), where I don’t think there are any real issues, and a hosted environment running the same container being an authed JupyterHub entrypoint, where it’s a bit trickier. I tried to explore how the jupyter-book-server-proxy might be able to pick up a token from the server, and pondered how looping that token via public request from the (authed) Jupyer Book via a public http request back to the authed notebook server, but probably just confused the issue. Anyway, WordPress threw it away. F**k you, WordPress.

I finally reviewed what I think is a really exciting prospect, running Jupyter Book with a JupyterLite code execution enviornment purely in a web browser. I set this up by considering how Jupyter Book currently requires a Jupyter server as well as a web server, or a Jupyter server and a jupyter-server-proxied web book, but the pure web-server only play — Jupyter Book Lite or JupyterLite Book — seems really useful in open education, particularly if you struggle to get tech support. (I did review things like the need for a recent browser, good machine and the overhead involved in the initial download, but WordPress threw it away. F**k you, WordPress.) I noted my lack of understanding about the plumbing around Pyolite/WASM whilst describing how jupyterlite has already demonstrated a JupyterLab and RetroLab environment running with a full Python (or javascript, or P5) kernel purely in the browser, and idly wondered about how useful it would be to have something like a ThebeLite or ThebeLabLite version of, or extension to, ThebeLab, but WordPress threw it away. F**k you, WordPress.

Anyway, rather than that considered post, there’s this scraggy notey one. F**k you, WordPress.

PS here’s a minimal demo generated from our Data Managament and Analysis module notebooks…

And here’s an example script for hacking a local installation together [gist]:

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Using Dummy Python Packages to Distribute Course Code Requirements

One of the approaches I started exploring some time ago for distributing required Python packages to support particular courses (modules) was to create a dummy Python package containing just a list of required packages and push it to PyPi. Installing the single dummy package then installs all the actually required packages. (The dummy package could also include environmental checks, perhaps even some simple module documentation etc. But for now, I’ve been keeping it simple.

The setup I’m using bundles the requirements into a separate file, rather than expicitly listing them. The generic package source tree looks something like this:

.
├── LICENSE
├── MANIFEST.in
├── README.md
├── requirements.txt
├── ou-MODULECODE-py
│   ├── __init__.py
├── setup.py

The MANIFEST.in file is required and ensures that the requirements.py file is bundled into the package that is uploaded to PyPi. If the MANIFEST.in file is omitted, the package install will work from Github (because the requirements.txt will form part of the cloned download, but not from PyPi (becuase the requirements file won’t be uploaded to PyPi). (A safer route to install requirements would be to list them explicitly within the setup.py file. And perhaps not trap for the missing requirements file… Or at least, raise a warning if an anctipated requirements.txt file is not available.)

The __init__.py file can be empty, but I’ve started exploring simple test scripts within it:

def about():
    """Provide a simple description of the package."""
    msg = f"""
# ===== ou_tm351_py, version: {__version__} =====
The `ou_tm351_py` package is an "empty" package that installs Python package requirements 
for the Open University module "Data management and analysis (TM351)" [http://www.open.ac.uk/courses/modules/tm351].
You can test that key required packages are installed by running the command: ou_tm351_py.test_install()
    """
    print(msg)


def test_install(key_packages=None):
    """Test the install of key packages."""
    import importlib

    if key_packages is None:
        key_packages = [
            "pandas",
            "schemadisplay_magic"
        ]
    for p in key_packages:
        try:
            importlib.import_module(p.strip())
            print(f"{p} loaded correctly")
        except:
            print(f"{p} appears to be missing")

A more elaborate test file could attempt to check that all the requirements are installed. We could also bundle simple command line commands, or even a simple webserver serving module help files / documentation.

The setup.py file is largely boilerplate:

from setuptools import setup
from os import path

def get_requirements(fn='requirements.txt'):
   """Get requirements."""
   if path.exists(fn):
      with open(fn, 'r') as f:
        requirements = [r.split()[0].strip() for r in f.read().splitlines() if r and not r.startswith('#')]
   else:
     requirements = []

   return requirements
   
requirements = get_requirements(nogit=False)

print(f'Requirements: {requirements}')

# Any extra requirements will need bundling via MANIFEST.in
extras = {
    'production': get_requirements('requirements_production.txt'),
    'AL': get_requirements('requirements_AL.txt')
    }
    
setup(
    # Meta
    author='Tony Hirst',
    author_email='tony.hirst@open.ac.uk',
    description='Python package installation for OU module MODULECODE',
    name='ou-MODULECODE-py',
    license='MIT',
    url='https://github.com/innovationOUtside/innovationOUtside/ou-MODULECODE-py',
    version='0.0.1',
    packages=['ou_MODUECODE_py'],

    # Dependencies
    install_requires=requirements,
    #setup_requires=[],
    extras_require=extras,

    # Packaging
    #entry_points="",
    include_package_data=True,
    zip_safe=False,

    # Classifiers
    classifiers=[
        'Development Status :: 3 - Alpha',
        'Environment :: Web Environment',
        'Intended Audience :: Education',
        'License :: Free For Educational Use',
        'Programming Language :: Python :: 3.6',
        'Programming Language :: Python :: 3.7',
        'Programming Language :: Python :: 3.8',
        'Topic :: Education',
        'Topic :: Scientific/Engineering :: Visualization'
    ],
)

Installing the package is then a command along the lines of pip install ou-MODULECODE-py.

Sick of Being Stuck Trying to Do Things That Should BE Really Simple With Docker

Set up is – trying to cross-build a large Docker container.

Something like this seems to work with Github Actions:

docker buildx build --platform linux/amd64,linux/arm64             -t ouvocl/vce-tm351-monolith:SOME-HASH --push ./tm351-monolith

docker buildx build --platform linux/amd64,linux/arm64             -t ouvocl/vce-tm351-monolith:latest --push ./tm351-monolith

only it doesn’t because trying to cross-build several images at the same time seems to break the action (out of space?) With smaller images everythig is fine and the second build appears to just use the cached layers from the first build.

Trying to run the above commands locally doesn’t work: cacheing doesn’t seem to apply, so the build is executed twice, which takes ages. Also, if the built layers are different, the push takes double the time it should if we were just pusjing layers that had already been pushed. And the images aren’t necessarily identical, which they should be.

When pushing to Dockerhub, a manifest keeps track of tags and platforms. But trying to find docs that would give me a simple recipe for reliably:

  • cross-build images for two or more platforms;
  • tag and push images for each platform to Docker using eg a Github hash tag as a well as a latest tag;
  • update image for one platform and push that with a new Github hash tag and new latest tag without knackering the manifest.

I am sick to f****g death of trying to build images and containers and do the simplest of related image management tasks.

Docker Crossbuilds: Platform Sensitive Dockerfile Commands

Whilst it’s true to a certain extent that Docker containers “run anywhere”, that “anywhere” comes with a proviso: Docker images are built for particular operating system architectures. A wide variety of machines run amd64 processor instruction sets, but other devices, such as the new Mac M1 processor based machines, are arm64 devices; and Raspberry Pi’s can run as either arm64 or arm/v7 (32 bit).

A single Dockerfile can be used to build images for each architecture using build commands of the form:

docker buildx build --platform linux/amd64,linux/arm/v7,linux/arm64 .

In some cases, you may need to modify the Dockerfile to perform slightly different actions based on the architecture type. Several arguments are available in the Dockerfile (brought into scope by declaring them using ARG) that allow you to modify behaviour based on the the architecture:

  • TARGETPLATFORM – platform of the build result (for example, linux/amd64, linux/arm64linux/arm/v7windows/amd64)
  • TARGETOS – OS component of TARGETPLATFORM (for examaple, linux)
  • TARGETARCH – architecture component of TARGETPLATFORM (for example, amd64, arm64 )
  • TARGETVARIANT – variant component of TARGETPLATFORM (for example, v7)

In a Dockerfile, we might then have something like:

ARG TARGETPLATFORM

...

# RUN command for specific target platforms
RUN if [ "$TARGETPLATFORM" = "linux/amd64" ] || ["$TARGETPLATFORM" = "linux/arm64"] ; \
    then MPLBACKEND=Agg python3 -c "import matplotlib.pyplot" &>/dev/null ; fi 

For more discussion and examples, see WIP: Docker –platform translation example for TARGETPLATFORM.

Draft: Glossary of Jupyter Production Workflow Terms

Jupyter: an open source community project focused on the development of the Jupyter ecosystem (tools and architectures for the deployment of arbitrary executable code environment and reproducible "computational essay" documents). Coined from the original three programming languages supported by the IPython notebook architecture which was subsumed into the Jupyter project as Jupyter Notebooks: Julia, Python and R.

Jupyter Notebooks: variously: a browser based interactive Jupyter notebook; a textual document format (.ipynb); and (less frequently) the single user Jupyter notebook server. In the first sense, most commonly used sense, the Jupyter notebook is a browser based application within which users can edit, render and save markdown (text rendered as HTML), edit code in a wide variety of languages (including but not limited to Python, Javascript, R, Java, C++, SQL), execute the code on a code server and then return and display the response/code outputs in the interactive notebook. The Jupyter notebook document format is a text (JSON) document format that can embed the markdown text, code and code outputs. The cell based structure of the notebook format supports the use of metadata "tags" to annotate cells which can then be used to provide extension supported styling of individual cells (for example, colouring "activity" tagged cells with a blue background to distinguish them from the rest of the content) or modify cell behaviour in other ways.

JupyterHub: JupyterHub is a multi-user server providing authentication, access to persistent user storage, and a multi-user experience. Logged in users can be presented with a range of available environments associated with their user account. The JupyterHub server is responsible for launching individual notebook servers on demand and providing tools for users to manage their environment as well as tools for administrators to manage all users registered on the hub. JupyterHub can launch environments using remote cloud-hosted servers in an elastic (on-demand and responsive) way.

Jupyter server: a Jupyter server or Jupyter notebook server is a server that that connects a Jupyter served computational environment to a Jupyter client (for example, the Jupyter notebook or JupyterLab user interface or the VS Code IDE).

Jupyter kernel: a Jupyter kernel is a code execution environment managed by Jupyter protocols that can execute code requests from a Jupyter notebook environment or IDE and return a code output to the notebook. Jupyter kernels are available for a wide variety of programming languages.

Integrated Development Environment / IDE: a software application providing code editing and debugging tools. IDEs such as Microsoft’s VS Code also provide support for editing and previewing markdown content (as well as generated content, such as VS Code as an Integrated, Extensible Authoring Environment for Rich Media Asset Creation) and showing differences between file versions (see for example Sensible Diff-ing of Jupyter Notebook ipynb Documents Using VS Code).

BinderHub: BinderHub is a on-demand server capable of building and launching temporary / ephemeral environments constructed from configuration files and content contained in an online repository (eg Github or a DOI accessed repository). By default, BinderHub will build a Jupyter notebook environment with preinstalled packaged defined as requirements in a specified Github repository and populated with notebooks contained in the repository.

MyBinder: MyBinder is a freely available community service that launches temporary/ephemeral interactive environments from public repositories using donated cloud server resources.

ipywidgets: the ipywidgets Python package provide a set of interactive HTML widgets that can synchronise settings across interactive Javascript applications that are rendered in a web browser with the state of Python programmes running inside a Jupyter computational environment. ipywidgets also provide a toolkit for easily generating end user application interfaces / widgets inside a Jupyter notebook that can interact with Python programme code also defined in the same notebook.

Core package: for the purposes of this document, a core package is one that is managed under the official jupyter namespace under the Jupyter project governance process.

Contributed package: for the purposes of this document, a contributed package is one that is maintained outside of the official Jupyter project namespace and governance process by independent contributors but complements or extends the core Jupyter packages. Many "official" (which is to say core) packages started life as contributed packages.

Jupytext: Jupytext is a contributed package that supports the conversion of Jupyter notebook .ipynb files to/from other text representations (structured markdown files, Python or Rmd (R markdown) code files). A server extension allows markdown and code documents opened from within a Jupyter environment to be edited within the Jupyter environment. Jupytext also synchronises multiple formats of the same notebooks, such as an .ipynb notebook document with populated code output cells and simple markdown document that represented just markdown and code input cells.

JupyterLite: JupyterLite is a contributed package that removes the need for a separately hosted Jupyter server. Instead, a simple web server can deploy a JupyterLite distribution which provides a JupyterLab or RetroLab user environment that can execute code against a computational environment that runs purely in the web page/web browser using a WASM compiled Jupyter kernel. With JupyterLite, the user can run a Jupyter environment without the need to install any software other than a web browser and without the need to have a web connection once the environment is loaded in the browser.

Github: Github is an online collaborative development environment owned and operated by Microsoft. Online code repositories provide version controlled file archives that can be access individually or by multiple team members. As well as providing a git managed repository with all that involves (the ability to inspect different versions of checked in files, the ability to manage various code branches, management tools for accepting pull requests), Github also provides a wide range of project management and coordination tools: project boards, issue management, discussion forums, code commit comments, wikis, automation.

git: git is a version control system for tracking changes over separate file "commits" (i.e. saved versions of a file). Originally designed as a command line tool, several graphical UI applications (for example, Github Desktop and Sourcetree) or IDEs (for example, VS Code with the extensions make it easier to manage git environments locally as well as synchronising local code repositories with online code repositories. Many IDEs also integrate git support natively (VS Code, RStudio) as well as providing extended support through additional extensions (for example, VS Code GitLens extension). Notably, the VS Code environment provides a rich differencing display for Jupyter notebooks.

ThebeLab: Thebelab is a contributed package that provides a set of Javascript functions that support remote code execution from an HTML web page. Using ThebeLab, code contained in HTML code cells can be edited and executed against a remote Jupyter kernel that is either hosted by a Jupyter notebook server or launched responsively via MyBinder or another BinderHub server.

Jupyter Book: Jupyter Book is a contributed technique for generating an interactive HTML style textbook from a set of markdown documents or Jupyter notebooks using the Sphinx document processing toolchain. Documents can also be rendered into other formats such as e-book formats or PDF. Notebooks can be executed to include code outputs or rendered without code execution. Notebook cell tags can be used to hide (or remove) unwanted code cell inputs or outputs as well as styling particular cells. Inline interactive code execution is also possible using ThebeLab, although in-browser code execution using JupyterLite is not supported. Interactive notebooks can also be launched from Jupyter Books using MyBinder or opened directly in a linked Jupyter notebook server environment. Jupyter Book builds on several community contributed tools managed as part of the Executable Books project for rendering rich and comprehensively styled content from source markdown and notebook documents. Jupyter Book represents the closest thing to an official a rich publication route from notebook content.

Sphinx: Sphinx is a publishing toolchain originally created to support the generation of Python code documentation. Spinx can render a documents in a wide variety of formats including HTML, ebooks, LaTeX and PDF. A wide range of plugins and extensions exist to support formatting and structuring of documentation, including the generation of tables of contents, managing references, handling code syntax highlighting and providing code copying tools.

nbsphinx: nbsphinx is a contributed Sphinx extension that for parsing and executing Jupyter notebook .ipynb files. nbsphinx thus represents a simple publishing extension to Sphinx for rendering Jupyter notebooks, compared to Jupyter Book which provides a complete framework for publishing rich interactive content as part of a Jupyter workflow.

Docker: Docker is a virtual machine technology used to deploy virtualised environments on a user’s own computer or via a remote server. A JupyterHub server can be used to manage the deployment of Docker environments running individual Jupyter user environments on remote, scaleable servers.

Docker image / Docker container image: a Docker virtual machine environment is downloaded as an image file. An actual instance of a Docker virtual machine environment is generated from a Docker image. Public Docker images are hosted in a Docker registry such as DockerHub from where they can be downloaded by a Docker client.

Docker container: a Docker container is an instantiated version of a Docker image. A Docker container can be used to deploy a Jupyter notebook server and the Jupyter environments exposed by the server. Just like a "real" computer, Docker containers can also be hibernated / resumed or restarted. A pristine version of the environment can be created by destroying a container and then creating a brand new one from the original Docker container image.

Dockerhub: DockerHub is a hosted Docker image registry that hosts public Docker images that can be downloaded and used by Docker applications running locally or on a cloud server. Github also publish a Docker container registry. In addition, organisations and individuals can self-host a registry. Private image registries are also possible that only allow authenticated users or clients to search for and download particular images.

Python: Python is a general purpose programming language that is widely used in OU modules. A Python environment can be distributed via the Anaconda scientific Python distribution or inside a Docker container.

Anaconda: Anaconda is a scientific Python distribute that bundles the basic Python environment with a wide range of preinstalled scientific Python packages. In many instances, the Anaconda distribution will include all the packages required in order to perform a set of required scientific computing tasks. Anaconda can be installed directly onto the user’s desktop or used inside a Docker container to provide a Python environment inside such a virtualised environment. The appropriateness of using Anaconda as a distribution environment in a distance education context is contested.

IPython: IPython (interactive Python) provides an interactive "REPL" (read, evaluate, print, loop) environment for supporting interactive execution and code output display. In a Python based Jupyter environment, it is actual IPython that supports the interactive code execution.

R: R is a programming language designed to support statistical analysis and the creation of hight quality, data driven scientific charts and graphs. R is used in several OU modules.

Javascript: Javascript is a widely used general purpose programming language. Javascript is also available inside a web browser. Standalone interactive web pages or web applications are typically built from Javascript code that runs inside the web page/web browser. Such applications can often continue to work even in the absence of a network connection.

WASM: WASM (or WebAssembly) is a virtualised programming environment that can run inside a web browser. The JupyterLite package uses WASM to provide an in-browser computational environment for Jupyter environments that allows notebooks to execute Python code cells purely within the browser.

Markdown: Markdown is a simple text markup language that allows you to use simple conventions to indicate style (for example, wrapping a word in asterisks to indicate emphasis, or using a dash at the start of a line to indicate a list item or bullet point). Markdown is typically converted to HTML and then rendered in a browser as a styled document. Many Markdown editors, including Jupyter notebooks and IDEs such as VS Code, provide live, styled previews of raw markdown content within the application.

HTML: HTML (hyptertext markup language) is an XML based language used to mark-up text documents with simple structure and style. Web browsers typically render HTML documents as styled web pages. The actual styling (cplour selection, font selection) is typically managed using a CSS (cascading style sheets) which can change the look and feel of the page without having to change the underlying HTML. (When a theme is changed on a web page, for example, dark mode, a different set of CSS settings are used to render the page whilst the HTML remains unchanged).

CSS: CSS (cascading style sheets) control the particular visual styles used to render HTML content. Changing the CSS changes the visual rendering of a particular HTML webpage without having to change the underlying structural HTML.

nbgrader: nbgrader is a core Jupyter package providing a range of tools for manage the creation, release, collection and automated and manual marking of Jupyter notebooks.

Version Control: version control is a technique for tracking changes in one or more documents over time. Changes to individual documents may be uniquely tracked with different document versions (for example, imagine looking at "tracked changes" between two versions of the same document), and collections of versioned documents can themselves be versioned and tracked (for example, a set of documents that make up the documents released to students in a particular presentation of a particular module). In a distributed version control system such as git, mechanisms exist that allow multiple authors or editors to work on their own own copies of the same documents at the same time, and then alert each other to the changes they have made to the documents and allow them to merge changes in made by other authors/editors. If two people have changed the same piece of content in different ways at the same time, a so-called merge conflict will be generated that identifies the clash and allows a decision to be made as to which change is accepted.

Merge conflict: a merge conflict arises in a collaborative, distributed version control system when conflicting changes are made the same part of a particular file by different people, or when one person works on or makes changes to a file that another has independently deleted. Resolving the merge conflict means deciding which set of updates you actually want to to accept into the modified document.

Github Issue: a Github Issue is a single issue comment thread used to discuss a particular issue such as a specific bug, error or feature request. Issues can be tagged with particular users and/or topics. Github Issues are associated with a particular code repository. "Open issues" are ones that are still to be addressed; once resolved, they are then "closed" providing an archived history of matters arising and how they were addressed. When files are committed to the repository, the commit message may be used to associate the commit (i.e. the changes made to particular files) with a particular issue, and even automatically close the issue if the commit resolves that issue.

Github Discussion: a Github Discussion is a threaded forum associated with a particular repository that allows for more open ended discussions than might be appropriate in an issue.

Github/git commit: a git or Github commit represents a check-in of a particular set of changes to one or more documents. Each commit has a unique reference value allowing you to review just the changes made as part of that commit compared to either the previous version of those documents, or another version of those documents. Making commits at a low level of granularity means that very particular changes can be tracked and if necessary rolled back. A commit message allows a brief summary of the changes made in the commit to be associated with it; this is useful for review purposes and in distributed multi-user settings to communicate what changes have been made (a longer description message may also be attached to each commit). Identifying an appropriate level of granularity for commits is one of the challenges in establishing a good workflow, not least because of the overhead associated with adding a commit message to each commit.

Github/git pull request (PR): a git or Github Pull request (PR) represents a request that a set of committed changes are accepted from one branch into into another branch of a git repository. Automated checks and tests can be run whenever a PR is made; if they do not pass, the person making the PR is alerted to the fact and invited to address the issue. Merging commits from a PR may be blocked until all tests pass. PRs may also be blocked until the PR has received a review by one or more named individuals.

Automation: automation is the use of automatically or manually triggered events or manually issued commands for running scripted tasks. Automation can be used to run a spell-checker over a set of files whenever they are updated, automatically check and style code syntax, or automatically execute and text code execution. Automation can also be used to automatically update the building of Docker images or render and publish interactive textbooks. Automation could be used to automate the production of material distributions and releases and then publish them to a desired location (such as the location pointed to by a VLE download link).

Autonomation: autonomation (not commonly used in computing context) is a term taken from lean manufacturing that refers to "automation with a human touch". In the case of a Jupyter production system, this might include the running of automated tests (such as spell checkers) that prevent documents being committed to a repository if they contain a spelling mistake. The main idea is that errors should not propagate but be fixed immediately at source. The automation identifies the issue and prevents it being propagated forward, a human fixes the issue then reruns the automated tests. If they pass, the work is then automatically passed forwards.

Github Action: a Github Action forms part of an automation framework for Github. Github Actions can be triggered to run checks and tests in response to particular events such as code commits, PRs or releases, as well as to manual triggers. Github Actions can also be used to render source documents to create distributions as well as publishing distributions to particular locations (for example, creating a Docker image and pushing it to DockerHub, generating a Jupyter Book interactive textbook and publishing it via Github Pages, etc.). A wide range of off-the-shelf Github Actions are available.

git commit hook: a git commit hook is a trigger for automation scripts that are run whenever a git commit is issued. The script runs against the committed files and may augment the commit (for example, automatically checking and correcting code style / layout and automatically adding style corrections as part of the commit process, or using Jupytext to automatically create a paired markdown document for a committed .ipynb notebook document, or vice versa.

pre-commit: pre-commit is a general purpose contributed framework for creating git precommit scripts. A wide range of off-the-shelf pre-commit actions are defined for performing particular tasks.

Rendering: rendering a file refers to the generation of a styled "output" version of a document from a source format. For example, a markdown document may rendered as a styled HTML document.

Generative document: a generative document is a document that includes executable source code. The source code provides a complete set of instructions for generating media assets as the source document is rendered into a distribution document.

Generative rendering: a generative document is rendered as a styled document containing media assets that are created by executing some form of source code within the source document as part of the rendering process

Generated asset: a generated asset is a media asset that has been generated from a source code representation as part of the rendering process. Updates to the media asset (for example, text labels or positioning in a diagram) are made by making changes to the source code and then re-rendering it, not by editing the asset directly.

Distribution: a distribution represents a complete set of version controlled files that could be distributed to an end user. In a content creation content, a distribution might take the form of a complete set of notebooks, a complete set of HTML files, or a set of rendered PDF documents. A distribution might be used as a formal handover in a regimented linear workflow process or as the basis of a set of files released to students. A uniquely identifying hash value can be used to identify each distribution and track exactly which version of each individual file is included in a particular distribution.

Release: a release is a version controlled distribution that can be distributed to end users such as a particular cohort of students on a particular module. A release will be given an explicit version number that should include the module code, year and month of presentation as well as lesser version (edition) numbers that help track releases integrating minor updates etc.

Source / Source files: the source files are the set of files from which a distribution is rendered. The source files might include structural metadata, comment code that is stripped from the source document and does not appear in the rendered document, and even code that is executed to produce generated assets that form part of the distribution, even if the source code does not.

Fragment: Tools of Production – ggalt and encircling scatterplot points in R and Python

In passing, I note ggalt, an R package containing some handy off-the-shelf geoms for use with ggplot2.

Using geom_encircle() you can trivially encircle a set of points which could be really handing when demonstrating / highlighting grouping various sets of points in a scatterplot:

See the end of this post for a recipe for creating a similar effect in Python.

You can also encircle and fill by group:

A lollipop chart . The geom_lollipop() geom provides a clean alternative to the bar chart (although with a possible loss of resolution around the actual value being indicated):

A dumbbell chart provides a really handy way of comparing differences between pairs of values. Enter, the geom_dumbbell():

The geom_dumbbell() will also do dodging of duplicate treatment values, which could be really useful:

The geom_xspline() geom provides a good range of controls for generating splines drawn relative to control points: “for each control point, the line may pass through (interpolate) the control point or it may only approach (approximate) the control point”.

The geom_encircle() idea is really handy for annotating charts. I donlt think there’s a native Pyhton seaborn method for this, but there is a hack to it (via this StackOverflow answer) using the scipy.spatial.ConvexHull() function:

# Via: https://stackoverflow.com/a/44577682

import matplotlib.pyplot as plt
import numpy as np; np.random.seed(1)
from scipy.spatial import ConvexHull

x1, y1 = np.random.normal(loc=5, scale=2, size=(2,15))
x2, y2 = np.random.normal(loc=8, scale=2.5, size=(2,13))

plt.scatter(x1, y1)
plt.scatter(x2, y2)

def encircle(x,y, ax=None, **kw):
    if not ax: ax=plt.gca()
    p = np.c_[x,y]
    hull = ConvexHull(p)
    poly = plt.Polygon(p[hull.vertices,:], **kw)
    ax.add_patch(poly)

encircle(x1, y1, ec="k", fc="gold", alpha=0.2)
encircle(x2, y2, ec="orange", fc="none")

plt.show()

It would be handy to add a buffer / margin region so the line encircles the points rather than going through the envelope loci? From this handy post on Drawing Boundaries in Python, one way of doing this is to cast the points defining the convex hull to a shapely shape (eg using boundary = shapely.geometry.MultiLineString(edge_points)) and then buffer it using a shapely shape buffer (boundary.buffer(1)). Alternatively, if the points are cast as shapely points using MultiPoint, then shapely also a convex hull function that returns and object that can be buffered from directly.