With Permission: Running Arbitrary Startup Services In Docker Containers

In Running Arbitrary Startup Scripts in Docker Containers, I described a recipe cribbed from MyBinder/repo2docker, for running arbitrary scripts on startup of a Docker container.

One thing I hadn’t fully appreciated was the role of permissions in making sure that scripts and services called in the startup script had enough permissions to run. In the containers we are using, which are inspired by the official Jupyter stack containers, the containers are started with a specified user (in the Jupyterverse, this is user jovyan wuth UID 1000, by convention).

The start script runs under the user that is started into the container, so trying to start the postgres database service from my start script resulted in a permissions error.

One possible fix is to elevate the permissions of the user so that they can run the desired start commands. This is perhaps not as unreasonable as it might sound, at least in an educational context. The containers are single user environments, and when run from a multi-user JupyterHub environment (at least under Kubernetes, rather than for example in The Littlest JupyterHub). Whilst we don’t want learners to be in a position where they accidentally destroy their environment, my personal belief is that we should allow learners to have as much ownership of the environment as possible. (It should also be noted that if a student does mangle a containerised environment, respawning the environment from the original image as a new container should put everything back in place…)

So how do we go about elevating permissions? The approach I have used to date (for example, in here) is to allocate sudoers priviliges to the user in respect of at least the commands that are used to start up services in the start up script.

For example, the following Dockerfile command gives the user permission to start the postgresql service, run the mongo server and fire up an external start script:

RUN echo "$NB_USER ALL=(ALL:ALL) NOPASSWD: /sbin/service postgresql restart" >> /etc/sudoers && \
    echo "$NB_USER ALL=(ALL:ALL) NOPASSWD: /usr/bin/mongod" >> /etc/sudoers && \
    echo "$NB_USER ALL=(ALL:ALL) NOPASSWD: /var/startup/start_jh_extras" >> /etc/sudoers

The extra start script is actually provided as a place for additional startup items required when used in a JupyterHub environment, about which more in a later post. (The sudoers bit for that script should probably really be in the Dockerfile that generates the JupyterHub image, which slightly differs from the local image in my build.)

if [ -f "/var/startup/start_jh_extras" ]; then
    sudo /var/startup/start_jh_extras
fi

PS for a complementary approach to all this, see my colleague Mark Hall’s ou-container-builder.

Fragment: Helping Learners Read Code

Picking up on Helping Learners Look at Their Code, where I showed how we can use the pyflowchart Python package to render a flowchart equivalent of code in a notebook code cell using the flowchart.js package, I started wondering about also generating text based descriptions of simple fragements of code. I half expected there to be a simple package out there that would do this — a Python code summariser, or human radable text description generator — but couldn’t find anything offhand.

So as a a really quick proof of concept knocked up over a coffee break, here are some sketches of a really naive way in to parsing some simple Python code (and that’s all we need to handle…) on the way to creating a simple human readable text version of it.

#  Have a look at the AST of some Python code

# Pretty print AST
#https://github.com/clarketm/pprintast
#%pip install pprintast

from pprintast import pprintast as ppast # OR: from pprintast import ppast

# 2. pretty print AST from a "string".
exp = '''
import os, math
import pandas as pd
from pprintast import pprintast2 as ppast
def test_fn(a, b=1, c=2):
    """Add two numbers"""
    out = a+b
    print(out)
    return out
def test_fn2(a, b=1):
    out = a+b
    if a>b:
        print(a)
    else:
        print(b)
    print(out)
'''

ppast(exp)

This gives a pretty printed output that lets us review the AST:

Module(body=[
    Import(names=[
        alias(name='os', asname=None),
        alias(name='math', asname=None),
      ]),
    Import(names=[
        alias(name='pandas', asname='pd'),
      ]),
    ImportFrom(module='pprintast', names=[
        alias(name='pprintast2', asname='ppast'),
      ], level=0),
    FunctionDef(name='test_fn', args=arguments(posonlyargs=[], args=[
        arg(arg='a', annotation=None, type_comment=None),
        arg(arg='b', annotation=None, type_comment=None),
        arg(arg='c', annotation=None, type_comment=None),
      ], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[
        Constant(value=1, kind=None),
        Constant(value=2, kind=None),
      ]), body=[
        Expr(value=Constant(value='Add two numbers', kind=None)),
        Assign(targets=[
            Name(id='out', ctx=Store()),
          ], value=BinOp(left=Name(id='a', ctx=Load()), op=Add(), right=Name(id='b', ctx=Load())), type_comment=None),
        Expr(value=Call(func=Name(id='print', ctx=Load()), args=[
            Name(id='out', ctx=Load()),
          ], keywords=[])),
        Return(value=Name(id='out', ctx=Load())),
      ], decorator_list=[], returns=None, type_comment=None),
    FunctionDef(name='test_fn2', args=arguments(posonlyargs=[], args=[
        arg(arg='a', annotation=None, type_comment=None),
        arg(arg='b', annotation=None, type_comment=None),
      ], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[
        Constant(value=1, kind=None),
      ]), body=[
        Assign(targets=[
            Name(id='out', ctx=Store()),
          ], value=BinOp(left=Name(id='a', ctx=Load()), op=Add(), right=Name(id='b', ctx=Load())), type_comment=None),
        If(test=Compare(left=Name(id='a', ctx=Load()), ops=[
            Gt(),
          ], comparators=[
            Name(id='b', ctx=Load()),
          ]), body=[
            Expr(value=Call(func=Name(id='print', ctx=Load()), args=[
                Name(id='a', ctx=Load()),
              ], keywords=[])),
          ], orelse=[
            Expr(value=Call(func=Name(id='print', ctx=Load()), args=[
                Name(id='b', ctx=Load()),
              ], keywords=[])),
          ]),
        Expr(value=Call(func=Name(id='print', ctx=Load()), args=[
            Name(id='out', ctx=Load()),
          ], keywords=[])),
      ], decorator_list=[], returns=None, type_comment=None),
  ], type_ignores=[])

We can now parse that into a dict, for example:

#https://www.mattlayman.com/blog/2018/decipher-python-ast/
import re
import ast
from pprint import pprint

# TO DO update generic_visit to capture other nodes
# A NodeVisitor can respond to any type of node in the Python AST.
# To visit a particular type of node, we must implement a method that looks like visit_.
class Analyzer(ast.NodeVisitor):
    def __init__(self):
        self.stats = {"import": [], "from": [], "function":[]}

    def visit_Import(self, node):
        for alias in node.names:
            import_ = {'name':alias.name, 'alias':alias.asname}
            self.stats["import"].append(import_)
        self.generic_visit(node)

    def visit_ImportFrom(self, node):
        imports = {'from': node.module, 'import':[]}
        for alias in node.names:
            imports['import'].append({'name':alias.name, 'as':alias.asname})
        self.stats["from"].append(imports)
        self.generic_visit(node)
    
    def visit_FunctionDef(self, node):
        ret = None
        args = [a.arg for a in node.args.args]
        args2 = [c.value for c in node.args.defaults]
        argvals = [a for a in args]
        for (i,v) in enumerate(args2[::-1] ):
            argvals[-(i+1)] = f"{args[-(i+1)]}={v}"
        for n in node.body:
            if isinstance(n, ast.Return):
                ret = re.sub('^return\s+' , '', ast.get_source_segment(exp, n))
        self.stats["function"].append({'name':node.name, 
                                       'docstring': ast.get_docstring(node),
                                       'returns': ret,
                                       'args': args, 'args2': args2, 'argvals':argvals,
                                       'src':ast.get_source_segment(exp,node)})
        self.generic_visit(node)

    def report(self):
        pprint(self.stats)
 

And that then generates output of the form:

tree = ast.parse(exp)
analyzer = Analyzer()
analyzer.visit(tree)
analyzer.report()

'''
{'from': [{'from': 'pprintast',
           'import': [{'as': 'ppast', 'name': 'pprintast2'}]}],
 'function': [{'args': ['a', 'b', 'c'],
               'args2': [1, 2],
               'argvals': ['a', 'b=1', 'c=2'],
               'docstring': 'Add two numbers',
               'name': 'test_fn',
               'returns': 'out',
               'src': 'def test_fn(a, b=1, c=2):\n'
                      '    """Add two numbers"""\n'
                      '    out = a+b\n'
                      '    print(out)\n'
                      '    return out'},
              {'args': ['a', 'b'],
               'args2': [1],
               'argvals': ['a', 'b=1'],
               'docstring': None,
               'name': 'test_fn2',
               'returns': None,
               'src': 'def test_fn2(a, b=1):\n'
                      '    out = a+b\n'
                      '    if a>b:\n'
                      '        print(a)\n'
                      '    else:\n'
                      '        print(b)\n'
                      '    print(out)'}],
 'import': [{'alias': None, 'name': 'os'},
            {'alias': None, 'name': 'math'},
            {'alias': 'pd', 'name': 'pandas'}]}
'''

It’s not hard to see how we could then convert that to various text sentences, such as:

# N packages are imported directly: os and math without any aliases, pandas with the alias pd
# The ppast package is loaded in from the pprintast module with alias ppast
# Two functions are defined: test_fn, which will add two numbers, and...
# The test_fn function takes two arguments TO DO N required and M optional

It would be trivial to create some magic to wrap all that together, the let user use a block cell magic such as %%summarise_this_code to generate the text description, or play it out using a simple text to speech function.

PS in passing, it’s also worth noting pindent.py (via) which will add #end of block comments at the end of each code block in a Python program. Backup gist: https://gist.github.com/psychemedia/2c3fe0466aca1f760d67d5ca4f6e00b1

Helping Learners Look at Their Code

One of the ideas I’m using the Subject Matter Authoring With Jupyter Notebooks online textbook to riff around is the notion (which will be familiar to long time readers of this blog) of sharing the means of asset production with students. In many cases, the asset production I am interested in relates to the creation of media assets or content that is used to support or illustrate teaching and/or learning material.

For example, one well known graphical device for helping explain a simple algorithm or process is a flow chart. Here’s an example from an OpenLearn unit on Computers and compute systems:

The image may have been drawn using a flowchart generation package, or “freehand” by an artist in a generic drawing package.

Here’s the same flow chart (originally generated in SVG form) produced by another production route, the flowchart.js Javascript package:

The chart was created from a script with a particular syntax:

st=>start: start
in1=>inputoutput: Accept data from sensor
op1=>operation: Transform sensor data to display data format
out1=>inputoutput: Send display data to display
e=>end: end

st->in1->op1->out1->e

To begin with, each element is defined using a construction of the form: uniqueIdentifier=>blocktype: label. (Note that the space after the colon following the blocktype is required or the diagram won’t be laid out properly.) The relationship describing how the various blocks are connected is then described using their unique identifiers.

Using a piece of simple IPython magic (flowchart_js_jp_proxy_widget), we can create a simple tool for rendering flowcharts created using flowchart.js in a Jupyter notebook, or defining the production of an image asset in an output document format such as an HTML textbook or a PDF document generated from the notebook “source” using a publishing tool such as Jupyter Book.

(In a published output created using Jupyter Book, we could of course hide or remove the originating script from the final document and just display the flow chart.)

To update the flowchart all that is required is that we update the script and rerun it (or reflow the document if we are creating a published format).

One other thing we notice about the OpenLearn document is that it links to a long description of the diagram, which is required for accessibility purposes. Only, in this case, it doesn’t…:

(I really do need to get around to hacking together an OU-XML quality report tool that would pick up things like that… I assume there are such tools internally — the OU-XML gold master format has been around for at least 15 years — but I’ve never seen one, and working on one of my own would help my thinking re: OU-XML2ipynb or OU-XML2MyST conversions anyway.)

A very literal long description might have taken the form of something like “A flow chart diagram showing a particular process. A rounded start block connects to an input/output block (a parallelogram) labeled ‘Accept data from sensor’…” and so on. In one sense, just providing the flowchart.js source would provide an unsighted user with all the information contained in the diagram, albeit in a slightly abstract form. But it’s not hard to see how we might be able to automate the creation of a simple text creation script from the flowchart description. (That is left as an exercise for the reader… Please post a link to your solution in the comments!;-) Or if you know of any reference material with best practice guidance for generating long descriptions of flow charts in particular, and diagrams in general, for blind users, please let me know, again via the comments.)

So, that’s an example of how we can use text based tools to generate simple flow charts. Here’s another example, taken from here:

Simple flow chart (via pyflowchart docs)

This diagram is not ideal: the if a block feels a bit clunky, we might take issue with the while operation block not being cast into a decision loop (related issue – and me not reading the docs! This is a feature, not a bug, an automated simplification that can be disabled…), and why is the print() statement a subroutine? But it’s a start. And given the diagram has been generated from a script, if we don’t like the diagram, we can easily make changes to it by editing the script and improving on it.

At this point, it’s also worth noting that the script the image was generated was itself generated from a Python function defintion using the pyflowchart Python package:

The script is slightly harder to read than the original example, not least in the way the unique identifiers are designed, but it’s worth remembering this is currently generated primarily for a machine, rather than a person, to read. (We could create simple unique identifiers, for example, if we wanted to make the script more palatable to human readers.)

Whilst the script may not be ideal, it does function as a quick first draft that an author can work up. As with many automation tools, they are most effective when used by a human performing a task in order to support that task, rather than instead of a human performing that task (in this case, the task of producing a flowchart diagram).

We can also generate the diagram from a function in a Jupyter notebook code cell using a bit of IPython magic:

Creating a flowchart with pyflowchart magic

On my to do list is to update the magic to alternatively display the flowchart.js diagram generating script as well as optionally execute the code in the magicked code cell (at the moment, the code is not executed in the notebook Python kernel environment). And having a look at automatically generating human readable text descriptions.

It’s not just functions that we can render into flowcharts either. We can also render flowcharts from simpler scripts:

(Thinks: it might be nice if we could add a switch to add start and stop blocks to top and tail this sort of flowchart. Or maybe allow a pass statement at the very start or end of very end of a code fragment to get rewritten as start/stop block as appropriate.

So, here we have a couple of tools that can be used:

  • to generate a generative text description of an algorithm defined using Python code from an abstract syntax tree (AST) representation of the code;
  • to render a media asset (a flow chart diagram) from a generative text description.

So that’s the means of production part.

We can use this means of production to help support the creation of flowcharts in a relatively efficient and straightforward way. The lower the overhead or cost to doing something the more likely we are to do it, or at least, the more likely we are to consider doing it because of the fewer blockers to actually doing it. So this sort of route, this means of production, makes it easier for us to make use of graphical flow charts to illustrate materials if we want to.

If we want to.

Or how about if a student wants to?

With an automated means of production available, it becomes easier to create additional examples that we might make optionally available (either in final rendered form, or in a generated form, from a generative scripts) that some students might find useful.

But if we make the means of production available to students, then it means that they can generate their own examples. And check their own work.

Take the case of self-assessment activities generated around flowcharts. We might imagine:

  • asking a learner to identify which flowchart of several describes a particular code fragment;
  • which code fragment of several implements a particular flowchart;
  • what a flowchart for a particular code fragment might look like;
  • what a piece of code implementing an alogirithm described in a provided flow chart might look like.

In each of the above cases, we can use a generative code2flowchart method to help implement such an activity. (Note that we can’t go directly from flowchart to code, though we may be able to use the code to flowchart route to workaround that in some cases with a bit of creative thinking…)

In addition, my providing learners with the means of production for generaing flowcharts, they can come up with their own exercises and visualise their own arbitrary code in a flowchart format to self-check their own work or understanding. Whether folk really can be said to be “visual learners” or not, sometimes it can help drawing things out, seeing them represented in visual form, and reading the story back from what the diagram appears to be saying.

And if we can also get a simple flowchart script to human readable text generator going, we can provide yet another different way of looking at (which is to say, reading), the code text. And as with many accessibility tools, that can be useful for everyone, not just the learner who needs an alternative format in order to be able to access the materials in a useful and useable (which is to say, a meaningful) way.

PS Finally, it’s worth noting the some of the layouts that flowchart.js generates are a bit broken. Some people will use this to argue that “because it breaks on these really weird edge cases and doesn’t do everything properly, we shouldn’t use it for anything, even the things it does work well on”. This is, unfortunately, an all-too-common response in an organisation with a formal process mentality that also belies a misunderstanding of using automation instead of rather than by and in support of. There are two very obvious retorts to this: firstly, the flowchart.js code is open, so if we spot something is broken we can fix it (erm, like I haven’t) and make it better for everyone (and which will in turn encourage more people to use it and more people to spot and fix broke things); secondly, the flowchart.js code which the author has presumably ensured is logically correct, can be used to generate a “first draft” SVG document that an artist could rearrange on the page so that it looks nice, without changing the shape, labels or connectedness of the represented objects which are a matter of correctness and convention and which the artist should not be allowed to change.

Opening Up Access to Jupyter Notebooks: Serverless Computational Environments Using JupyterLite

A couple of weeks ago, I started playing with jupyterlite, which removes the need for an external Jupyter server and lets you use JupyterLab or RetroLab (the new name for the JupyterLab classic styled notebook UI) purely in the browser using a reasonably complete Python kernel that runs in the browser.

Yesterday, I had a go at porting over some notebooks we’ve used for several years for some optional activities in first year undergrad equivalent course. You can try them out here: https://ouseful-demos.github.io/TM112-notebooks-jupyterlite.

You can also try out an online HTML textbook version that does require an external server, in the demo case, launched on demand using MyBinder, from here: https://ouseful-course-containers.github.io/ou-tm112-notebooks/Location%20Based%20Computing.html

The notebooks were originally included in the course as a low-risk proof of concept of how we might make use of notebooks in the course. Although Python is the language taught elsewhere in the module, engagement with it is through the IDLE environment, with no dependencies other than the base Python install: expecting students to install a Jupyter server howsoever, was a no-no. The optional and only limited use of notebooks meant we could also prove a hosted notebook solution using JupyterHub and Kubernetes in a very light touch way: authentication via a Moodle VLE LTI link gave users preauthenticated access to a JupyterHub server from where students could run the provided notebooks. The environment was not persistent though: if students want to save their notebooks to work on them at a future time, they had to export the notebooks then re-upload in their next session. We were essentially running just a temporary notebook server. The notebooks were also designed to take this into account, i.e. that the activities should be relatively standalone and self-contained, and could be completed in a short study session.

To the extent that the rest of the university paid no attention, it would be wrong to class this as innovation. On the one hand, the approach we used was taken off-the-shelf (Zero to JupyerHub With Kubernetes), although some contributed docs did result (using the JupyterHub LTI authenticator with Moodle). The deployment was achieved very much as a side project, using personal contacts and an opportunity to deploy it outside of formal project processes and procedures and before anyone realised what had actually just happened. (I suspect they still don’t). On the other, whilst it worked and did the job required of it, it influenced nothing internally and had zero internal impact other than meeting the needs of several thousand students. And whilst it set a precedent, it wasn’t really one we managed to ever build directly from or invoke to smooth some later campaign.

As well as providing the hosted solution, we also made the environment available via MyBinder, freeloading on that service to provide students with an environment that they could access ex- of university systems. This is important because it meant, and means, that access remains available to students at the end of the course. Unlike traditional print based models of distance education, where students get a physical copy of course materials they can keep for ever, the online first approach that dominates now means that students lose access to the online materials after some cut-off point. So much for being able to dig that old box out of the loft containing your lecture notes and university textbooks. Such is the life of millenials, I guess: a rented, physical artefactless culture. Very much the new Dark Age, as artist James Bridle has suggested elsewhere.

But is there a better way? Until now, to run a Jupyter notebook has placed a requirement on being able to access a Jupyter server. At this point, it’s worth clarifying a key point. Jupyter is not notebooks. At its core, Jupyter is a set of protocols that provide access to arbitrary computational environments, that can run arbitrary code in those environments, and that can return the outputs of that code execution under a REPL (read-eval-print loop) model. The notebooks (or JupyterLab) are just a UI layer. (Actually, they’re a bit more interesting than that, as are ipywidgets, but that’s maybe something for another post.)

So, the server and the computational environment. The Jupyter server is the thing that provides access to the computational environment. And the computational environment has typically needed to run “somewhere else”, as far as the the notebook or JupyterLab UI is concerned. This could be on a remote hosted server somewhere in the cloud or provided by your institution, or in the form of an environment that exists and runs from your own desktop or laptop computer.

What JupyterLite neatly does is bring all these components into the browser. No longer does the notebook client, the user interface, need to connect to a computational environment running elsewhere, outside the browser. Now, everything can run inside the browser. (There is one niggle to this: you need to use a webserver to initially deliver everything into the browser, but that can be any old webserver that might also be serving any other old website.)

Now, I see this as A Good Thing, particularly in open online edcuation where you want learners to be able to do computational stuff or benefit from interactions or activities that require some sort of computational effort on the back end, such as some intelligent tutoring thing that responds to what you’ve just done. But a blocker in open ed has always been: how is that compute provided?

Typically, you either need the learner to install and run something — and this is something that does not scale well in terms of the amount of support you have to provide, because some people will need (a lot of!) support —or you need to host or otherwise provide access to the computational environment. And resource it. And probably also support user authentication. And hence also user registration. And then either keep it running, or prevent folk from accessing it from some unspecified date in the future.

What this also means is that whilst you might reasonably expect folk who want to do computing for computing’s sake to take enough of an interest, and be motivated enough to install a computing evironment of their own to work in, for folk who want to use use computing to get stuff done, or just want to work through some materials without having to already have some skills in installing and running software (on their own computer), it all becomes a bit too much, a bit too involved and let’s just not bother. (A similar argument holds when hosting software: the skills required to deploy and manage end-user facing software on a small network, for example, (think of the have-a-go teacher who looks after the primary school computer network), or the more considerable skills (and resource) required to deploy environments in a large university with lots of formalised IT projects and processes. When just do it becomes a project with planning and meetings and all manner of institutional crap, it can quickly become “just what is the point in even trying to do any this?!”

Which is where running things in the browser makes it easier. No install required. Just publish files via your webserver in the same way you would publish any other web pages. And once the user has opened their page in the browser, that’s you done with it. They can run the stuff offline, on their own computer, on a train, in the garden, in their car whilst waiting for a boat. And they won’t have had to install anything. And nor will you.

Try it here: https://ouseful-demos.github.io/TM112-notebooks-jupyterlite.

TJ Fragment: Sharing Desktop Apps Via Jupyter-Server-Proxy Et Al.

It’s been some time since I last had a play with remote desktops, so here’s a placeholder / round up of a couple of related Jupyter server proxy extensions that seem to fit the bill.

For those at the back who aren’t keeping up, jupyter-server-proxy applications are incredibly useful: they extent the Jupyter server to proxy other services running in the same environment. So if you have a Jupyter server running on example.url/nbserver/, and another application that publishes a web UI in the same environment, you can publish that application, using jupyter-server-proxy, via example.url/myapplication. As an example, for out TM351 Data Management and Analysis course, we proxy OpenRefine using jupyter-server-proxy (example [still missing docs].).

Applications that are published using a jupyter-server-proxy wrapper are typically applications that publish an HTML UI. So what do you do if the application you want to share is a desktop application? One way is to to share the desktop via a browser (HTML) interface. Two popular ways of doing this are:

  • novnc: an “open source VNC client – it’s is both a VNC client JavaScript library as well as an application built on top of that library”;
  • xpra: “an open-source multi-platform persistent remote display server and client for forwarding applications and desktop screens”.

Both of these applications allow you to share (Linux) desktop applications via a web browser, and both of them are available as jupyter-server-proxy extensions (subject to the correct operating system packages also being installed).

As far as novnc goes, jupyterhub/jupyter-remote-desktop-proxy will “run a Linux desktop on the Jupyter single-user server, and proxy it to your browser using VNC via Jupyter”. A TightVNC server is bundled with the application as a fallback if no other VNC server is available. One popular application used wrapped by several people using jupyter-remote-desktop-proxy is QGIS; for example, giswqs/jupyter-qgis. I used it to demonstrate how we could make a legacy Windows desktop application available via a browser by running using Wine on a Linux desktop and then sharing it via the jupyter-remote-desktop-proxy.

For xpra, the not very active (but maybe it’s stable enough?!) FZJ-JSC/jupyter-xprahtml5-proxy seems to allow you to “integrate Xpra in your Jupyter environment for an fast, feature-rich and easy to use remote desktop in the browser”. However, no MyBinder demo is provided and I haven’t had a chance yet to give this a go. (I have tried XPRA in other contexts, though, such as here: Running Legacy Windows Desktop Applications Under Wine Directly in the Browser Via XPRA Containers.)

Another way of sharing desktops is to use the Microsoft Remote Desktop Protocol (aka RDP). Again, I’ve used that in various demos (eg This is What I Keep Trying to Say…) but not via a jupyter-server-proxy. I’m not sure if there is a jupyter-server-proxy example out there for publishing a proxied RDP port?

Just in passing, I also note this recipe for a Docker compose configuration that uses a bespoke container to act as a desktop sharing bridge: Viewing Dockerised Desktops via an X11 Bridge, novnc and RDP, Sort of…. I’m not sure how that might fit into in Jupyter set up? Could a Jupyter server container be composed with a bridge container, and then proxy the bridge services?

Finally, another way to share stuff is to to use WebRTC. The maartenbreddels/ipywebrtc extension can “expose the WebRTC and MediaStream API in a Jupyter notebook/JupyterLab environment” allowing you to create a MediaStream out of an ipywidget, a video/image/audio file, or a webcam and use it as the bases for a movie, image snapshot or audio recording. I keep thinking this might be really useful for recording screencast scenes or other teaching related assets, but I haven’t fully grocked the full use of it. (Something like Jupyter Graffiti also falls into this class, which can be used to record a “tour” or walkthrough of a notebook that can also be interrupted by live interaction or the user going off-piste. The jupyterlab-contrib/jupyterlab-tour extension also provides an example of a traditional UI tour for JupyterLab, although I’m not sure how easy it is to script/create your own tours. Such a thing might be useful for guiding a user around a custom JupyterLab workspace layout, for example. [To my mind, workspaces are the most useful and least talked about feature of the JupyerLab UI….] More generally, shepherd.js looks interesting as a generic website tour supporting Javascript package.) What I’m not sure about is the extent to which I could share, or proxy access to, of a WebRTC MediaStream that could be accessed live by a remote user.

Another way of sharing the content of a live notebook is to use the new realtime collaboration features in JupyterLab (see the official announcement/background post: How we made Jupyter Notebooks collaborative with Yjs). (A handy spin-off of this is that it now provides a hacky workaround way of opening two notebooks on different monitors.) If you prefer more literal screensharing, there’s also yuvipanda/jupyter-videochat which provides a server extension for proxying a Jitsi (WebRTC) powered video chat, which can also support screen sharing.

Fragment: Software Decay From Inside and Out

Over the last couple of weeks, I’ve been dabbling with a new version of the software environment we use for our TM351 Data Management and Analysis course, bundling everything into a single monolithic docker container (rather than a more elegant docker compose solution because we haven’t yet figured out how to mount multiple personal volumes from a JupyterHub/k8s config),

Hmm… in a docker compose set up, where I mount a persistent volume onto container A at $SHAREDPATH, can I mount from a path $SHAREDPATH/OTHER in that container into another, docker compose linked container?

At the final hurdle, have fought with various attempts to build a docker container stack that works, I hit an an issue when trying to open a new notebook:

Crap.

The same notebook works fine in JupyterLab, so there is something wrong, somewhere, with launching notebooks in the classic Jupyter notebook UI.

Which made me a bit twitchy. Because the classic notebook is the one we use for teaching in several courses, and we use a wide variety of off-the-shelf extensions, as well as a range of custom developed extensions to customise our notebook authoring and presentation environment (examples). And these customisations are not available in JupyterLab UIs. For a related discussion, see this very opinionated post.

For folk who follow these things, and for folk who have a stake in the classic notebook UI, the question of long term support for the classic UI should be a consideration and a concern. Support for the classic notebook UI is not the focus area for the core Jupyter UI project developers effort.

And here’s another weak signal of a possible fork in the road:

The classic notebook user community, of which I consider myself a part, which includes education and could well extend into publishing more generally as tools like Jupyter Book mature even further, need to be mindful that someone needs to look after this codebase. And it would be a tragedy if that someone turned out to be someone who forked the codebase for their own (commercial) publishing platform. An Elsevier, for example, or a Blackboard.

Anyway, back to my 500 server error.

Here’s how the error starts to be logged in by the Jupyter server:

And here’s where the problem might be:

In a third party package that provides and an “export to docx (Microsoft Word)” feature implemented as a Jupyter notebook custom bundler.

Removing that package seemed to fix things, but it got me wondering about whether I should treat this as a weak signal of software rot in Jupyter notebook. I tweeted to the same effect — with a slight twinge of uncertainty about whether folk might think I was dissing the Jupyter community again! — but then started pondering about what that might actually mean.

Off the top of my head, it seems that one way of slicing the problem is consider rot that comes from two different directions:

  • inside: which is to say, as packages that the notebook depends on update, do things start to break inside the notebook server and its environment. Pinning package versions may help, or making sure you always run the notebook server in its own, very tightly controlled Python environment and always serve kernels from a separate environment. But if you do need to install other things in the same environment as the notebook server, and there is conflict in the dependencies of those things and the notebook server’s dependencies, things might break;
  • outside: which is to say, things that a user or administrator might introduce into the notebook environment to extend it. As in the example of the extension I installed that in its current version appears to cause the 500 server error noted above.

Note that in the case of the outside introduced breakage, the error for the user appears to be that something inside the notebook server is broken: for the user, they draw the system boundary around the notebook server and its extensions whilst for the developer (core notebook server dev, or the extension developer), they see the world a bit differently:

There are folk who make an academic career out of such concerns of course, who probably have a far more consdered take on how software decays and how software rot manifests itself, so here are a few starters for 10 that I’ve added to my reading pile (no idea how good they are: this was just a first quick grab):

  • Le, Duc Minh, et al. “Relating architectural decay and sustainability of software systems.” 2016 13th Working IEEE/IFIP Conference on Software Architecture (WICSA). IEEE, 2016.
  • Izurieta, Clemente, and James M. Bieman. “How software designs decay: A pilot study of pattern evolution.” First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007). IEEE, 2007.
  • Izurieta, C., Vetrò, A., Zazworka, N., Cai, Y., Seaman, C., & Shull, F. (2012, June). Organizing the technical debt landscape. In 2012 Third International Workshop on Managing Technical Debt (MTD) (pp. 23-26). IEEE.
  • Hassaine, S., Guéhéneuc, Y. G., Hamel, S., & Antoniol, G. (2012, March). Advise: Architectural decay in software evolution. In 2012 16th European Conference on Software Maintenance and Reengineering (pp. 267-276). IEEE.
  • Hochstein, Lorin, and Mikael Lindvall. “Combating architectural degeneration: a survey.” Information and Software Technology 47.10 (2005): 643-656.

jupyterlite — “serverless” Jupyter In the Browser Using Pyodide and WASM

Several years ago, a Mozilla project announced pyodide, a full Python stack, compiled to WebAssembly / WASM, running in the browser. Earlier this year, pyodide was spun out into its own community governed project (Pyodide Spin Out and 0.17 Release) which means it will now stand or fall based on its usefulness to the community. I’m hopeful this is a positive step, and it’ll be interesting to see how active the project becomes over the next few months.

Since then, a full scipy stack appeared, runnable via pyodide, along with the odd false start (most notably, jyve) at getting a Jupyter server running in the browser. Originally, pyodide had supported its own notebook client (indeed, had been created for it) but that project — iodide — soon languished.

As I haven’t really been Tracking Jupyter since summer last year, there are probably more a few projects ticking along that I missed the earliest of signs of and that have only now come to my attention through occasional mentions on social media that have passed my way.

One of these is jupyterlite (docs), “a JupyterLab distribution that runs entirely in the browser built from the ground-up using JupyterLab components and extensions”. It’s not classic notebook, but it does suggest there’s a running jupyter server available as a WASM component…

So why is this interesting?

To run a Jupyter notebook requires three things:

  • a client in the browser;
  • a Jupyter server to serve the client and connect it to a kernel process;
  • a computing environment to execute code in code cells (the kernel process).

If you access a hosted Jupyter environment, someone else manages the Jupyter server and computing environment for you. If you run notebooks locally, you need at least a Jupyter server, and then you can either connect to a remote kernel or run one locally.

To run a multi-user hosted server, you need to run the Jupyter server, and potentially also manage authentication, persistent storage for users to save their notebooks, and the compute backend to serve the kernel processes. This means you need infrastructure of the hard kind (servers, storage, bandwidth), and you become a provider of insfrastructure of the soft kind (jupyer notebooks as a service).

With Jupyter running in the browser, using something like Jupyterlite, all you need is a web server. Which you’re probably already running. The notebook server now runs in the browser; the kernel now runs in the browser; and the client (JupyterLab) continues to run in the browser, just as it ever did.

In JupyterLite, storage is provided by the local browser storgae, which means you need to work with a single browser. (With many browsers, such as Chrome, now offering browser synchronisation, I wonder if the local storage is synched too? If so, then you can work from any browser you can “log in” to to enable synchronisation services.)

To my mind, this is a huge win. You don’t need to host any compute or storage services to make interactive computing available to your users/students/learners: you just need a webserver. And you don’t even need to run your own: the jupyterlite demo runs using Github pages.

jupyterlite demo running on Github Pages

For open education, this means you can make a computing environment available, in the browser, using just a webserver, without the overhead, or security concerns, of running a compute backend capabale of running arbitrary, user submitted code.

So that’s one thing.

For learners running things locally, they just need a simple web server (I think this requires serving: clicking on an HTML document to open it in a browser may hit browser issues that expect content served with a particular MIME-type).

A simple web server is the sort of thing that can be easily packaged and distributed, but it still presents something of an overhead in terms os downloading, installing and then running the service.

Perhaps simpler would be distributing the application as a cross-platform electron app? As far as I know, jupyerlite isn’t (yet?!) packaged that way, but there is at least one demo out there of pyodide bundled inside an electron app: inureyes/pyodide-console (presentation). So it’s not hard to imagine bundling and distributing jupyterlite the same way, although the practicalities may prove fiddly. (Or they may not…)

“So what?”, you may say. “If we’re giving students access to anaconda anyway, what benefit does this bring?” Leaving aside the huge questions I have about using things like Anaconda, not least their lack of generality compared to distributing environments using docker containers, for example, and notwithstanding the ability to be able to provide computing environments purely withing the browser as noted earlier, the availablity of a Jupyter server and Jupyter kernel running in the browser makes other things possible, or at least, allows us to entertain the idea of other applications with a view to seeing if they are realisable.

Hmm… maybe chrome can provide the webserver itself?

So what might those things be? Off the top of my head, and without any serious thought at all, several things come immediately to mind.

Firstly, the possibilty of the cross-platfrom electron distribution (which is essentially an application container wrapping a chrome browser and a simple web server).

Secondly, and nosing around a little, there are already VS Code extensions that seem to be riffing on using jupyterlite too; so if you have access to VS Code in the browser, you could perhaps also install a pyodide jupyter kernel and run notebooks using that in VS Code in the browser. (I’m not sure if you can host VS Code using just a simple web server or it needs a nodejs server app?)

Thirdly, it’s not hard to imagine a route towards making interactive books avaliable and served just via a web browser. For example, a jupyter book UI where rather than  having to hook up to a remote jupyter server to run (editable) code cells from the page using thebelab you could just run the cells against the WASM run kernel in the browser. (What would be required to make a thebelab like javaascript package that would allow a Jupyter Book to connect to a jupyterlite server running from the same browser tab?) It would then be possible publish a fully interactive textbook using just a simple web server and no other dependencies. The only piece missing from that jigsaw would be a Jupyter Book extension to allow you to save edited code cells into browser storage; and maybe also add in some means of adding / editing additional html cells (then at a later date adding support for markdown, perhaps).

The availability of a thebelab like package to connect to an “in page” Jupyter environment also means we can support on demand executable code from any code bearing HTML page, such as content pages with code examples in a VLE web page, and without the need for backend server support.

Finally, institutionally, jupyterlite makes it possible to publish a simple Jupyter environment directly from the VLE as a “simple” html page, with no compute backend/traditional Jupyter hosting requirement on the backend. The compute/storage requirement must be provided by the end user in the form of a recent browser and a computer that can cope with running the WASM enviornment inside it.

Related: Fragment – Jupyter Book Electron App.

On the WatchList: VisualPython

A fragmentary note to put a watch on Visual Python, a classic Jupyter notebook extension (note that.. a classic notebook extension) to support visual Python programming:

It’s a bit flaky at the moment — the above screenshot shows multiple previews of the selected function code, and the function preview doesn’t properly render things like the function arguments (nor could I get the function to appear in the list of user defined functions), but it’s early days yet.

At first, a blocker to me in terms of suggesting folk internally have at a look at it right now included the apparant inability to define a variable by visual means (all I wanted to do was set a=1) or clear the notebook cells when I wanted to reflow the visual program into the notebook code cell area.

But then I twigged that rather than trying to create a complete program using the visual tools, a better way of using visualpython might be as a helper to code fragments for me in particular use cases.

In the example below, I created the dataframe manually and then used the editor to create a simple plot command that could be inserted into a notebook code cell. The editor picked up on the dataframe I had defined and used that to prepopulate selection lists in the editor.

If the environment becomes the plaything of devs looking to put complex features into the environment, seeing it as a rich power tool for them (and contra to their beliefs, an increasingly hostile environment to novices as more “powerful” features are added and more visual clutter to the environment to scare the hell of users with things that at irrelevant), then the basic usability required for a teaching and learning environment will be lost if users see it as a tool for creating complete programs visually.

For the developers, it’s all too easy to see how the environment could become as much a toy for adding yet more support for yet more packages that can be demonstrated in the environment but never used (because the power users actually prefer using autocomplete in a “proper IDE”) rather than being simplified for use by novices with very, very, very simple programming demands; (just think of the two, three, four lines of code examples that fill the pages of introductory programming text books).

If folk do want a visual editor for data related programming, wouldn’t they use something like Orange, enso, or the new JupyterLab based orchest?

orchest-0.3.0-demo
orchest: JupyterLab visual pipleine programming environment

But if you see the Visual Python editor as a tool at the side that essentially operationalises documentation lookup in a way that helps you create opinionated code fragments, where the opinion is essentially a by prodcut of the code templates that are used to generate code from particular visual UI selections, then I think it could be useful as a support tool for creating code snippets, not as an authoring tool for writing a more complete program or computational analysis.

So what will I be watching for? User uptake (proxied by mentions I see of it), some simple documentation, and perhaps a two minute preview video tour (I’m not willing to spend my time on this right now because I think it needs a bit more time in the oven…). The usability should improve as novices get confused and raise issues with how to perform the most basic of tasks and as the noosphere finds a way to conceptualise the sort of usage patterns and workflows that VisualPython supports best.

My intial reaction was a bit negative — it’s too visually complex already for novices, and some really basic usability issues and operations are either missing or broken, if you see it as an editor for creating complete programs. But if you view it as a code generating documentation support tool that lets you hack together a particular code fragment with visual cues that you might otherwise pick up from from documentation, documentation code examples or simple tutorials, then I think it could be useful.

Hmmm… Another thing to try to get my head round in the context of generative workflow tools…

Show Your Working, Check Your Working, Check the Units

One of the common refrains in maths, physics and engineering education is to “show your working” and “check your working”. In physics and engineering, “check the units” is also commonly heard.

If you have a calculation to do, show the algebraic steps, then substitute in the numbers as part of the working, showing partial results along the way.

As I slowly start to sketch out examples of how we can use one piece generative document workflows to both create educational materials, and, by sharing the tools of production with learners in the guise now of a mechanical tutor, support self-checking, worked equations, and checking your working, both seem to provide good examples of how to demonstrate this sort of practice.

The handcalcs python package provides a simple but effective way to to write simple mathematical expressions and then automate the production of a simple worked example.

handcalcs worked example

In phyics and engineering settings, dimensional analysis can provide a powerful shortcut to checking that a derived equation produces a thing of the correct dimension: was it V=IR, or V=I/R? A quick dimensional analysis, if you know your SI units, can help check.

There are several packages out there that provide units of measurements that can be used to type numerical values with particular units. The forallpeople package is onesuch, and also happens to play nicely with handcalcs.

Another handy benefit of a good units of measurement package, for production as well as student self-checking, is the mechanical support for expressing the units in appropriate form, given the magnitude of an expressed quantity:

Demonstration of forallpeople units of measurement

Current SubjectMatterNotebooks example: https://opencomputinglab.github.io/SubjectMatterNotebooks/maths/worked-equations.html

Fragment: Factory Edjucashun

A not thought out at all fragment that came to mind reading the first few pages of Taiichi Ohno’s Toyota Production System book last night.

First up, if we have open book assessment, then for efficient students motivated solely by accreditation, everything in course material that does not directly help get marks in the assessment is waste. Which made me wonder: to get better coverage of the course material so huge chunks of it aren’t waste in a particular presentation, we should have different assessment for different students that cover the whole of the curriculum over the student body? As to why we don’t do this, I suspect “quality” (standardisation) is the answer: by giving everyone the same assessment, we can get a deviation of marks to find who the good and bad outliers are. And we also get to fiddle the distributions to fix questions, or markers, that didn’t seem to work so well by manipulating the stats…

Secondly, if teaching universities are factories working on the raw material that is a student, what’s the output? A standard product from each course where each student can produce the same function, albeit with a range of tolerances? Or a material tranformation where the same processing or transformation steps have been applied to materials of varying quality (different students with different interests, skills, resources, ability, etc?)