Towards One (or More!) Institutionally Hosted Jupyter Services

When we first presented a module to students that used Jupyter notebooks (which is to say, “IPython notebooks” as they were still referred to back in early 2016) we were keen that students should also be able to access an online hosted solution as an alternative. Last year, we provided an on-request hosted service for the data management module, but it was far from ideal, running as it did an environment that differed from the environment we provided to students via a Docker container.

This year, we are offering a more complete hosted service for the same module that makes use of pretty much the same Dockerised environment that the students can run locally (the difference is that the hosted solution also bundles the JupyterHub package and a shim to provide a slightly different start-up sequence).

The module is hosted by the schools specialist IT service using a customised zero2kubernetes deployment route developed by an academic colleague, Mark Hall, as part of an internal scholarly research project exploring the effectiveness of using a hosted environment for a web technology course. (Note that: the developer is an academic…)

The student user route is to click on an LTI auth link in the Moodle VLE that takes ths students to an “available images” list:

On starting an image, an animated splash screen shows the deployment progress. This can take a couple of minutes, depending on the deployment policy used (e.g. whether a new server needs to be started or whether the container can be squeezed onto an already running server.

Cost considerations play a part here in determining resource availability, although the data management module runs on local servers rather than commercial cloud servers.)

Once the environment has started up and been allocated, a startup sequence checks that everything is in place. The container is mounted against a persistent user filespace and can deploy files stashed inside the container to that mounted volume on first run.

Once again, there may be some delay as any required files are copied over and the actual user environment services are started up:

When the user environment is available, for example, a Jupyter notebook environment, the student is automatically redirected to it.

The Control Panel link takes the student back to the server image selection page.

In terms of user permissions, the containers are intended to run the user as a permission limited generic user (i.e. not root). This has some downsides, not least in terms of slightly increasing the complexity of the environment to ensure that permissions are appropriately set on the user. Also note that the user is not a unique user (e.g. all users might be user oustudent inside their own container, rather than being their own user as set using their login ID).

The Docker image used for the data management course is an updated version of the image used for October 2020.

A second module I work on is using the same deployment framework but hosted by the academic developer using Azure servers. The image for that environment is an opinionated one constructed using a templated image building tool, mmh352/ou-container-builder. The use of this tool is intended to simplify image creation and management. For example, the same persistent user file volume is used for all launched computational environments, so care needs to be taken that an environment used for one course doesn’t clobber files or environment settings used by another course in the persistent file volume.

One thing that we haven’t yet bundled in our Jupyter containers is a file browser; but if students are mounting persisting files against different images, and do want to be able to easily upload and download files, I wonder if it makes sense to start adding such a service vie a Jupyter server proxy wrapper. For example, imjoy-team/imjoy-elfinder [elfinder repo] provides such a proxy off-the-shelf:

The above approach, albeit variously hosted and resourced, uses JupyterHub to manage the (containerised) user environments. A second approach has been taken by central IT in their development of a generic Docker container launch.

I note that you can use JupyterHub to manage arbitrary services running in containers, not just Jupyter notebook server environments, but from what I can tell, I don’t think that central IT even evaluated that route.

As with the previous approach, the student user route in is via an LTI authentication link in the browser. The landing page sets up the expectation of a long wait…

Telling a student who may be grabbing a quick 30 minute study session over lunch that they must wait up to 10 minutes for the environment to appear is not ideal… Multiply that wait by 1000 students on a course, maybe two or three times a week for 20+ weeks, and that is a lot of lost study time… But then, as far as IT goes, I guess you cost it as “externality”…

To add to the pain, different containers are used for different parts of the module, or at least, different containers are used for teaching and assessment. Since a student can only run one container at a time, if you start the wrong cotnainer (and wait 5+ minutes for it to start up) and then half to shut it down to start the cotnainer you meant to start), I imagine this could be very frustrating…

As well as the different Jupyter environment containers, a file manager container is also provided (I think this can be run at the same time as one of the Jupyer container images). Rather than providing a container image selection UI on an integrated image launcher page, separate image launching links are provided (and need to be maintained) in the VLE:

The file browser service is a containerised version of TinyFileManager [repo]:

In the (unbranded) Jupyter containers, the environment is password protected using a common Jupyter token:

The (unbranded) environment is wrapped in an IFrame by default:

However, if you click on the Jupyter link you can get a link to the notebook homepage without the IFrame wrapper:

In passing, I note that in this IT maintained image, the JupyterLab UI is not installed, which means students are required to use the notebook UI.

It’s also worth noting that these containers are running on GPU enabled servers. Students are not provided with a means of running environments locally because some of the activities require a GPU if they are to complete in a timely fashion.

In passing, I note that another new module that started this year that also used Jupyter notebooks does not offer a hosted solution but instead instructs students to download and install Anaconda and run a Jupyter server that way. This module has very simple Python requirements and I suspect that most, if not all, the code related activities could be executed using a JupyterLite/Pyodide kernel run via WASM in the browser. In many cases, the presentation format of the materials (a code text book, essentially) are also suggestive of a Jupyter Book+Thebe code execution model to me, although the the ability for students to save any edited code to browser storage, for example, would probably be required [related issue].

One final comment to make about the hosted solutions is that the way they are accessed via an LTI authenticated link using institutional credentials prevents a simple connection to the Jupyter server as a remote kernel provider. For example, it is not possibly to trivially launch a remote kernel on the hosted server via a locally running VS Code environment. In the case of the GPU servers, this would be really useful because it would allow students to run local servers most of the time and then only access a GPU powered server when required for a particular activity.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: