Fragment: Bring Your Own Infrastructure (BYOI)

Over the years I posted various fragmentary thoughts on delivering software to students in BYOD (bring your own device) environments (eg Distributing Software to Students in a BYOD Environment from 5 years ago,  BYOA (Bring Your Own Application) – Running Containerised Applications on the Desktop from 3 years ago, or Rethinking: Distance Education === Bring Your Own Device? yesterday).

Adding a couple more pieces to the jigsaw, today I notice this Coding Enviroment Landing Page at the University of Colorado:

The environment appears to be a JupyterHub environment bundled with VSCode inside using the jupyter_codeserver_proxy extension and the draw.io picture editor bundled as a JupyterLab extension.

Advice is also given on running arbitrary, proxied web apps within a user session using Jupyter server proxy (Proxying Web Applications). This is a great example of one of the points of contention I have with Jim Groom, “Domain of Your Own” evangelist, and that I’ve tried to articulate over the years (not necessarily very successfully), several times previously (eg in Cloudron – Self-Hosted Docker / Containerised Apps (But Still Not a Personal Application Server?) or Publish Static Websites, Docker Containers or Node.js Apps Just by Typing: now): in particular, the desire to (create,) launch and run applications on a temporary per session basis (during a study session, for the specific purposes of launching and reading an interactive paper in a “serverless” way, etc).

The Colorado example is a really nice example of a simple multi-user environment that can be used to support student computing with an intelligent selection of tools bundled inside the container. (I’m guessing increasing numbers of universities offer similar services. Anyone got additional examples?)

Another jigsaw piece comes in the form of eduID, a federated Swedish identity service that students can use to sign in to their university services, whichever university they attend. One advantage of this is that you can create an identity when you start a university application process, and retain that identity throughout an HE career, even if you switch institution (for example, attending one as an undergrad, another as a postgrad). The eduID can also be linked to an Orcid ID, am international identifier shceme used to identify academic researchers.

What  eduID does then, is provide you with an identity that can be registered with an HE provider and used to access that HEI’s services. Your identity is granted, and grants you, access to their services.

So. Domain of Your Own. Hmmm… (I’ve been here before…) Distance education students, and even students in traditional universities, often study on a “bring your own device” basis. But what if that was an “Infrastructure of Your Own” basis? What would that look like?

I can imagine infrastructure being provide in various ways. For example:

  1. identity: a bring-your-own-identity service such as eduID;

  2. storage: I give the institution access to my Dropbox account or Google Drive account or Microsoft Live Onebox, or something like a personal SparkleShare or Nextcloud server; when I load a personal context on an institutional service, if there is a personal user file area linked to it, it synchs to my remote linked storage;

  3. compute: if I need to install and run software as part of my course, I might normally be expected to install it on my own computer. But what if my computer is a spun-up-on-demand server in the cloud?

(It may also be worth trying to compare those to the levels I sketched out in a fragment from a year ago in Some Rambling Thoughts on Computing Environments in Education.)

I’m absolutely convinced that all the pieces are out there to support a simple web UI that would let me log-in to and launch temporary services on on-demand servers (remote or local) and link and persist files I  was working on using those services to a personal storage server somewhere. And that all it would take is some UI string’n’glue to pull them together.

PS Security wise, something like Tailscale (via @simonw) looks interesting for trying to establish personal private networks between personally hosted services.

PPS anyone else remember PLEs (personal learning environments) and distributed, DIY networked service oriented architecture ideas that were floating around a decade or so ago?

On Stats, and “The Thing”…

Although I work from home pretty much all of the time anyway, I’m rubbish in online meetings and tend to get distracted, often by going on a quick web trawl for things related to whatever happens to be in mind just as the meeting starts…

So for example, yesterday, I started off wondering about mortality stats relating to “the thing”. Public Health England publish a dashboard with some numbers and charts on it but it’s really hard to know what to make of the numbers. You can also get COVID-19 Daily Deaths from NHS England reported against the following constraints:

All deaths are recorded against the date of death rather than the date the deaths were announced. Interpretation of the figures should take into account the fact that totals by date of death, particularly for most recent days, are likely to be updated in future releases. For example as deaths are confirmed as testing positive for COVID-19, as more post-mortem tests are processed and data from them are validated. Any changes are made clear in the daily files.

These figures will be updated at 2pm each day and include confirmed cases reported at 5pm the previous day. Confirmation of COVID-19 diagnosis, death notification and reporting in central figures can take up to several days and the hospitals providing the data are under significant operational pressure. This means that the totals reported at 5pm on each day may not include all deaths that occurred on that day or on recent prior days.

These figures do not include deaths outside hospital, such as those in care homes. This approach makes it possible to compile deaths data on a daily basis using up to date figures.

If the conditions for adding counts to the tally for covid deaths are people who test positive post mortem, then the asymptomatic boy racer who kills themself in a motorbike accident whilst making use of open roads will count, but that’s not really the sort of number we’re interested in.

The ONS (Official of National Statistics) look like they’re trying to capture the number of deaths where “the thing” is the likely cause of death (Counting deaths involving the coronavirus (COVID-19)), adding a Covid19 tab to the weekly provisional mortality stats release along with a related bulletin and covid19 deaths breakout collection:

I don’t think these stats are even labelled as “experimental”, so from my humble position I think the ONS should be commended on the way they’ve managed to pull this new release together so quickly, albeit with a lag in the numbers that results from due process around death certificate registration etc.

One thing that is perhaps unfortunate is that the NHS weekly winter sitreps stopped a few weeks ago; these stats track several hospital and critical care related measures at the hospital level, but they’re only released a few months a year. Whilst continuing the release would have added a burden, I think a lot of planners may have found them useful. (I hope the planners who really need them have access to them anyway.) By the by, some daily data collections relating to managing “the thing” were described in a letter from the PHE Incident Director at Public Health England’s National Infection Service on March 11th.

I also note the suspension of various primary care and secondary care data collections, which means that spotting various side effects of the emergency response activity may not be obvious for some time.

As far as the ONS go, they have published their own statement on ensuring the best possible information during COVID-19 through safe data collection as well as a special ONS — Coronavirus (COVID-19) landing page for all related datasets they are able to release.

[April 2nd, 2020: the ONS have also started publishing several more “faster” society and economic indicators.]

Pondering the ONS response, I started wondering whether there’s a history anywhere of the genesis and evolution of each ONS statistical measure. In my “listen to the online meeting as radio while doing an unrelated web trawl”, I turned up a few potential starting points relating the the history of official stats, including this presented paper on the evolution of the United Kingdom statistical system, which looks like it might have appeared in published form in this (special issue?) of the Statistical Journal of the IAOS – Volume 24, issue 1,2 .

The Royal Statistical Society’s StatsLife online magazine also has a History of Statistics Section tag which pulls up more possibly useful starting points…

Broadening my search a little, I also found this briefing on sources of historical statistics from one of my favourite sources ever, the House of Commons Library; in turn this led to a more general web search on "sources of statistics" site:commonslibrary.parliament.uk/research-briefings which turns up a wealth of briefings on sourcing UK stats in particular subject areas.

And finally, for anyone out there who does have proper skills in the area and the ability to commit resource, there are various initiatives out there looking for volunteers. In particular:

Others with less specific or specialist skill might consider many of the other opportunities for technical sprints in communities that might typically lack access to cognitive surplus in developer communities. For example, this initiative exploring and developing Digital Tools for churches during the Coronavirus. (Don’t let “churches” put you off: for “church” read “body of people” or “community” and go from there…)

Rethinking: Distance Education === Bring Your Own Device?

In passing, an observation…

Many OU modules require students to provide their own computer subject to a university wide “minimum computer specificiation” policy. This policy is a cross-platform one (students can run Windows, Mac or Linux machines) but does allow students to run quite old versions of operating systems. Because some courses require students to install desktop software applications, this also means that tablets and netbooks (eg Chromebooks) do not pass muster.

On the module I work primarily on, we supply students with a virtual machine preconfigured to meet the needs of the course. The virtual machine runs on a cross-platform application (Virtualbox) and will run on a min spec machine, although there is a hefty disk space requirement: 15GB of free space required to install and run the VM (plus another 15-20GB you should always have free anyway if you want your computer to keep running properly, be able to install updates etc.)

Part of the disk overhead comes from another application we require students to use called vagrant. This is a “provisioner” application that manages the operation of the VirtualBox virtual machine from a script we provide to students. The vagrant application caches the raw image of the VM we distribute so that fresh new instances of it can be created. (This means students can throw away the wokring copy of their VM and create a fresh one if they break things; trust me, in distance edu, this is often the best fix.)

One of the reasons why we (that is, I…) took the vagrant route for managing the VM was that it provided a route to ship VM updates to students, if required: just provide them with a new Vagrantfile (a simple text file) that is used to manage the VM and add in an update routine to it. (In four years of running the course, we havenlt actually done this…)

Another reason for using Vagrant was that it provides an abstraction layer between starting and stopping the virtual machine (via a simple commandline command such as vagrant up, or desktop shortcut that runs a similar command) and the virtual machine application that runs the virtual machine. In our case, vagrant instructs Virtualbox running on the student’s own computer, but we can also create Vagrantfiles that allow students to launch the VM on a remote host if they have credentials (and credit…) for that remote host. For example, the VM could be run on Amazon Web Services/AWS, Microsoft Azure, Google Cloud, Linode, or Digital Ocean. Or on an OU host, if we had one.

For the next presentation of the module, I am looking to move away from the Virtualbox VM and move the VM into a Docker container†. Docker offers an abstraction layer in much the same way that vagrant does, but using a different virtualisation model. Specifically, a simple Docker command can be used to launch a Dockerised VM on a student’s own computer, or on a remote host (AWS, Azure, Google Cloud, Digital Ocean, etc.)

We could use separate linked Docker containers for each service used in the course — Jupyter notebooks, PostgreSQL, MongoDB, OpenRefine — or we could use a monolithic container that includes all the services. There are advantages and disadvantages to each that I really do need to set down on paper/in a blog post at some point…

So how does this help in distance education?

I’ve already mentioned that we require students to provide a certain minimum specification computer, but for some courses, this hampers the activities we can engage students in. For example, in our databases course, giving students access to a large database running on their own computer may not be possible; for an upcoming machine learning course, access to a GPU is highly desirable for anything other than really simple training examples; in an updated introductory robotics module, using realistic 3D robot simulators for even simple demos requires access to a gamer level (GPU supported) computer.

In a traditional university, physical access to computers and computer labs running pre-installed, university licensed software packages on machines capable of providing them for students who can’t run the same on their own machines may be available.

In my (distance learning) institution, access to university hosted software is not the norm: students are expected to provide their own computer hardware (at least to minimum spec level) and install the software on it themselves (albeit software we provide, and software that we often build installers for, at least for users of Windows machines).

What we don’t do, however, is either train students in how to provision their own remote servers, or provide software to them that can easily be provisioned on remote servers. (I noted above that our vagrant manager could be used to deploy VMs to remote servers, and I did produce demo Vagrantfiles to support this, but it went no further than that.)

This has made me realise that we make OUr distance learning students pretty much wholly responsible for meeting any computational needs we require of them, whilst at the same time not helping them develop skills that allow that them to avail themselves of self-service, affordable, metered remote computation-on-tap (albeit with the constraint of requiring a netwrok connection to access the remote service).

So what I’m thinking now is that now really is the time to start upskilling OUr distance learners, at least in disciplines that are computationally related, early on and in the following ways:

  1. a nice to have — provide some academic background: teach students about what virtualisation is;

  2. an essential skill, but with a really low floor — technical skills training: show students how to launch virtual servers of their own.

We should also make software available that is packaged in a way that the same environment can be run locally or remotely.

Another nice to have might be helping students reason about personal economic consequences, such as the affordability of different approaches in their local situation, which is to say: buying a computer and running things locally vs. buying something that can run a browser and run things remotely over a network connection.

As much as anything, this is about real platform independence, being open as to, and agnostic of, what physical compute device a student has available at home (whether it’s a gamer spec desktop computer or a bottom of the range Chromebook) and providing them with both software packages that really can run anywhere and the tools and skills to help students run them anywhere.

In many respects, using abstraction layer provisioning tools like vagrant and Docker, the skills to run software remotely are the same as running them locally, with the additional overhead that students have a once only requirement to sign up to a remote host and set up credentials that allow them to access the remote service from the provisioner service that runs on their local machine.

Simple 2D ev3devsim Javascript Simulator Running as an ipywidget in Jupyter Notebooks

So…

…for a course revision upcoming, I’ve been tweaking a thing.

The thing is ev3devsim [repo], a Javascript powered 2 robot simulator that allows you to execute Python code, via Skulpt, in the browser to control a simple simulated robot.

The Python package used to control the robot is a skulpt port of ev3dev-lang-python, a Python wrapper for the ev3dev Linux distribution for Lego Ev3 robots. (Long time readers may recall I explored ev3dev for use in an OU residential school way back when and posted a few related notebooks.)

Anyway… we want to use Python in the module revision, the legacy activities we want to update look similar, ish, sort of, almost, we may be able to use some of them, and I want to do the activities via Jupyter notebooks.

So I’ve had a poke around and think I’ve managed to make the fumblings of a start around an ipywidget wrapper for the simulator that will allows us to embed it in a notebook.

Because I don’t understand ipywidgets at all, I’m using the jp_proxy_widget, which I first played with in the context of wrapping wavesurfer.js (Rapid ipywidgets Prototyping Using Third Party Javascript Packages in Jupyter Notebooks With jp_proxy_widget).

Here’s where I’m at [nbev3devsim; Binder demo available, if the code I checked in works!]:

The first thing to notice is that the terminal has gone. The idea is that you write the code in a code cell and inject it into the simulator. My model for doing this is via cell block magic or by passing code in a variable into the simulato (for generality, I should probably also allow a link to a .py file).

The cell block magic still needs some work, I think; eg a temporary alert with a coloured backrgound to say “code posted to simulator” that disappears on its own after a couple of seconds.) I probably also need an easy  way to preview the code currently assigned to the simulated robot.

You might also notice a chart display. This is actually a plotly streaming line chart that updates with sensor values (at the moment, just the ultrasound sensor; other sensors have different ranges, so do I scale those, or use different charts perhaps?)

There is also an output window your code can print messages to, as the following hello-world magic shows:

We can read state out of the simulator, though the way the callback work this seems to require running code across two cells to get the result into the Python environment?

I’ve also experimented with another approach where the widget’s parent object grabs (or could be regularly updated to mirror, maybe?) logged sensor readings from inside the simulator, which means I can interrogate that object, even as the simulator runs. (I also started to explore using a streaming dataframe for this data, but I’m not convinced that’s the best approach; certainly trying to stream logged data from the simulator into a streaming chart in the notebook context is laggy compared to the chart embedded in the simulator context.)

With the data in the Python context, we can do what we like with it, Like chart it etc.

There’s a lot of tweaks that need to be made and things to be added to run the full complement of activities we ran in the original presentation of the course.

I’d already started to explore what’s required to add Python functions to skulpt (eg Simple Text to Speech With Skulpt), although I’m not sure if that’s blocking (could it be handled asynchronously if so?) and today managed to learn enough from this SO answer on making objects draggable to make the robot draggable on the canvas (I think; a PR is here but not tested in a fresh/isolated environment yet (I only made a PR to give me s/thing to link to here!); the biggest issue I had was converting mouse co-ordinates to robot world canvas co-ordinates. There are still issues there, eg in getting the size of the robot correctly, but the co-ordinate management in the simulator looks a bit involved to me and I want to get my head round it enough that if I do start trying to simplify things, I don’t break other things!)

Other things that really need adding:

  • ability to reset canvas in one go;
  • ability to rotate robot using mouse;
  • ability to add noise to motors and sensors;
  • configure robot from notebook code cell rather than simulator UI? (This could also be seen as an issue about whether to strip as much out of the widget as possible.)
  • predefine sensoble robot configurations; (can we also have a single, centreline front mounted light sensor?)
  • add pen-up / pen-down support (perhaps have a drawing layer in the simulator for this?)
  • explore supporting multiple simulators embedded in one notebook (currently it’s at most one, I suspect in large part becuase of specific id values assigned to DOM elements?)

The layout is also really clunky, the main issue being how to see the code against the simulator (if you need to). Two columns might be better — notebook text and code cells in one, the simulator windows stacked in the other? — but then again, a wide simulator window is really useful. A floatinging / draggable simulator window might be another option? I did thing the simulator window might be tear-offable in JupyterLab, but I have never managed to successfully tear off any jp_proxy_widget in JupyterLab (my experiences using JupyterLab for anything are generally really miserable ones).

The original module simulator allowed you to step through the code, but: a) I don’t know if that would be possible; b) I suspect my coding knowledge / skills aren’t up to it; c) I really should be trying to write the activities, not sinking yet more time into the simulator. (One thing I do need to do is see if any of the code I wrote years ago when scopting things for the residential school is reusable, which could save some time…)

I also need to see if the simulator is actually any good for the practical activities we used in the original version of the course, or whether I need to write a whole new set of activities that do work in this simulator… Erm…

Notes in Advance of a Meeting About The Possibility of Getting an Institutional Jupyter Server Up and Running

Notes on Jupyter Deployments in an OU Context.

Notes made in advance of an internal workshop to discuss supporting “Jupyter notebooks” in the OU.

My intention is to split the content over several documents.

Needless to say, I think “notebooks” are both: a) not really the point; b) offer way more potential for doing all sorts of things than folk might think.

pandoc -o output.docx -f markdown -t docx filename.md

The content is split into three main sections:

  • Current Jupyter Deployments in the OU
  • Architectural Models
  • Use Cases for Jupyter Services

Current Jupyter Deployments in the OU

Jupyter notebooks are already used in several OU courses. The following summarises what I’ve managed to learn, or imagine may be the case. Folk involved with the respective modules may well disagree.

TM351

TM351 is a 30 point, third level module on data management and analysis. Approximately 50% of study is spent working on practical activities delivered via Jupyter notebooks.
Essential requirements:

  • notebook server that allows students to complete notebook based activities in a Python environment;
  • ability to create and run new Python backed notebooks;
  • pre-configured Python computational environment with preinstalled packages (Python and Linux package dependencies);
  • access to PostgreSQL server with permissions to add and delete users, roles, databases, tables; read and write to tables from notebooks;
  • access to MongoDB server with read permissions on a seeded database; permissions to create / read / write / delete databases and collections from notebooks;
  • access to OpenRefine application;
  • access to public internet addresses in order to download data files from arbitrary URLs;
  • ability to persist notebooks;
  • ability to access all service GUIs through a web browser;
  • ability to work on a student’s own computer in an offline mode;
  • ability to work cross-platform (Windows, Mac, Linux).
    Desirable Requirements:

  • ability to backup and restore databases;

  • ability to take away the complete computing environment so it can be run ex- of the OU;
  • ability to access the environment on a remote host (eg an OU hosted solution); (this solution would in turn create requirements based on scaleability, affordability, peak load, resource (processor, memory, storage, bandwidth), uptime etc.)
  • ability to access the environment from a terminal / command line;
  • side effect free on user desktop (i.e. the environment should not clash with any services already running on the student’s computer; the environment should not require changes to the student’s computer; any applications installed should be capable of being uninstalled cleanly).

Optional Requirements:

  • headless operation (access to desktop applications inside the provided environment is not required);
    Solution (16B, 16J-19J):

A VirtualBox virtual machine (VM) managed using vagrant provides a self-contained, preconfigured environment running all required applications. User files are mounted into the VM from the user desktop and saved out of the VM back onto the desktop.

Proposed Solution (20J+):

A Docker container (rather than VirtualBox VM) built using repo2docker, runnable via Docker Desktop / ContainDS on a student computer or on a remote host either as a standalone service or via JupyterHub.

Requirements not met:

  • side effects (VirtualBox and vagrant must both be installed; on Windows, virtualisation may need enabling, HyperV needs disabling. A fix to the latter would be to ship a VM built natively for HyperV);
  • hosted solution: an OU hosted solution is not available. DIY solutions for students to self-host on Azure, AWS, Digital Ocean are provided.

TM112

TM112 is a 30 point, first level module that in part provides an introduction to Python programming. Whilst the Python environment used for most of the activities is a simple, user-installed Python environment without embellishment, an optional “notebook experience” activity is also provided to deliver enrichment activities.
Essential requirements:

  • ability to run provided Jupyter notebooks within a single study session (no requirement to persist changed notebooks);
  • install-free / hosted solution;
  • scaleable (capable of coping with peak demand: 1k concurrent users)
  • affordable (<£X per student presentation, <£Y per student per activity);
  • available (24hr, instant / on-demand access over activity period);

Desirable Requirements:

  • enforced spend limit for autoscaled delivery;
  • authenticated access (without additional sign-on requirements) from inside Moodle VLE.
    Solution (18J, 19B, 19J):

JupyterHub+Kubernetes running prebuilt Docker container on Microsoft Azure. Disposable single user notebook environments launched on demand from a preconfigured, LTI authenticating link on a module webpage in Moodle VLE. Docker container also runs via MyBinder or on local machine with Docker installed.

S818

S818 is a masters level module on space science. Students are required to use the Python programming language to complete a small numbet of programming activities in supplied notebooks.

TMA01 and require students to use simple notebook computations and TMA02 requires students to complete calculations that might sensibly be computed in a notebook context.

Essential requirements:

  • ability to run and edit provided notebooks in a Python / pandas environment;
  • ability to create and run new notebooks;
  • ability to persist created / edited notebooks.

Desirable Requirements:

  • None?

Solution (??, 18B, 19B, 20B):

Students are referred to the OpenLearn Learn to Code for Data Analysis course which recommends installing and Anancoda scientific Python environment locally. This environment includes a local notebook server; the scipy stack, including pandas, is available as part of the default Anaconda environment.

Planned Deployments

Jupyter notebooks are planned for use in several modules currently in production, including:

M269 (new edition) Algorithms

Essential requirements:
– uploading data files and notebooks and run them, like current TM351 server
– have NetworkX installed

Desirable requirements:
– direct link from VLE, not a separate login
– ability to create image before start of module with already all notebooks and TMAs
– direct submission, marking and return of TMAs via nbgrader
– jupyterlab
– remember open notebooks and files from last session

_Intended solution:__ online hosted notebook environment; Anaconda for local use.

TM358 Machine Learning

Essential requirements:

  • ability to run GPU powered kernels;
  • pre-configured Python computational environment with preinstalled packages (Python and Linux package dependencies);
  • access to public internet addresses in order to download data files from arbitrary URLs;
  • ability to persist notebooks;
  • ability to access all service GUIs through a web browser;
  • ability to work cross-platform (Windows, Mac, Linux).
  • access to data storage for large datasets (10s Mb up to perhaps a few Gb) (mostly read-only, but also need to save and load trained models)

Desirable requirements:

M348 Linear Models

Essential requirements:

  • ability to run and edit provided notebooks;
  • ability to execute code in notebook code cells in a preconfigured R environment;

Architectural Models

When delivering computational environments to students, particular ones that expose services to students, we might characterise several architectural models:

  • student / user-focused / standalone environments (1 to N):
  • a single base environment (1) is downloaded by N students; the provides all tools and services required by the student.
  • example: TM351VM: contains Jupyter server, PostgreSQL server; OpenRefine server; MongoDB server; requires VirtualBox to run the VM, and vagrant to manage its deployment.

  • institutional / centralised environments (S to N):

  • centralised multi-user services;
  • S denotes multiple central services provided into the student environment (for example, a shared database, a shared JupyterHub);

The 1-N approach means that students can take away their computing environment and work with it offline. It also means that we can’t track activity inside the student environment unless we enable logging and log data collection inside the environment along with some sort of data log return mechanism.

The S-N approach means that students require online access and cannot take away their computing environment. It also means that we can log any transaction that goes through a server.

Note that a localised, temporary, site deployment model may be possible in the S-N approach. For example, from a standalone physical server at a day school. On a local network in a prison (though prison IT might forbid such an architecture. It would be interesting to know what policies govern how we make software available to students in prisons).

A Note On installing software on student computers

In the first case, we should note that the OU supports:

  • platform indepence (Windows, Mac, Linux);
  • low minimum specification machine (old operating system, minimal memory, basic CPU, no GPU);

When deploying environments to student machines, we should aim to isolate or encapsulate the provided environment from the student’s own environment so that it does not interfere with any applications or services they are already running, and in a way that it can be easily and comprehensively removed from their computer at the end of their studies.

Jupyter Architectural Components

The Jupyter project oversees several components that can be used as part of an integrated notebook hosting service:

  • single user Jupyter server (aka Jupyter notebook server or simple Jupyter server) [PRODUCTION STABLE]: serves notebook and JupyterLab UIs via a browser to a single user, with password or token authentication if required;
  • multi-user JupyterHub server [PRODUCTION STABLE]: provides authenticated access for multiple users to single-user Jupyter servers. Plugins exist to support a wide range of authentication types. Persistent user accounts supported. Single user environments can be created using various “spawner” types, for example: Docker, Kubernetes.
  • Jupyter Enterprise Gateway [PRODUCTION STABLE]: single user Jupyter servers connect a user facing noteook or JupyterLab UI with a backend Jupyter kernel that contains the runtime object environment within which code in a particular notebook is executed. The Jupyter Enterprise Gateway launches kernels at the request of a single user server using Kubernetes; the single user server then manages communications between the UI and the Jupyter Enterprise Gateway managed kernel.
  • Binderhub: launch “temporary” single person notebook servers based on environment definitions contained in a public repository (Github, Zenodo DOI indicated repositories, etc).

Arbitrary web applications (that is, applications the present an HTML over HTTP user interface) can also be access in Jupyter environment in two ways: first, proxied via a single user Jupyter notebook server; second, via a recent community contribution (jhsingle-native-proxy), through being wrapped with a proxy services that can communicate with a JupyterHub or BinderHub server in a similar way to a single user Jupyter notebook server but without the need to run a notebook server.

Single User Jupyter Notebook Server

  • Jupyter notebook server: a standalone server that can be run locally and that is capable of:
  • providing password or token enabled access to the server via the web UI;
  • serving a Jupyter notebook or JupyterLab HTML UI over http on an arbitrary port;
  • from the UI, each separate notebook can launch a single computational enviroment (a Jupyter kernel) that is responsible for executing on demand, and in a REPL way, code contained in notebook code cells.

Architecturally, a notebook server presents the user with a notebook management interface for launching individual notebooks; and the notebook interface then provides a way of launching and managing a code executing kernel associated with the notebook.

The notebook server can be used to support computation in several ways:

  • via an interactive browser based notebook UI;
  • as a headless kernel provider to provide a computational environment that can be used to execute code:
  • from within a code editor, such as a PyCharm, VSCode, Atom etc.
  • displayed in an arbitrary HTML page;
  • as a proxy server providing access to other, arbitrary web applications via a single notebook server URL path (i.e. down a single path on a single port).

Let’s consider each of those in more detail in turn.

Using a Notebook Server to Serve Interactive Notebooks

This is the limit of what most people think of when they think of Jupyter notebooks: as a read/write/execute/display interactive notebook environment, accessed via a web browser.

Notebooks can be used in various ways, including but not limited to:

  • all explanatory text and code provided; all code is run in one go and used to deploy interactive widgets and displays in the page to support UI driven interactive activities (the code can optionally be hidden from view);
  • all code provided and users run one cell at a time; instructional text guides their activity and the see the result of executing each step of code a code cell block at a time;
  • all code provided, but users encouraged to edit, change and execute code repeatedly to explore a particular code idea;
  • some code provided; users lead through an activity but have to supply some code themselves;
  • structured slate: text used to develop ideas and set up practical activities but students provide all the code;
  • blank slate: users create and run all their own text and code.
Using a Notebook Server as a Provider of Computational Environments for Interactive Code Activities in Arbitrary Web Pages

Javascript packages such as thebelab.js allow code areas in arbitrary HTML documents to be executed against a known Jupyter server endpoint. This allows instructional HTML text to include activities where:

  • students execute provided code and see the results returned to, and embedded in, the page at the point the executed the code;
  • students to edit code in the HTML page, execute that code against the notebook served headless Jupyter kernel, and see the results returned to and embedded in the page.
Using a Notebook Server as a Proxy to Other Web Applications

The Jupyter single user server can be extended with a server extension (jupyter-server-proxy) that will allow it to proxy other HTML/http UIs.

Use case example: a Jupyter single user server with the jupyter-server-proxy enabled can be used to proxy an RStudio or OpenRefine application via the Jupyter user interface. A notebook server on example.com/nbserver can trivially serve applications against example.com/nbserver/proxy/PORTNUMBER or aliased as eg example.com/nbserver/rstudio. This means that a Jupyter notebook server can be used to provide an authentication layer in front of an arbitrary web application served from the same local network.

The jupyter-desktop-server server extension extends the jupyter-server-proxy extension to allow desktop environments (such as XFCE) to be proxied via the Jupyter single user server.

Use case example: a Docker container running a Jupyter server provides authenticated access to a GUI based Java application via a web browser. The Java application runs on the virtualised desktop and is proxied by the notebook server using the jupyter-desktop-server.
Use case example: a Jupyter server provides browser based access to a Windows desktop application running under Wine on an XFCE desktop. The application is launched from a Jupyter notebook UI and proxied via the jupyter-desktop-server extension.

JupyterHub

JupyterHub is a multi-user service that can provide authenticated access to personalised computational environments for multiple users. Each user may also be provided with their own persistent account managed by the server.

A range of authentication schemes are supported, including OAuth, LTI, login from Github etc. (TM112 uses LTI to allow students to access a single sign on authenticating JupyterHub server that launches temporary notebook servers from a Moodle VLE web page. Contact: Rod Norfor for technical details.)

The JupyterHub server provides access to a range of computational environments through environment spawners. For example, the DockerSpawner will launch a Docker container in response to a user login that runs a personalised computational environment in a Docker container.

JupyterHub can be configured to provide users with a range of alternative posssible environments, so a student could log in and be presented with options to launch different environments relating to different modules, for example.

JupyterHub can scale a service offering with increasing numbers of users using a well supported and well proven Kubernetes deployment model. (The TM112 Jupyterhub sever uses Kubernetes on Microsoft Azure to service the required number of students in a scaleable way.)

Whilst JupyterHub nominally expects to manage launched environments via a Jupyter notebook server running in the environment, a recent community contribution (jhsingle-native-proxy) allows arbitrary containerised web applications to be launched and managed from a JupyterHub server.

Architecturally, JupyterHub can launch individual Jupyter notebook servers, notebook servers then launch notebooks, and notebooks launch kernels.

Jupyter Enterprise Gateway Server

The Jupyter Enterprise Gateway Server is a middleware service, originally developed by IBM, that provides the ability to launch kernels on behalf of remote notebooks in a scaleable way (eg scaling for large numbers of users; allowing kernels to run with different amounts of computational resource (CPUs, GPUs, memory etc)).

One possible architectural model would be for a JupyterHub server to provide multi-user access to the Jupyter environment, and JupyterHub to launch kernels via the Jupyter Enterprise Gateway Server.

Alternatively, a student running their own personal Jupyter notebook server at home on a computer with limited computational resource could use an institutional Jupyter Enterprise Gateway Server to launch a kernel on a well provisioned server, eg one with a large amount of memory and a GPU.

(At the moment, I don’t think the same personal Jupyter server can launch kernels locally as well as via a Jupyter Enterprise Gateway server; I think the provisioner is one or the other.)

BinderHub

BinderHub is variant of JupyterHub that allows an unauthenticated user to launch a containerised Jupyter notebook server, on-demand, built according to a linked to specification on a remote repository.

The most common way of using BinderHub is to create a Github repository containing environment definition files as well as user files (eg Jupyter notebooks) and then use MyBinder (a free and open federated Binderhub service) to build a Docker image based on the contents of the repository. Once built, or if a cached version already exists, MyBinder spawns a Docker container from the image and serves it to the user.

Currently, BinderHub provides only a “temporary” service – the container is built, deployed, served to the user, and then destroyed at the end of the session. However, one of the Binder Federation nodes do have an experimental persistent Binderhub deployment that provides authenticated user access and a persistent user file areas. Users can launch Binder containers from their account, save files to their account, and share their own files into their launched Binder containers.

Binderhub / MyBinder is also used as an ad hoc provider of computational environments for a variety of online “interactive textbooks” and online courses. For example, Jupyter Book and “the spacy course”, as well as the LibreTexts interactive book platform. Published as HTML websites, the contents of code cells embedded (and editable) within the HTML page can be executed against a remotely launched MyBinder kernel, with the result of the computation returned to the page and displayed within it.

Several Javascript packages (thebelab.js, juniper.js) exist to “enable” the code cells in an HTML page and manage the MyBinder connection.

nbgallery

nbgallery is not an official Jupyter project but it does provide a range of interesting features that are worth exploring if we want to be open-minded about what sort of user environment we want to use to provide people to access to Jupyer notebooks.

The nbgallery application (TH review and video review) was developed by the US Department of Defense and the NSA to provide multi-user access to a wide range notebooks. The gallery provides search tools over a wide collection of notebooks and allows users to rate and review notebooks. Users can launch notebooks in a connected Jupyter environment. A healthcheck facility checks that code cells execute as expected (and if not may flag maintenance or student difficulty issues).

Exploring the use of nbgallery either as a social application, visited by all students, or as a personal application, used by a student to access notebooks in a personal environment, may turn up a way of providing access to notebooks in a way that is useful not just for linear courses, but also resource based / problem based learning courses.

Institutional Vs Local Provision

Whilst it is possible to consider Jupyter mediated environments in either the local user context (eg TM351 students using their own VMs) or the institutional context (for example, TM112 students launnching temporary notebook servers), I think there is most to be gained from considering them as two different ways of exposing students to the same computational environments.

For example, consider the following three situations:

1) TM351 students run a Virtualbox virtual machine containing multiple “personal” servers: a Jupyter notebook server, a Postgres database server; the student “owns” all services and all services are integrated.

2) TM351 students accessing a virtalised Jupyter environment for notebook access, and logging in to a shared Postgres database server. The student does not own their computational environment provided by the notebook server, though they may able to export their files; nor do they own their database server: they are one of many users accessing the same service, though they may be able to export a dump of their database contents.

3) a TM351 Docker environment is defined in a public repository such as innovationOUtside/tm351vm-binder. The definition is public and can be shared (owned, edited) by anyone. The repository can be launched using a Binderhub instance and used to provide temporary access to an integrated, personal TM351 environment running a personal Jupyter notebook server as well as a Posgtgres server. Continuous integration tools build a Docker container image from the repo and push it to Docker Hub (ousefuldemos/tm351-binderised. An institutional JupyterHub server allows students to seemlessly login from the VLE and launch the TM351 environment pulled from Docker Hub either directly or via an institutional Jupyter Enterprise Gateway Server. A student with a powerful computer at home installs Docker and launches their own local container instance of the TM351 environment pulled from DockerHub. Perhaps more conveniently, they use the ContainDS desktop application to launch the container locally, again either pulling the prebuilt image from Docker Hub, or building a version themselves either directly from the original repository or from a local clone of it. (ContainDS greatly simplifies the practicalites of running Dockerised notebook servers on the desktop, providing a useful graphical user interface for mangaing containers, managing server authentication tokens, mounting files from the desktop into the container, etc.)

In the third case, the same environment definition is used to:

  • build and deploy temporary environments on MyBinder;
  • build a public image deposited on Docker Hub;

The public image on Docker Hub is then used to:

  • deploy environments from an institutional JupyterHub service;
  • deploy local environments on the students’ own desktop.

In each case, students gain personal access to a commonly defined environment and have “ownership” of all the services running inside the environment. Students are free to take away the environment and use it in other contexts, or access it solely via hosted solutions. (There is an issue of synchronising user files and environment updates across mutliple services if a student works in that way, eg sometimes at home on their computer, sometimes from their desk on an OU remote host, etc.)

Use Cases for Jupyter Services

Jupyter environments can be used to support a range of activities across the institution, most notably:

  1. Delivering interactive teaching materials to students (module delivery);
  2. Authoring teaching materials (module production);
  3. Supporting computational academic research (research);
  4. Disseminating academic research (reproducible research publications) (research publishing);
  5. Supporting institutional data analysis and reporting (business analytics)

A sensible question to ask is: “what benefits or differences do a ‘Jupyter solution’ bring to each of these activities. So let’s quickly review them:

Delivering Interactive Teaching Materials to Students

We have been using Jupyter notebook from since before they were Jupyter notebooks to deliver 4 hours of teaching per week to students on TM351. Over that time, the pedagogy has delivered but is still largely unexplored.

The notebooks we wrote then are not the notebooks we would write now. The notebooks we might write now are not the notebooks we could write if we spent some time exploring them properly as a medium for both teaching and learning, eg considering how they might be used to support formative assessment, summatice assessment (touched on on TM351), automated testing / grading, personal note taking and portfolio development.

Computation supports interactivity in two ways:

  • it supports the execution of provided code and as such can be used to create what are effectively end user applications;
  • it allows student to create and execute their own code, for whatever purpose.

The Jupyter environment can support both use cases.

Authoring Teaching Materials

Irrespective of whether notebooks are used to deliver teaching to students, they can be used to develop teaching materials, interactive and otherwise, in a direct authoring way.

For example, document conversion tools allow authored notebooks to be rendered in a variety of formats: as .docx word processor documents, as .pdf files, as simple .md markdown/text files, as HTML pages.

The notebook user interface is web based and, via notebook extensions, supports WYSIWYG editing as well as direct editing of markdown and HTML text. Mathematical and chemical equations written in LaTeX are rendered natively by the notebook.

A wide range of display methods allow rich media assets to be embedded in the text by simple wrapping of a file reference (a local file reference or a web URL) that points to the object: images (Image(URL)), videos (Video(LOCAL_FILE) and audio clips (Audio(URL)) are all readily embedded in the document, for example, as can more complex objects such as interactive maps.

Code, often little more than a single simple line, can also be used to generate media outputs, from rich interactive embedded javascript applications to simple charts and tables.

More ambitious authors may choose to create their own asset generating code, or even go so far as to create de facto end user applications within the notebook context.

Using code to generate charts and tables has benefits for module maintenance, because charts are generated from source datasets or equations. If they need to be updated, a simple change to the code or data file is all that’s required for the rendered asset to be updated.

The executed notebook can then be exported as final document (as interactive HTML or ePub, as a flat PDF or docx etc) from the original notebook. (Of course, not all renderings will be as rich as the original interactive notebook form or HTML converted form.)

I was hoping to make more progress on openlearn-publish-test to demonstrate how we could use Jupytext to support direct authoring and rich / interactive updating of converted OpenLearn OU-XML content in a notebook UI, but I’ve run out of time and only got as far as how to get the OpenLearn content into a notebook enabled environment.

Supporting Computational Academic Reasearch

Lots of academics write ad hoc code in their research; lots of academics make notes around their ad hoc code; lots of academics use code for exploration; notebooks are powerful environment that lets you do each of those in the context of all the others.

Jupyter notebooks are arguably not the best environment for developing research software packages, although the provision of computational environments may support that activity. However, workflows are emerging that do better support traditional software engineering / code development practices. For example, tools such as Jupytext provide support for working with simple text document formats (.py and .R files, for example) directly within the Jupyter notebook environment.

Disseminating Academic Research

An increasing number of academic journals require researchers to deposit reproducible code scripts with their submissions.

Several journals are exploring the use of Jupyer notebooks as a first-class document format for submitting papers, as well as developing review and comment tools around them.

Supporting Institutional Analysis and Reporting

Lots of financial companies use notebooks as an analysis environment. The Ministry of Justice moved to a Jupyter fronted platform (MoJ Analytical Platform) for their analysts.

A recent OU job ad for Head of Data Analytics identified skills in things like Python and R. Such folk might reasonably expect to use Jupyter notebooks for their analysis and reporting. One of the things not covered in this review are the rich interactive dashboarding tools that Jupyter ecosystem supports (eg Voilà).

Appropriating OpenLearn Content and Republishing Edited Versions Of It Via a “Simple” Automated Text Blogging Workflow

I had intended on using my (unpaid) strike days to catch up with some books and harp practice, and maybe even the garden, and keep away from the keyboard; or failing that, to have a push on my rally data tinkering and get another LeanPub book started to try to reboot the £50 a quarter or so my previous publication (Wrangling F1 Data With R) generated, which keeps things like recurring Dropbox and Flickr etc etc charges covered (no-one has ever bought me a KoFi, as far as I can tell…).

And I was determined not to do any of the mounting workload associated with the day job, no matter how much fun some it is likely to be (like getting Ev3devSim working as an ipywidget in Jupyter notebooks).

Whilst I did manage to stick to the determined not to path, I never even really started down the intended one, instead spending hours and hours in front of keyboard trying to hack something together around my OpenLearn publishing workflow.

So here’s what I’ve come up with…

An OpenLearn Unit Text Publishing Thing

Firstly, it’s a thing that lets you grab the “source” content of an OpenLearn unit (at least, some of it; I still haven’t got round to grabbing things like video files or audio files, or scraping PDFs etc.) and churn it into a simple text format, markdown, which looks like this:

Headers are prefixed with a #, you can emphasise things by wrapping it in * characters, eg *italics* -> italics, or **double them up** for strong emphasis. Embedding links — [link text](link/path/file.html) — and images — ![Alt text](path/to/image.file) — is also pretty easy when you get the hang of it.

So how do you get started? First, you need a Github account (sign up here; just get one: you’re not going to have to do any hard Github stuff, you’re just making use of their free hosting). Get one, and sign in.

Second, visit my demo repo — psychemedia/openlearn-publish-test — (the URL will change at some point, but I’ll archive the original and link the new address from it…) and grab a copy of your own repo from mine by clicking the big green Use this template button:

You’ll be presented with a form:

Give your repo a name (no spaces). Optionally add a description. Keep the repo public. And click the big green Create repository from template button.

Things will churn for a moment or two:

And then you’ll have your own repo, containing a copy of the files in mine:

Behind the scenes, there is work going on…

Click the Actions tab on your copy of the repo to see what…

At first, it may look like nothing… but wait a moment or two and refresh the page:

A couple of actions will start running to initialise, and customise, your repo for you.

When the actions are done, you’ll be informed… (you shouldn’t have to refresh the page, the status indicators should update when things are done…):

If you go back to your repo homepage, you’ll see it’s been updated with a new README that’s slightly different to the original copy from my repo, and that has been personalised to yours:

So… now you can grab some OpenLearn content into your repo.

Click on the SET_UP.md file link in your repo:

You will be presented with a list of units on OpenLearn.

Find one you like the look of and click the Grab Unit into this repo link:

This will open a new issue for you in the Issues tab of your repo, and prepopulate it with a title that will tell a Github Action you want to grab some OpenLearn content, and an issue body that tells the action where the unit can be found.

Click the Submit new issue button to get things started.

Back in the Actions tab, you can see the helper elves have started doing their thing again…

If you click on a running Action, you can check its progress in more detail:

Click through on the actual job name to see what’s happening inside:

You can expand a step by clicking the arrow to see what each step is doing or has already done…

If you read through the steps, you’ll see several things are done: for example, we grab some OUXML (the OpenLearn content), convert to markdown, build some HTML files (these are what gets published), and deploy them, then build a LaTeX version of the material (which is used to generate a PDF), and an ePub ebook. (The LaTeX step takes some time; I should perhaps simplify things so that only the HTML build is done by default.)

When the Actions are green circled / green ticked and done (which may take a few minutes…), or at least, when the Deploy HTML to gh-pages step has run, go back to the repo home page, where you should see a new commit has been made to your repo:

If you click into the content folder you’ll see one or more session folders:

If you click into a session folder, you see some markdown files:

If you click on one of those, you’ll see some scraped and converted OpenLearn content:

So the content has been grabbed from OpenLearn and saved to your repo.

But that’s not all.

If you scroll down on your README page (I really should make this link more prominent in the README…) you’ll see a link to a github.io site published from your repo:

Click it…

If you see a “404”, page not found, don’t panic

On the repo home page, select the Settings tab:

and scroll down to the Github Pages area:

Change the Source from gh-pages branch to master branch:

And then, select the master branch:

And set the Source back to the gh-pages branch:

When you see something like this, you knows all good to go:

Note that cacheing of a previous build of the site may last for up to 10 minues, so grab yourself a cup of tea, or perhaps look through the markdown files in the content directory, or even go back to the Actions tab and, if the actions have completed.

If the Actions have completed, select the OpenLearnXML2 (or a completed nbsphinx publisher action if you have committed your own changes to the markdown files) and you should see and the availability of an Artifacts download.

Down load and unzip the artifacts file. If the build process has been able to build a PDF file and/or an ePub file from the content, it will be found in the unzipped downloaded directory.

Right… time to try your github.io site link again:

An OpenLearn Editing Thing

This will have to be in a part two to this post… I’ve run out of time for now and need to get back to the day job…

If you are itching to get started, this may work, if I’ve got my autopublishing things fixed…

In the content folder (on the default master branch of the repo),  find the markdown file you want to edit, and click on the pencil icon to open the editor:

my-oer_Part_00_02_md_at_master_·_psychemedia_my-oer

Edit the file / make the changes you want, and commit it (you may want to set a meaningful commit message title summarising the chnages, and perhaps even a longer description about the motivation for the changes, but both are optional…):

Editing_my-oer_Part_00_02_md_at_master_·_psychemedia_my-oer

Click the big green Commit changes button to commit the changes. If you look in the Actions tab, you should see that an nbsphinx publisher action has started that should publish your changes to your site.

Actions_·_psychemedia_my-oer6

Note that even when the publishing action has generated and pushed updated site pages  to where they need to be, the site may take a few minutes to update because of page cacheing on the Github site.

The Future

One of the spinoffs of this for me was the realisation that I could use Github Actions to run arbitrary code in response to particular events, such as.. commits or issue postings. The current machinery uses a Sphinx / nbSphinx publishing route, but I’ve also started exploring a recipe for Jekyll based Jupyter Book publishing. (Next on the to do list will be an Executable Book project / MyST workflow](https://ebp.jupyterbook.org/en/latest/); I also need to split out the workflows into actions of their own, but I haven’t figured out how to do that for myself yet.) It strikes me that I could bundle all these in the same repo with some way of flagging which build process I want to use. This would allow the user to then republish their material using the publishing tool, and its various peculiarities, customisationa and affordances, of their choice.

Immediate to dos, that may not happen because I’m the only user, I know it’s possible, and I’m not that interested, are to: make the Github Pages / pubished site link more prominent in the README; get movie and audio downloads and embeds working. Also a way of handling PDFs linked from the OpenLearn materials, and perhaps extracting text from those, even, to support republishing…

On the publish side, it would be useful to be able to publis to HTML only by default, with some optional way of invoking the PDF and ePub builds. The ePub build also needs things like title and author setting. The PDF build sometimes breaks, eg due to the inability to detect a bounding box size round a gif image. I maybe need to use another PDF generator, eg some hints here.

I also need to refactor the code, two ways: firstly, a simplification, that uses the bare minimum of packages and just churns the markdown direct from XML in one simple step. Secondly, fixing the current workflow, which stages the XML in a SQLite database, so that the database can properly handly content from multiple unitis and I can reliably churn the md from the database for any single unit. At the moment, I think things pretty much assume there’s content from just a single unit in the database. Putting the md into the db might be useful too… Then I could imagine a datasette powered publishing route too…

As a recent tweet from Martin Hawksey reveals, he’s been blogging about how we can turn Google’s App Script to our own purposes as a hosted code runner, and I think Github now provides a similar opportunity for anyone who wants to appropriate it to that end…

OpenLearn OER (Re)Publishing the Text Way

In response to a provocation, I built a thing that will let you grab an OpenLearn unit, convert it to a simple text format, and publish it on your own website.

[For the next step in this journey, see: Appropriating OpenLearn Content and Republishing Edited Versions Of It Via a “Simple” Automated Text Blogging Workflow.]

It doesn’t require much:

  • if you haven’t got one already, create a Github account (just don’t “ooh, Github, that’s really hard, so I won’t be able to do it…”; just f***ing get an account);
  • visit my repo and read down the page to see what to do…

And what to do essentially boils down to:

As for changing the content – it’s not that hard once you’ve done it a few times and just go with the flow of writing what feels natural… “Easy” to edit text files are in the content directory and you can edit them via the Github website.