Category: OU2.0

Fragment – Running Multiple Services, such as Jupyter Notebooks and a Postgres Database, in a Single Docker Container

Over the last couple of days, I’ve been fettling the build scripts for the TM351 VM, which typically uses vagrant to build a VirtualBox VM from a set of shell scripts, so they can be used to build a single Docker container that runs all the TM351 services, specifically Jupyter notebooks, OpenRefine, PostgreSQL and MongoDB.

Docker containers are typically constructed to a run a single service, with compositions of containers wired together using Docker Compose to create applications that deliver, or rely on, more than one running service. For example, in a previous post (Setting up a Containerised Desktop API server (MySQL + Apache / PHP 5) for the ergast Motor Racing Data API) I showed how to set up a couple of containers to work together, one running a MySQL database server, the other an http service that provided an API to the database.

So how to run multiple services in the same container? Docs on the Docker website suggest using supervisord to run multiple services in a single container, so here’s a fragment on how I’ve done that from my TM351 build.

To begin with, I’ve built the container up as a tiered set of containers, in a similar way to the way the stack of opinionated Jupyter notebook Docker containers are constructed:

#Define a stub to identify the images in this image stack
IMAGESTUB=psychemedia/tm361testm

# minimal
## Define a minimal container, eg a basic Linux container
## using whatever flavour of Linux we prefer
docker build --rm -t ${IMAGESTUB}-minimal-test ./minimal

# base
## The base container installs core packages
## The intention is to define a common build environment
## populated with packages likely to be common to many courses
docker build --rm --build-arg BASE=${IMAGESTUB}-minimal-test -t ${IMAGESTUB}-base-test ./base

#...

One of the things I’ve done to try to generalise the build steps is allow the name a base container to be used to bootstrap a new one by passing the name of the base image in via an optional variable (in the above case, --build-arg BASE=${IMAGESTUB}-minimal-test). Each Dockerfile in a build step directory uses the following construction to work out which image to use as the FROM basis:

#Set ARG values using --build-arg =
#Each ARG value can also have a default value
ARG BASE=psychemedia/ou-tm351-base-test
FROM ${BASE}

Using the same approach, I have used separate build tiers for the following components:

  • jupyter base: minimal Jupyter notebook install;
  • jupyter custom: add some customisation onto a pre-existing Jupyter notebook install;
  • openrefine: add the OpenRefine application; (note, we could just use BASE=ubuntu to create this a simple, standalone OpenRefine container);
  • postgres: create a seeded PostgreSQL database; note, this could be split into two: a base postgres tier and then a customisation that adds users, creates and seed databases etc;
  • mongodb: add in a seeded mongo database; again, the seeding could be added as an extra tier on a minimal database tier;
  • topup: a tier to add in anything I’ve missed without having to go back to rebuild from an earlier step…

The intention behind splitting out these tiers is that we might want to have a battle hardened OU postgres tier, for example, that could be shared between different courses. Alternatively, we might want to have tiers offering customisations for specific presentations of a course, whilst reusing several other fixed tiers intended to last out the life of the course.

By the by, it can be quite handy to poke inside an image once you’ve created it to check that everything is in the right place:

#Explore inside animage by entering it with a shell command
docker run -it --entrypoint=/bin/bash psychemedia/ou-tm351-jupyter-base-test -i

Once the services are in place, I add a final layer to the container that ensures supervisord is available and set up with an appropriate supervisord.conf configuration file:

##Dockerfile
#Final tier Dockerfile
ARG BASE=psychemedia/testpieces
FROM ${BASE}

USER root
RUN apt-get update && apt-get install -y supervisor

RUN mkdir -p /openrefine_projects  && chown oustudent:100 /openrefine_projects
VOLUME /openrefine_projects

RUN mkdir -p /notebooks  && chown oustudent:100 /notebooks
VOLUME /notebooks

RUN mkdir -p /var/log/supervisor
COPY monolithic_container_supervisord.conf /etc/supervisor/conf.d/supervisord.conf

EXPOSE 3334
EXPOSE 8888

CMD ["/usr/bin/supervisord"]

The supervisord.conf file is defined as follows:

##supervisord.conf
##We can check running processes under supervisord with: supervisorctl

[supervisord]
nodaemon=true
logfile=/dev/stdout
loglevel=trace
logfile_maxbytes=0
#The HOME envt needs setting to the correct USER
#otherwise jupyter throws: [Errno 13] Permission denied: '/root/.local'
#https://github.com/jupyter/notebook/issues/1719
environment=HOME=/home/oustudent

[program:jupyternotebook]
#Note the auth is a bit ropey on this atm!
command=/usr/local/bin/jupyter notebook --port=8888 --ip=0.0.0.0 --y --log-level=WARN --no-browser --allow-root --NotebookApp.password= --NotebookApp.token=
#The directory we want to start in
#(replaces jupyter notebook parameter: --notebook-dir=/notebooks)
directory=/notebooks
autostart=true
autorestart=true
startsecs=5
user=oustudent
stdout_logfile=NONE
stderr_logfile=NONE

[program:postgresql]
command=/usr/lib/postgresql/9.5/bin/postgres -D /var/lib/postgresql/9.5/main -c config_file=/etc/postgresql/9.5/main/postgresql.conf
user=postgres
autostart=true
autorestart=true
startsecs=5

[program:mongodb]
command=/usr/bin/mongod --dbpath=/var/lib/mongodb --port=27351
user=mongodb
autostart=true
autorestart=true
startsecs=5

[program:openrefine]
command=/opt/openrefine-3.0-beta/refine -p 3334 -i 0.0.0.0 -d /vagrant/openrefine_projects
user=oustudent
autostart=true
autorestart=true
startsecs=5
stdout_logfile=NONE
stderr_logfile=NONE

One thing I need to do better is to find a way to stage the construction of the supervisord.conf file, bearing in mind that multiple tiers may relate to the same servicel for example, I have a jupyter-base tier to create a minimal Jupyter notebook server and then a jupyter-base-custom tier that adds in specific customisations, such as branding and course related notebook extensions.

When the final container is built, the supervisord command is run and the multiple services started.

One other thing to note: we’re hoping to run TM351 environments on an internal OpenStack cluster. The current cluster only allows students to expose a single port, and port 80 at that, from the VM (IP addresses are in scant supply, and network security lockdowns are in place all over the place). The current VM exposes at least two http services: Jupyter notebooks and OpenRefine, so we need a proxy in place if we are to expose them both via a single port. Helpfully, the nbserverproxy Jupyter extension (as described in Exposing Multiple Services Via a Single http Port Using Jupyter nbserverproxy), allows us to do just that. One thing to note, though – I had to enable it via the same user that launches the notebook server in the suoervisord.conf settings:

##Dockerfile fragment

RUN $PIP install nbserverproxy

USER oustudent
RUN jupyter serverextension enable --py nbserverproxy
USER root

To run the VM, I can call something like:

docker run -p 8899:8888 -d psychemedia/tm351dockermonotest

and then to access the additional services, I can browse to e.g. localhost:8899/proxy/3334/ to see the OpenRefine application.

PS in case you’re wondering why I syndicated this through RBloggers too, the same recipe will work if you’re using Jupyter notebooks with an R kernel, rather than the default IPython one.

Interactive Authoring Environments for Reproducible Media: Stencila

One of the problems associated with keeping up with tech is that a lot of things that “make sense” are not the result of the introduction or availability of a new tool or application in and of itself, but in the way that it might make a new combination of tools possible that support a complete end to end workflow or that can be used to reengineer (a large part of) an existing workflow.

In the OU, it’s probably fair to say that the document workflow associated with creating course materials has its issues. I’m still keen to explore how a Jupyter notebook or Rmd workflow would work, particularly if the authored documents included recipes for embedded media objects such as diagrams, items retrieved from a third party API, or rendered from a source representation or recipe.

One “obvious” problem is that the Jupyter notebook or RStudio Rmd editor is “too hard” to work with (that is, it’s not Word).

A few days ago I saw a tweet mentioning the use of Stencila with Binderhub. Stencila? Apparently, *”[a]n open source office suite for reproducible research”. From the blurb:

[T]oday’s tools for reproducible research can be intimidating – especially if you’re not a coder. Stencila make reproducible research more accessible with the intuitive word processor and spreadsheet interfaces that you and your colleagues are already used to.

That sounds appropriate… It’s available as a desktop app, but courtesy of minrk/jupyter-dar (I think?), it runs on binderhub and can be accessed via a browser too:

 

You can try it here.

As with Jupyter notebooks, you can edit and run code cells, as well as authoring text. But the UI is smoother than in Jupyter notebooks.

(This is one of the things I don’t understand about colleagues’ attitude towards emerging tech projects: they look at today’s UX and think that’s it, because that’s how it is inside an organisation – you take what you’re given and it stays the same for decades. In a living project, stuff tends to get better if it’s being used and there are issues with it…)

The Jupyter-Dar strapline pitches “Jupyter + DAR compatibility exploration for running Stencila on binder”. Hmm. DAR? That’s also new to me:

Dar stands for (Reproducible) Document Archive and specifies a virtual file format that holds multiple digital documents, complete with images and other assets. A Dar consists of a manifest file (manifest.xml) that describes the contents.

Dar is being designed for storing reproducible research publications, but the underlying concepts are suitable for any kind of digital publications that can be bundled together with their assets.

Repo: [substance/dar](https://github.com/substance/dar)

Sounds interesting. And which reminds me: how’s OpenCreate coming along, I wonder? (My permissions appear to have been revoked again; or the URL has changed.)

PS seems like there’s more activity in the “pure web” notebook application world. Hot on the heels of Mike Bostock’s Observable notebooks (rationale) comes iodide, “[a] frictionless portable notebook-style interface for literate scientific computing in the browser” (examples).

I don’t know if these things just require you to use Javascript, or whether they can also embed things like Brython.

I’m not sure I fully get the js/browser notebooks yet? I like the richer extensibility of things like Jupyter in terms of arbitrary language/kernel availability, though I suppose the web notebooks might be able to hook into other kernels using similar mechanics to those used by things like Thebelab?

I guess one advantage is that you can do stuff on a Chromebook, and without a network connection if you cache all the required JS packages locally? Although with new ChromeOS offering support for Linux – and hence, Docker containers – natively, Chromebooks could get a whole lot more exciting over the next few months. From what I can tell, corsvm looks like a ChromeOS native equivalent to something like Virtualbox (with an equivalent of Guest Additions?). It’ll be interesting how well things like audio works? Reports suggest that graphical UIs will work, presumably using some sort of native X11 support rather than noVNC, so now could be a good time to start looking out for souped up Pixelbook…

Generative Assessment Creation

It’s coming round to that time of year where we have to create the assessment material for courses with an October start date. In many cases, we reuse question forms from previous presentations but change the specific details. If a question is suitably defined, then large parts of this process could be automated.

In the OU, automated question / answer option randomisation is used to provide iCMAs (interactive computer marked assessments) via the student VLE using OpenMark. As well as purely text based questions, questions can include tables or images as part of the question.

One way of supporting such question types is to manually create a set of answer options, perhaps with linked media assets, and then allow randomisation of them.

Another way is to define the question in a generative way so that the correct and incorrect answers are automatically generated.(This seems to be one of those use cases for why ‘everyone should learn to code’;-)

Pinching screenshots from an (old?) OpenMark tutorial, we can see how a dynamically generated question might be defined. For example, create a set of variables:

and then generate a templated question, and student feedback generator, around them:

Packages also exist for creating generative questions/answers more generally. For example, the R exams package allows you to define question/answer templates in Rmd and then generate questions and solutions in a variety of output document formats.


You can also write templates that include the creation of graphical assets such as charts:

 

Via my feeds over the weekend, I noticed that this package now also supports the creation of more general diagrams created from a TikZ diagram template. For example, logic diagrams:

Or automata diagrams:

(You can see more exam templates here: www.r-exams.org/templates.)

As I’m still on a “we can do everything in Jupyter” kick, one of the things I’ve explored is various IPython/notebook magics that support diagram creation. At the moment, these are just generic magics that allow you to write TikZ diagrams, for example, that make use of various TikZ packages:

One the to do list is to create some example magics that template different question types.

I’m not sure if OpenCreate is following a similar model? (I seem to have lost access permissions again…)

FWIW, I’ve also started looking at my show’n’tell notebooks again, trying to get them working in Azure notebooks. (OU staff should be able to log in to noteooks.azure.com using OUCU@open.ac.uk credentials.) For the moment, I’m depositing them at https://notebooks.azure.com/OUsefulInfo/libraries/gettingstarted, although some tidying may happen at some point. There are also several more basic demo notebooks I need to put together (e.g. on creating charts and using interactive widgets, digital humanities demos, R demos and (if they work!) polyglot R and python notebook demos, etc.). To use the notebooks interactively, log in and clone the library into your own user space.

Seeding Shared Folders With Files Distributed via a VM

For the first few presentations of our Data Management and Analysis course, the course VM has been distributed to students via a USB mailing. This year, I’m trying to move to a model whereby the primary distribution is via a download from VagrantCloud (students manage the VM using Vagrant), though we’re also hoping to be able to offer access to an OU OpenStack hosted VM to any student’s who really need it.

For students on Microsoft Windows computers, an installer installs Virtualbox and vagrant from installers distributed via the USB memory stick. This in part derives from the policy of fixing versions of as much as we can so that it can be tested in advance. The installer also creates a working directory for the course that will be shared by the VM, and copies required files, again from the memory stick, into the shared folder. On Macs and Linux, students have to do this setup themselves.

One of the things I have consciouslystarted trying to do is move the responsibility for satisficing of some of the installation requirements into the Vagrantfile. (I’m also starting to think they should be pushed even deeper into the VM itself.)

For example, as some of the VM services expect particular directories to exist in the shared directory, we have a couple of defensive measures in place:

  • the Vagrantfile creates any required, yet missing, subdirectories in the shared directory;
            #Make sure that any required directories are created
            config.vm.provision :shell, :inline => <<-SH
                mkdir -p /vagrant/notebooks
                mkdir -p /vagrant/openrefine_projects
                mkdir -p /vagrant/logs
                mkdir -p /vagrant/data
                mkdir -p /vagrant/utilities
                mkdir -p /vagrant/backups
                mkdir -p /vagrant/backups/postgres-backup/
                mkdir -p /vagrant/backups/mongo-backup/	
            SH
    

  • start up scripts for services that require particular directories check they exist before they are started and create them if they are missing. For example, in the service file, go defensive with something like ExecStartPre=mkdir -p /vagrant/notebooks.

The teaching material associated with the (contents of) the VM is distributed using a set of notebooks downloaded from the VLE. Part of the reason for this is that it delays the point at which the course notebooks must be frozen: the USB is mastered late July/early August for a mailing in September and course start in October.

As well as the course notebooks are a couple of informal installation test notebooks. This can be frozen along with the VM and distributed inside it, but the question then arises. So this year I am trying out a simple pattern that bakes test files into the VM and then uses the Vagranfile to copy the files into the shared directory on its first run with a particular shared folder:

config.vm.provision :shell, :inline => <<-SH
    if [ ! -f /vagrant/.firstrun_nbcopy.done ]; then
        # Trust notebooks in immediate child directories of notebook directory
        files=(`find /opt/notebooks/* -maxdepth 2 -name "*.ipynb"`)
        if [ ${#files[@]} -gt 0 ]; then
            jupyter trust /opt/notebooks/*.ipynb;
            jupyter trust /opt/notebooks/*/*.ipynb;
        fi
        #Copy notebooks into shared directory
        cp -r /opt/notebooks/. /vagrant/notebooks
        touch /vagrant/.firstrun_nbcopy.done
    fi
   SH

This pattern allows files shipped inside the VM to be copied into the shared folder once it is mounted into the VM from host. The files will then persist inside the shared directory, along with a hidden flag file to say the files have been copied. I’m not sure about the benefits of auto-running something inside the VM to manage this copying? Or whether to check that a more recent copy of the files to be copied doesn’t already exist in the shared folder before copying on the first run in the folder?

[Not] Running Jupyter Notebook Containers Via Jupyterhub Container Under Kubernetes Under Docker on My Desktop

…(sic).

One of the things I’ve been exploring lately (actually, that my colleague Rod Norfor in IT has been exploring lately) has been a way of offering a self-service, disposable Jupyter notebook service, pre-packaged with customised, pre-prepared teaching notebooks for students to work through. The hope is that this will go live to some TM112 students later this month.

The solution we’ve come up with is running Jupyterhub under Kubernetes on Microsoft Azure Cloud. The Jupyterhub server serves a prebuilt, public Docker container pulled from Docker Hub. (As a back-up, the same notebooks can be run via Microsoft Azure Notebooks cloned from a library on Microsoft Azure Notebooks or from the Github repo used to build the notebook container served by Jupyterhub.) On the to do list is to explore whether we should use a private registry or pull from an OU hosted registry, either private or public.

One thing we have noticed is that tagging the source notebook container image as latest breaks things with Kubernetes if we try to serve an updated image. The solution is to tag each updated version of the container differently (?and restart the Jupyterhub server?). The image is rebuilt using the same tag as a Github release tag, and the build triggered from a Github release. The Github release is handled manually.

As things currently stand, authentication is required to get into the OU VLE in the normal way, and then a link to the Jupterhub server containing a secret is used to take students to the Jupyterhub server. From there, users self-start the Jupyter notebook container.

In the default setup, dummy auth is in place, letting users through an explicit check with any user/password combination:

 

Thanks to Rod, the Azure route appears to be working so far, but  I thought I’d also try to set up a local development environment on my desktop.

The Docker Community Edition Edge release](https://www.docker.com/kubernetes) allows you to run Kubernetes locally on your desktop under Docker. Having got Kubernetes installed and up running straightforwardly enough, it was time to give Jupyterhub a go in its default state. The following recipes is via a get started crib from David Currie and the Zero to JupyterHub with Kubernetes docs:

## Install helm
## - download and install appropriate distribution (I grabbed the Mac distribution from [kubernetes/helm](https://github.com/kubernetes/helm), unzipped it, and put the app in my `Applications` folder, which is in my path)

##Check we can see it
helm

## Create a working directory, just in case!
mkdir code/dockerk8sjupyterhub
cd code/dockerk8sjupyterhub

## Set up things to work with my local docker powered kubernetes service
kubectl config use-context docker-for-desktop
kubectl get nodes

##Tiller is already installed - I'm not sure if this is necessary?
helm init --service-account tiller

## Security voodoo
kubectl --namespace=kube-system patch deployment tiller-deploy --type=json --patch='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["/tiller", "--listen=localhost:44134"]}]'

## Create a config file
touch config.yaml
## Create a secret hash value
openssl rand -hex 32
## And add it to the YAML file:
nano config.yaml
|------
proxy:
  secretToken: "MY_TOKEN"
|------

## Add helm chart
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
helm install jupyterhub/jupyterhub --version=v0.6 --name=thhelm1juphub --namespace=ouseful -f config.yaml
##locally:
##helm install path/to/my/files
##helm install my_chart.tgz

## When it's running, find the IP address
kubectl --namespace=ouseful get svc proxy-public
|------
NAME           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
proxy-public   10.104.156.186   localhost     80:32292/TCP,443:32125/TCP   1h
|------
## In this case, we can find Jupyterhub at: 127.0.0.1:80

##Bring everything down with:
helm delete ouseful --purge
kubectl delete namespace ouseful

One of the things I’d like to do is explore things like customise the notebook container start page:

…and the holding page as the server starts up:

and the error pages (at the moment, this is as far as I can get:-(

(I seem to remember there were issues setting up the helm chart to work with Azure the first time, so I’m hoping there’s a similar simple fix to get my local environment working. If you can point me to possible solutions via the comments, that would be appreciated:-)

I can also easily raise a 404 error page:

It looks as if the template pages are in the jupyterhub repo here: share/jupyterhub/templates so while things are still broken, now might be a good time for me to try some simple (branded) customisations of them.

At the moment, there are two things I need to clarify for myself (help via the comments appreciated!):

  • where is the path to the Jupyterhub container specified? Ideally, I want to build a local image containing customised start/error pages.
  •  where is the path to the Jupyter notebook image served by Jupyterhub specified so that I can serve my own test containers?

Clarification here: only one container is required – the one that will be served – but :

The Docker image you are using must have the jupyterhub package installed in order to work. Moreover, the version of jupyterhub must match the version installed by the helm chart that you’re using. For example, v0.5 of the helm chart usesjupyterhub==0.8.

To use a new image, tweak the config.yaml file, for example:

singleuser:
  image:
    name: jupyter/scipy-notebook
    tag: c7fb6660d096

DIT4C Inspired RobotLab Container

A couple of days ago I posted a quick demo of how to run an old OU Windows app under wine in a Docker container that exposed the app running on a graphical desktop via a browser.

One of the the problems with that demo is that it doesn’t provide a way of uploading or downloading files, so unless the user runs the container with a volume mounted against something that does support file transfer, users couldn’t persist files outside the container.

I don’t know whether the audio works either (though I didn’t try…) but as that recipe was based on a container I’d used to run the Audacity audio editor previously, I’m guessing it should…?

Anyway… regarding the file transfer issue, I recalled putting together a container last year for my PhD student who was looking for a portable way of demoing a desktop Ruby app. Digging that out, it was easily repurposed to run RobotLab. The base container is the DIT4C X11 container. The DIT4C project lost its funding last year, I think, but the repos and Docker images still exist and they provide a really great set of examples of prebuilt containerised apps. Exactly the sort of apps I’d want on my HE Library Digital App shelf. A bit dated now, perhaps, but as I when I get a chance I’ll start trying to refresh them.

The base Dockerfile is relatively straightforward:

#Base container
FROM dit4c/dit4c-container-x11:debian

#Required in order to add backports repo
RUN apt-get update && apt-get install -y software-properties-common 

# Install wine
RUN dpkg --add-architecture i386
RUN echo "deb http://httpredir.debian.org/debian jessie-backports main" | sudo tee /etc/apt/sources.list.d/docker.list
RUN apt-get update && apt-get install -y -t jessie-backports wine

# /var contains HTML site for container homepage
COPY var /var

# RobotLab windows files
COPY Apps/ /opt/Apps

# profile.d environment vars
COPY etc /etc

# Desktop shortcuts and application icons
COPY usr /usr
RUN chmod +x /usr/share/applications/robotlab.desktop
RUN chmod +x /usr/share/applications/neural.desktop
RUN chmod +x /usr/share/applications/remote.desktop

#Add items to top toolrail on desktop
RUN LNUM=$(sed -n '/launcher_item_app/=' /etc/tint2/panel.tint2rc | head -1) && \
  sed -i "${LNUM}ilauncher_item_app = /usr/share/applications/robotlab.desktop" /etc/tint2/panel.tint2rc && \
sed -i "${LNUM}ilauncher_item_app = /usr/share/applications/neural.desktop" /etc/tint2/panel.tint2rc && \
sed -i "${LNUM}ilauncher_item_app = /usr/share/applications/remote.desktop" /etc/tint2/panel.tint2rc

The local build / run / push to dockerhub process is trivial:

cp Dockerfile_dit4c_x11 Dockerfile
docker build -t psychemedia/robtolab2 .
docker run -d -p 8082:8080 psychemedia/robtolab2
docker push psychemedia/robtolab2

As to what it looks like, here’s the home page:

Files can be uploaded to / downloaded from the container using the File Management link:

The Linux Desktop Session link takes you to a Linux desktop. Several tools are available in the toolbar at the top of the desktop, including a terminal and the RobotLab application.

Clicking on the RobotLab icon runs it under wine – on first run we seem to get an update message. Maybe I need to run wine as part of the start-up procedure to handle this as part of the build process?

As well as the RobotLab application, we can run other tools in the RobotLab suite, such as the Neural neural network package:

Note that I haven’t tested the student activities yet – this is still early days in just trying to work out what the tech requirements are and how the workflow / student user experience might play out…

Repo is here (it contains redundant RobotLab stuff such as drivers for the original Lego Mindstorms infra-red towers, but I haven’t yet worked out what can be dropped and what RobotLab requires…): ouseful-course-containers/ou-tm129-robotlab.

A container is also available on Dockerhub as ousefulcoursecontainers/ou-tm129-robotlab but I haven’t had chance to check it yet… (it’s still in the build queue…).

The container can be imported into a desktop Docker environment using Kitematic:

robotlab_kitematic0

Once added, the container can also be run using Kitematic, with the user clicking on a link to take then directly to an appropriate location in their web browser:

robotlab_kitematic

The desktop / Kitematic route also enables file sharing with a local directory mounted into the container (though by the looks of it, I may need to tidy that up a little?).

The pieces are slowly starting to come together…?

Scripted Forms Magic

One of the things I think we could be looking at more in open ed in general, and the OU in particular, are authoring environments that allow educators to create their own interactives.

A significant blocker is the skills required to create and deploy such things:

  • activity design in general;
  • front end / UI coding (HTML / js)
  • back end coding (application logic, js, or py, for example)
  • runtime management (js can run in a browser, R or py likely to require a backend kernel to execute the code).

One of the things that appeals to me about the Jupyter and RStudio environments / ecosystems is the availability of packages that support end-user development by wrapping high level UI components as templated widgets that can be automatically generated from code. (See for example my own attempts at creating an HTML hexjson widget for rendering a hex map given an R dataframe.)

One of the things I started to come round to when creating various binderhub demos to show off how Jupyter notebooks can be used to generate and embed rich media assets for different topic areas (see slides 25-33 of the presentation from Dev.ac.uk embedded below, for example) was that a good collection of IPython magics could provide the basis for end-user developed online materials featuring embedded interactives.

For example, the py / folium magic I started working on makes it relatively straightforward to create an embedded map within a Jupyter notebook, that can then be saved and rendered as a standalone HTML page featuring an embedded interactive map.

It struck me that the magic should also work in the context of Scriptedforms, eg as per Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter.

And it does…

The markdown (text) file, scriptedforms_magic.md, on the left can be fired up as the HTML + interactive form on the right from a single command line command:

scriptedforms scriptedforms_magic.md

You can also do “live development” by editing and saving the markdown and then reloading the browser page.

By developing a sensible set of IPython magics, authors with some limited technical skills should be able to write enough “code”, as per the “code” in the markdown document above, to:

  • identify form elements;
  • bind those form elements as parameters in an IPython magic call.

After all, we keep saying everyone should be able to code… and that’s a level of coding where you can be immediately productive…

Everybody Should Code? Well, Educators Working in Online Learning in a “Digital First” Organisation Probably Should?

Pondering Sheila’s thoughts on Why I Don’t Code?, and mulling over some materials we’re prepping as revisions to the database course…

I see the update as an opportunity to try to develop the ed tech a bit, make it a bit more interactive, make it a bit “responsive” to students so they can try stuff out and get meaningful responses back that helps them check what they’ve been doing has worked. (I’ve called such approaches “self-marking” before, eg in the context of robotics, where it’s obvious if your robot has met the behavioural goal specified in some activity (eg go forwards and backwards the same distance five times, or traverse a square).)

Some the best activities in this regard, to my mind at least, can be defined as challenges, with a “low floor, high ceiling” as I always remember Mike Reddy saying in the context of educational robotics outreach. Eg how perfect a square, or how quickly can your robot traverse it. And can it reverse it?

Most kids/students can get the robot to do something, but how well can they get it to do something in particular, and how efficiently? There are other metrics too – how elegant does the code look, or how spaghetti like?!

So prepping materials, I keep getting sidetracked by things we can do to try to support the teaching. One bit I’m doing is getting students to link database tables together, so it seems sensible to try to find, or come up with, a tool that lets students make changes to the database and then look at the result in some meaningful way to check their actions have led to the desired outcome.

There are bits and bobs of things I can reuse, but they require:

1) gluing stuff together to make the student use / workflow appropriate. That takes a bit of code (eg to make some IPython magic that provides a oneliner capable of graphically depicting some updated configuration or other );
2) tweaking things that don’t work quite how want them, either in terms of presentation / styling, or functionality – which means being able to read other people’s code and change the bits we want to change;
3) keeping track of work – and thinking – in progress, logging issues, sketching out possible fixes, trying to explore the design space, and use space, including looking for alternative – possibly better – ways of achieving the same thing; which means trying to get other people’s code to work, which in turn means coming up with and trying out small examples, often based on reading documentation, looking for examples on Stack Overflow, or picking through tests to find out how to call what function with what sorts of parameter values.

Code stuff; very practical programmingy code-y stuff. That’s partly how I see code being used – on a daily basis – by educators working in a institution like the OU that claims to pride itself on digital innovation: educators coming up with small bits of code to implement micro digital innovations that directly support/benefit the teaching/learning materials and enhance them, technologically… Code as part of everyday practice for online digital educators. (Remember: “everyone should code”…)

It’s also stuff that tests my own knowledge of how things are supposed to work in principle – as well as practice. This plays out through trying to make sure that the tech remains relevant to the educational goal, and ideally helps reinforce it, as well as providing a means for helping the students act even more effectively as independent learners working at a distance…

And to my mind, it’s also part of our remit, if we’re supposed to be trying to make the most use of the whole digital thing… And if we can’t do this in a computing department, then where can – or should – we do it?

PS I should probably have said something about this being relevant to, or an example of, technology enhanced learning, but I think that phrase has been lost to folk who want to collect every bit of data they can from students? Ostensibly, this is for “learning analytics”, which is to say looking at ways of “optimising” students, rather than optimising our learning materials (by doing the basic stuff like looking at web stats as I’ve said repeatedly). I suspect there’s also an element of plan B – if we have a shed load of data, we can sell it to some one as part of every other fire sale.

Potential Issues With Institutionally Mediated Reproducible Research Environments

One of the advantages, for me, of the Jupyter Binderhub enviornment is that it provides with a large amount of freedom to create my own computational environment in the context of a potentially managed institutional service.

At the moment, I’m lobbying for an OU hosted version of Binderhub, probably hosted via Azure Kubernetes, for internal use in the first instance. (It would be nice if we could also be part of an open and federated MyBinder provisioning service, but I’m not in control of any budgets.) But in the meantime, I’m using the open MyBinder service (and very appreciative of it, too).

To test the binder builds locally, I use repo2docker, which is also used as part of the Binderhub build process.

What this all means is that I should be able to write – and test – notebooks locally, and know that I’ll be able to run them “institutionally” (eg on Binderhub).

However, one thing I noticed today was that notebooks in a binder container that was running okay, and that still builds and runs okay locally, have broken when run through Binderhub.

I think the error is a permissions error in creating temporary directories or writing temporary image files in either the xelatex commandline command used to generate a PDF from the LaTeX script, or the ImageMagick convert command used produce an image from the PDF which are both used as part of some IPython magic that renders LaTeX tikz diagram generating scripts. It certainly affects a couple of my magics. (It might be an issue with the way the magics are defined too. But whatever the case, it works for me locally but not “institutionally”.)

Broken notebook: https://mybinder.org/v2/gh/psychemedia/showntell/maths?filepath=Mechanics.ipynb
magic code: https://github.com/psychemedia/showntell/tree/maths/magics/tikz_magic
Error is something to do with the ImageMagick convert command not converting the .pdf to an image. At least one of issues seems to be that ghostscript is lost somewhere?

So here’s the issue. Whilst the notebooks were running fine in a container generated from an image that was itself created presumably before a Binderhub update, rebuilding the image (potentially without making any changes to the source Github repository) can lead to notebooks that were running fine to break.

Which is to say, there may be a dependency in the way a repository defines an environment on some of the packages installed by the repo2docker build process. (I don’t know if we can fully isolate out these dependencies by using a Dockerfile to define the environment rather than apt.txt and requirements.txt?)

This raises a couple of questions for me about dependencies:

  • what sort of dependency issues might there be in components or settings introduced by the jupyter2repo process, and how might we mitigate against these?
  • are there other aspects of the Binderhub process that can produce breaking changes that impact on notebooks running in a repository that specifies a computational environment run via Binderhub?

Institutionally, it also means that environments run via an institutionally supported Binderhub environment could break downstream environments (that is, ones run via Binderhub) through updates to the Binderhub environment.

This is a really good time for this to happen to me, I think, because it gives me more things to think about when considering the case for providing a Binderhub service institutionally.

On the other hand, it means I can’t update any of the other repos that use the tikz or asymptote magic until I find the fix because otherwise they will break too…

Should users of the institutional service, for example, be invited to define test areas in their Binder repositories (for example, using nbval) that the institution can use as test cases when making updates to the institutional service? If errors are detected through the running of these tests by the institutional service provider against their users’ tests, then the institutional service provider could explore whether the issue can be addressed by their update strategy, or alert the Binderhub user there may be breaking changes and how to explore what they are or mitigate against them. (That is, perhaps it falls to the institutional provider to centrally explore the likely common repercussions of a particular update and identify fixes to address them?)

For example, there might be dependencies on particular package version numbers. In this case, the user might then either want to update their own code, or add in a build requirement that regresses the package to the desired version. (Institutional providers might have something to say about that if the upgrade was for valid security reasons, though running things in isolation in containers should reduce that risk?) Lists of affected packages could also be circulated to other users using the same packages, along with mitigation strategies for coping with updates to the institutionally provided service.

There are also updating issues associated with a workflow strategy I am exploring around Binderhub which relates to using “base containers” to seed Binderhub builds (Note On My Emerging Workflow for Working With Binderhub). For example, if a build uses a “latest” tagged base image, any updates to that base image may break things built on top of it. In this case, mitigating against update risk to the base container is achieved by building from a specifically tagged version of the container. However, if an update to the Binderhub environment can break notebooks running on top of a particularly labelled base container, the fix for the notebooks may reside in making a fix to the environment in the base container (for example, which specifically acts to enforce a package version). This suggests that the base container might need doubly tagging – one tag paying heed to the downstream end users (“buildForExptXYZ”) – and the other that captures the upstream Binderhub environment (“BinderhubBuildABC”).

I’m also wondering know about where responsibility arises for maintaining the integrity of the user computing environment (that is, the local computational environment within which code in notebooks should continue to operate once the user has defined their environment). Which is to say, if there are changes to the wider environment that somehow break that local user environment, who should help fix it? If the changes are likely to impact widely, it makes sense to try to fix it once and then share the change, rather than expecting every user suffering from the break to have to find the fix independently?

Also, I’m wondering about classes of error that might arise. For example, ones that can be fixed purely by changing the environmental definition (baking package versions into config files, for example, which is probably best practice anyway) and ones that require changes to code in notebooks?

PS Hmm.. noting… are whitelists and blacklists also specifiable in Binderhub config? eg https://github.com/jupyterhub/mybinder.org-deploy/pull/239/files

binderhub:
  extraConfig:
    bans:
        c.GitHubRepoProvider.banned_specs = [
          '^GITHUBUSER/REPO.*'
        ]

Fragment – Virtues of a Programmer, With a Note On Web References and Broken URLs

Ish-via @opencorporates, I came across the “Virtues of a Programmer”, referenced from a Wikipedia page, in a Nieman Lab post by Brian Boyer on Hacker Journalism 101,, and stated as follows:

  • Laziness: I will do anything to work less.
  • Impatience: The waiting, it makes me crazy.
  • Hubris: I can make this computer do anything.

I can buy into those… Whilst also knowing (from experience) that any of the above can lead to a lot of, erm, learning.

For example, whilst you might think that something is definitely worth automating:

the practical reality may turn out rather differently:

The reference has (currently) disappeared from the Wikipedia page, but we can find it in the Wikipedia page history:

Larry_Wall_-_Wikipedia_old

The date of the NiemanLab article was 

Larry_Wall__Revision_history_-_Wikipedia

So here’s one example of a linked reference to a web resource that we know is subject to change and that has a mechanism for linking to a particular instance of the page.

Academic citation guides tend to suggest that URLs are referenced along with the date that the reference was (last?) accessed by the person citing the reference, but I’m not sure that guidance is given that relates to securing the retrievability of that resource, as it was accessed, at a later date. (I used to bait librarians a lot for not getting digital in general and the web in particular. I think they still don’t…;-)

This is an issue that also hits us with course materials, when links are made to third party references by URI, rather than more indirectly via a DOI.

I’m not sure to what extent the VLE has tools for detecting link rot (certainly, they used to; now it’s more likely that we get broken link reports from students failing to access a particular resource…) or mitigating against broken links.

One of the things I’ve noticed from Wikipedia is that it has a couple of bots for helping maintain link integrity: InternetArchiveBot and Wayback Medic.

Bots help preserve link availability in several ways:

  • if a link is part of a page, that link can be submitted to an archiving site such as the Wayback machine (or if it’s a UK resource, the UK National Web Archive);
  • if a link is spotted to be broken (header / error code 404), it can be redirected to the archived link.

One of the things I think we could do in the OU is add an attribute to the OU-XML template that points to an “archive-URL”, and tie this in with service that automatically makes sure that linked pages are archived somewhere.

If a course link rots in presentation, students could be redirected to the archived link, perhaps via a splash screen (“The original resource appears to have disappeared – using the archived link”) as well as informing the course team that the original link is down.

Having access to the original copy can be really helpful when it comes to trying to find out:

  • whether a simple update to the original URL is required (for example, the page still exists in its original form, just at a new location, perhaps because of a site redesign); or,
  • whether a replacement resource needs to be found, in which case, being able to see the content of the original resource can help identify what sort of replacement resource is required.

Does that count as “digital first”, I wonder???