Running lOCL

So I think have the bare bones of a lOCL (local Open Computing Lab) thing’n’workflow running…

I’m also changing the name… to VOCL — Virtual Open Computing Lab … which is an example of a VCL, Virtual Computing Lab, that runs VCEs, Virtual Computing Environments. I think…

If you are Windows, Linux, Mac or a 32 bit Raspberry Pi, you should be able to do the following:

Next, we will install a universal browser based management tool, portainer:

  • install portainer:
    • on Mac/Linux/RPi, run: docker run -d -p 80:8000 -p 9000:9000 --name=portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock portainer/portainer-ce
    • on Windows, the start up screen suggests docker run -d -p 80:8000 -p 9000:9000 --name=portainer --restart=always -v \\.\pipe\docker_engine:\\.\pipe\docker_engine portainer/portainer-ce may be the way to go?

On my to do list is to customise portainer a bit and call it something lOCL.

On first run, portainer will prompt you for an admin password (at least 8 characters).

You’ll then have to connect to a Docker Engine. Let’s use the local one we’re actually running the application with…

When you’re connected, select to use that local Docker Engine:

Once you’re in, grab the feed of lOCL containers: <s>https://raw.githubusercontent.com/ouseful-demos/templates/master/ou-templates.json</s> (I’ll be changing that URL sometime soon…: NOW IN OpenComputingLab/locl-templates Github repo) and use it to feed the portainer templates listing:

From the App Templates, you should now be able to see a feed of examaple containers:

The [desktop only] containers can only be run on desktop (amd64) processors, but the other should run on a desktop computer or on a Raspberry Pi using docker on a 32 bit Rasbperry Pi operating system.

By default, when you launch a container, it is opened onto the domain 0.0.0.0. This can be changed to the actual required domain via the Endpoints configuration page. For example, my Raspberry Pi appears on raspberrypi.local, so if I’m running portainer against that local Docker endpoint, I can configure the path as follows:

>I should be able to generate Docker images for the 64 bit RPi O/S too, but need to get a new SD card… Feel free to chip in to help pay for bits and bobs — SD cards, cables, server hosting, an RPi 8GB and case, etc — or a quick virtual coffee along the way…

The magic that allows containers to be downloaded to Raspberry Pi devices or desktop machines is based on:

  • Docker cross-builds (buildx), which allow you to build containers targeted to different processors;
  • Docker manifest lists that let you create an index of images targeted to different processors and associate them with a single "virtual" image. You can then docker pull X and depending on the hardware you’re running on, the appropriate image will be pulled down.

For more on cross built containers and multiple architecture support, see Multi-Platform Docker Builds. This describes the use of manifest lists which let us pull down architecture appropriate images from the same Docker image name. See also Docker Multi-Architecture Images: Let docker figure the correct image to pull for you.

To cross-build the images, and automate the push to Docker Hub, along with an appropriate manifest list, I used a Github Action workflow using the recipe decribed here: Shipping containers to any platforms: multi-architectures Docker builds.

Here’s a quick summary of the images so far; generally, they either run just on desktop machines (specifically, these are amd64 images, but I think that’s the default for Docker images anyway? At least until folk start buying the new M1 Macs.:

  • Jupyter notebook (oulocl/vce-jupyter): a notebook server based on andresvidal/jupyter-armv7l because it worked on RPi; this image runs on desktop and RPi computers. I guess I can now start iterating on it to make a solid base Jupyter server image. The image also bundles pandas, matplotlib, numpy, scipy and sklearn. These seem to take forever to build using buildx so I built wheels natively on an RPi and added them to the repo so the packages can be installed directly from the wheels. Pyhton wheels are named according to a convention which bakes in things like the Python version and processor architecture that the wheel is compiled for.

  • the OpenRefine container should run absolutely everywhere: it was built using support for a wide range of processor architectures;

  • the TM351 VCE image is the one we shipped to TM351 students in October; desktop machines only at the moment…

  • the TM129 Robotics image is the one we are starting to ship to TM129 students right now; it needs a rebuild because it’s a bit bloated, but I’m wary of doing that with students about to start; hopefully I’ll have a cleaner build for the February start;

  • the TM129 POC image is a test image to try to get the TM129 stuff running on an RPi; it seems to, but the container is full of all sorts of crap as I tried to get it to build the first time. I should now try to build a cleaner image, but I should really refactor the packages that bundle the TM129 software first because they distribute the installation weight and difficulty in the wrong way.

  • the Jupyter Postgres stack is a simple Docker Compose proof of concept that runs a Jupyter server in one container and a PostgreSQL server in a second, linked container. This is perhaps the best way to actually distribute the TM351 environment, rather than the monolithic bundle. At the moment, the Jupyter environment is way short of the TM351 environment in terms of installed Python packages etc., and the Postgres database is unseeded.

  • TM351 also runs a Mongo database, but there are no recent or supported 32 bit Mongo databases any more so that will have to wait till I get a 64 bit O/S running on my RPi. A test demo with an old/legacy 32 bit Mongo image did work okay in a docker-compose portainer stack, and I could talk to it from the Jupyter notebook. It’s a bit of a pain because it means we won’t be able to have the same image running on 32 and 64 bit RPis. And TM351 requires a relatively recent version of Mongo (old versions lack some essentially functionality…).

Imagining a Local Open Computing Lab Server (lOCL)

In the liminal space between sleep and wakefulness of earlier this morning, several pieces of things I’ve been pondering for years seemed to come together:

  • from years ago, digital application library shelves;
  • from months ago, a containerised Open Computing Lab <- very deprecated; things have moved on…)
  • from the weekend, trying to get TM129 and TM351 software running in containers on a Raspberry Pi 400;
  • from yesterday, a quick sketch with Portainer.

And today, the jigsaw assembled itself in the form of a local Open Computing Lab environment.

This combines a centrally provided, consistently packaged approach to the delivery of self-contained virtualised computational environments (VCEs) in the form of Docker containerised services and applications accessed over http) with a self-hosted (locally or remotely) via an environment that provides discovery, retrieval and deployment of these VCEs.

At the moment, for TM351 (live in October 2020), we have a model where students on the local install route (we also have a hosted offering):

  • download and install Docker
  • from the command line, pull the TM351 VCE
  • from the command line, start up the VCE and then access it from the browser.

In TM129, there is a slightly simpler route available:

  • download and install Docker
  • from the command line, pull the TM129 VCE
  • using the graphical Docker Dashboard (Windows and Mac), launch and manage the container.

The difference arises from the way a default Jupyter login token is set in the TM129 container, but a random token is generated in the TM351 VCE, which makes logging in trickier. At the moment, we can’t set the token (as an environment variable) via the Docker Dashboard, so getting into the TM351 container is fiddly. (Easily done, but another step that requires fiddly instructions.)

Tinkering with the Raspberry Pi over the last few days, and thinking about the easiest route to setting it up as a Docker engine server that can be connected to over a network, @kleinee reminded me in a Twitter exchange of the open-source portainer application [repo], a browser based UI for managing Docker environments.

Portainer offers several things:

  • the ability to connect to local or remote Docker Engines
  • the ability to manage Docker images, inclduing pulling them from a specified repo (DockerHub, or a private repo)
  • the ability to manage containers (start, stop, inspect, view logs, etc); this includes the ability to set environment variables at start-up
  • the ability to run compositions
  • a JSON feed powered menu listing a curated set of images / compositions.

So how might this work?

  • download and install Docker
  • pull a lOCL portainer Docker image and set it running
  • login
  • connect to a Docker Engine; in the screenshot below, the Portainer application is running on my RPi
  • view the list of course VCEs
  • select one of the listed VCEs to run it using it’s predefined settings
  • customise container settings via advanced options
  • we can also create a container from scratch, including setting environment variables, start directories etc
  • we can specify volume mounts etc

All the above uses the default UI, with custom settings via a control panel to set the logo and specify the application template feed (the one I was using is here).

I’m also using the old portainer UI (I think) and need to try out the new one (v2.0).

So… next steps?

  • fork the portainer repo and do a simplified version of it (or at least, perhaps such CSS display:none some of the more confusing elements, perhaps toggled with a ‘simple/full UI’ button somewhere
  • cross build the image for desktop machines (Win, Mac etc) and RPi
  • cross build TM351 and TM129 VCE images for desktop machines and RPi, and perhaps also some other demo containers, such as a minimal OU branded Jupyter notebook server and perhaps an edge demo OU branded Jupyter server with lots of my extensions pre-installed. Maybe an package up an environment and teaching materials for the OpenLearn Learn to Code for Data Analysis course as a demo for how OpenLearn might be able to make use of this approach
  • instructions for set up on:
    • desktop computer (Mac, Win, Linux)
    • RPi
    • remote host (Digital Ocean is perhaps simplest; portainer does have a tab for setting up against Azure, but it seems to require finding all sorts of fiddly tokens)

Two days, I reckon, to pull bits together (so four or five, when it doesn’t all "just" work ;-)

But is it worth it?

Keeping Track of TM129 Robotics Block Practical Activity Updates…

Not being the sort of person to bid for projects that require project plans and progress reports and end of project reports, and administration, and funding that you have to spend on things that don’t actually help, I tend to use this blog to keep track of things I’ve done in the form of blog posts, as well as links to summary presentations of work in progress (there is nothing ever other than work in progress…).

But I’ve not really been blogging as much as I should, so here’s a couple of links to presentations that I gave last week relating to the TM129 update:

  • Introducing RoboLab: an integrated robot simulator and Jupyter notebook environment for teaching and learning basic robot programming: this presentation relates to the RoboLab environment I’ve been working that integrates a Javascript robot simulator based on ev3devsim in a jupyter_proxy_widget with Jupyter notebook based instructional material. RoboLab makes heavy use of Jupyter magics to control the simulator, download programs to it, and retrieve logged sensor data from it. I think it’s interesting but no-one else seems to. I had to learn a load of stuff along the way: Javascript, HTML and CSS not the least among them.
  • Using Docker to deliver virtual computing environments (VCEs) to distance education students: this represents some sort of summary about my thinking around delivering virtualised software to students in Docker containers. We’ve actually be serving containers via Kubernetes on Azure using LTI-authed links from the Moodle VLE to launch temporary Jupyter notebook servers via an OU hosted JupyterHub server since Spring, 2018, and shipped the TM351 VCE (virtual computing environment) to TM351 students this October, with TM129 students getting starting to access their Dockerised VCE, which also bundles all the practical activity instructional notebooks. I believe that the institution is looking to run a "pathfinder" project (?) regarding making containerised environments avaliable to students in October, 2021. #ffs Nice to know when your work is appreciated, not… At least someone will make some internal capital in promotion and bonus rounds from that groundbreaking pathfinder work via OU internal achievement award recognition in 2022.

The TM129 materials also bundle some neural network / MLP / CNN activities that I intend to write up in a similar way at some point (next week maybe; but the notes take f****g hours to write…). I think some of the twists in the way the material is presented is quite novel, but then, wtf do I know.

There’s also bits and bobs I explored relating to embedding audio feedback into RoboLab, which I thought might aid accessibility as well as providing a richer experience for all users. You’d have thought I might be able to find someone, anyone, in the org who might be interested in bouncing more ideas around that (we talk up our accessibilitiness(?!)), or maybe putting me straight about why it’s a really crappy and stupid thing to do, but could I find a single person willing to engage on that? Could I f**k…

In passing, I note I ranted about TM129 last year (Feb 2019) in I Just Try to Keep On Keeping On Looking at This Virtual(isation) Stuff…. Some things have moved on, some haven’t. I should probably do a reflective thing comparing that post with the things I ended up tinkering with as part of the TM129 Robotics block practical activity update, but then again, maybe I should go read a book or listen to some rally podcasts instead…

Quick Tinker With Raspberry Pi 400

Some quick notes on a quick play with my Rapsberry Pi 400 keyboard thing…

Plugging it in to my Mac (having found the USB2ethernet dongle becuase Macs are too "thin" to have proper network sockets) and having realised the first USB socket I tried on my Mac doesn’t seem to work (no idea if this is at all, or just with the dongle) I plugged an ethernet between the Mac and the RPi 400, tried a ping which seemed to work:

ping raspberry.local

then tried to SSH in:

ssh pi@raspberry.local

No dice… seems that SSH is not enabled by default, so I had to find the mouse and HDMI cable, rewire the telly, go into the Raspberry Pi Configuration tool, Interfaces tab, and check the ssh option, unwire everything, reset the telly, redo the ethernet cable between Mac and RPi 400 and try again:

ssh pi@raspberry.local

and with the default raspberry password (unchanged, of course, or I might never get back in again!), I’m in. Yeah:-)

> I think the set-up just requires a mouse, but not a keyboard. If you buy a bare bones RPi, I think this means to get running you need: RPi+PSU+ethernet cable, then for the initial set-up: mouse + micro-HDMI cable + access to screen with HDMI input.

> You should also be able to just plug your RPi 400 into your home wifi router using an ethernet cable, and the device should appear (mine did…) at IP address-name raspberry.local.

> Security may be an issue so need to tell user to change the pi password when they have keyboard access. During setup, users could unplug the broadband in cable to their home router until they have a chance to reset the password, or swtich off wifi on their laptop etc if they set-up via an ethernet cable connection to the laptop etc.

Update everything (I’d set up the Raspberry Pi’s connection settings to our home wifi network when I first got it, though with a direct ethernet cable connection, you shouldn’t need to do that?):

sudo apt update && sudo apt upgrade -y

and we’re ready go…

Being of a trusting nature, I’m lazy enough to use the Docker convenience installation script:

curl -sSL https://get.docker.com | sh

then add the pi user to the docker group:

sudo usermod -aG docker pi

and: logout

then ssh back in again…

I’d found an RPi Jupyter container previously at andresvidal/jupyter-armv7l (Github repo: andresvidal/jupyter-armv7l), so does it work?

docker run -p 8877:8888 -e JUPYTER_TOKEN="letmein" andresvidal/jupyter-armv7l

It certainly does… the notebook server is there and running on http://raspberrypi.local:8877 and the token letmein does what it says on the tin…

> For a more general solution, just install portainer (docker run -d -p 80:8000 -p 9000:9000 --name=portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock portainer/portainer-ce and then go to http://raspberry.local via a browser and you should be able to install / manage Docker images and containers via that UI.

Grab a container with a bloated TM129 style container (no content):

docker pull outm351dev/nbev3devsimruns

(note that you may need to free space on SD Cards; suggested delections somewhere further down this post).

Autostart container: sudo nano /etc/rc.local before the exit 0 add:

docker run -d -p 80:8888 --name tm129vce  -e JUPYTER_TOKEN="letmein" outm351dev/nbev3devsimruns

Switch off / unplug RPi and switch it on again, server should be viewable at: http:raspberry.local with token letmein. Note that files are not mounted onto desktop. They could be but I think I heard somewhere that repeated backup writes every few seconds may degrade SD card over time?

How about if we try docker-compose?

This isn’t part of the docker package, so we need to install it separately:

pip3 install docker-compose

(I think that pip may be set up to implicitly use --extra-index-url=https://www.piwheels.org/simple which seems to try to download prebuilt RPi wheels from piwheels.org…?)

The following docker-compose.yaml file should load a notebook container wired to a PostgreSQL container.

version: "3.5"

services:
  tm351:
    image: andresvidal/jupyter-armv7l
    environment:
      JUPYTER_TOKEN: "letmein"
    volumes:
      - "$PWD/TM351VCE/notebooks:/home/jovyan/notebooks"
      - "$PWD/TM351VCE/openrefine_projects:/home/jovyan/openrefine"
    networks:
      - tm351
    ports:
      - 8866:8888

  postgres:
    image: arm32v7/postgres
    environment:
      POSTGRES_PASSWORD: "PGPass"
    ports:
      - 5432:5432
    networks:
      - tm351

  mongo:
    image: apcheamitru/arm32v7-mongo
    ports:
      - 27017:27017
    networks:
      - tm351

networks:
  tm351:

Does it work?

docker-compose up

It does, I can see the notebook server on http://raspberrypi.local:8866/.

Can we get a connection to the database server? Try the following in a notebook code cell:

# Let's install a host of possibly useful helpers...
%pip install psycopg2-binary sqlalchemy ipython-sql

# Load in the magic...
%load_ext sql

# Set up a connection string
PGCONN='postgresql://postgres:PGPass@postgres:5432/'

# Connect the magic...
%sql {PGCONN}

Then in a new notebook code cell:

%%sql
DROP TABLE IF EXISTS quickdemo CASCADE;
DROP TABLE IF EXISTS quickdemo2 CASCADE;
CREATE TABLE quickdemo(id INT, name VARCHAR(20), value INT);
INSERT INTO quickdemo VALUES(1,'This',12);
INSERT INTO quickdemo VALUES(2,'That',345);

SELECT * FROM quickdemo;

And that seems to work too:-)

How about the Mongo stuff?

%pip install pymongo
from pymongo import MongoClient
#Monolithic VM addressing - 'localhost',27351
# docker-compose connection - 'mongo', 27017

MONGOHOST='mongo'
MONGOPORT=27017
MONGOCONN='mongodb://{MONGOHOST}:{MONGOPORT}/'.format(MONGOHOST=MONGOHOST,MONGOPORT=MONGOPORT)

c = MongoClient(MONGOHOST, MONGOPORT)

# And test
db = c.get_database('test-database')
collection = db.test_collection
post_id = collection.insert_one({'test':'test record'})

c.list_database_names()

A quick try installing the ou-tm129-py package and it seemed to get stuck on the Installing build dependencies ... step, though I could install most packages separately, even if the builds were a bit slow (scikit-learn seemed to cause the grief?).

Running pip3 wheel PACKAGENAME seems to build .whl files into the local directory, so it might be worth creating some wheels and popping them on Github… The Dockerfile for the Jupyter container I’m using gives a crib:

# Copyright (c) Andres Vidal.
# Distributed under the terms of the MIT License.
FROM arm32v7/python:3.8

LABEL created_by=https://github.com/andresvidal/jupyter-armv7l
ARG wheelhouse=https://github.com/andresvidal/jupyter-armv7l/raw/master/wheelhouse

#...

RUN pip install \
    $wheelhouse/kiwisolver-1.1.0-cp38-cp38-linux_armv7l.whl # etc

Trying to the run the nbev3devsim package to load the nbev3devsimwidget, and jp_proxy_widget threw an error, so I raised an issue and it’s already been fixed… (thanks, Aaron:-)

Trying to install jp_proxy_widget from the repo threw an error — npm was missing — but the following seemed to fix that:

#https://gist.github.com/myrtleTree33/8080843

wget https://nodejs.org/dist/latest-v15.x/node-v15.2.0-linux-armv7l.tar.gz 

#unpack
tar xvzf node-v15.2.0-linux-armv7l.tar.gz 

mkdir -p /opt/node
cp -r node-v15.2.0-linux-armv7l/* /opt/node

#Add node to your path so you can call it with just "node"
#Add these lines to the file you opened

PROFILE_TEXT="
PATH=\$PATH:/opt/node/bin
export PATH
"
echo "$PROFILE_TEXT" >> ~/.bash_profile
source ~/.bash_profile

# linking for sudo node (TO FIX THIS - NODE DOES NOT NEED SUDO!!)
ln -s /opt/node/bin/node /usr/bin/node
ln -s /opt/node/lib/node /usr/lib/node
ln -s /opt/node/bin/npm /usr/bin/npm
ln -s /opt/node/bin/node-waf /usr/bin/node-waf

From the notebook code cell, nbev3devsim install requires way too much (there’s a lot of crap for the NN packages which I need to separate out… crap, crap, crap:-( Eveything just hangs on sklearn AND I DON"T NEED IT.

%pip install https://github.com/AaronWatters/jp_proxy_widget/archive/master.zip nest-asyncio seaborn tqdm  nb-extension-empinken Pillow
%pip install sklearn
%pip install --no-deps nbev3devsim

So I have to stop now – way past my Friday night curfew… why the f**k didn’t I do the packaging more (c)leanly?! :-(

Memory Issues

There’s a lot of clutter on the memory card supplied with the Raspberry Pi 400, but we can free up some space quite easily:

sudo apt-get purge wolfram-engine libreoffice* scratch -y
sudo apt-get clean
sudo apt-get autoremove -y

# Check free space
df -h

Check O/S: cat /etc/os-release

Installing scikit learn is an issue. Try adding more support for build inside container:

! apt-get update &amp;&amp; apt-get install gfortran libatlas-base-dev libopenblas-dev liblapack-dev -y
%pip install scikit-learn

There is no Py3.8 wheel for sklearn on piwheels at the moment (only 3.7)?

More TM351 Components

Lookup processor:

cat /proc/cpuinfo

Returns: ARMv7 Processor rev 3 (v7l)

And:

uname -a

Returns:

Linux raspberrypi 5.4.72-v7l+ #1356 SMP Thu Oct 22 13:57:51 BST 2020 armv7l GNU/Linux

Better, get the family as:

PROCESSOR_FAMILY=`uname -m`

I do think there are 64 bit RPis out there thoughm using ARMv8? And RPi 400 advertises as "Featuring a quad-core 64-bit processor"? So what am I not understanding? Ah… https://raspberrypi.stackexchange.com/questions/101215/why-raspberry-pi-4b-claims-that-its-processor-is-armv7l-when-in-official-specif

So presumably, with the simple 32 bit O/S we can’t use arm64v8/mongo and instead we need a 32 bit Mongo, which was deprecated in Mongo 3.2? Old version here: https://hub.docker.com/r/apcheamitru/arm32v7-mongo

But TM351 has a requirement on much more recent MongoDB… SO we maybe do need to a new SD card image? That could also be built as a much lighter custom image, perhaps with an OU customised dektop…

In the meantime, maybe worth moving straight to Ubuntu 64 bit server? https://ubuntu.com/download/raspberry-pi

Ubuntu installation guide for RPi: https://ubuntu.com/tutorials/how-to-install-ubuntu-on-your-raspberry-pi#1-overview

There also looks to be a 64 bit RPi / Ubuntu image with Docker already baked in here: https://github.com/guysoft/UbuntuDockerPi

Building an OpenRefine Docker container for Raspberry Pi

I’ve previously posted a cribbed Dockerfile for building an Alpine container that runs OpenRefine ( How to Create a Simple Dockerfile for Building an OpenRefine Docker Image), so let’s have a go at one for building an image that can run on an RPi:

FROM arm32v7/alpine

#We need to install git so we can clone the OpenRefine repo
RUN apk update &amp;&amp; apk upgrade &amp;&amp; apk add --no-cache git bash openjdk8

MAINTAINER tony.hirst@gmail.com
 
#Download a couple of required packages
RUN apk update &amp;&amp; apk add --no-cache wget bash
 
#We can pass variables into the build process via --build-arg variables
#We name them inside the Dockerfile using ARG, optionally setting a default value
#ARG RELEASE=3.1
ARG RELEASE=3.4.1

#ENV vars are environment variables that get baked into the image
#We can pass an ARG value into a final image by assigning it to an ENV variable
ENV RELEASE=$RELEASE
 
#There's a handy discussion of ARG versus ENV here:
#https://vsupalov.com/docker-arg-vs-env/
 
#Download a distribution archive file
RUN wget --no-check-certificate https://github.com/OpenRefine/OpenRefine/releases/download/$RELEASE/openrefine-linux-$RELEASE.tar.gz
 
#Unpack the archive file and clear away the original download file
RUN tar -xzf openrefine-linux-$RELEASE.tar.gz  &amp;&amp; rm openrefine-linux-$RELEASE.tar.gz
 
#Create an OpenRefine project directory
RUN mkdir /mnt/refine
 
#Mount a Docker volume against the project directory
VOLUME /mnt/refine
 
#Expose the server port
EXPOSE 3333
 
#Create the state command.
#Note that the application is in a directory named after the release
#We use the environment variable to set the path correctly
CMD openrefine-$RELEASE/refine -i 0.0.0.0 -d /mnt/refine

Following the recipe here — Building Multi-Arch Images for Arm and x86 with Docker Desktop — we can build an arm32v7` image as follows:

# See what's available...
docker buildx ls

# Create a new build context (what advantage does this offer?)
docker buildx create --name rpibuilder

# Select the build context
docker buildx use rpibuilder

# And cross build the image for the 32 bit RPi o/s:
docker buildx build --platform linux/arm/v7 -t outm351dev/openrefinetest:latest --push .

For more on cross built containers and multiple architecture support, see Multi-Platform Docker Builds. This describes the use of manifest lists which let us pull down architecture appropriate images from the same Docker image name. For more on this, see Docker Multi-Architecture Images: Let docker figure the correct image to pull for you. For an example Github Action workflow, see Shipping containers to any platforms: multi-architectures Docker builds. For issues around new Mac Arm processors, see eg Apple Silicon M1 Chips and Docker.

With the image built and pushed, we can add the following to the docker-compose.yaml file to launch the container via port 3333:

  openrefine:
    image: outm351dev/openrefinetest
    ports:
      - 3333:3333

which seems to run okay:-)

Installing the ou-tm129-py package

Trying to buld the ou-tm129-py package into an image is taking forever on the sklearn build step. I wonder about setting up a buildx process to use something like Docker custom build outputs to genarate wheels. I wonder if this could be done via a Github Action with the result pushed to a Github repo?

Hmmm.. maybe this will help for now? oneoffcoder/rpi-scikit (and Github repo). There is also a cross-building Github Action demonstrated here: Shipping containers to any platforms: multi-architectures Docker builds. Official Docker Github Action here: https://github.com/docker/setup-buildx-action#quick-start

Then install node in a child container for the patched jp_widget_proxy build (for some reason, pip doesn’t run in the Dockerfile: need to find the correct py / pip path):

FROM oneoffcoder/rpi-scikit

RUN wget https://nodejs.org/dist/latest-v15.x/node-v15.2.0-linux-armv7l.tar.gz 

RUN tar xvzf node-v15.2.0-linux-armv7l.tar.gz 

RUN mkdir -p /opt/node
RUN cp -r node-v15.2.0-linux-armv7l/* /opt/node

RUN ln -s /opt/node/bin/node /usr/bin/node
RUN ln -s /opt/node/lib/node /usr/lib/node
RUN ln -s /opt/node/bin/npm /usr/bin/npm
RUN ln -s /opt/node/bin/node-waf /usr/bin/node-waf

and in a notebook cell try:

!pip install --upgrade https://github.com/AaronWatters/jp_proxy_widget/archive/master.zip
!pip install --upgrade tqdm
!apt-get  update &amp;&amp; apt-get install -y libjpeg-dev zlib1g-dev
!pip install --extra-index-url=https://www.piwheels.org/simple  Pillow #-8.0.1-cp37-cp37m-linux_armv7l.whl 
!pip install nbev3devsim
from nbev3devsim.load_nbev3devwidget import roboSim, eds

%load_ext nbev3devsim

Bah.. the Py 3.low’ness of it is throwing an error in nbev3devsim around character encoding of loaded in files. #FFS

There is actually a whole stack of containers at: https://github.com/oneoffcoder/docker-containers

Should I fork this and start to build my own, more recent versions? They seem to use conda, which may simplify the sklearn installation? But it looks like recent Py supporting packages aren’t there? https://repo.anaconda.com/pkgs/free/linux-armv7l/ ARRGGHHHH.

Even the "more recent" https://github.com/jjhelmus/berryconda is now deprecated.

Bits and pieces

WHere do the packages used for your current Python environment when using a Jupyter notebook live?

from distutils.sysconfig import get_python_lib
print(get_python_lib())

So.. I got scikit to pip install afteer who knows how long by installing from a Jupyter notebook code cell into the a container that I think was based on the following:

FROM andresvidal/jupyter-armv7l

RUN pip3 install --extra-index-url=https://www.piwheels.org/simple myst-nb numpy pandas matplotlib jupytext plotly
RUN wget https://nodejs.org/dist/latest-v15.x/node-v15.2.0-linux-armv7l.tar.gz  &amp;&amp;  tar xvzf node-v15.2.0-linux-armv7l.tar.gz  &amp;&amp;  mkdir -p /op$


# Pillow support?
RUN apt-get install -y libjpeg-dev zlib1g-dev libfreetype6-dev  libopenjp2-7 libtiff5
RUN mkdir -p wheelhouse &amp;&amp; pip3 wheel --wheel-dir=./wheelhouse Pillow &amp;&amp; pip3 install --no-index --find-links=./wheelhouse Pillow
RUN pip3 install --extra-index-url=https://www.piwheels.org/simple blockdiag blockdiagMagic

#RUN apt-get install -y gfortran libatlas-base-dev libopenblas-dev liblapack-dev
#RUN pip3 install --extra-index-url=https://www.piwheels.org/simple scipy

RUN pip3 wheel --wheel-dir=./wheelhouse durable-rules &amp;&amp; pip3 install --no-index --find-links=./wheelhouse durable-rules

#RUN pip3 wheel --wheel-dir=./wheelhouse scikit-learn &amp;&amp; pip3 install --no-index --find-links=./wheelhouse scikit-learn
RUN pip3 install https://github.com/AaronWatters/jp_proxy_widget/archive/master.zip
RUN pip3 install --upgrade tqdm &amp;&amp;  pip3 install --no-deps nbev3devsim

In the notebook, I tried to generate wheels along the way:

!apt-get install -y  libopenblas-dev gfortran libatlas-base-dev liblapack-dev libblis-dev
%pip install --no-index --find-links=./wheelhouse scikit-learn
%pip wheel  --log skbuild.log  --wheel-dir=./wheelhouse scikit-learn

I donwnloaded the wheels from the notebook home page (select the files, clicl Download) so at the next attempt I’ll see if I can copy the wheels in via the Dockerfile and install sklearn from the wheel.

The image is way to heavy – and there is a lot of production crap in the `ou-tm129-py image that could be removed. But I got the simulator to run :-)

So nows the decision as to whether to try to pull together as lite a container as possible. Is it worth the effort?

Mounting but not COPYing wheels into a container

The Docker build secret Dockerfile feature looks like it will mount a file into the conatiner and let you use it but not actually leave the mouted file in a layer. So could we mount a wheel into the container and install from it, essentially giving a COPY...RUN ....&amp;&amp; rm *.wheel statement?

A recipe from @kleinee for building wheels (I think):

- run pipdeptree | grep -P '^\w+' &gt;requirements.txt
in installation that works (python 3.7.3)
- shift requirements.txt into your 64 bit container with Python x.x and bulid-deps
- in dir containing requirements.txt run pip3 wheel --no-binary :all: -w . -r ./requirements.txt

In passing, tags for wheels: https://www.python.org/dev/peps/pep-0425/ and then https://packaging.python.org/specifications/platform-compatibility-tags/ See also https://www.python.org/dev/peps/pep-0599/ which goes as far as linux-armv7l (what about arm8???)

Pondering "can we just plug an RPi into an iPad / Chromebook via an ethernet cable?", via @kleinee again, seems like yes, for iPad at least, or at least, using USB-C cable…: https://magpi.raspberrypi.org/articles/connect-raspberry-pi-4-to-ipad-pro-with-a-usb-c-cable See also https://www.hardill.me.uk/wordpress/2019/11/02/pi4-usb-c-gadget/

For Chromebook, there are lots of USB2Ethernet adapters (which is what I am using with my Mac).. https://www.amazon.co.uk/chromebook-ethernet-adapter/s?k=chromebook+ethernet+adapter And there are also USB-C to ethernet dongles? https://www.amazon.co.uk/s?k=usbc+ethernet+adapter

Via @kleinee: example RPi menu driven tool for creating docker-compose scripts: https://github.com/gcgarner/IOTstack and walkthough video: https://www.youtube.com/watch?v=a6mjt8tWUws Also refers to:

Portainer overview: Codeopolis: Huge Guide to Portainer for Beginners. For a video: https://www.youtube.com/watch?v=8q9k1qzXRk4 On RPI example : https://homenetworkguy.com/how-to/install-pihole-on-raspberry-pi-with-docker-and-portainer/ (not sure if you must set up the volume?)

Barebones RPi?

So to get this running on a home network, you also need to add an ethernet cable (to connect to home router) and a mouse (so you can point and click to set ssh during setup), and have a micron-HDML2HDMI cable and access to a tv/monitor w/ HDMI input during setup, and then you’d be good to go?

https://thepihut.com/collections/raspberry-pi-kits-and-bundles

Pi 4 may run hot, so maybe replace with a passive heatsink case such as https://thepihut.com/products/aluminium-armour-heatsink-case-for-raspberry-pi-4 (h/t @kleinee again)? VIa @svenlatham, “[m]ight be sensible to include an SD reader/writer in requirements? Not only does it “solve” the SSH issue from earlier, Pis are a pain for SD card corruption if (for instance) the power is pulled. Giving students the ability to quickly recover in case of failure.” eg https://thepihut.com/products/usb-microsd-card-reader-writer-microsd-microsdhc-microsdxc maybe?

Idly Wondering – RPi 400 Keyboard

A couple of days ago, I noticed a new release from the Raspberry Pi folks, a "$70 computer" bundling a Raspberry Pi 4 inside a keyboard, with a mouse, power supply and HDMI cable all as part of the under a hundred quid "personal computer kit". Just add screen and network connection (ethernet cable, as well as screen, NOT provided).

Mine arrived today:

Raspberry Pi 400 Personal Computer Kit

So… over the last few months, I’ve been working on some revised material for a level 1 course. The practical computing environment, which is to be made available to students any time now for scheduled use from mid-December, is being shipped via a Docker image. Unlike the TM351 Docker environment (repo), which is just the computing environment, the TM129 Docker container (repo) also contains the instructional activity notebooks.

One of the issues with providing materials this way is that students need a computer that can run Docker. Laptops and desktop computers running Windows or MacOS are fine, but if you have a tablet, cheap Chromebook, or just a phone, you’re stuck. Whilst the software shipped in the Docker image is all accessed through a browser, you still need a "proper" computer to run the server…

Challenges using Raspberry Pis as home computing infrastructure

For a long time, several of us have muttered about the possibility of distributing software to students that can run on a Raspberry Pi, shipping the software on a custom programmed SD card. This is fine, but there are several hurdles to overcome, and the target user (eg someone whose only computer is a phone or a tablet) is likely to be the least confident sort of computer user at a "system" level. For example, to use the Raspberry Pi from a standing start, you really need access to:

  • a screen with an HDMI input (many TVs offer this);
  • a USB keyboard (yeah, right: not in my house for years, now…)
  • a USB mouse (likewise);
  • an ethernet cable (because that’s the easiest way to connect to you home broadband router, at least to start with).

You might also try to get your Raspberry Pi to speak to you if you can connect it to your local network, for example to tell you what IP address it’s on, but then you need an audio output device (the HDMI’d screen may do it, or you need a headset/speaker/set of earphones with an appropriate connector).

Considering these hurdles, the new RPi-containing keyboard, makes it easier to "just get started", particularly if purchased as part of the Personal Computer Kit: all you need is a screen if the SD card has all the software you need.

So I’m wondering again if this is a bit closer to the sort of thing we might give to students when they sign up with the OU: a branded RPi keyboard, with a custom SD card for each module or perhaps a single custom SD card and then a branded USB memory stick with additional software applications required for each module.

All the student needs to provide is a screen. And if we can ship a screensharing or collaborative editing environment (think: Google docs collaborative editing) as part of the software environment, then if a student can connect their phone or tablet to a server running from their keyboard, we could even get away without a phone.

The main challenges are still setting up a connection to a screen or setting up a network connection to a device with a screen.

> Notes from setting up my RPi 4000: don’t admit to having a black border on your screen: I did and lost the the desktop upper toolbar off the top of my TV and had to dive into a config file to reset it, eg as per instructions here: run sudo nano /boot/config.txt then comment out line: #disable_overscan=1

Building software distributions

As far as going software environments to work with the RPi, I had a quick poke around and there are several possible tools that could help.

One approach I am keen on is using Docker containers to distribute images, particularly if we can build images for different platfroms from the same build scripts.

Installing Docker on the RPi looks simple enough, eg this post suggests the following is enough:

sudo apt update -y
curl -fsSL get.docker.com -o get-docker.sh && sh get-docker.sh

although if that doesn’t work, a more complete recipe can be found here: Installing Docker on the Raspberry Pi .

A quick look for arm32 images on DockerHub turns up a few handy looking images, particulalry if you work with a docker-compose architecture to wire different containers together to provide the student computing environment, and there are examples out there of building Jupyter server RPi Docker images from these base containers (that repo includes some prevbuilt arm python package wheels; other pre-built packages can be found on piwheels.org). See also: jupyter-lab-docker-rpi (h/t @dpmcdade) for a Docker container route and kleinee/jns for a desktop install route.

The Docker buildx cross-builder offers another route. This is available via Docker Desktop if uou enable experimental CLI features and should support cross-building of images targeted to the RPi Arm processor, as described here.

Alternatively, it looks like there is at least one possible Github Action out there that will build an Arm7 targeted image and push it to a Docker image hub? (I’m not sure if the official Docker Github Action supports cross-builds?)

There’s also a Github Action for building pip wheels, but I’m not convinced this will build for RPi/Arm too? This repo — Ben-Faessler/Python3-Wheels — may be more informative?

For building full SD card images, this Dockerised pi-builder should do the trick?

So, for my to do list, I think I’ll have a go at seeing whether I can build an RPi runnable docker-compose version of the TM351 environment (partitioning the services into separate containers that can then be composed back together has been on my to do list for some time, so this way I can explore two things at once…) and also have a go at building an RPi Docker image for the TM129 software. The TM129 release might also be interesting to try in the context of an SD Card image with the software installed on the desktop and, following the suggestion of a couple f colleagues, accessed via an xfce4 desktop in kiosk mode perhaps via a minimal o/s such as DietPi.

PS also wonders – anyone know of a jupyter-rpi-repodocker that will do the repo2docker thing on Raspberry Pi platforms…?!

FInding the Path to a Jupyter Notebook Server Start Directory… Or maybe not…

For the notebook search engine I’ve been tinkering with, I want to be able to index notebooks rooted on the same directory path as a notebook server the search engine can be added to as a Jupyter server proxy extension. There doesn’t seem to be a reliably set or accessible environment variable containing this path, so how can we create one?

Here’s a recipe that I think may help: it uses the nbclient package to run a minimal notebook that just executes a simple, single %pwd command against the available Jupyter server.

import nbformat
from nbclient import NotebookClient

_nb =  '''{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pwd"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}'''

nb = nbformat.reads(_nb, as_version=nbformat.NO_CONVERT)

client = NotebookClient(nb, timeout=600)
# Available parameters include:
# kernel_name='python3'
# resources={'metadata': {'path': 'notebooks/'}})

client.execute()

path = nb['cells'][0]['outputs'][0]['data']['text/plain'].strip("'").strip('"')

Or maybe it doesn’t? Maybe it actually just runs in the directory you run the script from, in which case it’s just a labyrinthine pwd… Hmmm…

Sketching a datasette powered Jupyter Notebook Search Engine: nbsearch

Every so often, I’ve pondered the question of "notebook search": how can we easily support searches over Jupyter notebooks. I don’t really understand why this area seems so underserved, especially given the explosion in the number of notebooks and the way in which notebooks are increasingly used as a document for writing technical documentation, tutorial and instructional material.

One approach I have seen as a workaround is to produce an HTML site from a set of notebooks using something like nbsphinx or Jupyter Book simply to generate access to an inbuilt search engine. But that somehow feels redundant to me. The HTML Jupyter book form is not a collection of notebooks, nor does it provide a satisfying search environment. To access runnable notebooks you need to click through to open the notebook in another environment (for example, a MyBinder environment built from a repository of notebooks that created the HTML pages), or return the the HTML environment and run code cells inline using something like Thebelab.

So I finally got round to considering this whole question again in the form of a quick sketch to see what an integrated Jupyter notebook server search engine might feel like. It’s still early days — the nbsearch tool is provided as a Jupyter server proxy application, rather than integrated as a Jupyter server extension available via a integrated tab, but that does mean it also works in a standalone mode.

The search engine is built on top of a SQLite database, served using datasette. The base UI was stolen wholesale from Simon Willison’s Fast Autocomplete Search for Your Website demo.

The repo is currently here.

The search index is currently based on a full text search index of notebook code and markdown cells. (At the moment, you have to manually generate the index from a command line command. On the to do list for another sketch is an indexer that monitors the file system.) Cells are returned in a cell-type sensitive way:

Screenshot of initial nbsearch UI.

Code cells are syntax highlighted using Prism.js, and feature a Copy button for copying the (unstyled) code (clipboard.js). Markdown cells are styled using a simple Javascript markdown parser (marked.js).

The code cells should also have line numbers but this seems a little erratic at the moment; I can’t get local static js and css files to load properly under the Jupyter server proxy at the moment, so I’m using a CDN. The prism.js line number extension is a separate CDN delivered script to the main Prism script, and it seems that the line number extension doesnlt necessarily load correctly? A race condition maybe?

Each result item displays a link to the original notebook (although this doesn’t necessarily resolve correctly at the moment), and a description of which cell in the notebook the result corresponds to. An inline graphic depicts the structure of the notebook (markdown cells are blue, and code cells pink). Clicking the graphic toggles the display (show / hide) of that results cell group.

The contents of a cell are limited in terms of number of characters displayed. Clicking the the Show all cell button displays the full range of content. Two other buttons — Show previous cell and Show next cell — allow you to repeatedly grab additional cells that surround the originally retrieved results cell.

I’ve also started experimenting with a Thebelab code execution support. At the moment this is hardwired to use a MyBinder backend, but the intention is that if a local Jupyer server is available (eg as in the case when running nbsearch as a Jupyter server proxy application), it will use the local Jupyter server. (Ideally, it would also ensure the correct kernel is selected for any given notebook result.)

nbsearch UI with ThebeLab code execution example.

At the moment, things don’t work completely properly with Thebelab. If you run a query, and "activate" Thebelab in the normal way, things work fine. But when I dynamically add new cells, they arenlt activated.

If I try to manually activate them via a cell-centric button:

then the run/restart buttons appear, but trying to run the cell just hangs on the "Waiting for kernel…" message.

At the moment, the code cell is non-editable, but making it editable should just be a case of tweaking the code cell attributes.

There are lots of other issues to consider regarding cell execution, such as when a cell requires other cells to have run previously. This could be managed by running another query to grab all the previous code cells associated with a particular code code, and running those cells on a restarted kernel using Thebelab before running the current cell.

Providing an option to grab and display (and even copy) all the previous code in a notebook, or perhaps explore the gather package for finding precursor cells, might be a useful facility anyway, even without the ability to execute the code directly.

At the moment, results are limited to the first ten. This needs tweaking, perhaps with a slider ranged to the total number of results for a particular query and then letting you slide to select how many of them you want to display.

A switch to limit results to just code or just markdown cells might also be useful, as would an indicator somewhere that shows the grouped number of hits per notebook, perhaps with selection of this group acting as a facet: selecting a particular notebook would then limit cell results to just that notebook, perhaps grouping and ordering cells within a notebook by cell otde.

The ranking algorithm is something else that may be worth exploring more generally. One simple ranking tweak that may be useful in an educational setting could be to order results by notebook and cell order (for example, if notebooks are named according to some numbering convention: 01.1 Introduction to X, O1.2 X in more detail, 02.1 etc). Again, Simon Willison has led the way in some of the practicalities associated with exploring custom ranking schemes in his post Exploring search relevance algorithms with SQLite.

Way back when, when I originally started blogging, search was one of my favourite topics. I’ve neglected it over the years, but still think it has a lot to offer as a teaching and learning tool (eg things like Search Engine Powered Courses… and search hubs / discovered custom search engines). Many educators disagree with this approach because they like to think they are in control of the narrative, whereas I think that search, with a properly tuned ranking algorithm, can help support a student demand led, query result constructed, personalised structured narrative. Maybe it’s time for me to start playing with these ideas again…

Rally Review Charts Recap

At the start of the year, I was planning on spending free time tinkering with rally data visualisations and trying to pitch some notebook originated articles to Racecar Engineering. The start of lockdown brought some balance, and time away from scrren and keyboard in the garden, but then pointless made up organisational deadlines kicked in and my 12 hour plus days in the front of the screen kicked back in. As Autumn sets in, I’m going to try to cut back working hours, if not screen hours, by spending my mornings playing with rally data.

The code I have in place at the moment is shonky as anything and in desperate need of restarting from scratch. Most of the charts are derived from mutliple separate steps, so I’m thinking about pipeline approaches, perhaps based around simple web services (so a pipeline step is actually a service call). Thiw will give me an opportunity to spend some time seeing how production systems actually build pipeline and microsservice architectures to see if there is anything I can pinch.

One class of charts, shamelessly stolen from @WRCStan of @PushingPace / pushingpace.com, are pace maps that use distance along the axis and a function of time on the y-axis.

One flavour of pace map uses a line chart to show the cumulative gap between a selected driver and other drivers. The following chart, for example, shows the progress of Thierry Neuville on WRC Rally Italia Sardegna, 2020. On the y-axis is the accumulated gap in seconds, at the end of each stage, to the identified drivers. Dani Sordo was ahead for the whole of the rally, building a good lead over the first four stages, then holding pace, then losing pace in the back end of the rally. Neuville trailed Seb Ogier over the first five stages, then after borrowing from Sordo’s set-up made a come back and a good battle with Ogier from SS6 right until the end.

Off the pace chart.

The different widths of the stage identify the stage length; the graph is thus a transposed distance-time graph, with the gradient showing the difference in pace in seconds per kilometer.

The other type of pace chart uses lines to create some sort of stylised onion skin histogram. In this map, the vertical dimension is seconds per km gained / loat relative to each driver on the stage, with stage distance again on the x-axis. The area is thus and indicator of the total time gained / lost on the stage, which marks this out as a histogram.

In the example below, bars are filled relative to a specific driver. In this case, we’re plotting pace deltas relative to Thierry Neuville, and highlighting Neuville’s pace gap to Dani Sordo.

Pace mapper.

The other sort of chart I’ve been producing is more of a chartable:

Rally review chart.

This view combines tabular chart, with in-cell bar charts and various embedded charts. The intention was to combine glanceable visual +/- deltas with actual numbers attached as well as graphics that could depict trends and spikes. The chart also allows you to select which driver to rebase the other rows to: this allows you to generate a report that tells the rally story from the perspective of a specified driver.

My thinking is that to create the Rally Review chart, I should perhaps have a range of services that each create one component, along with a tool that lets me construct the finished table from component columns in the desired order.

Some columns may contain a graphical summary over values contained in a set of neighbouring columns, in which case it would make sense for the service to itself be a combination of services: one to generate rebased data, others to generate and return views over the data. (Rebasing data may be expensive computationally, so if we can do it once, rather than repeatedly, that makes sense.)

In terms of trasnformations, then, there are are least two sorts of transformation service required:

  • pace transformations, that take time quantities on a stage and make pace out them by taking the stage distance into account;
  • driver time rebasing transformations, that rebase times relative to a specified driver.

Rummaging through other old graphics (I must have the code somewhere bit not sure where; this will almost certainly need redoing to cope with the data I currently have in the form I have it…), I also turn up some things I’d like to revisit.

First up, stage split charts that are inspired by seasonal subseries plots:

Stage split subseries

These plots show the accumulated delta on a stage relative to a specified driver. Positions on the stage at each split are shown by overplotted labels. If we made the x-axis a split distance dimension, the gradient would show pace difference. As it is, the subseries just indicate trend over splits:

Another old chart I used to quite like is based on a variant of quantised slope chart or bump charts (a bit like postion charts.

This is fine for demonstrating changes in position for a particular loop but gets cluttered if there are more than three or four stages. The colour indicates whether a driver gained or lost time realtive to the overall leader (I think!). The number represents the number of consecutive stage wins in the loop by that driver. The bold font on the right indicates the driver improved overall position over the course of the loop. The rows at the bottom are labelled with position numbers if they rank outside the top 10 (this should really be if they rank lower than the number of entruies in the WRC class).

One thing I haven’t tried, but probably should, is a slope graph comparing times for each driver where there are two passes of the same stage.

I did have a go in the past at more general position / bump charts too but these are perhaps a little too cluttered to be useful:

Again, the oberprintined numbers on the first position row indicate the number ofconsecutive stage wins for that driver; the labels on the lower rows are out of top 10 position labels.

What may be more useful would be adding the gap to leader of diff to car ahead, either on stage or overall, with colour indicating either that quantity, heatmap style, or for the overall case, whether that gap / difference increased or decreased on the stage.

Deconstructing the TM351 Virtual Computing Environment via VS Code

For 2020J, which is to say, the 2020 October presentation, of our TM351 Data Management and Analysis course, we’ve deprecated the original VirtualBox packaged virtual machine and moved to a monolithic Docker container that packages all the required software applications and services (a Jupyer notebook server, postgres and mongoDB database servers, and OpenRefine).

As with the VM, the container is headless and exposes applications over http via browser based user interfaces. We also rebranded away from “TM351 VM” to “TM351 VCE”, where VCE stands for Virtual Computing Environment.

Once Docker is installed, the environment is installed and launched purely from the command line using a docker run command. Students early in to the forums have suggested moving to docker compose, which simplifies the command line command significantly, but also at the cost of having to supply a docker-compose.yaml . With OU workflows, it can take weeks, if not months, to get files onto the VLE for the first time, and days to weeks to post updates (along with a host of news announcements and internal strife about the possibility of tutors/ALs and students having different versions of the file). As we need to support cross-platfrom operation, and as the startup command specifies file paths for volume mounts, we’d need different docker-compose files (I think?) because file paths on Mac/Linux hosts, versus Windows hosts, use a different file path syntax (forward vs back slashes as path delimiters. [If anyone can tell me how to write a docker-compose.yaml files with arbitrary paths on the volume mounts, please let me know via the comments…]

Something else that has cropped up early in the forums is mention of VS Code, which presents a way to personalise the way in which the course materials are used.

By default, the course materials we provide for practical activities are all based on Jupyter notebooks, delivered via the Jupyter notebook server in the VCE (or via an OU hosted notebook server we are also exploring this year). The activities are essentially inlined using notebook code cells within a notebook that presents a linear teaching text narrative.

Students access the notebooks via their web browser, wherever the notebook server is situated. For students running the Docker VCE, notebook files (and OpenRefine project files) exist in a directory on the student’s own computer that is then mounted into the container; make changes to the notebooks in the container and those changes are saved in the notebooks mounted from host. Delete the container, and the notebooks are still on your desktop. For students using the online hosted notebook server, there is no way of synchronising files back to the student desktop, as far as I am aware; there was an opportunity to explore how we might allow students to use something like private Github repositories to persist their files in a space they control, but to my knowledge that has not been explored (a missed opportunity, to my mind…).

Using the VS Code Python extension, students installing VS Code on their own computer can connect to the Jupyter server running in the containerised VCE and (I don’t know if the permissions allow this on the hosted server).

The following tm351vce.code-workspace file describes the required settings:

{
"folders": [
{
"path": "."
}
],
"settings": {
"python.dataScience.jupyterServerURI": "http://localhost:35180/?token=letmein"
}
}

The VSCode Python extension renders notebooks, so students can open local copies of files from their own desktop and execute code cells against the containerised kernel. If permissions on the hosted Jupyter service allow remote/external connections, this would provide a workaround for synching notebooks files: students would work with notebook files saved on their own computer but executed against the hosted server kernel.

Queries can be run against the database servers via the code cells in the normal way (we use some magic to support this for the postgres database).

If we make some minor tweaks to the config files for the PostgreSQL and MongoDB database servers, we can use the VS Code PostgreSQL extension and MongoDB extension to run queries from VS Code directly against the databases.

For example, the postgres database:

image

and the mongo database:

image

Note that this is now outside the narrative context of the notebooks, although it strikes me that we could generate .sql and .json text files from notebooks that show code literally and comment out the narrative text (the markdown text in the notebooks).

However, we wouldn’t be able to work directly with the data returned from the database via Python/pandas dataframes, as we do in the notebook case. (Note also that in the notebooks we use a Python API for querying the mongo database, rather than directly issuing Javascript based queries.)

At this point you might ask why we would want to deconstruct / decompose the original structured notebook+notebook UI environment and allow students to use VS Code to access the computational environment, not least when we are in the process of updating the notebooks and the notebook environment to use extensions that add additional style and features to the user environment. Several reasons come to my mind that are motivated by finding ways in which we can essentially lose control, as educators, of the user interface whilst still being reasonably confident that the computational environment will continue to perform as we intend (this stance will probably make many of my colleagues shudder; I call it supporting personalisation…):

  • we want students to take ownership of their computational environment; this includes being able to access it from their own clients that may be better suited to their needs, eg in terms of familiarity, accessibility, productivity, etc;
  • a lot of our students are already working in software development and already have toolchains they are working with. Whilst we see benefits of using the notebook UI from a teaching and learning perspective, the fact remains that students can also complete the activities in other user environments. We should not hinder them from using their own environments — the code should still continue to run in the same way — as long as we explain how the experience may not be the same as the one we are providing, and also noting that some of the graphics / extensions we use in the notebooks may not work in the same way, or may not even work at all, in the VS Code environment.

If students encounter issues when using their own environment, rather than the one we provide, we can’t offer support. If the personalised learning environment is not as supportive for teaching and learning as the environment we provide, it is the student’s choice to use it. As with the Jupyter environment, the VS Code environment sits at the centre of a wide ecosystem of third party extesions. If we can make our materials available in that environment, particulalry for students already familiar with that environment, they may be able to help us by identifying and demonstrating new ways, perhaps even more effective ways, of using the VS Code tooling to support their learning than the enviorment we provide. (One example might be the better support VS Code has for code linting and debugging, which are things we don’t teach, and that our chosen environment perhaps even prevents students who know how to use such tools from making use of them. Of course, you could argue we are doing students a service by grounding them back in the basics where they have to do their own linting and print() statement debugging… Another might be the Live Share/collaboration service that lets two or more users work collaboratively in the same notebook, which might be useful for personal tutorial sessions etc.)

From my perspective, I believe that, over time, we should try to create materials that continue to work effectively to support both teaching and learning in environments that students may already be working in, and not just the user interface environments we provide, not least becuase we potentially increase the number of ways in which students can see how they might make use of those tools / environments.

PS I do note that there may be licensing related issues with VS Code and the VS Code extensions store, which are not as open as they could be; VSCodium perhaps provides a way around that.