Seeding Shared Folders With Files Distributed via a VM

For the first few presentations of our Data Management and Analysis course, the course VM has been distributed to students via a USB mailing. This year, I’m trying to move to a model whereby the primary distribution is via a download from VagrantCloud (students manage the VM using Vagrant), though we’re also hoping to be able to offer access to an OU OpenStack hosted VM to any student’s who really need it.

For students on Microsoft Windows computers, an installer installs Virtualbox and vagrant from installers distributed via the USB memory stick. This in part derives from the policy of fixing versions of as much as we can so that it can be tested in advance. The installer also creates a working directory for the course that will be shared by the VM, and copies required files, again from the memory stick, into the shared folder. On Macs and Linux, students have to do this setup themselves.

One of the things I have consciouslystarted trying to do is move the responsibility for satisficing of some of the installation requirements into the Vagrantfile. (I’m also starting to think they should be pushed even deeper into the VM itself.)

For example, as some of the VM services expect particular directories to exist in the shared directory, we have a couple of defensive measures in place:

  • the Vagrantfile creates any required, yet missing, subdirectories in the shared directory;
            #Make sure that any required directories are created
            config.vm.provision :shell, :inline => <<-SH
                mkdir -p /vagrant/notebooks
                mkdir -p /vagrant/openrefine_projects
                mkdir -p /vagrant/logs
                mkdir -p /vagrant/data
                mkdir -p /vagrant/utilities
                mkdir -p /vagrant/backups
                mkdir -p /vagrant/backups/postgres-backup/
                mkdir -p /vagrant/backups/mongo-backup/	
            SH
    

  • start up scripts for services that require particular directories check they exist before they are started and create them if they are missing. For example, in the service file, go defensive with something like ExecStartPre=mkdir -p /vagrant/notebooks.

The teaching material associated with the (contents of) the VM is distributed using a set of notebooks downloaded from the VLE. Part of the reason for this is that it delays the point at which the course notebooks must be frozen: the USB is mastered late July/early August for a mailing in September and course start in October.

As well as the course notebooks are a couple of informal installation test notebooks. This can be frozen along with the VM and distributed inside it, but the question then arises. So this year I am trying out a simple pattern that bakes test files into the VM and then uses the Vagranfile to copy the files into the shared directory on its first run with a particular shared folder:

config.vm.provision :shell, :inline => <<-SH
    if [ ! -f /vagrant/.firstrun_nbcopy.done ]; then
        # Trust notebooks in immediate child directories of notebook directory
        files=(`find /opt/notebooks/* -maxdepth 2 -name "*.ipynb"`)
        if [ ${#files[@]} -gt 0 ]; then
            jupyter trust /opt/notebooks/*.ipynb;
            jupyter trust /opt/notebooks/*/*.ipynb;
        fi
        #Copy notebooks into shared directory
        cp -r /opt/notebooks/. /vagrant/notebooks
        touch /vagrant/.firstrun_nbcopy.done
    fi
   SH

This pattern allows files shipped inside the VM to be copied into the shared folder once it is mounted into the VM from host. The files will then persist inside the shared directory, along with a hidden flag file to say the files have been copied. I’m not sure about the benefits of auto-running something inside the VM to manage this copying? Or whether to check that a more recent copy of the files to be copied doesn’t already exist in the shared folder before copying on the first run in the folder?

[Not] Running Jupyter Notebook Containers Via Jupyterhub Container Under Kubernetes Under Docker on My Desktop

…(sic).

One of the things I’ve been exploring lately (actually, that my colleague Rod Norfor in IT has been exploring lately) has been a way of offering a self-service, disposable Jupyter notebook service, pre-packaged with customised, pre-prepared teaching notebooks for students to work through. The hope is that this will go live to some TM112 students later this month.

The solution we’ve come up with is running Jupyterhub under Kubernetes on Microsoft Azure Cloud. The Jupyterhub server serves a prebuilt, public Docker container pulled from Docker Hub. (As a back-up, the same notebooks can be run via Microsoft Azure Notebooks cloned from a library on Microsoft Azure Notebooks or from the Github repo used to build the notebook container served by Jupyterhub.) On the to do list is to explore whether we should use a private registry or pull from an OU hosted registry, either private or public.

One thing we have noticed is that tagging the source notebook container image as latest breaks things with Kubernetes if we try to serve an updated image. The solution is to tag each updated version of the container differently (?and restart the Jupyterhub server?). The image is rebuilt using the same tag as a Github release tag, and the build triggered from a Github release. The Github release is handled manually.

As things currently stand, authentication is required to get into the OU VLE in the normal way, and then a link to the Jupterhub server containing a secret is used to take students to the Jupyterhub server. From there, users self-start the Jupyter notebook container.

In the default setup, dummy auth is in place, letting users through an explicit check with any user/password combination:

 

Thanks to Rod, the Azure route appears to be working so far, but  I thought I’d also try to set up a local development environment on my desktop.

The Docker Community Edition Edge release](https://www.docker.com/kubernetes) allows you to run Kubernetes locally on your desktop under Docker. Having got Kubernetes installed and up running straightforwardly enough, it was time to give Jupyterhub a go in its default state. The following recipes is via a get started crib from David Currie and the Zero to JupyterHub with Kubernetes docs:

## Install helm
## - download and install appropriate distribution (I grabbed the Mac distribution from [kubernetes/helm](https://github.com/kubernetes/helm), unzipped it, and put the app in my `Applications` folder, which is in my path)

##Check we can see it
helm

## Initialise helm
helm init

## Create a working directory, just in case!
mkdir code/dockerk8sjupyterhub
cd code/dockerk8sjupyterhub

## Set up things to work with my local docker powered kubernetes service
kubectl config use-context docker-for-desktop
kubectl get nodes

##Tiller is already installed - I'm not sure if this is necessary?
helm init --service-account tiller

## Security voodoo
kubectl --namespace=kube-system patch deployment tiller-deploy --type=json --patch='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["/tiller", "--listen=localhost:44134"]}]'

## Create a config file
touch config.yaml
## Create a secret hash value
openssl rand -hex 32
## And add it to the YAML file:
nano config.yaml
|------
proxy:
  secretToken: "MY_TOKEN"
|------

## Add helm chart
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
helm install jupyterhub/jupyterhub --version=v0.6 --name=thhelm1juphub --namespace=ouseful -f config.yaml
##locally:
##helm install path/to/my/files
##helm install my_chart.tgz

## When it's running, find the IP address
kubectl --namespace=ouseful get svc proxy-public
|------
NAME           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
proxy-public   10.104.156.186   localhost     80:32292/TCP,443:32125/TCP   1h
|------
## In this case, we can find Jupyterhub at: 127.0.0.1:80

##Bring everything down with:
helm delete ouseful --purge
kubectl delete namespace ouseful

One of the things I’d like to do is explore things like customise the notebook container start page:

…and the holding page as the server starts up:

and the error pages (at the moment, this is as far as I can get:-(

(I seem to remember there were issues setting up the helm chart to work with Azure the first time, so I’m hoping there’s a similar simple fix to get my local environment working. If you can point me to possible solutions via the comments, that would be appreciated:-)

I can also easily raise a 404 error page:

It looks as if the template pages are in the jupyterhub repo here: share/jupyterhub/templates so while things are still broken, now might be a good time for me to try some simple (branded) customisations of them.

At the moment, there are two things I need to clarify for myself (help via the comments appreciated!):

  • where is the path to the Jupyterhub container specified? Ideally, I want to build a local image containing customised start/error pages.
  •  where is the path to the Jupyter notebook image served by Jupyterhub specified so that I can serve my own test containers?

Clarification here: only one container is required – the one that will be served – but :

The Docker image you are using must have the jupyterhub package installed in order to work. Moreover, the version of jupyterhub must match the version installed by the helm chart that you’re using. For example, v0.5 of the helm chart usesjupyterhub==0.8.

To use a new image, tweak the config.yaml file, for example:

singleuser:
  image:
    name: jupyter/scipy-notebook
    tag: c7fb6660d096

DIT4C Inspired RobotLab Container

A couple of days ago I posted a quick demo of how to run an old OU Windows app under wine in a Docker container that exposed the app running on a graphical desktop via a browser.

One of the the problems with that demo is that it doesn’t provide a way of uploading or downloading files, so unless the user runs the container with a volume mounted against something that does support file transfer, users couldn’t persist files outside the container.

I don’t know whether the audio works either (though I didn’t try…) but as that recipe was based on a container I’d used to run the Audacity audio editor previously, I’m guessing it should…?

Anyway… regarding the file transfer issue, I recalled putting together a container last year for my PhD student who was looking for a portable way of demoing a desktop Ruby app. Digging that out, it was easily repurposed to run RobotLab. The base container is the DIT4C X11 container. The DIT4C project lost its funding last year, I think, but the repos and Docker images still exist and they provide a really great set of examples of prebuilt containerised apps. Exactly the sort of apps I’d want on my HE Library Digital App shelf. A bit dated now, perhaps, but as I when I get a chance I’ll start trying to refresh them.

The base Dockerfile is relatively straightforward:

#Base container
FROM dit4c/dit4c-container-x11:debian

#Required in order to add backports repo
RUN apt-get update && apt-get install -y software-properties-common

# Install wine
RUN dpkg --add-architecture i386
RUN echo "deb http://httpredir.debian.org/debian jessie-backports main" | sudo tee /etc/apt/sources.list.d/docker.list
RUN apt-get update && apt-get install -y -t jessie-backports wine

# /var contains HTML site for container homepage
COPY var /var

# RobotLab windows files
COPY Apps/ /opt/Apps

# profile.d environment vars
COPY etc /etc

# Desktop shortcuts and application icons
COPY usr /usr
RUN chmod +x /usr/share/applications/robotlab.desktop
RUN chmod +x /usr/share/applications/neural.desktop
RUN chmod +x /usr/share/applications/remote.desktop

#Add items to top toolrail on desktop
RUN LNUM=$(sed -n '/launcher_item_app/=' /etc/tint2/panel.tint2rc | head -1) && \
sed -i "${LNUM}ilauncher_item_app = /usr/share/applications/robotlab.desktop" /etc/tint2/panel.tint2rc && \
sed -i "${LNUM}ilauncher_item_app = /usr/share/applications/neural.desktop" /etc/tint2/panel.tint2rc && \
sed -i "${LNUM}ilauncher_item_app = /usr/share/applications/remote.desktop" /etc/tint2/panel.tint2rc

The local build / run / push to dockerhub process is trivial:

cp Dockerfile_dit4c_x11 Dockerfile
docker build -t psychemedia/robtolab2 .
docker run -d -p 8082:8080 psychemedia/robtolab2
docker push psychemedia/robtolab2

As to what it looks like, here’s the home page:

Files can be uploaded to / downloaded from the container using the File Management link:

The Linux Desktop Session link takes you to a Linux desktop. Several tools are available in the toolbar at the top of the desktop, including a terminal and the RobotLab application.

Clicking on the RobotLab icon runs it under wine – on first run we seem to get an update message. Maybe I need to run wine as part of the start-up procedure to handle this as part of the build process?

As well as the RobotLab application, we can run other tools in the RobotLab suite, such as the Neural neural network package:

Note that I haven’t tested the student activities yet – this is still early days in just trying to work out what the tech requirements are and how the workflow / student user experience might play out…

Repo is here (it contains redundant RobotLab stuff such as drivers for the original Lego Mindstorms infra-red towers, but I haven’t yet worked out what can be dropped and what RobotLab requires…): ouseful-course-containers/ou-tm129-robotlab.

A container is also available on Dockerhub as ousefulcoursecontainers/ou-tm129-robotlab:dit4c but I haven’t had chance to check it yet… (it’s still in the build queue…).

The container can be imported into a desktop Docker environment using Kitematic:

robotlab_kitematic0

Once added, the container can also be run using Kitematic, with the user clicking on a link to take then directly to an appropriate location in their web browser:

robotlab_kitematic

The desktop / Kitematic route also enables file sharing with a local directory mounted into the container (though by the looks of it, I may need to tidy that up a little?).

The pieces are slowly starting to come together…?

Scripted Forms Magic

One of the things I think we could be looking at more in open ed in general, and the OU in particular, are authoring environments that allow educators to create their own interactives.

A significant blocker is the skills required to create and deploy such things:

  • activity design in general;
  • front end / UI coding (HTML / js)
  • back end coding (application logic, js, or py, for example)
  • runtime management (js can run in a browser, R or py likely to require a backend kernel to execute the code).

One of the things that appeals to me about the Jupyter and RStudio environments / ecosystems is the availability of packages that support end-user development by wrapping high level UI components as templated widgets that can be automatically generated from code. (See for example my own attempts at creating an HTML hexjson widget for rendering a hex map given an R dataframe.)

One of the things I started to come round to when creating various binderhub demos to show off how Jupyter notebooks can be used to generate and embed rich media assets for different topic areas (see slides 25-33 of the presentation from Dev.ac.uk embedded below, for example) was that a good collection of IPython magics could provide the basis for end-user developed online materials featuring embedded interactives.

For example, the py / folium magic I started working on makes it relatively straightforward to create an embedded map within a Jupyter notebook, that can then be saved and rendered as a standalone HTML page featuring an embedded interactive map.

It struck me that the magic should also work in the context of Scriptedforms, eg as per Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter.

And it does…

The markdown (text) file, scriptedforms_magic.md, on the left can be fired up as the HTML + interactive form on the right from a single command line command:

scriptedforms scriptedforms_magic.md

You can also do “live development” by editing and saving the markdown and then reloading the browser page.

By developing a sensible set of IPython magics, authors with some limited technical skills should be able to write enough “code”, as per the “code” in the markdown document above, to:

  • identify form elements;
  • bind those form elements as parameters in an IPython magic call.

After all, we keep saying everyone should be able to code… and that’s a level of coding where you can be immediately productive…

Everybody Should Code? Well, Educators Working in Online Learning in a “Digital First” Organisation Probably Should?

Pondering Sheila’s thoughts on Why I Don’t Code?, and mulling over some materials we’re prepping as revisions to the database course…

I see the update as an opportunity to try to develop the ed tech a bit, make it a bit more interactive, make it a bit “responsive” to students so they can try stuff out and get meaningful responses back that helps them check what they’ve been doing has worked. (I’ve called such approaches “self-marking” before, eg in the context of robotics, where it’s obvious if your robot has met the behavioural goal specified in some activity (eg go forwards and backwards the same distance five times, or traverse a square).)

Some the best activities in this regard, to my mind at least, can be defined as challenges, with a “low floor, high ceiling” as I always remember Mike Reddy saying in the context of educational robotics outreach. Eg how perfect a square, or how quickly can your robot traverse it. And can it reverse it?

Most kids/students can get the robot to do something, but how well can they get it to do something in particular, and how efficiently? There are other metrics too – how elegant does the code look, or how spaghetti like?!

So prepping materials, I keep getting sidetracked by things we can do to try to support the teaching. One bit I’m doing is getting students to link database tables together, so it seems sensible to try to find, or come up with, a tool that lets students make changes to the database and then look at the result in some meaningful way to check their actions have led to the desired outcome.

There are bits and bobs of things I can reuse, but they require:

1) gluing stuff together to make the student use / workflow appropriate. That takes a bit of code (eg to make some IPython magic that provides a oneliner capable of graphically depicting some updated configuration or other );
2) tweaking things that don’t work quite how want them, either in terms of presentation / styling, or functionality – which means being able to read other people’s code and change the bits we want to change;
3) keeping track of work – and thinking – in progress, logging issues, sketching out possible fixes, trying to explore the design space, and use space, including looking for alternative – possibly better – ways of achieving the same thing; which means trying to get other people’s code to work, which in turn means coming up with and trying out small examples, often based on reading documentation, looking for examples on Stack Overflow, or picking through tests to find out how to call what function with what sorts of parameter values.

Code stuff; very practical programmingy code-y stuff. That’s partly how I see code being used – on a daily basis – by educators working in a institution like the OU that claims to pride itself on digital innovation: educators coming up with small bits of code to implement micro digital innovations that directly support/benefit the teaching/learning materials and enhance them, technologically… Code as part of everyday practice for online digital educators. (Remember: “everyone should code”…)

It’s also stuff that tests my own knowledge of how things are supposed to work in principle – as well as practice. This plays out through trying to make sure that the tech remains relevant to the educational goal, and ideally helps reinforce it, as well as providing a means for helping the students act even more effectively as independent learners working at a distance…

And to my mind, it’s also part of our remit, if we’re supposed to be trying to make the most use of the whole digital thing… And if we can’t do this in a computing department, then where can – or should – we do it?

PS I should probably have said something about this being relevant to, or an example of, technology enhanced learning, but I think that phrase has been lost to folk who want to collect every bit of data they can from students? Ostensibly, this is for “learning analytics”, which is to say looking at ways of “optimising” students, rather than optimising our learning materials (by doing the basic stuff like looking at web stats as I’ve said repeatedly). I suspect there’s also an element of plan B – if we have a shed load of data, we can sell it to some one as part of every other fire sale.

Potential Issues With Institutionally Mediated Reproducible Research Environments

One of the advantages, for me, of the Jupyter Binderhub enviornment is that it provides with a large amount of freedom to create my own computational environment in the context of a potentially managed institutional service.

At the moment, I’m lobbying for an OU hosted version of Binderhub, probably hosted via Azure Kubernetes, for internal use in the first instance. (It would be nice if we could also be part of an open and federated MyBinder provisioning service, but I’m not in control of any budgets.) But in the meantime, I’m using the open MyBinder service (and very appreciative of it, too).

To test the binder builds locally, I use repo2docker, which is also used as part of the Binderhub build process.

What this all means is that I should be able to write – and test – notebooks locally, and know that I’ll be able to run them “institutionally” (eg on Binderhub).

However, one thing I noticed today was that notebooks in a binder container that was running okay, and that still builds and runs okay locally, have broken when run through Binderhub.

I think the error is a permissions error in creating temporary directories or writing temporary image files in either the xelatex commandline command used to generate a PDF from the LaTeX script, or the ImageMagick convert command used produce an image from the PDF which are both used as part of some IPython magic that renders LaTeX tikz diagram generating scripts. It certainly affects a couple of my magics. (It might be an issue with the way the magics are defined too. But whatever the case, it works for me locally but not “institutionally”.)

Broken notebook: https://mybinder.org/v2/gh/psychemedia/showntell/maths?filepath=Mechanics.ipynb
magic code: https://github.com/psychemedia/showntell/tree/maths/magics/tikz_magic
Error is something to do with the ImageMagick convert command not converting the .pdf to an image. At least one of issues seems to be that ghostscript is lost somewhere?

So here’s the issue. Whilst the notebooks were running fine in a container generated from an image that was itself created presumably before a Binderhub update, rebuilding the image (potentially without making any changes to the source Github repository) can lead to notebooks that were running fine to break.

Which is to say, there may be a dependency in the way a repository defines an environment on some of the packages installed by the repo2docker build process. (I don’t know if we can fully isolate out these dependencies by using a Dockerfile to define the environment rather than apt.txt and requirements.txt?)

This raises a couple of questions for me about dependencies:

  • what sort of dependency issues might there be in components or settings introduced by the jupyter2repo process, and how might we mitigate against these?
  • are there other aspects of the Binderhub process that can produce breaking changes that impact on notebooks running in a repository that specifies a computational environment run via Binderhub?

Institutionally, it also means that environments run via an institutionally supported Binderhub environment could break downstream environments (that is, ones run via Binderhub) through updates to the Binderhub environment.

This is a really good time for this to happen to me, I think, because it gives me more things to think about when considering the case for providing a Binderhub service institutionally.

On the other hand, it means I can’t update any of the other repos that use the tikz or asymptote magic until I find the fix because otherwise they will break too…

Should users of the institutional service, for example, be invited to define test areas in their Binder repositories (for example, using nbval) that the institution can use as test cases when making updates to the institutional service? If errors are detected through the running of these tests by the institutional service provider against their users’ tests, then the institutional service provider could explore whether the issue can be addressed by their update strategy, or alert the Binderhub user there may be breaking changes and how to explore what they are or mitigate against them. (That is, perhaps it falls to the institutional provider to centrally explore the likely common repercussions of a particular update and identify fixes to address them?)

For example, there might be dependencies on particular package version numbers. In this case, the user might then either want to update their own code, or add in a build requirement that regresses the package to the desired version. (Institutional providers might have something to say about that if the upgrade was for valid security reasons, though running things in isolation in containers should reduce that risk?) Lists of affected packages could also be circulated to other users using the same packages, along with mitigation strategies for coping with updates to the institutionally provided service.

There are also updating issues associated with a workflow strategy I am exploring around Binderhub which relates to using “base containers” to seed Binderhub builds (Note On My Emerging Workflow for Working With Binderhub). For example, if a build uses a “latest” tagged base image, any updates to that base image may break things built on top of it. In this case, mitigating against update risk to the base container is achieved by building from a specifically tagged version of the container. However, if an update to the Binderhub environment can break notebooks running on top of a particularly labelled base container, the fix for the notebooks may reside in making a fix to the environment in the base container (for example, which specifically acts to enforce a package version). This suggests that the base container might need doubly tagging – one tag paying heed to the downstream end users (“buildForExptXYZ”) – and the other that captures the upstream Binderhub environment (“BinderhubBuildABC”).

I’m also wondering know about where responsibility arises for maintaining the integrity of the user computing environment (that is, the local computational environment within which code in notebooks should continue to operate once the user has defined their environment). Which is to say, if there are changes to the wider environment that somehow break that local user environment, who should help fix it? If the changes are likely to impact widely, it makes sense to try to fix it once and then share the change, rather than expecting every user suffering from the break to have to find the fix independently?

Also, I’m wondering about classes of error that might arise. For example, ones that can be fixed purely by changing the environmental definition (baking package versions into config files, for example, which is probably best practice anyway) and ones that require changes to code in notebooks?

PS Hmm.. noting… are whitelists and blacklists also specifiable in Binderhub config? eg https://github.com/jupyterhub/mybinder.org-deploy/pull/239/files

binderhub:
  extraConfig:
    bans:
        c.GitHubRepoProvider.banned_specs = [
          '^GITHUBUSER/REPO.*'
        ]

Fragment – Virtues of a Programmer, With a Note On Web References and Broken URLs

Ish-via @opencorporates, I came across the “Virtues of a Programmer”, referenced from a Wikipedia page, in a Nieman Lab post by Brian Boyer on Hacker Journalism 101,, and stated as follows:

  • Laziness: I will do anything to work less.
  • Impatience: The waiting, it makes me crazy.
  • Hubris: I can make this computer do anything.

I can buy into those… Whilst also knowing (from experience) that any of the above can lead to a lot of, erm, learning.

For example, whilst you might think that something is definitely worth automating:

the practical reality may turn out rather differently:

The reference has (currently) disappeared from the Wikipedia page, but we can find it in the Wikipedia page history:

Larry_Wall_-_Wikipedia_old

The date of the NiemanLab article was 

Larry_Wall__Revision_history_-_Wikipedia

So here’s one example of a linked reference to a web resource that we know is subject to change and that has a mechanism for linking to a particular instance of the page.

Academic citation guides tend to suggest that URLs are referenced along with the date that the reference was (last?) accessed by the person citing the reference, but I’m not sure that guidance is given that relates to securing the retrievability of that resource, as it was accessed, at a later date. (I used to bait librarians a lot for not getting digital in general and the web in particular. I think they still don’t…;-)

This is an issue that also hits us with course materials, when links are made to third party references by URI, rather than more indirectly via a DOI.

I’m not sure to what extent the VLE has tools for detecting link rot (certainly, they used to; now it’s more likely that we get broken link reports from students failing to access a particular resource…) or mitigating against broken links.

One of the things I’ve noticed from Wikipedia is that it has a couple of bots for helping maintain link integrity: InternetArchiveBot and Wayback Medic.

Bots help preserve link availability in several ways:

  • if a link is part of a page, that link can be submitted to an archiving site such as the Wayback machine (or if it’s a UK resource, the UK National Web Archive);
  • if a link is spotted to be broken (header / error code 404), it can be redirected to the archived link.

One of the things I think we could do in the OU is add an attribute to the OU-XML template that points to an “archive-URL”, and tie this in with service that automatically makes sure that linked pages are archived somewhere.

If a course link rots in presentation, students could be redirected to the archived link, perhaps via a splash screen (“The original resource appears to have disappeared – using the archived link”) as well as informing the course team that the original link is down.

Having access to the original copy can be really helpful when it comes to trying to find out:

  • whether a simple update to the original URL is required (for example, the page still exists in its original form, just at a new location, perhaps because of a site redesign); or,
  • whether a replacement resource needs to be found, in which case, being able to see the content of the original resource can help identify what sort of replacement resource is required.

Does that count as “digital first”, I wonder???

PS See also: https://blog.ouseful.info/2017/03/14/computer-spirits/

PPS Handy tool for creating backup links – create a link that submits the link to the Internet Archive and adds a versionurl attribute to your anchor: https://robustlinks.mementoweb.org/# About: https://www.infodocket.com/2021/02/10/new-journal-article-robustifying-links-to-combat-reference-rot/ /via Stephen’s Lighthouse.

 

https://robustlinks.mementoweb.org/robustify/?anchor_text=Visualising%20Rally%20Stages&url=https%3A%2F%2Frallydatajunkie.com%2Fvisualising-rally-stages%2F

Scratch Materials – Using Blockly Style Resources in Jupyter Notebooks

One of the practical issues associated with using the Scratch desktop application (or it’s OU fork, OUBuild) for teaching programming is that runs on the desktop (or perhaps a tablet? It’s an Adobe Air app which I think runs on iOS?). This means that the instructional material is likely to be separated from the application, either as print or as screen based instructional material.

OUBuild

If delivered via the same screen as the application, there can be a screen real estate problem when trying to display both the instructional material and the application.

In OU Build, there can also be issues if you want to have two projects open at the same time, for example to compare a provided solution with your own solution, or to look at an earlier project as you create a new one. The solution is to provide two copies of the application, each running its own project.

Creating instructional materials can also be tricky, requiring the capturing of screenshots from the application and then inserting them in the materials, along with the attendant risk when it comes to updating the materials that screenshots as captured in the course materials may drift from the actuality of the views in the application.

So here are a couple of ways that we might be able to integrate Scratch like activities and guidance into instructional materials.

Calysto/Metakernel Jigsaw Extension for Jupyter Notebooks

The Calysto/Metakernel* Jigsaw extension for Jupyter notebooks wraps the Google Blockly package for use in a Jupyter notebook.

Program code is saved as an XML file, which means you can save and embed multiple copies of the editor within the same Jupyter notebook. This means an example programme can be provided in one embed, and the learner can build up the programme themselves in another, all in the same page.

The code cell input (the bit that contains the %jigsaw line) can be hidden using the notebook Hide Input Cell extension so only the widget is displayed.

The use of the editor is a bit tricky – it’s easy to accidentally zoom in and out, and I’m guessing not very accessible, but it’s great as a scratchpad, and perhaps as an instructional material authoring environment?

Live example on Binderhub

For more examples, see the original Jigsaw demo video playlist.

For creating instructional materials, we should be able to embed multiple steps of a programme in separate cells, hiding the code input cell (that is, the %jigsaw line) and then export or print off the notebook view.

LaTeX Scratch Package

The LaTeX Scratch package provides a way of embedding Blockly style blocks in a document through simple LaTeX script.

Using a suitable magic we can easily add scripts to the document (the code itself could be hidden using the notebook Hide Code Cell Input extension.

(Once again, the code cell input (the cell that contains the lines of LaTeX code) can be hidden using the notebook Hide Input Cell extension so only the rendered blocks are displayed.)

We can also create scripts in strings and then render those using line magic.

Live example on Binderhub

One thing that might be quite interesting is a parser that can take the XML generated from the Jigsaw extension and generate LaTeX script from it, as well as generating a Jigsaw XML file from the LaTeX script?

Historical Context

The Scratch rebuild – OU Build – used in the OU’s new level 1 introductory computing course is a cross platform, Adobe Air application. I’d originally argued that if the earlier taken decision to use a blocks style environment was irreversible, the browser based BlockPy (review and code) application might be a more interesting choice: the application was browser based, allowed users to toggle between blocks and Python code views, displayed Python errors messages in a simplified form, and used a data analysis, rather than animation, context, which meant we could also start to develop data handling skills.

BlockPy

One argument levelled against adopting BlockPy was that it looked to be a one man band in terms of support, rather than the established Scratch community. I’m not sure how much we benefit from, or are benefit to, the Scratch community though? If OU Build is a fork,  we may or may not be able to benefit from any future support updates to the Scratch codebase directly. I don’t think we commit back?

If the inability to render animations had also been a blocker, adding an animation canvas as well as the charting canvas would have been a possibility? (My actual preference was that we should do a bigger project and look to turn BlockPy into a Jupyter client.)

Another approach that is perhaps more interesting from a “killing two birds with one stone” perspective is to teach elementary programming and machine learning principles at the same time. For example, using something like Dale Lane’s excellent Scratch driven Machine Learning for Kids resources.

PS the context coda is not intended to upset, besmirch or provoke anyone involved with OUBuild. It’s self-contempt / self-critical, directed at myself for not managing to engage/advocate my position/vision in a more articulate or compelling way.

PPS new JupyterLab blockly extension with blocks to code and back again support: https://olney.ai/category/2020/01/20/intelliblocks.html Repo: aolney/fable-jupyterlab-blockly-extension

Maybe Programming Isn’t What You Think It Is? Creating Repurposable & Modifiable OERs

With all the “everyone needs to learn programming” hype around, I am trying to be charitable when it comes to what I think folk might mean by this.

For example, whilst trying to get some IPython magic working, I started having a look at TikZ, a LaTex extension that supports the generation of scientific and mathematical diagrams (and which has been around for decades…).

Getting LaTeX environments up and running can be a bit of a pain, but several of the Binderhub builds I’ve been putting together include LateX, and TikZ,  which means I have an install-free route trying snippets of TikZ code out.

As an example, in my showntell/maths demo includes an OpenLearn_Geometry.ipynb notebook that includes a few worked examples of how to “write” some of the figures that appear in an OpenLearn module on geometry.

From the notebook:

The notebook includes several hidden code cells that generate the a range of geometric figures. To render the images, go to the Cell menu and select Run All.

To view/hide the code used to generate the figures, click on the Hide/Reveal Code Cell Inputs button in the notebook toolbar.

To make changes to the diagrams, click in the appropriate code input cell, make your change, and then run the cell using the Run Cell (“Play”) button in the toolbar or via the keyboard shortcut SHIFT-ENTER.

Entering Ctrl-Z (or CMD-Z) in the code cell will undo your edits…

Launch the demo notebook server on Binder here.

Here’s an example of one of the written diagrams (there may be better ways; I only started learning how to write this stuff a couple of days ago!)

Whilst tinkering with this, a couple of things came to mind.

Firstly, this is programming, but perhaps not as you might have thought of it. If we taught adult novices some of the basic programming and coding skills using Tikz rather than turtle, they’d at least be able to create professional looking diagrams. (Okay, so the syntax is admittedly probably a bit scary and confusing to start with… But it could be simplified with some higher level, more abstracted, custom defined macros that learners could then peek inside.)

So when folk talk about teaching programming, maybe we need to think about this sort of thing as well as enterprise Java. (I spent plenty of time last night on the Stack Exchange TEX site!)

Secondly, the availability of things like Binderhub make it easier to build preloaded distributions that can be run by anyone, from anywhere (or at least, for as long as public Binderhub services exist). Simply by sharing a link, I can point you to a runnable notebook, in this case, the OpenLearn geometry demo notebook mentioned above.

One of the things that excites me, but I can’t seem to convince others about, is the desirability of constructing documents in the way the OpenLearn geometry demo notebook is constructed: all the assets displayed in the document are generated by the document. What this means is that if I want to tweak an image asset, I can do. The means of production – in the example, the TikZ code – is provide; it’s also editable and executable within the Binder Jupyter environment.

When HTML first appeared, web pages were shonky as anything, but there were a couple of buts…: the HTML parsers were forgiving, and would do their best to whatever corruption of HTML was thrown at them; and the browsers supported the ability to View Source (which still exists today; for example, in Chrome, go to the View menu then select Developer -> View Source).

Taken together, this meant that: a) folk could copy and paste other people’s HTML and try out tweaks to “cool stuff” they’d seen on other pages; b) if you got it wrong, the browser would have a go at rendering it anyway; you also wouldn’t feel as if you’d break anything serious by trying things out yourself.

So with things like Binder, where we can build disposable “computerless computing environments” (which is to say, pre-configured computing environments that you can run from anywhere, with just a browser to hand), there are now lots of opportunities to do powerful computer-ingy things (technical term…) from a simple, line at a time notebook interface, where you (or others) can blend notes and/or instruction text along with code – and code outputs.

For things like the OpenLearn demo notebook, we can see how the notebook environment provides a means by which educators can produce repurposeable documents, sharing not only educational materials for use by learners, or appropriation and reuse by other educators, but also the raw ingredients for producing customised forms of the sorts of diagrams contained in the materials: if the figure doesn’t have the labels you want, you can change them and re-render the diagram.

In a sense, sharing repurposeable, “reproducible” documents that contain the means to generate their own media assets (at least, when run in an appropriate environment: which is why Binderhub is such a big thing…) is a way of sharing your working. That is, it encourages open practice, and the sharing of how you’ve created something (perhaps even with comments in the “code” explaining why you’ve done something in a particular way, or where the inspiration/prior art came from), as well as the what of the things you have produced.

That’s it, for now… I’m pretty much burned out on trying to persuade folk of the benefits of any of this any more…

PS TikZ and PGF TikZ and PGF: TeX packages for creating graphics programmatically. Far more useful than turtle and Scratch?

Open Education Versions of Open Source Software: Adding Lightness and Accessibility to User Interfaces?

In a meeting a couple of days ago discussing some of the issues around what sort of resources we might want to provide students to support GIS (geographical information system) related activities, I started chasing the following idea…

The OU has, for a long time, developed software application in-house that is provided to students to support one or more courses. More often than not, the code is devloped and maintained in-house, and not released / published as open source software.

There are a couple of reasons for this. Firstly, the applications typically offer a clean, custom UI that minimises clutter and is designed in order to support usability for learners learning about a particular topic. Secondly, we require software provided by students to be accessible.

For example, the RobotLab software, originally developed, an still maintained, by my colleague Jon Rosewell was created to support a first year undergrad short course, T184 Robotics and the Meaning of Life, elements of which are still used in one of our level 1 courses today. The simulator was also used for many years to support first year undergrad residential schools, as well as a short “build a robot fairground” activity in the masters level team engineering course.

As well as the clean design, and features that support learning (such as a code stepper button in RobotLab that lets students step through code a line at a time), the interfaces also pay great attention to accessibility requirements. Whilst these features are essential for students with particular accessibility needs, they also benefit all out students by adding to the improved usability of the software as a whole.

So those are two, very good reasons, for developing software in-house. But as a downside, it means that we limit the exposure of students to “real” software.

That’s not to say all our courses use in-house software: many courses also provide industry standard software as part of the course offering. But this can present problems too: third party software may come with complex user interfaces, or interfaces that suffer from accessibility issues. And software versions used in the course may drift from latest releases if the software version is fixed for the life of the course. (In fact, the software version may be adopted a year before the start of the course and then expected to last for 5 years of course presentation). Or if software is updated, this may cause significant updates to be made to the course material wrapping the software.

Another issue with professional software is that much of it is mature, and has added features over its life. This is fine for early adopters: the initial versions of the software are probably feature light, and add features slowly over time, allowing the user to grow with them. Indeed, many latterly added features may have been introduced to address issues surrounding a lack of functionality, power or “expressiveness” in use identfied by, and frustrating to, the early users, particularly as they became more expert in using the application.

For a novice coming to the fully featured application, however, the wide range of features of varying levels of sophistication, from elementary, to super-power user, can be bewildering.

So what can be done about this, particularly if we want to avail ourselves of some of the powerful (and perhaps, hard to develop) features of a third party application?

To steal from a motorsport engineering design principle, maybe we can add lightness?

For example, QGIS is a powerful, cross-platform GIS application. (We have a requirement for platfrom neutrality; some of us also think we should be browser first, but let’s for now accept the use of an application that needs to be run on a computer with a “desktop” applciation system (Windows, OS/X, Linux) rather than one running a mobile operating system (iOS, Android) or eveloped for use by a netbook (Chrome OS).)

The interface is quite busy, and arguably hard to quickly teach around from a standing start:

However, as well as being cross-platform, QGIS also happens to be open source.

That is, the source code is available [github: qgis/QGIS].

 

Which means that as well as the code that does all the clever geo-number crunching stuff, we have access to the code that defines the user interface.

*[UPDATE: in this case, we don’t need to customise the UI by forking the code and changing the UI definition files – QGIS provides a user interface configuration / customisation tool.]

For example, if we look for some menu labels in he UI:

we can then search the source code to find the files that contribute to building the UI:

In turn, this means we can take that code, strip out all the menu options and buttons we don’t need for a particular course, and rebuild QGIS with the simplified UI. Simples. (Or maybe not that simples when you actually start getting into the detail, depending on how the software is designed!)

And if the user interface isn’t as accessible as we’d like it, we can try to improve that, and contribute the imporvements back the to parent project. The advantage there is that if students go on to use the full QGIS application outside of the course, they can continue to benefit from the accessiblity improvements. As can every other user, whether they have accessibility needs or not.

So here’s what I’m wondering: if we’re faced with the decision between wanting to use an open source, third party “real” application with usability and access issues, why build the custom learning app, especially if we’re going to keep the code closed and have to maintain it ourselves? Why not join the developer community and produce a simplified, accessible skin for the “real” application, and feed accessibility improvements at least back to the core?

On reflection, I realised we do, of course, do the first part of this already (forking and customising), but we’re perhaps not so good at the latter (contributing accessibility or alt-UI patterns back to the community).

For operational systems, OU developers have worked extensively on Moodle, for example (and I think, committed to the parent project)… And in courses, the recent level 1 computing course uses an OU fork of Scratch called OUBuild, a cross-platform Adobe Air application (as is the original), to teach basic programming, but I’m not sure if any of the code changes have been openly published anywhere, or design notes on why the original was not appropriate as a direct/redistributed download?

Looking at the Scratch open source repos, Scratch looks to be licensed under BSD 3-clause “New” or “Revised” License (“a permissive license similar to the BSD 2-Clause License, but with a 3rd clause that prohibits others from using the name of the project or its contributors to promote derived products without written consent”). Although it doesn’t have to be, I’m not sure the OUBuild source code has been released anywhere or whether commits were made back to the original project? (If you know differently, please let me know:-)) At the very least, it’d be really handy if there was a public document somewhere that identifies the changes that were made to the original and why, which could be useful from a “design learning” perspective. (Maybe there is a paper being worked up somewhere about the software development for the course?) By sharing this information, we could perhaps influence future software design, for example by encouraging developers to produce UIs that are defined from configuration files that can be easily customised and selected from, in that that users can often select language packs).

I can think of a handful of flippant, really negative reasons why we might not want to release code, but they’re rather churlish… So they’re hopefully not the reasons…

But there are good reasons too (for some definition of “good”..): getting code into a state that is of “public release quality”; the overheads of having to support an open code repository (though there are benefits: other people adding suggestions, finding bugs, maybe even suggesting fixes). And legal copyright and licensing issues. Plus the ever present: if we give X away, we’re giving part of the value of doing our courses away.

At the end of the day, seeing open education in part as open and shared practice, I wonder what the real challenges are to working on custom educational software in a more open and collaborative way?