Category: OU2.0

A Peek Inside the TM351 VM

So this is how I currently think of the TM351 VM:


What would be nice would be a drag’n’drop tool to let me draw pictures like that that would then generate the build scripts… (a docker compose script, or set of puppter scripts, for the architectural bits on the left, and a Vagrantfile to set up the port forwarding, for example).

For docker, I wouldn’t have thought that would be too hard – a docker compose file could describe most of that picture, right? Not sure how fiddly it would be for a more traditional VM, though, depending on how it was put together?

Running Executable Jupyter/IPython Notebooks Directly from Github With Binder

It’s taken me way too long to get round to posting this, but it’s a compelling idea that I think more notice should be taken of… binder ([code]).

The idea is quite simple – specify a public github project (username/repo) that contains one or more Jupyter (IPython) notebooks, hit “go”, and the service will automatically create a docker container image that includes a Jupyter notebook server and a copy of the files contained in the repository.


(Note that you can specify any public Github repository – it doesn’t have to be one you have control over at all.)

Once the container image is created, visiting will launch a new container based on that image and display a Jupyter notebook interface at the redirected to URL. Any Jupyter notebooks contained within the original repository can then be opened, edited and executed as an active notebook document.

What this means is we could pop a set of course related notebooks into a repository, and share a link to Whenever the link is visited, a container is fired up from the image and the user is redirected to that container. If I go to the URL again, another container is fired up. Within the container, a Jupyter notebook server is running, which means you can access the notebooks that were hosted in the Github repo as interactive, “live” (that is, executable) notebooks.

Alternatively, a user could clone the original repository, and then create a container image based on their copy of the repository, and then launch live notebooks from their own repository.

I’m still trying to find out what’s exactly going on under the covers of the binder service. In particular, a couple of questions came immediately to mind:

  • how long do containers persist? For example, at the moment we’re running a FutureLearn course (Learn to Code for Data Analysis) that makes use of IPython/Jupyter notebooks (, but it requires learners to install Anaconda (which has caused a few issues). The course lasts 4 weeks, with learners studying a couple of hours a day maybe two days a week. Presumably, the binder containers are destroyed as a matter of course according to some schedule or rule – but what rule? I guess learners could always save and download their notebooks to the desktop and then upload them to a running server, but it would be more convenient if they could bookmark their container and return to it over the life of the course? (So for example, if Futurelearn was operating a binder service, joining the course could provide authenticated access to a container at for the duration of the course, and maybe a week or two after? Following ResBaz Cloud – Containerised Research Apps as a Service, it might also allow for a user to export a copy of their container?)
  • how does the system scale? The FutureLearn course has several thousand students registered to it. To use the binder approach towards providing any student who wants one with a web-accessible, containerised version of the notebook application so they don’t have to insall one of their own, how easily would it scale? eg how easy is it to give a credit card to some back-end hosting company, get some keys, plug them in as binder settings and just expect it to work? (You can probably guess at my level devops/sysadmin ability/knowledge!;-)

Along with those immediate questions, a handful of more roadmap style questions also came to mind:

  • how easy would it be to set up the Jupyter notebook system to use an alternative kernel? e.g. to support a Ruby or R course? (I notice that offers a variety of kernels, for example?)
  • how easy would it be to provide alternative services to the binder model? eg something like RStudio, for example, or OpenRefine? I notice that the binder repository initialisation allows you to declare the presence of a custom Dockerfile within the repo that can be used to fire up the container – so maybe binder is not so far off a general purpose docker-container-from-online-Dockerfile launcher? Which could be really handy?
  • does binder make use of Docker Compose to tie multiple applications together, as for example in the way it allows you to link in a Postgres server? How extensible is this? Could linkages of a similar form to arbitrary applications be configured via a custom Dockerfile?
  • is closer integration with github on the way? For example, if a user logged in to binder with github credentials, could files then saved or synched back from the notebook to that user’s corresponding repository?

Whatever – will be interesting to see what other universities may do with this, if anything…

See also Seven Ways of Running IPython Notebooks and ResBaz Cloud – Containerised Research Apps as a Service.

PS I just noticed an interesting looking post from @KinLane on API business models: I Have A Bunch Of API Resources, Now I Need A Plan, Or Potentially Several Plans. This has got me wondering: what sort of business plan might support a “Studyapp” – applications on demand, as a service – form of hosting?

Several FutureLearn courses, for all their web first rhetoric, require studentslearners to install software onto their own computers. (From what I can tell, FutureLearn aren’t interested in helping “partners” do anything that takes eyeballs away from So I don’t understand why they seem reluctant to explore ways of using tech to provide interactive experiences within the FutureLearn context, like using embedded IPython notebooks, for example. (Trying to innovate around workflow is also a joke.) And IMVHO, the lack of innovation foresight within the OU itself (FutureLearn’s parent…) seems just as bad at the moment… As I’ve commented elsewhere, “[m]y attitude is that folk will increasingly have access to the web, but not necessarily access to a computer onto which they can install software applications. … IMHO, we are now in a position where we can offer students access to “computer lab” machines, variously flavoured, that can run either on a student’s own machine (if it can cope with it) or remotely (and then either on OU mediated services or via a commercial third party on which students independently run the software). But the lack of imagination and support for trying to innovate in our production processes and delivery models means it might make more sense to look to working with third parties to try to find ways of (self-)supporting our students.”. (See also: What Happens When “Computers” Are Replaced by Tablets and Phones?) But I’m not sure anyone else agrees… (So maybe I’m just wrong!;-)

That said, it’s got me properly wondering – what would it take for me to set up a service that provides access to MOOC or university course software, as a service, at least, for uncustomised, open source software, accessible via a browser? And would anybody pay to cover the server costs? How about if web hosting and a domain was bundled up with it, that could also be used to store copies of the software based activities once the course had finished? A “personal, persistent, customised, computer lab machine”, essentially?

Possibly related to this thought, Jim Groom’s reflections on The Indie EdTech Movement, although I’m thinking more of educators doing the institution stuff for themselves as a way of helping the students-do-it-for-themselves. (Which in turn reminds me of this hack around the idea of THEY STOLE OUR REVOLUTION LEARNING ENVIRONMENT. NOW WE’RE STEALING IT BACK !)

ResBaz Cloud – Containerised Research Apps as a Service

Just over three years or so ago, the OU’s KMi started experimenting with a service to support researchers that made RStudio – and a linked MySQL database – available as on online service (Open Research Data Processes: KMi Crunch – Hosted RStudio Analytics Environment).

I’m not sure if they’ve also started exploring the provision of other browser accessed applications – Jupyter noteboooks, for example – but developing online personal application delivery models is something I’ve felt the OU should be exploring for a long time – for undergraduate and postgraduate teaching, as well as research.

I don’t know whether KMi have been looking at delivering apps via self-service launching of dockerised/containerised applications, or whether there are any HE or Research Council infrastructure projects looking at supporting this sort of thing, but it seems that other enlightened agencies are… For example, a few weeks ago I came across a tweet from ex-JISC disrupter Dave Flanders mentioning the Australian ResBaz cloud service:


Offering a free service to the Australian academic research community (I’m grateful to the team for providing me with reviewer access:-), early stage researchers can request access (or configure access?) to a named research cluster, and then deploy containers to it:


The containerised applications on offer are initially configured by the ResBaz team – I don’t think there’s a way of pointing to your own Dockerfile/setupconfig/image on Dockerhub – but this means there is an element of support that will help you get set up with an application that you know will run!


The containers you create persist – you can turn them off and on again, as well as deleting them and creating new ones – which means you can save project and data files within the container. There’s also an option to export the container, supports portability, I guess.

The platform itself is reminiscent of a minimal take on something like, which provides access to a hosted version of IPython notebooks within a claimed workbench environment. To my mind, KMi Crunch as more of a workbench feel to it, because it provides application integration, (RSTudio + MySQL), albeit baked in. At the moment, ResBaz doesn’t seem to offer that. (However, another service that I’ll be blogging about in a day or two, binder, does provide support for 1-click created linked containers (although again, the configuration options are limited). I think binder is builds on elements of, which itself demonstrates support for a full blown Jupyter install capable of running several kernels, which may be something for the ResBaz folk to think about (for example, offering at least an R kernel within the notebooks, and maybe Python 3 as well as Python 2.7?)


One of the great things about the ResBaz set-up seems to be its support for training events. From my own personal experience, it’s really handy to be able to point workshop participants to online, browser reachable versions of the applications covered in the workshop you’re running.

For OU teaching, I think we really should be looking seriously at using software packages that can be accessed via a browser and run either as a local virtualised service or as a remotely hosted service to try to mitigate against software install issues/hassles. For OU postgrad research students, I think that running applications via containers has a lot to recommend it. And for academic researchers, including the growing number of digital humanities researchers, I think that the range of benefits associated with being able to run research software using what is essentially as software-application-as-a-service model are increasing.

But then, what do I know? I just watched a bunch of folk wasting much of the day trying to work out how to support a raft of remote, informal learners install some remotely hosted and maintained third party s/w onto all manner of personally managed weird and wonderful Windows machines. (The ones on company machines tend not to have the privileges they need to install the software, so we just forget about them. The ones on notebooks wondering why their machines start to fall over when they have to run more than a browser, or the ones who have tablets that can’s install anything other than custom built applications, are also discounted… If the OU is set on becoming a global, online provider, someone needs to start thinkingdoing something about this…)

See also: Seven Ways of Running IPython Notebooks

Exporting and Distributing Docker Images and Data Container Contents

Although it was a beautiful day today, and I should really have spent it in the garden, or tinkering with F1 data, I lost the day to the screen and keyboard pondering various ways in which we might be able to use Kitematic to support course activities.

One thing I’ve had on pause for some time is the possibility of distributing docker images to students via a USB stick, and then loading them into Kitematic. To do this we need to get tarballs of the appropriate images so we could then distribute them.

docker save psychemedia/openrefine_ou:tm351d2test | gzip -c > test_openrefine_ou.tgz
docker save psychemedia/tm351_scipystacknserver:tm351d3test | gzip -c > test_ipynb.tgz
docker save psychemedia/dockerui_patch:tm351d2test | gzip -c > test_dockerui.tgz
docker save busybox:latest | gzip -c > test_busybox.tgz
docker save mongo:latest | gzip -c > test_mongo.tgz
docker save postgres:latest | gzip -c > test_postgres.tgz

On the to do list is getting to these to with the portable Kitematic branch (I’m not sure if that branch will continue, or whether the interest is too niche?!), but in the meantime, I could load it into the Kitematic VM from the Kitematice CLI using:

docker load < test_mongo.tgz

assuming the test_mongo.tgz file is in the current working directory.

Another I need to explore is how to get the set up the data volume containers on the students’ machine.

The current virtual machine build scripts aim to seed the databases from raw data, but to set up the student machines it would seem more sensible to either rebuild a database from a backup, or just load in a copy of the seeded data volume container. (All the while we have to be mindful of providing a route for the students to recreate the original, as distributed, setup, just in case things go wrong. At the same time, we also need to start thing about backup strategies for the students so they can checkpoint their own work…)

The traditional backup and restore route for PostgreSQL seems to be something like the following:

#Use docker exec to run a postgres export
docker exec -t vagrant_devpostgres_1 pg_dumpall -Upostgres -c &gt; dump_`date +%d-%m-%Y"_"%H_%M_%S`.sql
#If it's a large file, maybe worth zipping: pg_dump dbname | gzip > filename.gz

#The restore route would presumably be something like:
cat postgres_dump.sql | docker exec -i vagrant_devpostgres_1 psql -Upostgres
#For the compressed backup: cat postgres_dump.gz | gunzip | psql -Upostgres

For mongo, things seem to be a little bit more complicated. Something like:

docker exec -t vagrant_mongo_1 mongodump

#Complementary restore command is: mongorestore

would generate a dump in the container, but then we’d have to tar it and get it out? Something like these mongodump containers may be easier? (mongo seems to have issues with mounting data containers on host, on a Mac at least?

By the by, if you need to get into a container within a Vagrant launched VM (I use vagrant with vagrant-docker-compose), the following shows how:

#If you need to get into a container:
vagrant ssh
#Then in the VM:
  docker exec -it CONTAINERNAME bash

Another way of getting to the data is to export the contents of the seeded data volume containers from the build machine. For example:

#  Export data from a data volume container that is linked to a database server

docker run --volumes-from vagrant_devpostgres_1 -v $(pwd):/backup busybox tar cvf /backup/postgresbackup.tar /var/lib/postgresql/data 

#I wonder if these should be run with --rm to dispose of the temporary container once run?

docker run --volumes-from vagrant_mongo_1 -v $(pwd):/backup busybox tar cvf /backup/mongobackup.tar /data/db

We can then take the tar file, distribute it to students, and use it to seed a data volume container.

Again, from the Kitematic command line, I can run something like the following to create a couple of data volume containers:

#Create a data volume container
docker create -v /var/lib/postgresql/data --name devpostgresdata busybox true
#Restore the contents
docker run --volumes-from devpostgresdata -v $(pwd):/backup ubuntu sh -c "tar xvf /backup/postgresbackup.tar"
#Note - the docker helpfiles don't show how to use sh -c - which appears to be required...
#Again, I wonder whether this should be run with --rm somewhere to minimise clutter?

Unfortunately, things don’t seem to run so smoothly with mongo?

#Unfortunately, when trying to run a mongo server against a data volume container
#the presence of a mongod.lock seems to break things
#We probably shouldn't do this, but if the database has settled down and completed
#  all its writes, it should be okay?!
docker run --volumes-from vagrant_mongo_1 -v $(pwd):/backup busybox tar cvf /backup/mongobackup.tar /data/db --exclude=*mongod.lock
#This generates a copy of the distributable file without the lock...

#Here's an example of the reconstitution from the distributable file for mongo
docker create -v /data/db --name devmongodata busybox true
docker run --volumes-from devmongodata -v $(pwd):/backup ubuntu sh -c "tar xvf /backup/mongobackup.tar"

(If I’m doing something wrong wrt the getting the mongo data out of the container, please let me know… I wonder as well with the cavalier way I treat the lock file whether the mongo container should be started up in repair mode?!)

If have a docker-compose.yml file in the working directory like the following:

  image: mongo
    - "27017:27017"
    - devmongodata

##We DO NOT need to declare the data volume here
#We have already created it
#Also, if we leave it in, a "docker-compose rm" command
#will destroy the data volume container...
#...which means we wouldn't persist the data in it
#    command: echo created
#    image: busybox
#    volumes: 
#        - /data/db

We can the run docker-compose up and it should fire up a mongo container and link it to the seeded data volume container, making the data contains in that data volume container available to us.

I’ve popped some test files here. Download and unzip, from the Kitematic CLI cd into the unzipped dir, create and populate the data containers as above, then run: docker-compose up

You should be presented with some application containers including OpenRefine and an OU customised IPython notebook server. You’ll need to mount the IPython notebooks folder onto the unzipped folder. The example notebook (if everything works!) should show demonstrate calls to prepopulated mongo and postgres databases.


Fighting With docker – and Pondering Innovation in an Institutional Context

I spent my not-OU day today battling with trying to bundle up a dockerised VM, going round in circles trying simplify things a bit, and getting confused by docker-compose not working quite so well following an upgrade.

I think there’s still some weirdness going on (eg in docker-ui showing messed container names?) but I’m now way too confused to care or try to unpick it…

I also spent a chunk of time considering the 32 bit problem, but got nowhere with it…. Docker is predominantly a 64 bit thing, but the course has decided in it’s wisdom that we have to support 32 bit machines, which means I need to find a way of getting a 32 bit version of docker into a base box (apt-get install I think?), finding way of getting the vagrant docker provisioner to use it (would an alias help?), and checking that vagrant-docker-compose works in a 32 bit VM, then tracking down 32 docker images for PostgreSQL, MongoDB, dockerUI and OpenRefine (or finding build files for them so I can build my own 32 bit images).

We then need to be able to test the VM in a variety of regimes: 32 bit O/S on a 32 bit machine, 32 bit O/S on a 64 bit machine, 64 bit O/S on a 64 bit machine, with a variety of hardware virtualisation settings we might expect on students’ machines. I’m on a highly specced Macbook Pro, though, so my testing is skewed…

And I’m not sure I have it in me to try to put together 32 bit installs…:-( Perhaps that’s what LTS are for…?!;-)

(I keep wondering if we could get access to stats about the sorts of machines students are using to log in to the OU VLE from the user-agent strings of their browsers that can be captured in webstats? And take that two ways: 1) look to see how it’s evolving over time; 2) look to see what the profile of machines is for students in computing programmes, particular those coming up to level 3 option study? That’s the sort of pratical, useful data that could help inform course technology choices but that doesn’t have learning analytics buzzword kudos or budget attached to it though, so I suspect it’s not often championed…)

When LTS was an educational software house, I think there was also more opportunity, support and willingness to try to explore what the technology might be able to do for us and OUr students? Despite the continual round of job ads to support corporate IT, I fear that exploring the educational uses of software has not had much developer support in recent years…

As an example of the sort of thing I think we could explore – if only we could find a forum to do so – is the following docker image that contains an OU customised IPython notebook: psychemedia/ouflpy_scipystacknserver

The context is a forthcoming FutureLearn course on introductory programming. We’re currently planning on getting students to use Anaconda to run the IPython Notebooks that provide the programming environment for the course, but I idly wondered what a Kitematic route might be like. (The image is essentially the scipystack and notebook server with a few notebook extensions and OU customisations installed.)

There are some sample (testing) notebooks here that illustrate some of the features.

Here’s the installation recipe:

– download and unzip the notebooks (double click the downloaded file) and keep a note of where you unzipped the notebook directory to.

– download and install Kitematic. Ths makes use of docker and Virtualbox – but I think it should install them both for you if you don’t already have them installed.

– start Kitematic, search for psychemedia/ouflpy_scipystacknserver and create an application container.


It should download and start up automatically.

When it’s running, click on the Notebooks panel and Enable Volumes. This allows the container to see a folder on your computer (“volumes” are a bit like folders that can be aliased or mapped on to other folders across devices).


Click the cog (settings) symbol in the Notebooks panel to get to the Volumes settings. Select the directory that you created when you unzipped the downloaded notebooks bundle.


Click on the Ports tab. If you click on the link that’s displayed, it should open an IPython notebook server homepage in your browser.


Here’s what you should see…


Click on a notebook link to open the notebook.


The two demo notebooks are just simple demonstrations of some custom extensions and styling features I’ve been experimenting with. You should be able to create you own notebooks, open other people’s notebooks, etc.

You can also run the container in the cloud. Tweak the following recipe to try it out on Digital Ocean: Getting Started With Personal App Containers in the Cloud or Running RStudio on Digital Ocean, AWS etc Using Tutum and Docker Containers. (That latter example you could equally well run in Kitematic – just search for and install rocker/rstudio.)

The potential of using containers still excites me, even after 6 months or so of messing around the fringes of what’s possible. In the case of writing a new level computing course with a major practical element, limiting ourselves to a 32 bit build seems a backward step to me? I fully appreciate the need to to make our courses as widely accessible as possible, and in an affordable a way as possible (ahem…) but here’s why I think supporting 32 bit machines in for a new level 3 computing course is a backward step.

In the first case, I think we’re making life harder for OUrselves. (Trying to offer backwards compatibility is prone to this.) Docker is built for 64 bit and most of the (reusable) images are 64 bit. If we had the resource to contribute to a 32 bit docker ecosystem, that might be good for making this sort of technology accessible more widely internationally, as well as domestically, but I don’t think there’s the resource to do that? Secondly, we arguably worsen the experience for students with newer, more powerful machines (though perhaps this could be seen as levelling the playing field a bit?) I always liked the idea of making use of progressive enhancement as a way of trying to offer the best possible experience using the technology they have, though we’d always have to ensure we weren’t then favouring some students over others. (That said, the OU celebrates diversity across a whole range of dimensions in every course cohort…)

Admittedly, students on a computing programme may well have bought a computer to see them through their studies – if the new course is the last one they do, that might mean the machine they bought for their degree is now 6 years old. But on the other hand, students buying a new computer recently may well have opted for an affordable netbook, or even a tablet computer, neither of which can support the installation of “traditional” software applications.

The solution I’d like to explore is a hybrid offering, where we deliver software that makes use of browser based UIs and software services that communicate using standard web protocols (http, essentially). Students who can install software on their computers can run the services locally and access them through their browser. Students who can’t install the software (because they have an older spec machine, or a newer netbook/tablet spec machine, or who do their studies on a public access machine in a library, or using an IT crippled machine in their workplace (cough, optimised desktop, cOUgh..) can access the same applications running in the cloud, or perhaps even from one or more dedicated hardware app runners (docker’s been running on a Raspberry Pi for some time I think?). Whichever you opt for, exactly the same software would be running inside the container and exposing it in the same way though a browser… (Of course, this does mean you need a network connection. But if you bought a netbook, that’s the point, isn’t it?!)

There’s a cost associated with running things in the cloud, of course – someone has to pay for the hosting, storage and bandwidth. But in a context like FutureLearn, that’s an opportunity to get folk paying and then top slice them with a (profit generating…) overhead, management or configuration fee. And in the context of the OU – didn’t we just get a shed load of capital investment cash to spend on remote experimentation labs and yet another cluster?

There are also practical consequences – running apps on you own machine makes it easier to keep copies of files locally. When running in the cloud, the files have to live somewhere (unless we start exploring fast routes to filesharing – Dropbox can be a bit slow at synching large files, I think…)

Anyway – docker… 32 bit… ffs…

If you give the container a go, please let me know how you get on… I did half imagine we might be able to try this for a FutureLearn course, though I fear the timescales are way too short in OU-land to realistically explore this possibility.

Literate DevOps? Could We Use IPython Notebooks To Build Custom Virtual Machines?

A week or so ago I came across a couple of IPython notebooks produced by Catherine Devlin covering the maintenance and tuning of a PostgreSQL server: DB Introspection Notebook (example 1: introspection, example 2: tuning, example 3: performance checklist). One of the things we have been discussing in the TM351 course team meetings is the extent to which we “show our working” to students in terms how the virtual machine and the various databases used in the course were put together, even if we don’t actually teach that stuff.

Notebooks make an ideal way of documenting the steps taken to set up a particular system, blending commentary with command line as well as code executable cells.

The various approaches I’ve explored to build the VM have arguably been over-complex – vagrant, puppet, docker and docker-compose – but I’ve always seen the OU as a place where we explore the technologies we’re teaching – or might teach – in the context of both course production and course delivery (that is, we can often use a reflexive approach whereby the content of the teaching also informs the development and delivery of the teaching).

In contrast, in A DevOps Approach to Common Environment Educational Software Provisioning and Deployment I referred to a couple of examples of a far simpler approach, in which common research, or research and teaching, VM environments were put together using simple scripts. This approach is perhaps more straightforwardly composable, in that if someone has additional requirements of the machine, they can just add a simple configuration script to bring in those elements.

In our current course example, where the multiple authors have a range of skill and interest levels when it comes to installing software and exploring different approaches to machine building, I’m starting to wonder whether I should have started with a simple base machine running just an IPython notebook server and no additional libraries or packages, and then created series of notebooks, one for each part of the course (which broadly breaks down to one part per author), containing instructions for installing all the bits and pieces required for just that part of the course. If there’s duplication across parts, trying to install the same thing for each part, that’s fine – the various package managers should be able to cope with that. (The only issue would arise if different authors needed different versions of the same package, for some reason, and I’m not sure what we’d do in that case?)

The notebooks would then include explanatory documentation and code cells to install Linux packages and python packages. Authors could take over the control of setup notebooks, or just make basic requests. At some level, we might identify a core offering (for example, in our course, this might be the inclusion of the pandas package) that might be pushed up into a core configuration installation notebook executed prior to the installation notebook for each part.

Configuring the machine would then be a case of running the separate configuration notebooks for each part (perhaps preceded by a core configuration notebook), perhaps by automated means. For example, ipython nbconvert --to=html --ExecutePreprocessor.enabled=True configNotebook_1.ipynb will [via StackOverflow]. This generates an output HTML report from running the code cells in the notebook (which can include command line commands) in a headless IPython process (I think!).

The following switch may also be useful (it clears the output cells): ipython nbconvert --to=pdf --ExecutePreprocessor.enabled=True --ClearOutputPreprocessor.enabled=True RunMe.ipynb (note in this case we generate a PDF report).

To build the customised VM box, the following route should work:

– set up a simple Vagrant file to import a base box
– install IPython into the box
– copy the configuration notebooks into the box
– run the configuration notebooks
– export the customised box

This approach has the benefits of using simple, literate configuration scripts described within a notebook. This makes them perhaps a little less “hostile” than shell scripts, and perhaps makes it easier to build in tests inline, and report on them nicely. (If a running a cell results in an error, I think the execution of the notebook will stop at that point?) The downside is that to run the notebooks, we also need to have IPython installed first.