Eighteen months or so ago, I started looking at ways in which we might use a virtual machine to bundle up a variety of interoperating software applications for a distance education course on databases and data management. (This VM would run IPython notebooks as the programming surface, PostgreSQL and MongoDB as the databases. I was also keen that OpenRefine should be made available, and as everything in the VM was being accessed via a browser, I added a browser based terminal app (tty.js) to the mix as well). The approach I started to follow was to use vagrant as a provisioner and VM manager, and puppet scripts to build the various applications. One reason for this approach is that the OU is an industrial scale educator, and (to my mind) it made sense to explore a model that would support the factory line production model we have in a way that would scale vertically as a way of maintaining VMs for a course that runs over several ways as well as horizontally across other courses with other software application requirements. You can see how my thinking evolved across the following posts: posts tagged “VM” on OUseful.info.
Since then, a lot has changed. IPython notebooks have forked into the Jupyter notebook server and IPython, and Jupyter has added a browser based terminal app to the base offerings of the notebook server. (It’s not as good a flexible as tty.js, which allowed for multiple terminals in the same browser window, but I guess there’s nothing to stop you loading multiple terminals into separate browser tabs.) docker has also become a thing…
To recap on some of thinking about how we might provide software to students, I was pre-occupied at various times with the following (not necessarily exhaustive) list of considerations:
- how could we manage the installation and configuration of different software applications on students’ self-managed, remote computers, running arbitrary versions of arbitrary operating systems on arbitrarily specced machines over networks with unknown and perhaps low bandwidth internet connections;
- how could we make sure those applications interoperated correctly on the students’ own machines;
- how could we make sure the students retained access to local copies of all the files they had created as part of their studies, and that those local copies would be the ones they actually worked on in the provided software applications; (so for example, IPython notebook files, and perhaps even database data directories);
- how could we manage the build of each application in the OU production context, with OU course teams requiring access to a possibly evolving version of the machine 18 months in advance of student first use date and an anticipated ‘gold master’ freeze date on elements of the software build ~9 months prior to students’ first use;
- how could we manage the maintenance of VMs within a single presentation of a 9 month long course and across several presentations of the course spanning 1 presentation a year over a 5 year period;
- how could the process support the build and configuration of the same software application for several courses (for example, an OU-standard PostgreSQL build);
- how could the same process/workflow support the development, packaging, release to students, maintenance workflow for other software applications for other courses;
- could the same process be used to manage the deployment of application sets to students on a cloud served basis, either through a managed OU cloud, or on a self-served basis, perhaps using an arbitrary cloud service provider.
All this bearing in mind that I know nothing about managing software packaging, maintenance and deployment in any sort of environment, let alone a production one…;-) And all this bearing in mind that I don’t think anybody else really cares about any of the above…;-)
Having spent a few weeks away from the VM, I’m now thinking that we would be better served by using a more piecemeal approach based around docker containers. These still require the use of something like Virtualbox, but rather than using vagrant to provision the necessary environment, we could use more of an appstore approach to starting and stopping services. So for example, today I had a quick play with Kitematic, a recent docker acquisition, and an app that doesn’t run on Windows yet but for which Windows supported is slated for June, 2015 in the Kitematic roadmap on github…
So what’s involved? Install Kitematic (if Virtualbox isn’t already installed, I think it’ll grab it down for you?) and fire it up…
It starts up a dockerised virtual machine into which you can install various containers. Next up, you’re presented with an “app dashboard”, as well as the ability to search dockerhub for additional “apps”:
Find a container you want, and select it – this will download the required components and fire up the container.
The port tells you where you can find any services exposed by the container. In this case, for scipyserver, it’s an IPython notebook (HTML app) running on top of a scipy stack.
By default the service runs over https with a default password; we can go into the Settings for the container, reset the Jupyter server password, force it to use http rather than https, and save to force the container to use the new settings:
So for example…
In the Kitematic container homepage, if I click on the notebooks folder icon in the Edit Files panel, I can share the notebook folder across to my host machine:
I can also choose what directory on host to use as the shared folder:
I can also discover and fire up some other containers – a PostgreSQL database, for example, as well as a MongoDB database server:
From within my notebook, I can install additional packages and libraries and then connect to the databases. So for example, I can connect to the PostgreSQL database:
or to mongo:
Looking at the container Edit Files settings, it looks like I may also be able to share across the database datafiles – though I’m not sure how this would work if I had a default database configuration to being with? (Working out how to pre-configure and then share database contents from containerised DBMS’ is something that’s puzzled me for a bit and something I haven’t got my head round yet).
So – how does this fit into the OU model (that doesn’t really exist yet?) for using VMs to make interoperating software collections available to students on their own machines?
First up, no Windows support at the moment, though that looks like it’s coming; secondly, the ability to mount shares with host seems to work, though I haven’t tested what happens if you shutdown and start up containers, or delete a scipyserver container and then fire up a clean replacement for example. Nor do I know (yet?!) how to manage shares and pre-seeding for the database containers. One original argument for the VM was that interoperability between the various software applications could be hardwired and tested. Kitematic doesn’t support fig/Docker compose (yet?) but it’s not too hard to lookup up the addresses paste them into a notebook. I think it does mean we can’t provide hard coded notebooks with ‘guaranteed to work’ configurations (i.e. ones prewritten with service addresses and port numbers) baked in, but it’s not too hard to do this manually. In the docker container Dockerfiles, I’m not sure if we could fix the port number mappings to initial default values?
One thing we’d originally envisioned for the VM was shipping it on a USB stick. It would be handy to be able to point Kitematic to a local dockerhub, for example, a set of prebuilt containers on a USB stick with the necessary JSON metadata file to announce what containers were available there, so that containers could be installed from the USB stick. (Kitematic currently grabs the container elements down from dockerhub and pops the layers into the VM (I assume?), so it could do the same to grab them from the USB stick?) In the longer term, I could imagine an OU branded version of Kitematic that allows containers to be installed from a USB stick or pulled down from an OU hosted dockerhub.