Ever since I joined the OU, I’ve believed in trying to deliver distance education courses in an agile and responsive way, which is to say: making stuff up for students whilst the course is in presentation.
This is generally not done (by course/module teams at least) because the aim of most course/module teams is to prepare the course so thoroughly that it can “just” be presented to students.
I personally think we should try to improve the student experience of the course as it presents if we can by being responsive and reactive to student questions and issues.
So… TM351, the data management course that uses a VM, has started again, and issues / questions are already starting to hit the forums.
One of the questions – which I’d half noted but never really thought through in previous presentations (my not iterating/improving the course experience in, or between, previous presentations) – related to sharing Jupyter notebooks across different machines using Google Drive (equally, Dropbox or Microsoft OneDrive).
The VirtualBox VM we use is fired up using the vagrant provisioner. A Vagrantfile defines various configuration settings – which ports are exposed by the VM, for example. By default, the contents of the folder in which vagrant is started up in are shared into the VM. At the same time, vagrant creates a hidden
.vagrant folder that contains state relating to the instance of that VM.
The set up on a single machine is something like this:
If a student wants to work across several machines, they need to share their working course files (Jupyter notebooks, and so on) but not the VM machine state. Which is to say, they need a set up more like the following:
For students working across several machines, it thus makes sense to have all project files in one folder and a separate
.vagrant settings folder on each separate machine.
Checking the vagrant docs, it seems as if this is quite manageable using the synced folder configuration settings.
The default copies the current project folder (containing the vagrantfile and from which vagrant is rum), which I’m guessing is a setting something like:
config.vm.synced_folder "./", "/vagrant"
By explicitly setting this parameter, we can decide how we want the mapping to occur. For example:
config.vm.synced_folder "/PATH/ON/HOST", "/vagrant"
allows you to to specify the folder you want to share into the VM. Note that the
/PATH/ON/HOST folder needs to be created before trying to share it.
To put the new shared directory into effect, reload and reprovision the VM. For example:
vagrant reload --provision
Student notebooks located in the notebooks folder of that shared directory should now be available in the VM. Furthermore, if the shared folder is itself in a webshared folder (for example, a synced Dropbox, Google Drive or Microsoft OneDrive folder) it should be available wherever that folder is synched to.
For example, on a Mac (where
~ is an alias to my home directory), I can create a directory in my dropbox folder
~/Dropbox/TM351VMshare and then map this into the VM using by adding the following line to the Vagrantfile:
config.vm.synced_folder "~/Dropbox/TM351VMshare", "/vagrant"
Note the possibility of slight confusion – the shared folder will not now be the folder from which vagrant is run (unless the folder are running from is
Furthermore, the only thing that needs to be in the folder from which vagrant is run is the
Vagrantfile and the hidden
.vagrant folder that vagrant creates.
Fingers crossed this recipe works…;-)
When we put together the virtual machine for TM351, the data management and analysis course, we built a headless virtual machine that did not contain a graphical desktop, but instead ran a set of services that could be accessed within the machine at a machine level, and via a browser based UI at the user level.
Some applications, however, don’t expose an HTML based graphical user interface over http, instead they require access to a native windowing system.
One way round this is to run a system that can generate an HTML based UI within the VM and then expose that via a browser. For an example, see Accessing GUI Apps Via a Browser from a Container Using Guacamole.
Another approach is to expose an X11 window connection from the VM and connect to that on the host, displaying the windows natively on host as a result. See for example the Viewing Application UIs and Launching Applications from Shortcuts section of BYOA (Bring Your Own Application) – Running Containerised Applications on the Desktop.
The problem with the X11 approach is that is requires gubbins (technical term!) on the host to make it work. (I’d love to see a version of Kitematic extended not only to support docker-compose but also pre-packaged with something that could handle X11 connections…)
So another alternative is to create a virtual machine that does expose a desktop, and run the applications on that.
Here’s how I think the different approaches look:
As an example of the desktop VM idea, I’ve put together a build script for a virtual machine containing a Linux graphic desktop that runs the V-REP robot simulator. You can find it here: ou-robotics-vrep.
The script uses one Vagrant script to build the VM and another to launch it.
Along with the simulator, I packaged a Jupyter notebook server that can be used to create Python notebooks that can connect to the simulator and control the simulated robots running within it. These notebooks could be be viewed view a browser running on the virtual machine desktop, but instead I expose the notebook server so notebooks can be viewed in a browser on host.
The architecture thus looks something like this:
I’d never used Vagrant to build a Linux desktop box before, so here are a few things I learned about and observed along the way:
ubuntu-desktopnaively installs a whole range of applications as well. I wanted a minimal desktop that contained just the simulator application (though I also added in a terminal). For the minimal desktop,
apt-get install -y ubuntu-desktop --no-install-recommends;
- by default, Ubuntu requires a user to login (user: vagrant; password: vagrant). I wanted to have as simple an experience as possible so wanted to log the user in automatically. This could be achieved by adding the following to
[SeatDefaults] autologin-user=vagrant autologin-user-timeout=0 user-session=ubuntu greeter-session=unity-greeter
- a screensaver kept kicking in and kicking back to the login screen. I got round this by creating a desktop settings script (
#dock location gsettings set com.canonical.Unity.Launcher launcher-position Bottom #screensaver disable gsettings set org.gnome.desktop.screensaver lock-enabled false
and then pointing to that from a
desktop_settings.desktop file in the
/home/vagrant/.config/autostart/ directory (I set execute permissions set on the script and the
[Desktop Entry] Name=Apply Gnome Settings Exec=/opt/set-gnome-settings.sh Hidden=false NoDisplay=false X-GNOME-Autostart-enabled=true Type=Application
- because the point of the VM is largely to run the simulator, I thought I should autostart the simulator. This can be done with another
.desktopfile in the autostart directory:
[Desktop Entry] Name=V-REP Simulator Exec=/opt/V-REP_PRO_EDU_V3_4_0_Linux/vrep.sh Type=Application X-GNOME-Autostart-enabled=true
- the Jupyter notebook server is started as a service and reuses the installation I used for the TM351 VM;
- I thought I should also add a desktop shortcut to run the simulator, though I couldnlt find an icon to link to? Create an executable
run_vrep.desktopfile and place it on the desktop:
[Desktop Entry] Name=V-REP Simulator Comment=Run V-REP Simulator Exec=/opt/V-REP_PRO_EDU_V3_4_0_Linux/vrep.sh Icon= Terminal=false Type=Application
Her’s how it looks:
If you want to give it a try, comments on the build/install process would be much appreciated: ou-robotics-vrep.
I will also be posting a set of activities based on the RobotLab activities used in TM129 in the possibility that we start using V-REP on TM129. The activity notebooks will be posted in the repo and via the associated uncourse blog if you want to play along.
One issue I have noticed is that if I resize the VM window, V-REP crashes… I also can’t figure out how to open a V-REP scene file from script (issue) or how to connect using a VM hostname alias rather than IP address (issue).
Trying to get my thoughts in order and lay bare some of my assumptions…
Comments / sanity checking appreciated…
So this is how I currently think of the TM351 VM:
What would be nice would be a drag’n’drop tool to let me draw pictures like that that would then generate the build scripts… (a docker compose script, or set of puppter scripts, for the architectural bits on the left, and a Vagrantfile to set up the port forwarding, for example).
For docker, I wouldn’t have thought that would be too hard – a docker compose file could describe most of that picture, right? Not sure how fiddly it would be for a more traditional VM, though, depending on how it was put together?
It’s getting to that time when we need to freeze the virtual machine build we’re going to use for the new (postponed) data course, which should hopefully go live to students in February, 2016, and I’ve been having a rethink about how to put it together.
The story so far has been documented in several blog posts and charts my learning journey from knowing nothing about virtual machines (not sure why I was given the task of putting it together?!) to knowing how little I know about building Linux administration, PostgreSQL, MongoDB, Linux networking, virtual machines and virtualisation (which is to say, knowing I don’t know enough to do any of this stuff properly…;-)
The original plan was to put everything into a single VM and wire all the bits together. One of the activities needed to fire up several containers as part of a mongo replica set, and I opted to use containers to do that.
Over the last few months, I started to wonder whether we should containerise everything separately, then deploy compositions of containers. The rationale behind this approach is that it means we could make use of a single VM to host applications for several users if we get as far as cloud hosting services/applications for out students. It also means students can start, stop or “reinstall” particular applications in isolation from the other VM applications they’re running.
I think I’ve got this working in part now, though it’s still very much tied to the single user – I’m doing things with permissions that would never be allowed (and that would possibly break things..) if we were running multiple users in the same VM.
So what’s the solution? I posted the first hints in Kiteflying Around Containers – A Better Alternative to Course VMs? where I proved to myself I could fire up an IPthyon notebook server on top of scientific distribution stack, and get the notebooks talking to a DBMS running in another container. (This was point and click easy, once you know what to click and what numbers to put where.)
The next step was to see if I could automate this in some way. As Kitematic is still short of a Windows client, and doesn’t (yet?) support Docker Compose, I thought I’d stick with vagrant (which I was using to build the original VM using a Puppet provision and puppet scripts for each app) and see if I could get it provision a VM to run containerised apps using docker. There are still a few bits to do – most notably trying to get the original dockerised mongodb stuff working, checking the mongo link works, working out where to try to persist the DBMS data files (possibly in a shared folder on host?) in a way that doesn’t trash them each time a DBMS container is started, and probably a load of other stuff – but the initial baby steps seem promising…
In the original VM, I wanted to expose a terminal through the browser, which meant pfaffing around with tty.js and node.js. The latest Jupyter server includes the ability to launch a browser based shell client, which meant I could get rid of tty.js. However, moving the IPython notebook into a container means that the terminal presumably has scope only within that container, rather than having access to the base VM command line? For various reasons, I intend to run the IPython/Jupyter notebook server container as a privileged container, which means it can reach outside the container (I think? The reason? eg to fire up containers for the mongo replica set activity) but I’m not sure if this applies to the command line/terminal app too? Though offhand, I can’t think why we might want to provide students with access to the base VM command line?
Anyway, the local set-up looks like this…
A simple Vagrantfile, called using vagrant up or vagrant reload. I have extended vagrant using the vagrant-docker-compose plugin that supports Docker Compose (fig, as was) and lets me fired up wired-together container configurations from a single script:
# -*- mode: ruby -*- # vi: set ft=ruby : Vagrant.configure("2") do |config| config.vm.box = "ubuntu/trusty64" config.vm.network(:forwarded_port, guest: 9000, host: 9000) config.vm.network(:forwarded_port, guest: 8888, host: 8351,auto_correct: true) config.vm.provision :docker config.vm.provision :docker_compose, yml: "/vagrant/docker-compose.yml", rebuild: true, run: "always" end
The YAML file identifies the containers I want to run and the composition rules between them:
ui: image: dockerui/dockerui ports: - "9000:9000" volumes: - /var/run/docker.sock:/var/run/docker.sock privileged: true ipynb: build: ./tm351_scipystacknserver ports: - "8888:8888" volumes: - ./notebooks/:/notebooks/ links: - devpostgres:postgres privileged: true devpostgresdata: command: echo created image: busybox volumes: - /var/lib/postgresql/data devpostgres: environment: - POSTGRES_PASSWORD=whatever image: postgres ports: - "5432:5432" volumes_from: - devpostgresdata
At the moment, Mongo is still missing and I haven’t properly worked out what to do with the PostgreSQL datastore – the idea is that students will be given a pre-populated, pre-indexed database, in part at least.
One additional component that sort of replaces the command line/terminal app requirement from the original VM is the dockerui app. This runs in its own container with privileged access to the docker environment and that provides a simple control panel over all the containers:
What else? The notebook stuff has a shared notebooks directory with host, and is built locally (from a Dockerfile in the local tm351_scipystacknserver directory) on top of the ipython/scipystack image; extensions include some additional package installations (requiring both apt-get and pip installs) and copying across and running a custom IPython notebook template configuration.
FROM ipython/scipystack MAINTAINER OU ADD build_tm351_stack.sh /tmp/build_tm351_stack.sh RUN bash /tmp/build_tm351_stack.sh ADD ipynb_style /tmp/ipynb_style ADD ipynb_custom.sh /tmp/ipynb_custom.sh RUN bash /tmp/ipynb_custom.sh ## Extremely basic test of install RUN python2 -c "import psycopg2, sqlalchemy" RUN python3 -c "import psycopg2, sqlalchemy" # Clean up from build RUN rm -f /tmp/build_tm351_stack.sh RUN rm -f /tmp/ipynb_custom.sh RUN rm -f -r /tmp/ipynb_style VOLUME /notebooks WORKDIR /notebooks EXPOSE 8888 ADD notebook.sh / RUN chmod u+x /notebook.sh CMD ["/notebook.sh"]
If we need to extend the PostgreSQL build, that can be presumably done using a Dockerfile that pulls in the core image and then runs an additional configuration script over it?
So where am I at? No f****g idea. I thought that between the data course and the new web apps course we might be able to explore some interesting models of using virtual machines (originally) and containers (more recently) in a distance education setting, that could cope with single user home use, computer training room/lab use, cloud use, but, as ever, I have spectacularly failed to demonstrate any sort of “academic leadership” in developing these ideas within the OU, or even getting much of a conversation going in the first place. Not in my skill set, I guess!;-) Though perhaps not in the institution’s interests either. Recamp. Retrench. Lockdown. As per some of the sentiments in Reflections on the Closure of Yahoo Pipes, perhaps? Don’t Play Here.
Eighteen months or so ago, I started looking at ways in which we might use a virtual machine to bundle up a variety of interoperating software applications for a distance education course on databases and data management. (This VM would run IPython notebooks as the programming surface, PostgreSQL and MongoDB as the databases. I was also keen that OpenRefine should be made available, and as everything in the VM was being accessed via a browser, I added a browser based terminal app (tty.js) to the mix as well). The approach I started to follow was to use vagrant as a provisioner and VM manager, and puppet scripts to build the various applications. One reason for this approach is that the OU is an industrial scale educator, and (to my mind) it made sense to explore a model that would support the factory line production model we have in a way that would scale vertically as a way of maintaining VMs for a course that runs over several ways as well as horizontally across other courses with other software application requirements. You can see how my thinking evolved across the following posts: posts tagged “VM” on OUseful.info.
Since then, a lot has changed. IPython notebooks have forked into the Jupyter notebook server and IPython, and Jupyter has added a browser based terminal app to the base offerings of the notebook server. (It’s not as good a flexible as tty.js, which allowed for multiple terminals in the same browser window, but I guess there’s nothing to stop you loading multiple terminals into separate browser tabs.) docker has also become a thing…
To recap on some of thinking about how we might provide software to students, I was pre-occupied at various times with the following (not necessarily exhaustive) list of considerations:
- how could we manage the installation and configuration of different software applications on students’ self-managed, remote computers, running arbitrary versions of arbitrary operating systems on arbitrarily specced machines over networks with unknown and perhaps low bandwidth internet connections;
- how could we make sure those applications interoperated correctly on the students’ own machines;
- how could we make sure the students retained access to local copies of all the files they had created as part of their studies, and that those local copies would be the ones they actually worked on in the provided software applications; (so for example, IPython notebook files, and perhaps even database data directories);
- how could we manage the build of each application in the OU production context, with OU course teams requiring access to a possibly evolving version of the machine 18 months in advance of student first use date and an anticipated ‘gold master’ freeze date on elements of the software build ~9 months prior to students’ first use;
- how could we manage the maintenance of VMs within a single presentation of a 9 month long course and across several presentations of the course spanning 1 presentation a year over a 5 year period;
- how could the process support the build and configuration of the same software application for several courses (for example, an OU-standard PostgreSQL build);
- how could the same process/workflow support the development, packaging, release to students, maintenance workflow for other software applications for other courses;
- could the same process be used to manage the deployment of application sets to students on a cloud served basis, either through a managed OU cloud, or on a self-served basis, perhaps using an arbitrary cloud service provider.
All this bearing in mind that I know nothing about managing software packaging, maintenance and deployment in any sort of environment, let alone a production one…;-) And all this bearing in mind that I don’t think anybody else really cares about any of the above…;-)
Having spent a few weeks away from the VM, I’m now thinking that we would be better served by using a more piecemeal approach based around docker containers. These still require the use of something like Virtualbox, but rather than using vagrant to provision the necessary environment, we could use more of an appstore approach to starting and stopping services. So for example, today I had a quick play with Kitematic, a recent docker acquisition, and an app that doesn’t run on Windows yet but for which Windows supported is slated for June, 2015 in the Kitematic roadmap on github…
So what’s involved? Install Kitematic (if Virtualbox isn’t already installed, I think it’ll grab it down for you?) and fire it up…
It starts up a dockerised virtual machine into which you can install various containers. Next up, you’re presented with an “app dashboard”, as well as the ability to search dockerhub for additional “apps”:
Find a container you want, and select it – this will download the required components and fire up the container.
The port tells you where you can find any services exposed by the container. In this case, for scipyserver, it’s an IPython notebook (HTML app) running on top of a scipy stack.
By default the service runs over https with a default password; we can go into the Settings for the container, reset the Jupyter server password, force it to use http rather than https, and save to force the container to use the new settings:
So for example…
In the Kitematic container homepage, if I click on the notebooks folder icon in the Edit Files panel, I can share the notebook folder across to my host machine:
I can also choose what directory on host to use as the shared folder:
I can also discover and fire up some other containers – a PostgreSQL database, for example, as well as a MongoDB database server:
From within my notebook, I can install additional packages and libraries and then connect to the databases. So for example, I can connect to the PostgreSQL database:
or to mongo:
Looking at the container Edit Files settings, it looks like I may also be able to share across the database datafiles – though I’m not sure how this would work if I had a default database configuration to being with? (Working out how to pre-configure and then share database contents from containerised DBMS’ is something that’s puzzled me for a bit and something I haven’t got my head round yet).
So – how does this fit into the OU model (that doesn’t really exist yet?) for using VMs to make interoperating software collections available to students on their own machines?
First up, no Windows support at the moment, though that looks like it’s coming; secondly, the ability to mount shares with host seems to work, though I haven’t tested what happens if you shutdown and start up containers, or delete a scipyserver container and then fire up a clean replacement for example. Nor do I know (yet?!) how to manage shares and pre-seeding for the database containers. One original argument for the VM was that interoperability between the various software applications could be hardwired and tested. Kitematic doesn’t support fig/Docker compose (yet?) but it’s not too hard to lookup up the addresses paste them into a notebook. I think it does mean we can’t provide hard coded notebooks with ‘guaranteed to work’ configurations (i.e. ones prewritten with service addresses and port numbers) baked in, but it’s not too hard to do this manually. In the docker container Dockerfiles, I’m not sure if we could fix the port number mappings to initial default values?
One thing we’d originally envisioned for the VM was shipping it on a USB stick. It would be handy to be able to point Kitematic to a local dockerhub, for example, a set of prebuilt containers on a USB stick with the necessary JSON metadata file to announce what containers were available there, so that containers could be installed from the USB stick. (Kitematic currently grabs the container elements down from dockerhub and pops the layers into the VM (I assume?), so it could do the same to grab them from the USB stick?) In the longer term, I could imagine an OU branded version of Kitematic that allows containers to be installed from a USB stick or pulled down from an OU hosted dockerhub.
The buzz phrase for elements of this (I think?) is microservices or microservice architecture (“a particular way of designing software applications as suites of independently deployable services”, [ref.]) but the idea of being able to run apps anywhere (yes, really, again…!;-) seems to have been revitalised by the recent excitement around, and rapid pace of development of, docker containers.
Essentially, docker containers are isolated/independent containers that can be run in a single virtual machine. Containers can also be linked together within so that they can talk to each other and yet remain isolated from other containers in the same VM. Containers can also expose services to the outside world.
In my head, this is what I think various bits and pieces of it look like…
A couple of recent announcements from docker suggest to me at least one direction of travel that could be interesting for delivering distance education and remote and face-to-face training include:
- docker compose (fig, as was) – “with Compose, you define your application’s components – their containers, their configuration, links, volumes, and so on – in a single file, then you can spin everything up with a single command that does everything that needs to be done to get your application running.”
- docker machine – “a tool that makes it really easy to go from ‘zero to Docker’. Machine creates Docker Engines on your computer, on cloud providers, and/or in your data center, and then configures the Docker client to securely talk to them.” [Like boot2docker, but supports cloud as well?]
- Kitematic UI – “Kitematic completely automates the Docker installation and setup process and provides an intuitive graphical user interface (GUI) for running Docker containers on the Mac.” ) [Windows version coming soon]
I don’t think there is GUI support for configuration management provided out of docker directly, but presumably if they don’t buy up something like panamax they’ll be releasing their own version of something similar at some point soon?!
(With the data course currently in meltdown, I’m tempted to add a bit more to the confusion by suggesting we drop the monolithic VM approach and instead go for a containerised approach, which feels far more elegant to me… It seems to me that with a little bit of imagination, we could come up with a whole new way of supporting software delivery to students. eg an OU docker hub with an app container for each app we make available to students, container compositions for individual courses, a ‘starter kit’ DVD (like the old OLA CD-ROM) with a local docker hub to get folk up and running without big downloads etc etc. ..) It’s unlikely to happen of course – innovation seems to be too risky nowadays, despite the rhetoric…:-(
As well as being able to run docker containers locally or in the cloud, I also wonder about ‘plug and play’ free running containers that run on a wifi enabled Raspberry Pi that you can grab off the shelf, switch on, and immediately connect to? So for example, a couple of weeks ago Wolfram and Raspberry announced the Wolfram Language and Mathematica on Raspberry Pi, for free [Wolfram’s Raspberry Pi pages]. There are also crib sheets for how to run docker on a Raspberry Pi (the downside of this being that you need ARM based images rather than x86 ones), which could be interesting?
So pushing the thought a bit further, for the mythical submariner student who isn’t allowed to install software onto their work computer, could we give them a Raspberry Pi running their OU course software as service they could remotely connect to?!
PS by the by, at the Cabinet Office Code Club I help run for Open Knowledge last week, we had an issue with folk not being able to run OpenRefine properly on their machines. Fortunately, I’d fired up a couple of OpenRefine containers on a cloud host so we could still run the session as planned…