Exporting and Distributing Docker Images and Data Container Contents

Although it was a beautiful day today, and I should really have spent it in the garden, or tinkering with F1 data, I lost the day to the screen and keyboard pondering various ways in which we might be able to use Kitematic to support course activities.

One thing I’ve had on pause for some time is the possibility of distributing docker images to students via a USB stick, and then loading them into Kitematic. To do this we need to get tarballs of the appropriate images so we could then distribute them.

docker save psychemedia/openrefine_ou:tm351d2test | gzip -c > test_openrefine_ou.tgz
docker save psychemedia/tm351_scipystacknserver:tm351d3test | gzip -c > test_ipynb.tgz
docker save psychemedia/dockerui_patch:tm351d2test | gzip -c > test_dockerui.tgz
docker save busybox:latest | gzip -c > test_busybox.tgz
docker save mongo:latest | gzip -c > test_mongo.tgz
docker save postgres:latest | gzip -c > test_postgres.tgz

On the to do list is getting to these to with the portable Kitematic branch (I’m not sure if that branch will continue, or whether the interest is too niche?!), but in the meantime, I could load it into the Kitematic VM from the Kitematice CLI using:

docker load < test_mongo.tgz

assuming the test_mongo.tgz file is in the current working directory.

Another I need to explore is how to get the set up the data volume containers on the students’ machine.

The current virtual machine build scripts aim to seed the databases from raw data, but to set up the student machines it would seem more sensible to either rebuild a database from a backup, or just load in a copy of the seeded data volume container. (All the while we have to be mindful of providing a route for the students to recreate the original, as distributed, setup, just in case things go wrong. At the same time, we also need to start thing about backup strategies for the students so they can checkpoint their own work…)

The traditional backup and restore route for PostgreSQL seems to be something like the following:

#Use docker exec to run a postgres export
docker exec -t vagrant_devpostgres_1 pg_dumpall -Upostgres -c &gt; dump_`date +%d-%m-%Y"_"%H_%M_%S`.sql
#If it's a large file, maybe worth zipping: pg_dump dbname | gzip > filename.gz

#The restore route would presumably be something like:
cat postgres_dump.sql | docker exec -i vagrant_devpostgres_1 psql -Upostgres
#For the compressed backup: cat postgres_dump.gz | gunzip | psql -Upostgres

For mongo, things seem to be a little bit more complicated. Something like:

docker exec -t vagrant_mongo_1 mongodump

#Complementary restore command is: mongorestore

would generate a dump in the container, but then we’d have to tar it and get it out? Something like these mongodump containers may be easier? (mongo seems to have issues with mounting data containers on host, on a Mac at least?

By the by, if you need to get into a container within a Vagrant launched VM (I use vagrant with vagrant-docker-compose), the following shows how:

#If you need to get into a container:
vagrant ssh
#Then in the VM:
  docker exec -it CONTAINERNAME bash

Another way of getting to the data is to export the contents of the seeded data volume containers from the build machine. For example:

#  Export data from a data volume container that is linked to a database server

#postgres
docker run --volumes-from vagrant_devpostgres_1 -v $(pwd):/backup busybox tar cvf /backup/postgresbackup.tar /var/lib/postgresql/data 

#I wonder if these should be run with --rm to dispose of the temporary container once run?

#mongo - BUT SEE CAVEAT BELOW
docker run --volumes-from vagrant_mongo_1 -v $(pwd):/backup busybox tar cvf /backup/mongobackup.tar /data/db

We can then take the tar file, distribute it to students, and use it to seed a data volume container.

Again, from the Kitematic command line, I can run something like the following to create a couple of data volume containers:

#Create a data volume container
docker create -v /var/lib/postgresql/data --name devpostgresdata busybox true
#Restore the contents
docker run --volumes-from devpostgresdata -v $(pwd):/backup ubuntu sh -c "tar xvf /backup/postgresbackup.tar"
#Note - the docker helpfiles don't show how to use sh -c - which appears to be required...
#Again, I wonder whether this should be run with --rm somewhere to minimise clutter?

Unfortunately, things don’t seem to run so smoothly with mongo?

#Unfortunately, when trying to run a mongo server against a data volume container
#the presence of a mongod.lock seems to break things
#We probably shouldn't do this, but if the database has settled down and completed
#  all its writes, it should be okay?!
docker run --volumes-from vagrant_mongo_1 -v $(pwd):/backup busybox tar cvf /backup/mongobackup.tar /data/db --exclude=*mongod.lock
#This generates a copy of the distributable file without the lock...

#Here's an example of the reconstitution from the distributable file for mongo
docker create -v /data/db --name devmongodata busybox true
docker run --volumes-from devmongodata -v $(pwd):/backup ubuntu sh -c "tar xvf /backup/mongobackup.tar"

(If I’m doing something wrong wrt the getting the mongo data out of the container, please let me know… I wonder as well with the cavalier way I treat the lock file whether the mongo container should be started up in repair mode?!)

If have a docker-compose.yml file in the working directory like the following:

mongo:
  image: mongo
  ports:
    - "27017:27017"
  volumes_from:
    - devmongodata

##We DO NOT need to declare the data volume here
#We have already created it
#Also, if we leave it in, a "docker-compose rm" command
#will destroy the data volume container...
#...which means we wouldn't persist the data in it
#devmongodata:
#    command: echo created
#    image: busybox
#    volumes: 
#        - /data/db

We can the run docker-compose up and it should fire up a mongo container and link it to the seeded data volume container, making the data contains in that data volume container available to us.

I’ve popped some test files here. Download and unzip, from the Kitematic CLI cd into the unzipped dir, create and populate the data containers as above, then run: docker-compose up

You should be presented with some application containers including OpenRefine and an OU customised IPython notebook server. You’ll need to mount the IPython notebooks folder onto the unzipped folder. The example notebook (if everything works!) should show demonstrate calls to prepopulated mongo and postgres databases.

Hopefully!

Fighting With docker – and Pondering Innovation in an Institutional Context

I spent my not-OU day today battling with trying to bundle up a dockerised VM, going round in circles trying simplify things a bit, and getting confused by docker-compose not working quite so well following an upgrade.

I think there’s still some weirdness going on (eg in docker-ui showing messed container names?) but I’m now way too confused to care or try to unpick it…

I also spent a chunk of time considering the 32 bit problem, but got nowhere with it…. Docker is predominantly a 64 bit thing, but the course has decided in it’s wisdom that we have to support 32 bit machines, which means I need to find a way of getting a 32 bit version of docker into a base box (apt-get install docker.io I think?), finding way of getting the vagrant docker provisioner to use it (would an alias help?), and checking that vagrant-docker-compose works in a 32 bit VM, then tracking down 32 docker images for PostgreSQL, MongoDB, dockerUI and OpenRefine (or finding build files for them so I can build my own 32 bit images).

We then need to be able to test the VM in a variety of regimes: 32 bit O/S on a 32 bit machine, 32 bit O/S on a 64 bit machine, 64 bit O/S on a 64 bit machine, with a variety of hardware virtualisation settings we might expect on students’ machines. I’m on a highly specced Macbook Pro, though, so my testing is skewed…

And I’m not sure I have it in me to try to put together 32 bit installs…:-( Perhaps that’s what LTS are for…?!;-)

(I keep wondering if we could get access to stats about the sorts of machines students are using to log in to the OU VLE from the user-agent strings of their browsers that can be captured in webstats? And take that two ways: 1) look to see how it’s evolving over time; 2) look to see what the profile of machines is for students in computing programmes, particular those coming up to level 3 option study? That’s the sort of pratical, useful data that could help inform course technology choices but that doesn’t have learning analytics buzzword kudos or budget attached to it though, so I suspect it’s not often championed…)

When LTS was an educational software house, I think there was also more opportunity, support and willingness to try to explore what the technology might be able to do for us and OUr students? Despite the continual round of job ads to support corporate IT, I fear that exploring the educational uses of software has not had much developer support in recent years…

As an example of the sort of thing I think we could explore – if only we could find a forum to do so – is the following docker image that contains an OU customised IPython notebook: psychemedia/ouflpy_scipystacknserver

The context is a forthcoming FutureLearn course on introductory programming. We’re currently planning on getting students to use Anaconda to run the IPython Notebooks that provide the programming environment for the course, but I idly wondered what a Kitematic route might be like. (The image is essentially the scipystack and notebook server with a few notebook extensions and OU customisations installed.)

There are some sample (testing) notebooks here that illustrate some of the features.

Here’s the installation recipe:

– download and unzip the notebooks (double click the downloaded file) and keep a note of where you unzipped the notebook directory to.

– download and install Kitematic. Ths makes use of docker and Virtualbox – but I think it should install them both for you if you don’t already have them installed.

– start Kitematic, search for psychemedia/ouflpy_scipystacknserver and create an application container.

Kitematic_fl_1

It should download and start up automatically.

When it’s running, click on the Notebooks panel and Enable Volumes. This allows the container to see a folder on your computer (“volumes” are a bit like folders that can be aliased or mapped on to other folders across devices).

Kitematic_fl_2

Click the cog (settings) symbol in the Notebooks panel to get to the Volumes settings. Select the directory that you created when you unzipped the downloaded notebooks bundle.

Kitematic_fl_3

Click on the Ports tab. If you click on the link that’s displayed, it should open an IPython notebook server homepage in your browser.

Kitematic_fl_4

Here’s what you should see…

Kitematic_fl_5

Click on a notebook link to open the notebook.

Kitematic_fl_6

The two demo notebooks are just simple demonstrations of some custom extensions and styling features I’ve been experimenting with. You should be able to create you own notebooks, open other people’s notebooks, etc.

You can also run the container in the cloud. Tweak the following recipe to try it out on Digital Ocean: Getting Started With Personal App Containers in the Cloud or Running RStudio on Digital Ocean, AWS etc Using Tutum and Docker Containers. (That latter example you could equally well run in Kitematic – just search for and install rocker/rstudio.)

The potential of using containers still excites me, even after 6 months or so of messing around the fringes of what’s possible. In the case of writing a new level computing course with a major practical element, limiting ourselves to a 32 bit build seems a backward step to me? I fully appreciate the need to to make our courses as widely accessible as possible, and in an affordable a way as possible (ahem…) but here’s why I think supporting 32 bit machines in for a new level 3 computing course is a backward step.

In the first case, I think we’re making life harder for OUrselves. (Trying to offer backwards compatibility is prone to this.) Docker is built for 64 bit and most of the (reusable) images are 64 bit. If we had the resource to contribute to a 32 bit docker ecosystem, that might be good for making this sort of technology accessible more widely internationally, as well as domestically, but I don’t think there’s the resource to do that? Secondly, we arguably worsen the experience for students with newer, more powerful machines (though perhaps this could be seen as levelling the playing field a bit?) I always liked the idea of making use of progressive enhancement as a way of trying to offer the best possible experience using the technology they have, though we’d always have to ensure we weren’t then favouring some students over others. (That said, the OU celebrates diversity across a whole range of dimensions in every course cohort…)

Admittedly, students on a computing programme may well have bought a computer to see them through their studies – if the new course is the last one they do, that might mean the machine they bought for their degree is now 6 years old. But on the other hand, students buying a new computer recently may well have opted for an affordable netbook, or even a tablet computer, neither of which can support the installation of “traditional” software applications.

The solution I’d like to explore is a hybrid offering, where we deliver software that makes use of browser based UIs and software services that communicate using standard web protocols (http, essentially). Students who can install software on their computers can run the services locally and access them through their browser. Students who can’t install the software (because they have an older spec machine, or a newer netbook/tablet spec machine, or who do their studies on a public access machine in a library, or using an IT crippled machine in their workplace (cough, optimised desktop, cOUgh..) can access the same applications running in the cloud, or perhaps even from one or more dedicated hardware app runners (docker’s been running on a Raspberry Pi for some time I think?). Whichever you opt for, exactly the same software would be running inside the container and exposing it in the same way though a browser… (Of course, this does mean you need a network connection. But if you bought a netbook, that’s the point, isn’t it?!)

There’s a cost associated with running things in the cloud, of course – someone has to pay for the hosting, storage and bandwidth. But in a context like FutureLearn, that’s an opportunity to get folk paying and then top slice them with a (profit generating…) overhead, management or configuration fee. And in the context of the OU – didn’t we just get a shed load of capital investment cash to spend on remote experimentation labs and yet another cluster?

There are also practical consequences – running apps on you own machine makes it easier to keep copies of files locally. When running in the cloud, the files have to live somewhere (unless we start exploring fast routes to filesharing – Dropbox can be a bit slow at synching large files, I think…)

Anyway – docker… 32 bit… ffs…

If you give the container a go, please let me know how you get on… I did half imagine we might be able to try this for a FutureLearn course, though I fear the timescales are way too short in OU-land to realistically explore this possibility.

Kiteflying Around Containers – A Better Alternative to Course VMs?

Eighteen months or so ago, I started looking at ways in which we might use a virtual machine to bundle up a variety of interoperating software applications for a distance education course on databases and data management. (This VM would run IPython notebooks as the programming surface, PostgreSQL and MongoDB as the databases. I was also keen that OpenRefine should be made available, and as everything in the VM was being accessed via a browser, I added a browser based terminal app (tty.js) to the mix as well). The approach I started to follow was to use vagrant as a provisioner and VM manager, and puppet scripts to build the various applications. One reason for this approach is that the OU is an industrial scale educator, and (to my mind) it made sense to explore a model that would support the factory line production model we have in a way that would scale vertically as a way of maintaining VMs for a course that runs over several ways as well as horizontally across other courses with other software application requirements. You can see how my thinking evolved across the following posts: posts tagged “VM” on OUseful.info.

Since then, a lot has changed. IPython notebooks have forked into the Jupyter notebook server and IPython, and Jupyter has added a browser based terminal app to the base offerings of the notebook server. (It’s not as good a flexible as tty.js, which allowed for multiple terminals in the same browser window, but I guess there’s nothing to stop you loading multiple terminals into separate browser tabs.) docker has also become a thing…

To recap on some of thinking about how we might provide software to students, I was pre-occupied at various times with the following (not necessarily exhaustive) list of considerations:

  • how could we manage the installation and configuration of different software applications on students’ self-managed, remote computers, running arbitrary versions of arbitrary operating systems on arbitrarily specced machines over networks with unknown and perhaps low bandwidth internet connections;
  • how could we make sure those applications interoperated correctly on the students’ own machines;
  • how could we make sure the students retained access to local copies of all the files they had created as part of their studies, and that those local copies would be the ones they actually worked on in the provided software applications; (so for example, IPython notebook files, and perhaps even database data directories);
  • how could we manage the build of each application in the OU production context, with OU course teams requiring access to a possibly evolving version of the machine 18 months in advance of student first use date and an anticipated ‘gold master’ freeze date on elements of the software build ~9 months prior to students’ first use;
  • how could we manage the maintenance of VMs within a single presentation of a 9 month long course and across several presentations of the course spanning 1 presentation a year over a 5 year period;
  • how could the process support the build and configuration of the same software application for several courses (for example, an OU-standard PostgreSQL build);
  • how could the same process/workflow support the development, packaging, release to students, maintenance workflow for other software applications for other courses;
  • could the same process be used to manage the deployment of application sets to students on a cloud served basis, either through a managed OU cloud, or on a self-served basis, perhaps using an arbitrary cloud service provider.

All this bearing in mind that I know nothing about managing software packaging, maintenance and deployment in any sort of environment, let alone a production one…;-) And all this bearing in mind that I don’t think anybody else really cares about any of the above…;-)

Having spent a few weeks away from the VM, I’m now thinking that we would be better served by using a more piecemeal approach based around docker containers. These still require the use of something like Virtualbox, but rather than using vagrant to provision the necessary environment, we could use more of an appstore approach to starting and stopping services. So for example, today I had a quick play with Kitematic, a recent docker acquisition, and an app that doesn’t run on Windows yet but for which Windows supported is slated for June, 2015 in the Kitematic roadmap on github

So what’s involved? Install Kitematic (if Virtualbox isn’t already installed, I think it’ll grab it down for you?) and fire it up…

Kitematic_1

It starts up a dockerised virtual machine into which you can install various containers. Next up, you’re presented with an “app dashboard”, as well as the ability to search dockerhub for additional “apps”:

Kitematic_2

Find a container you want, and select it – this will download the required components and fire up the container.

Kitematic_3

The port tells you where you can find any services exposed by the container. In this case, for scipyserver, it’s an IPython notebook (HTML app) running on top of a scipy stack.

Kitematic_4

By default the service runs over https with a default password; we can go into the Settings for the container, reset the Jupyter server password, force it to use http rather than https, and save to force the container to use the new settings:

Kitematic_5

So for example…

kitematic_ipynb

In the Kitematic container homepage, if I click on the notebooks folder icon in the Edit Files panel, I can share the notebook folder across to my host machine:

scipyserver_share

I can also choose what directory on host to use as the shared folder:

Kitematic_7

I can also discover and fire up some other containers – a PostgreSQL database, for example, as well as a MongoDB database server:

Kitematic_6

From within my notebook, I can install additional packages and libraries and then connect to the databases. So for example, I can connect to the PostgreSQL database:

kitematic_ipynb_postgres

or to mongo:

kitematic_ipynb_mongodb

Looking at the container Edit Files settings, it looks like I may also be able to share across the database datafiles – though I’m not sure how this would work if I had a default database configuration to being with? (Working out how to pre-configure and then share database contents from containerised DBMS’ is something that’s puzzled me for a bit and something I haven’t got my head round yet).

So – how does this fit into the OU model (that doesn’t really exist yet?) for using VMs to make interoperating software collections available to students on their own machines?

First up, no Windows support at the moment, though that looks like it’s coming; secondly, the ability to mount shares with host seems to work, though I haven’t tested what happens if you shutdown and start up containers, or delete a scipyserver container and then fire up a clean replacement for example. Nor do I know (yet?!) how to manage shares and pre-seeding for the database containers. One original argument for the VM was that interoperability between the various software applications could be hardwired and tested. Kitematic doesn’t support fig/Docker compose (yet?) but it’s not too hard to lookup up the addresses paste them into a notebook. I think it does mean we can’t provide hard coded notebooks with ‘guaranteed to work’ configurations (i.e. ones prewritten with service addresses and port numbers) baked in, but it’s not too hard to do this manually. In the docker container Dockerfiles, I’m not sure if we could fix the port number mappings to initial default values?

One thing we’d originally envisioned for the VM was shipping it on a USB stick. It would be handy to be able to point Kitematic to a local dockerhub, for example, a set of prebuilt containers on a USB stick with the necessary JSON metadata file to announce what containers were available there, so that containers could be installed from the USB stick. (Kitematic currently grabs the container elements down from dockerhub and pops the layers into the VM (I assume?), so it could do the same to grab them from the USB stick?) In the longer term, I could imagine an OU branded version of Kitematic that allows containers to be installed from a USB stick or pulled down from an OU hosted dockerhub.

But then again, I also imagined an OU USB study stick and an OU desktop software updater 9 years or so ago and they never went anywhere either..;-)