Exporting and Distributing Docker Images and Data Container Contents

Although it was a beautiful day today, and I should really have spent it in the garden, or tinkering with F1 data, I lost the day to the screen and keyboard pondering various ways in which we might be able to use Kitematic to support course activities.

One thing I’ve had on pause for some time is the possibility of distributing docker images to students via a USB stick, and then loading them into Kitematic. To do this we need to get tarballs of the appropriate images so we could then distribute them.

docker save psychemedia/openrefine_ou:tm351d2test | gzip -c > test_openrefine_ou.tgz
docker save psychemedia/tm351_scipystacknserver:tm351d3test | gzip -c > test_ipynb.tgz
docker save psychemedia/dockerui_patch:tm351d2test | gzip -c > test_dockerui.tgz
docker save busybox:latest | gzip -c > test_busybox.tgz
docker save mongo:latest | gzip -c > test_mongo.tgz
docker save postgres:latest | gzip -c > test_postgres.tgz

On the to do list is getting to these to with the portable Kitematic branch (I’m not sure if that branch will continue, or whether the interest is too niche?!), but in the meantime, I could load it into the Kitematic VM from the Kitematice CLI using:

docker load < test_mongo.tgz

assuming the test_mongo.tgz file is in the current working directory.

Another I need to explore is how to get the set up the data volume containers on the students’ machine.

The current virtual machine build scripts aim to seed the databases from raw data, but to set up the student machines it would seem more sensible to either rebuild a database from a backup, or just load in a copy of the seeded data volume container. (All the while we have to be mindful of providing a route for the students to recreate the original, as distributed, setup, just in case things go wrong. At the same time, we also need to start thing about backup strategies for the students so they can checkpoint their own work…)

The traditional backup and restore route for PostgreSQL seems to be something like the following:

#Use docker exec to run a postgres export
docker exec -t vagrant_devpostgres_1 pg_dumpall -Upostgres -c &gt; dump_`date +%d-%m-%Y"_"%H_%M_%S`.sql
#If it's a large file, maybe worth zipping: pg_dump dbname | gzip > filename.gz

#The restore route would presumably be something like:
cat postgres_dump.sql | docker exec -i vagrant_devpostgres_1 psql -Upostgres
#For the compressed backup: cat postgres_dump.gz | gunzip | psql -Upostgres

For mongo, things seem to be a little bit more complicated. Something like:

docker exec -t vagrant_mongo_1 mongodump

#Complementary restore command is: mongorestore

would generate a dump in the container, but then we’d have to tar it and get it out? Something like these mongodump containers may be easier? (mongo seems to have issues with mounting data containers on host, on a Mac at least?

By the by, if you need to get into a container within a Vagrant launched VM (I use vagrant with vagrant-docker-compose), the following shows how:

#If you need to get into a container:
vagrant ssh
#Then in the VM:
  docker exec -it CONTAINERNAME bash

Another way of getting to the data is to export the contents of the seeded data volume containers from the build machine. For example:

#  Export data from a data volume container that is linked to a database server

docker run --volumes-from vagrant_devpostgres_1 -v $(pwd):/backup busybox tar cvf /backup/postgresbackup.tar /var/lib/postgresql/data 

#I wonder if these should be run with --rm to dispose of the temporary container once run?

docker run --volumes-from vagrant_mongo_1 -v $(pwd):/backup busybox tar cvf /backup/mongobackup.tar /data/db

We can then take the tar file, distribute it to students, and use it to seed a data volume container.

Again, from the Kitematic command line, I can run something like the following to create a couple of data volume containers:

#Create a data volume container
docker create -v /var/lib/postgresql/data --name devpostgresdata busybox true
#Restore the contents
docker run --volumes-from devpostgresdata -v $(pwd):/backup ubuntu sh -c "tar xvf /backup/postgresbackup.tar"
#Note - the docker helpfiles don't show how to use sh -c - which appears to be required...
#Again, I wonder whether this should be run with --rm somewhere to minimise clutter?

Unfortunately, things don’t seem to run so smoothly with mongo?

#Unfortunately, when trying to run a mongo server against a data volume container
#the presence of a mongod.lock seems to break things
#We probably shouldn't do this, but if the database has settled down and completed
#  all its writes, it should be okay?!
docker run --volumes-from vagrant_mongo_1 -v $(pwd):/backup busybox tar cvf /backup/mongobackup.tar /data/db --exclude=*mongod.lock
#This generates a copy of the distributable file without the lock...

#Here's an example of the reconstitution from the distributable file for mongo
docker create -v /data/db --name devmongodata busybox true
docker run --volumes-from devmongodata -v $(pwd):/backup ubuntu sh -c "tar xvf /backup/mongobackup.tar"

(If I’m doing something wrong wrt the getting the mongo data out of the container, please let me know… I wonder as well with the cavalier way I treat the lock file whether the mongo container should be started up in repair mode?!)

If have a docker-compose.yml file in the working directory like the following:

  image: mongo
    - "27017:27017"
    - devmongodata

##We DO NOT need to declare the data volume here
#We have already created it
#Also, if we leave it in, a "docker-compose rm" command
#will destroy the data volume container...
#...which means we wouldn't persist the data in it
#    command: echo created
#    image: busybox
#    volumes: 
#        - /data/db

We can the run docker-compose up and it should fire up a mongo container and link it to the seeded data volume container, making the data contains in that data volume container available to us.

I’ve popped some test files here. Download and unzip, from the Kitematic CLI cd into the unzipped dir, create and populate the data containers as above, then run: docker-compose up

You should be presented with some application containers including OpenRefine and an OU customised IPython notebook server. You’ll need to mount the IPython notebooks folder onto the unzipped folder. The example notebook (if everything works!) should show demonstrate calls to prepopulated mongo and postgres databases.