The story so far… We’re looking at using a virtual machine (VM) preconfigured with all sorts of software and services for a distance education course. The VM runs in headless mode (no graphical desktop) and exposes the applications we want students to be able to run as services accessed through a web browser. The VM is built from a vagrant script using puppet, in order to support maintenance and as a demonstration of a potentially generic production model. It also means that we should be able to build VMs for different VM runners (Virtualbox, VMWare etc), as well as being able to generate machine images for use in cloud hosted VMs as well as getting them up and running. The activities to be run inside the VM include a demonstration of a distributed MongoDB network into which network partitions are introduced. The separate database instances run inside individual docker containers and firewall rules are used to network partition them.
The set-up looks something like this – green blocks are services running in the VM, orange blocks are containers:
The containers are started up from within the IPython notebook using a python wrapper to docker.io.
One problem with this approach is that we have two Mongo DB downloads – one in the VM and one for use in containers. This makes me think that it might make more sense to run all the applications in containers of their own. For example, the notebook server in a container of its own, the original MongoDB instance in a container of its own, PostgreSQL in a container of its own, and finally a data container or other area of the VM that can be used to persist data within the VM such that it can be accessed by one or more of the other services as and when required.
I’m not sure what the best strategy would be for persisting state, for example as used by the database services? If we mount a database’s datastore in a volume within the VM, we can destroy the container that is used to run a database service and our database datastore will be preserved. If the mounted volume is located in the VM/host share area, if the VM itself is destroyed the data will persist in a volume on the host. This is perhaps a bit scrappy because it means that we might nominally take up a large amount of space “on the host”, compared with the situation of providing students with a pre-populated database in which the data volume was mounted “inside” the VM proper (i.e. away from the share area). In such a case, it would nice if any data tables that students built were mounted into the host shared area (so the students can clearly “retain ownership” of those tables), but I suspect that DBMS don’t like putting different data tables or databases into different volumes..?
This approach to using docker is perhaps at odds with the typical way of thinking about how we might make use of it. It is very much in the style of a VM acting as an app runner for a host, and docker containers being used to run the individual apps.
One problem with this approach is that we need a control panel that:
- is always running;
- is exposed via an http/HTML service that can be accessed on the host machine;
- allows students to bring up and shut down containers/services/”apps” as required.
The container approach is nice because if something breaks with one of the databases, for example, it should be easy for a student to switch it off and switch it back on again… That said, if we are popping up and ripping down containers, we need to work out a connection manager so that eg IPython knows where to find a particular database (I think that docker starts services up on essentially arbitrary internal IP addresses?). This could possibly be done by naming docker processes sensibly and then creating a little python library to look up the docker processes by name so that we know where to find them?
As ever, there are tradeoffs in terms of making this easy approach easy for us (as production engineers), making them easy for students, and making them easy for the helpdesk to support…
I don’t know enough about any of this stuff to know whether it makes sense rebuilding the VM we currently have, in which services roaming around the base VM, with a completely containerised version. The core requirement from the user perspective is that a student should be able to download a base box, fire it up as easily as possible in Virtualbox, and then access a control panel (ideally in a browser) that allows them to start up and shut down applications-in-containers, as well as seeing a clear dashboard view of what services are up and running and what (localhost) ports they can be accessed on through their browser.
If you can help talk me through what the issues are or might be, and whether any of the above makes sense or is complete nonsense, I would be most grateful…
PS Mongo network partition activity example