Seeding Shared Folders With Files Distributed via a VM

For the first few presentations of our Data Management and Analysis course, the course VM has been distributed to students via a USB mailing. This year, I’m trying to move to a model whereby the primary distribution is via a download from VagrantCloud (students manage the VM using Vagrant), though we’re also hoping to be able to offer access to an OU OpenStack hosted VM to any student’s who really need it.

For students on Microsoft Windows computers, an installer installs Virtualbox and vagrant from installers distributed via the USB memory stick. This in part derives from the policy of fixing versions of as much as we can so that it can be tested in advance. The installer also creates a working directory for the course that will be shared by the VM, and copies required files, again from the memory stick, into the shared folder. On Macs and Linux, students have to do this setup themselves.

One of the things I have consciouslystarted trying to do is move the responsibility for satisficing of some of the installation requirements into the Vagrantfile. (I’m also starting to think they should be pushed even deeper into the VM itself.)

For example, as some of the VM services expect particular directories to exist in the shared directory, we have a couple of defensive measures in place:

  • the Vagrantfile creates any required, yet missing, subdirectories in the shared directory;
            #Make sure that any required directories are created
            config.vm.provision :shell, :inline => <<-SH
                mkdir -p /vagrant/notebooks
                mkdir -p /vagrant/openrefine_projects
                mkdir -p /vagrant/logs
                mkdir -p /vagrant/data
                mkdir -p /vagrant/utilities
                mkdir -p /vagrant/backups
                mkdir -p /vagrant/backups/postgres-backup/
                mkdir -p /vagrant/backups/mongo-backup/	
            SH
    

  • start up scripts for services that require particular directories check they exist before they are started and create them if they are missing. For example, in the service file, go defensive with something like ExecStartPre=mkdir -p /vagrant/notebooks.

The teaching material associated with the (contents of) the VM is distributed using a set of notebooks downloaded from the VLE. Part of the reason for this is that it delays the point at which the course notebooks must be frozen: the USB is mastered late July/early August for a mailing in September and course start in October.

As well as the course notebooks are a couple of informal installation test notebooks. This can be frozen along with the VM and distributed inside it, but the question then arises. So this year I am trying out a simple pattern that bakes test files into the VM and then uses the Vagranfile to copy the files into the shared directory on its first run with a particular shared folder:

config.vm.provision :shell, :inline => <<-SH
    if [ ! -f /vagrant/.firstrun_nbcopy.done ]; then
        # Trust notebooks in immediate child directories of notebook directory
        files=(`find /opt/notebooks/* -maxdepth 2 -name "*.ipynb"`)
        if [ ${#files[@]} -gt 0 ]; then
            jupyter trust /opt/notebooks/*.ipynb;
            jupyter trust /opt/notebooks/*/*.ipynb;
        fi
        #Copy notebooks into shared directory
        cp -r /opt/notebooks/. /vagrant/notebooks
        touch /vagrant/.firstrun_nbcopy.done
    fi
   SH

This pattern allows files shipped inside the VM to be copied into the shared folder once it is mounted into the VM from host. The files will then persist inside the shared directory, along with a hidden flag file to say the files have been copied. I’m not sure about the benefits of auto-running something inside the VM to manage this copying? Or whether to check that a more recent copy of the files to be copied doesn’t already exist in the shared folder before copying on the first run in the folder?