A DevOps Approach to Common Environment Educational Software Provisioning and Deployment

In Distributing Software to Students in a BYOD Environment, I briefly reviewed a paper that described a paper that reported on the use of Debian metapackages to support the configuration of Linux VMs for particular courses (each course has its own Debian metapackage that could install all the packages required for that course).

This idea of automating the build of machines comes under the wider banner of DevOps (development and operations). In a university setting, we might view this in several ways:

  • the development of course related software environments during course production, the operational distribution and deployment of software to students, updating and support of the software in use, and maintenance and updating of software between presentations of a course;
  • the development of software environments for use in research, the operation of those environments during the lifetime of a resarch project, and the archiving of those environments;
  • the development and operation of institutional IT services.

In an Educause review from 2014 (Cloud Strategy for Higher Education: Building a Common Solution, EDUCAUSE Center for Analysis and Research (ECAR) Research Bulletin, November, 2014 [link]), a pitch for universities making greater use of cloud services, the authors make the observation that:

In order to make effective use of IaaS [Infrastructure as a Service], an organization has to adopt an automate-first mindset. Instead of approaching servers as individual works of art with artisan application configurations, we must think in terms of service-level automation. From operating system through application deployment, organizations need to realize the ability to automatically instantiate services entirely from source code control repositories.

This is the approach I took from the start when thinking about the TM351 virtual machine, focussing more on trying to identify production, delivery, support and maintenance models that might make sense in a factory production model that should work in a scaleable way not only across presentations of the same course, as well as across different courses, but also across different platforms (students own devices, OU managed cloud hosts, student launched commercial hosts) rather than just building a bespoke, boutique VM for a single course. (I suspect the module team would have preferred my focussing on the latter – getting something that works reliably, has been rigorously tested, and can be delivered to students rather than pfaffing around with increasingly exotic and still-not-really-invented-yet tooling that I don’t really understand to automate production of machines from scratch that still might be a bit flaky!;-)

Anyway, it seems that the folk at Berkeley have been putting together a “Common Scientific Compute Environment for Research and Education” [Clark, D., Culich, A., Hamlin, B., & Lovett, R. (2014). BCE: Berkeley’s Common Scientific Compute Environment for Research and Education, Proceedings of the 13th Python in Science Conference (SciPy 2014).]

The BCE – Berkeley Common Environment – is “a standard reference end-user environment” consisting of a simply skinned Linux desktop running in virtual machine delivered as a Virtualbox appliance that “allows for targeted instructions that can assume all features of BCE are present. BCE also aims to be stable, reliable, and reduce complexity more than it increases it”. The development team adopted a DevOps style approach customised for the purposes of supporting end-user scientific computing, arising from the recognition that they “can’t control the base environment that users will have on their laptop or workstation, nor do we wish to! A useful environment should provide consistency and not depend on or interfere with users’ existing setup”, further “restrict[ing] ourselves to focusing on those tools that we’ve found useful to automate the steps that come before you start doing science. Three main frames of reference were identified:

  • instructional: students could come from all backgrounds and often unlikely to have sys admin skills over and above the ability to use a simple GUI approach to software installation: “The most accessible instructions will only require skills possessed by the broadest number of people. In particular, many potential students are not yet fluent with notions of package management, scripting, or even the basic idea of commandline interfaces. … [W]e wish to set up an isolated, uniform environment in its totality where instructions can provide essentially pixel-identical guides to what the student will see on their own screen.”
  • scientific collaboration: that is, the research context: “It is unreasonable to expect any researcher to develop code along with instructions on how to run that code on any potential environment.” In addition, “[i]t is common to encounter a researcher with three or more Python distributions installed on their machine, and this user will have no idea how to manage their command-line path, or which packages are installed where. … These nascent scientific coders will have at various points had a working system for a particular task, and often arrive at a state in which nothing seems to work.”
  • Central support: “The more broadly a standard environment is adopted across campus, the more familiar it will be to all students”, with obvious benefits when providing training or support based on the common environment.

Whilst it was recognised the personal laptop computers are perhaps the most widely used platform, the team argued that the “environment should not be restricted to personal computers”. Some scientific computing operations are likely to stretch the resources of a personal laptop, so the environment should also be capable of running on other platforms such as hefty workstations or on a scientific computing cloud.

The first consideration was to standardise on an O/S: Linux. Since the majority of users don’t run Linux machines, this required the use of a virtual machine (VM) to host the Linux system, whilst still recognising that “one should assume that any VM solution will not work for some individuals and provide a fallback solution (particularly for instructional environments) on a remote server”.

Another issue that can arise is dealing with mappings between host and guest OS, which vary from system to system – arguing for the utility of an abstraction layer for VM configuration like Vagrant or Packer … . This includes things like portmapping, shared files, enabling control of the display for a GUI vs. enabling network routing for remote operation. These settings may also interact with the way the guest OS is configured.

Reflecting on the “traditional” way of building a computing environment, the authors argued for a more automated approach:

Creating an image or environment is often called provisioning. The way this was done in traditional systems operation was interactively, perhaps using a hybrid of GUI, networked, and command-line tools. The DevOps philosophy encourages that we accomplish as much as possible with scripts (ideally checked into version control!).

The tools explored included Ansible, packer, vagrant and docker:

  • Ansible: to declare what gets put into the machine (alternatives include shell scripts, puppet etc. (For the TM351 monolithic VM, I used puppet.) End-users don’t need to know anything about Ansible, unless they want to develop a new, reproducible, custom environment.
  • packer: used to run the provisioners and construct and package up a base box. Again, end-users don’t need to know anything about this. (For the TM351 monolithic VM, I used vagrant to build a basebox in Virtualbox, and then package it; the power of Packer is that is lets you generate builds from a common source for a variety of platforms (AWS, Virtualbox, etc etc).)
  • vagrant: their description is quite a handy one: “a wrapper around virtualization software that automates the process of configuring and starting a VM from a special Vagrant box image … . It is an alternative to configuring the virtualization software using the GUI interface or the system-specific command line tools provided by systems like VirtualBox or Amazon. Instead, Vagrant looks for a Vagrantfile which defines the configuration, and also establishes the directory under which the vagrant command will connect to the relevant VM. This directory is, by default, synced to the guest VM, allowing the developer to edit the files with tools on their host OS. From the command-line (under this directory), the user can start, stop, or ssh into the Vagrant-managed VM. It should be noted that (again, like Packer) Vagrant does no work directly, but rather calls out to those other platform-specific command-line tools.” However, “while Vagrant is conceptually very elegant (and cool), we are not currently using it for BCE. In our evaluation, it introduced another piece of software, requiring command-line usage before students were comfortable with it”. This is one issue we are facing with the TM351 VM – current the requirement to use vagrant to manage the VM from the commandline (albeit this only really requires a couple of commands – we can probably get away with just: vagrant up && vagrant provision and vagrant suspend – but also has a couple of benefits, like being able to trivially vagrant ssh in to the VM if absolutely necessary…).
  • docker: was perceived as adding complexity, both computationally and conceptually: “Docker requires a Linux environment to host the Docker server. As such, it clearly adds additional complexity on top of the requirement to support a virtual machine. … the default method of deploying Docker (at the time of evaluation) on personal computers was with Vagrant. This approach would then also add the complexity of using Vagrant. However, recent advances with boot2docker provide something akin to a VirtualBox-only, Docker-specific replacement for Vagrant that eliminates some of this complexity, though one still needs to grapple with the cognitive load of nested virtual environments and tooling.” The recent development of Kitematic addresses some of the use-case complexity, and also provides GUI based tools for managing some of the issues described above associate with portmapping, file sharing etc. Support for linked container compositions (using Docker Compose) is still currently lacking though…

At the end of the day, Packer seems to rule them all – coping as it does with simple installation scripts and being able to then target the build for any required platform. The project homepage is here: Berkeley Common Environment and the github repo here: Berkeley Common Environment (Github).

The paper also reviewed another common environment – OSGeo. Once again built on top of a common Linux base, well documented shell scripts are used to define package installations: “[n]otably, the project uses none of the recent DevOps tools. OSGeo-Live is instead configured using simple and modular combinations of Python, Perl and shell scripts, along with clear install conventions and examples. Documentation is given high priority. … Scripts may call package managers, and generally have few constraints (apart from conventions like keeping recipes contained to a particular directory)”. In addition, “[s]ome concerns, like port number usage, have to be explicitly managed at a global level”. This approach contrasts with the approach reviewed in Distributing Software to Students in a BYOD Environment where Debian metapackages were used to create a common environment installation route.

The idea of a common environment is a good one, and that would work particularly well in a curriculum such as Computing, I think? One main difference between the BCE approach and the TM351 approach is that BCE is self-contained and runs a desktop environment within the VM, whereas the TM351 environment uses a headless VM and follows more of a microservice approach that publishes HTML based service UIs via http ports that can be viewed in a browser. One disadvantage of the latter approach is that you need to keep a more careful eye on port assignments (in the event of collisions) when running the VM locally.