Tagged: TM351

First Attempt at Running the TM351 VM as an AMI on Amazon Web Services

One of the things that’s been on my to do list for ages is trying to get a version of the TM351 virtual machine (VM) up and running on Amazon Web Services (AWS) as an Amazon Machine Instance (AMI). This would allow students who are having trouble running the VM on their own computer to access the services running in the cloud.

(Obviously, it would be preferable if we could offer such a service via OU operated servers, but I can’t do politics well enough, and don’t have the mentality to attend enough of the necessary say-the-same-thing-again-again meetings, to make that sort of thing happen.)

So… a first attempt is up on the eu-west-1 region in all its insecure glory: TM351 AMI v1. The security model is by obscurity as much as anything – there’s no model for setting separate passwords for separate students, for example, or checking back agains an OU auth layer. And I suspect everything runs as root…

(One of the things we have noticed in (brief) testing is that the Getting Started instructions don’t work inside the OU, at least if you try to limit access to your (supposed) IP address. Reminds of when we gave up trying to build the OU VM from machines on the OU network because solving proxy and blocked port issues was an irrelevant problem to have to worry about when working from the outside…)

Open Refine doesn’t seem to want to run with the other services in the free tier micro (1GB) machine instance, but at 2GB everything seems okay. (I don’t know if possible race conditions in starting services means that Open Refine could start and then block the Jupyter service’s request for resource.  I need to do an Apollo 13 style startup sequence exploration to see if all services can run in 1GB, I guess!) One thing I’ve added to the to do list is to split things out so into separate AMIs that will work on the 1GB free tier machines. I also want to check that I can provision the AMI from Vagrant, so students could then launch a local VM or an Amazon Instance that way, just by changing the vagrant provider. (Shared folders/volumes might get a bit messed up in that case, though?)

If services can run one at a time in the 1GB machines, it’d be nice to provide a simple dashboard to start and stop the services to make that easier to manage. Something that looks a bit like this, for example, exposed via an authenticated web page:

This needn’t be too complex – I had in mind a simple Python web app that could run under nginx (which currently provides a simple authentication layer for Open Refine to sit behind) and then just runs simple systemctl start, stop and restart commands on the appropriate service.

import os
os.system('systemctl restart jupyter.service')

I’m not sure how the status should be updated (based on whether a service is running or not) or what heartbeat it should update to. There may be better ways, of course, in which case please let me know via the comments:-)

I did have a quick look round for examples, but the dashboards/monitoring tools that do exist, such as pydash, are far more elaborate than what I had in mind. (If you know of a simple example to do the above, or can knock one up for me, please let me know via the comments. And the simpler the better ;-)

If we are to start exploring the use of browser accessed applications running inside user-managed VMs, this sort of simple application could be really handy… (Another approach would be to use a VM running docker, and then have a container manager running, such as portainer.)

Creating a Jupyter Bundler Extension to Download Zipped Notebook and HTML Files

In the first version of the TM351 VM, we had a simple toolbar extension that would download a zipped ipynb file, along with an HTML version of the notebook, so it could be uploaded and previewed in the OU Open Design Studio. (Yes, I know, it would have been much better to have an nbviewer handler as an ODS plugin, but the we don’t do that sort of tech innovation, apparently…)

Looking at updating the extension today for the latest version of Jupyter notebooks, I noticed the availability of custom bundler extensions that allow you to add additional tools to support notebook downloads and deployment (I’m not sure what deployment relates to?). Adding a new download option allows it to be added to the notebook Edit -> Download menu:

The extension is created as a python package:

# odszip/setup.py
from setuptools import setup

      description='Save Jupyter notebook and HTML in zip file with .nbk suffix',

# Copyright (c) The Open University, 2017
# Copyright (c) Jupyter Development Team.

# Distributed under the terms of the Modified BSD License.
# Based on: https://github.com/jupyter-incubator/dashboards_bundlers/

import os
import shutil
import tempfile

def _jupyter_bundlerextension_paths():
    '''API for notebook bundler installation on notebook 5.0+'''
    return [{
                'name': 'odszip_download',
                'label': 'ODSzip (.nbk)',
                'module_name': 'odszip.download',
                'group': 'download'

def make_download_bundle(abs_nb_path, staging_dir, tools):
	Assembles the notebook and resources it needs, returning the path to a
	zip file bundling the notebook and its requirements if there are any,
	the notebook's path otherwise.
	:param abs_nb_path: The path to the notebook
	:param staging_dir: Temporary work directory, created and removed by the caller
	# Clean up bundle dir if it exists
	shutil.rmtree(staging_dir, True)
	# Get name of notebook from filename
	notebook_basename = os.path.basename(abs_nb_path)
	notebook_name = os.path.splitext(notebook_basename)[0]
	# Add the notebook
	shutil.copy2(abs_nb_path, os.path.join(staging_dir, notebook_basename))
	# Include HTML version of file
	cmd='jupyter nbconvert --to html "{abs_nb_path}" --output-dir "{staging_dir}"'.format(abs_nb_path=abs_nb_path,staging_dir=staging_dir)

	zip_file = shutil.make_archive(staging_dir, format='zip', root_dir=staging_dir, base_dir='.')
	return zip_file

def bundle(handler, model):
	Downloads a notebook as an HTML file and zips it with the notebook
	# Based on https://github.com/jupyter-incubator/dashboards_bundlers
	abs_nb_path = os.path.join(
	notebook_basename = os.path.basename(abs_nb_path)
	notebook_name = os.path.splitext(notebook_basename)[0]
	tmp_dir = tempfile.mkdtemp()

	output_dir = os.path.join(tmp_dir, notebook_name)
	bundle_path = make_download_bundle(abs_nb_path, output_dir, handler.tools)
	handler.set_header('Content-Disposition', 'attachment; filename="%s"' % (notebook_name + '.nbk'))
	handler.set_header('Content-Type', 'application/zip')
	with open(bundle_path, 'rb') as bundle_file:


	# We read and send synchronously, so we can clean up safely after finish
	shutil.rmtree(tmp_dir, True)

We can then create the python package and install the extension, remmebering to restart the Jupyter server for the extension to take effect.

#Install the ODSzip extension package
pip3 install --upgrade --force-reinstall ./odszip

#Enable the ODSzip extension
jupyter bundlerextension enable --py odszip.download  --sys-prefix

Getting Web Services Up and Running on MicroSoft Azure Using Vagrant and the Azure CLI

As well as Getting Web Services Up and Running on Amazon Web Services (AWS) Using Vagrant and the AWS CLI, we can also use Vagrant to provision machines on other web hosts, such as the Microsoft Azure cloud paltform. In this post, I’ll describe a command line based recipe for doing just that.

To start with, you’ll need to get a Microsoft Azure account.

When you’ve done that, install the Azure command line interface (CLI). On a Mac:

curl -L https://aka.ms/InstallAzureCli | bash

For me, this installed to ~/bin/az.

With the client installed, login:

~/bin/az login

This requires a token based handshake with a Microsoft authentication website.

List the range of machine images available (if you haven’t set the path to az, use the full ~/bin/az):

az vm image list

There was only one that looked suitable to me for my purposes: Canonical:UbuntuServer:16.04-LTS:latest.

To run the provisioner, we need a Subscription ID; this will be used to set the vagrant .subscription_id parameter. These are listed on the Azure Portal.

We also need to create an Active Directory Service Principal:

az ad sp create-for-rbac

This information will be used to configure the Vagrantfile: the appId sets the vagrant .client_id, the password the .client_secret, and the tenant the .tenant_id.

You can also inspect the application in the App Registrations area of the Azure Portal.

Now let’s set up Vagrant. We’ll use the vagrant-azure plugin:

vagrant plugin install vagrant-azure --plugin-version '2.0.0.pre6'

We need to add a dummy box:

vagrant box add azure https://github.com/azure/vagrant-azure/raw/v2.0/dummy.box

Now let’s set up the Vagrantfile:

config.vm.provider :azure do |azure, override|
    #The path to your ssh keys
    override.ssh.private_key_path = '~/.ssh/id_rsa'

    #The default box we added
    override.vm.box = 'azure'
    #Set a territory

    #Provide your own group and VM name

    # Set an appropriate image (the UbuntuServer is actually the current default value)

    #Use a valid subscription ID
    azure.subscription_id = ENV['AZURE_SUBSCRIPTION_ID']

    # Using details from the Active Directory Service Principal setup
    azure.tenant_id = ENV['AZURE_TENANT_ID']
    azure.client_id = ENV['AZURE_CLIENT_ID']
    azure.client_secret = ENV['AZURE_CLIENT_SECRET']


With the Vagrantfile parameters in place, we should then be able call the Azure provider using the command:

vagrant up --provider=azure

But we’re still not quite done… If you’re running services on the VM, populated from elsewhere in the Vagranfile, you’ll need to add some security rules to make the ports accessible. I’m running services on ports 80,35180 and 35181 for example:

az vm open-port -g tm351azuretest -n tm351azuretest --port 80 --priority 130
az vm open-port -g tm351azuretest -n tm351azuretest --port 35180 --priority 140
az vm open-port -g tm351azuretest -n tm351azuretest --port 35181 --priority 150

Now we can lookup the IP address of the server:

az vm list-ip-addresses

and see if our applications are there :-)

Getting Web Services Up and Running on Amazon Web Services (AWS) Using Vagrant and the AWS CLI

From past experience of trying to get things up and running with AWS (Amazon Web Services), it can be a bit of a faff trying to work out what to set where the first time. So here’s an example of how to get a browser based application up and running on EC2 using vagrant from the command line.

(If you want to work through sorting the settings out via the AWS online management console, try Oliver Veits’ tutorial AWS Automation based on Vagrant — Part 2: Installation and Usage of the Vagrant AWS Plugin; you might also need to refer to Part 1: Getting started with AWS.)

This post in part assumes you know how to provision your own virtual machine locally using Vagrant. Here are the steps you need to take to be able to run an AWS provisioner (on a Mac or Linux machine… not sure about Windows?).

First up – sign up for AWS (get credit via the Github Education Pack)…

Pick up some credentials via AWS root Security Credentials (Access Keys (Access Key ID and Secret Access Key)):

Ensure that the key is active (Make Active).

There’s quite a bit of set up to do to configure the provisioner script. This can be done on the command line using the Amazon Command Line Interface (AWS CLI):

pip install --upgrade --user awscli

Now you need to configure the AWS CLI:

aws configure

Use the security credentials you picked up to configure the client*.

When we launch the AWS machine, vagrant needs to be able to access it via ssh using the public IP address automatically assigned to the machine. In deployment too, if we’re building specific services we want to be able to access over the web, we need to open up access to the ports those services are listening on.

By default, the machine will be locked down, so we need to open up specific ports by setting security rules. These are assigned on the basis of a security group. So lets create one of those (mine is named after the course VM I’m building…):

aws ec2 create-security-group --group-name tm351cloud --description "Security group for tm351 services"

We’re going to use this group in the .security_groups parameter in the Vagrantfile.

Now we need to create the security group rules. In my case, I want to open up ssh (port 22) to allow incoming traffic from my IP address, and ports 80, 35180 and 35181 to allow http traffic from anywhere. (The /0 suffix in the rules allows any IP format.)

MYIP=$(curl http://checkip.amazonaws.com/)
aws ec2 authorize-security-group-ingress --group-name tm351cloud --protocol tcp --port 22 --cidr ${MYIP}/0
aws ec2 authorize-security-group-ingress --group-name tm351cloud --protocol tcp --port 80 --cidr
aws ec2 authorize-security-group-ingress --group-name tm351cloud --protocol tcp --port 35180 --cidr
aws ec2 authorize-security-group-ingress --group-name tm351cloud --protocol tcp --port 35181 --cidr

# Check the policies
aws ec2 describe-security-groups --group-names tm351cloud

Having opened up at least the ssh port 22, we need to set up some SSH keys with a particular name (vagrantaws) that we will use with the vagrant .keypair_name parameter, and save them to a local file (vagrantaws.pem) with the appropriate permissions.

aws ec2 create-key-pair --key-name vagrantaws --query 'KeyMaterial' --output text > vagrantaws.pem
chmod 400 vagrantaws.pem

The vagrant provisioner also requires specific access tokens (.access_key_id, .secret_access_key, .session_token) to access EC2. Create these tokens, entering your own duration (in seconds):

aws sts get-session-token --duration-seconds 129600

Now we can start to look at the Vagrant set up. Install the vagrant AWS provisioner:

vagrant plugin install vagrant-aws

After setting up the Vagrantfile, you will be able to provision your machine on AWS using:

vagrant up --provider=aws

Add a dummy box:

vagrant box add awsdummy https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box

Now let’s look at the Vagrantfile:

#Set up the provider block
config.vm.provider :aws do |aws, override|

    #Use the ec2 security group set previously
    aws.security_groups = ["tm351cloud"]

    #Whatever name we want
    override.vm.hostname = "tm351aws"

    #The name of the dummy box we added
    override.vm.box = "awsdummy"
    #Set up machine access using our keypair name and ssh key path
    override.ssh.username = "ubuntu"
    override.ssh.private_key_path = "vagrantaws.pem"

    #Use the values generated by the session token generator
    aws.access_key_id = "YOUR_KEY_ID"
    aws.secret_access_key = "YOUR_SECRET_ACCESS_KEY"
    aws.session_token = "YOUR_SESSION_TOKEN"

    #Specify a region and valid ami for that region, along with the desired instance size
    aws.region = "eu-west-1"
    aws.ami = "ami-971238f1" 


Running vagrant up --provider=aws should run the Vagrant provisioner with the AWS provider. Running vagrant destroy will tear down the machine (so you don’t keep paying for it… I think the users, security groups and keypairs are free?)

To check on the IP address of your instance, run:

aws ec2 describe-instances

or check on the AWS EC2 console. You can also check the machine is ripped down correctly when you have finished with it from there.

(I need to check what happens if you vagrant suspend and then vagrant resume. Presumably, the state is preserved, but you are billed for storage, if not running time?)

*Alternatively, we could create a specific user with more limited credentials.

Create a user we can use to help set up the credentials to use with the vagrant provisioner:

aws iam create-user --user-name vagrant

Now we need to give that user permissions to build our EC2 instance, by attaching an appropriate security policy (AmazonEC2FullAccess). In other words, the Vagrantfile will make use of the AWS vagrant user to provision the machine, so we need to give that AWS user the appropraite permissions on AWS:

aws iam attach-user-policy --user-name vagrant --policy-arn arn:aws:iam::aws:policy/AmazonEC2FullAccess

Generate some keys:

aws iam create-access-key --user-name vagrant

Then run aws configure with the new keys.

Course Apps in the the Cloud – Experimenting With Open Refine on Digital Ocean, Linode and AWS / Amazon EC2 Web Services

With OUr data management and analysis course coming up to its third presentation start in October, various revisions and updates are currently being made to the materials, in part based on feedback from students, in part based the module team’s reflections on how the course material is performing.

We also have an opportunity to update the virtual machine supplied to students, so I’ve spent the last couple of days poking around in the various script rewrites I’ve toyed with over the last couple of years. When we started the course, Jupyter notebooks were still called IPython notebooks, and the ecosystem was still in its infancy. But whilst the module review process means changes are supposed to be kept to a minimum, there is still an opportunity to bake a few more tools into the VM that didn’t exist a couple of years ago when the VM was first gold mastered. (I’ll do a review of some of the Jupyter notebook features that I think should be released into the VM in another post.)

When the VM was first put together, I took it as an opportunity to explore automated build processes. The VM itself was built from Puppet scripts orchestrated from Vagrant, with another Vagrant script managing the machine we delivered to students (setting up shared folders, handling port forwarding, and giving the internal services a kick if required). I also explored a dockerised version, but Docker too was still in its infancy when we first looked at how to best virtualise the services and apps distributed as part of the course materials (IPython/Jupyter notebooks, PostgreSQL, MongoDB and OpenRefine). With Docker now having native versions for recent Macs and Windows platforms, I thought it might be worth exploring again; but OUr student computing policy means we have to build to lowest common denominator machines that are years old (though I’m ignoring the 32 bit hardware platform constraint and we’ll post an online workaround – or ship a Raspberry Pi version of the VM – if we have to!).

So… to demo where I’m at in terms of process, and keep a note to myself, the build has forsaken Puppet and I’ve gone back to simple shell scripts. As an example of most of the tricks I’ve had to invoke, I’ll post recipes for getting OpenRefine up and running on several virtual hosts in several different ways. Still to do is a dockerised version and and RPi version of the TM351 VM config, but I’m hoping the shell scripts will all be reusable (and if not, I’ll try to tweak them so they work as is as part of whatever build process is required…

To begin with, the builder shell scripts are as follows (.sh files all end up requiring execute permissions granted somehow…).

Structure is:


The main build script calls a script to add in base packages, and scripts for each application (in their own folder). I really should have had the same invocation filename or filename pattern (e.g. reusing the directory name) in each build folder.

## ./quickbuild/quick_build.sh
#chmod ugo+x on this file

#!/usr/bin/env bash
#Set the base build directory to the one containing this script
THISDIR=$(dirname "$0")

chmod ugo+x $THISDIR/basepackages.sh
chmod ugo+x $THISDIR/openrefine/openrefine.sh

#Build script for building machine


#tidy up
apt-get autoremove -y && apt-get clean && updatedb

The base packages script does some updating of package lists and then pulls in a range of essential utility packages, some of which are actually required for builds further down the line.

## ./quickbuild/basepackages.sh

#!/usr/bin/env bash

#Build script for building machine
apt-get clean && apt-get -y update && apt-get -y upgrade && apt-get install -y bash-completion vim curl zip unzip bzip2 && apt-get install -y build-essential gcc && apt-get install -y g++ gfortran && apt-get install -y libatlas-base-dev libfreetype6-dev libpng-dev libhdf5-serial-dev && apt-get install -y git python3 python3-dev python3-pip && pip3 install --upgrade pip

The application build files install additional packages specific to the application or its build process. We had some issues with service starts in the original VM (Ubuntu 14.04 LTS), but the service management in Ubuntu 16.04 LTS is much cleaner – and in my own testing so far, much more reliable.

# ./quickbuild/openrefine/openrefine.sh

THISDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

apt-get -y update && apt-get install -y wget ant unzip openjdk-8-jre-headless && apt-get clean -y

echo "Setting up OpenRefine: "

#Prep for download
mkdir -p /opt
mkdir -p /root

if [ ! -f /opt/openrefine.done ]; then
	echo "Downloading OpenRefine..."
	wget -q --no-check-certificate  -P /root https://github.com/OpenRefine/OpenRefine/releases/download/2.7-rc.2/openrefine-linux-2.7-rc.2.tar.gz
	echo "...downloaded OpenRefine"

	echo "Unpacking OpenRefine..."
	tar -xzf /root/openrefine-linux-2.7-rc.2.tar.gz -C /opt  && rm /root/openrefine-linux-2.7-rc.2.tar.gz
	#Unpacks to: /opt/openrefine-2.7-rc.2
	touch /opt/openrefine.done
	echo "...unpacked OpenRefine"
	echo "...already downloaded and unpacked OpenRefine"

cp $THISDIR/services/refine.service /lib/systemd/system/refine.service

# Enable autostart
sudo systemctl enable refine.service

# Refresh service config
sudo systemctl daemon-reload

#(Re)start service
sudo systemctl restart refine.service

Applications are run as services, where possible. If I get a chance – and space/resource requirements allow – I made add some service monitoring to try to ensure application services are always running when the VM is running.

## ./quickbuild/openrefine/services/refine.service

#When to bring the service up
#via https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
#Wait for a network stack to appear
#If we actually need the network to have a routable IP address:

ExecStart=/opt/openrefine-2.7-rc.2/refine -p 3334 -d /vagrant/openrefine_projects


Everything can be packaged up in a zip file with a command (tuned to omit Mac cruft, in part) of the form:

zip -r quickbuild.zip quickbuild -x *.vagrant* -x *.DS_Store -x *.git* -x *.ipynb_checkpoints*

So those are the files and the basic outline. Our initial plan is to run the VMs once again locally on a student’s own machine, using Virtualbox. I think we’ll stick with vagrant to manage this, not least because we can issue updates via new Vagrantfiles, not that we’ve done that to date…

By the by, I’m running vagrant with a handful of plugins:

#Speed up repeated builds
vagrant plugin install vagrant-cachier

#Use correct Virtualbox Guest Additions
vagrant plugin install vagrant-vbguest

#Help with provisioning to virtual hosts
vagrant plugin install vagrant-digitalocean
vagrant plugin install vagrant-linode
vagrant plugin install vagrant-aws

The following Vagrantfile builds the local Virtualbox instance by default. To build to DOgital Ocean or Linode, use the following:

  • vagrant up --provider=digital_ocean
  • vagrant up --provider=linode

I didn’t get the AWS vagrant provisioner to work (too many things to go wrong in terms of settings!)

The Linode build also required a hack to get the box to build correctly…

# ./quickbuild/Vagrantfile

#Vagrantfile for building machine from build scripts

Vagrant.configure("2") do |config|

#------------------------- PROVIDER: VIRTUALBOX (BUILD) ------------------------------

  config.vm.provider :virtualbox do |virtualbox|

      #ubuntu/xenial bug? https://bugs.launchpad.net/cloud-images/+bug/1569237
      config.vm.box = "bento/ubuntu-16.04"
      #Stick with the default key

      #For local testing:
      #config.vm.box = "tm351basebuild"
      #override.vm.box_url = "eg URL on dropbox"
      #config.vm.box_url = "../boxes/test.box"

      config.vm.hostname = "tm351base"

      virtualbox.name = "tm351basebuildbuild"
      #We need the memory to install scipy and build indexes on seeded mongodb
      #After the build it can be reduced back down to 1024
      virtualbox.memory = 2048
      #virtualbox.cpus = 1
      # virtualbox.gui = true

      #---- START PORT FORWARDING ----
      #Registered ports: https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers
      config.vm.network :forwarded_port, guest: 3334, host: 35101, auto_correct: true

      #---- END PORT FORWARDING ----

#------------------------- END PROVIDER: VIRTUALBOX (BUILD) ------------------------------

#------------------------- PROVIDER: DIGITAL OCEAN ------------------------------

config.vm.provider :digital_ocean do |provider, override|
        override.ssh.private_key_path = '~/.ssh/id_rsa'
        override.vm.box = 'digital_ocean'
        override.vm.box_url = "https://github.com/devopsgroup-io/vagrant-digitalocean/raw/master/box/digital_ocean.box"
        provider.token = 'YOUR_TOKEN'
        provider.image = 'ubuntu-16-04-x64'
        provider.region = 'lon1'
        provider.size = '2gb'


#------------------------- END PROVIDER: DIGITAL OCEAN ------------------------------

#------------------------- PROVIDER: LINODE ------------------------------

config.vm.provider :linode do |provider, override|
    override.ssh.private_key_path = '~/.ssh/id_rsa'
    override.vm.box = 'linode/ubuntu1604'

    provider.api_key = 'YOUR KEY'
    provider.distribution = 'Ubuntu 16.04 LTS'
    provider.datacenter = 'london'
    provider.plan = 'Linode 2048'

    #grub needs updating - but want's to do it interactively
    #this bit of voodoo from Stack Overflow hacks a non-interactive install of it
    override.vm.provision :shell, :inline => <<-SH
    	apt-get update && DEBIAN_FRONTEND=noninteractive apt-get -y -o DPkg::options::="--force-confdef" -o DPkg::options::="--force-confold"  install grub-pc


#------------------------- END PROVIDER: LINODE ------------------------------

#------------------------- PROVIDER: AWS ------------------------------


  config.vm.provider :aws do |aws, override|
  	config.vm.hostname = "tm351aws"
  	#vagrant box add dummy https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box
    override.vm.box = "dummy"
    aws.access_key_id = ""
    aws.secret_access_key = ""

    #Download and install the Amazon Command Line Interface
    #Configure the command line interface
    #$aws configure
    #Request the session token
    #$aws sts get-session-token --duration-seconds 129600 (enter your own duration)
    aws.session_token = ""

    #Keypair also generated via AWS console?
    aws.keypair_name = "vagrantAWSkeypair"

    aws.region = "eu-west-2a"
    aws.ami = "ami-ed908589"

    override.ssh.username = "ubuntu"
    override.ssh.private_key_path =  '~/.ssh/id_rsa'



#------------------------- END PROVIDER: AWS ------------------------------


  config.vm.provision :shell, :inline => <<-SH
  	#Add build scripts here
  	cd /vagrant/build
  	source ./quick_build.sh


(The vagrant script can be tidied to hide keys by setting eg export DIGITAL_OCEAN_TOKEN="YOUR TOKEN HERE" from the command line you call vagrant from, and in the Vagrantfile setting provider.token = ENV['DIGITAL_OCEAN_TOKEN']).)

One of the nice things about the current version of vagrant is that you have to destroy a machine before launching another one of the same name with a different provisioners (though this looks set to change in forthcoming versions of vagrant). Why nice? Because the vagrant destroy command kills the node the machine is running on – so it won’t be left running and you won’t forget to turn it off (and won’t keep the meter running….)

Firing up the boxes on various hosts, go to port 3334 at the appropriate IP address and you should see OpenRefine running there…

Having failed to get the machine up and running on AWS, I though I’d try the simple route of packaging an AMI using Packer.

The build script was remarkably simple – once I got one that worked!


  "variables": {
    "aws_access_key": "",
    "aws_secret_key": ""
  "builders": [{
    "type": "amazon-ebs",
    "access_key": "{{user `aws_access_key`}}",
    "secret_key": "{{user `aws_secret_key`}}",
    "region": "eu-west-1",
    "source_ami": "ami-971238f1",
    "instance_type": "t2.micro",
    "ssh_username": "ubuntu",
    "ami_name": "openrefine",
    "security_group_id": "OPTIONAL_YOUR_VAGRANT_GROUP"

  "provisioners": [

      "destination": "/tmp/",
      "source": "./toupload/",
      "type": "file"
      "inline": [
        "cd /tmp && sudo apt-get update && sudo apt-get install unzip && sudo unzip /tmp/quickbuild.zip -d /tmp && sudo chmod ugo+x /tmp/quickbuild/quick_build.sh && sudo /tmp/quickbuild/quick_build.sh "
      "type": "shell"


(The eu-west-2 (London) region wasn’t recognised by Packer for some reason…)

The machine can now be built on AWS and packaged as an AMI using Packer as follows (top level security tokens can be generated from the AWS Security Credentials console):

#Package the build files
mkdir -p toupload && zip -r toupload/quickbuild.zip quickbuild -x *.vagrant* -x *.DS_Store -x *.git* -x *.ipynb_checkpoints*

#Pack the machine
packer build -var 'aws_access_key=YOUR_KEY' -var 'aws_secret_key=YOUR_SECRET' awspacker.json

Launching an instance of this AMI, I found that I couldn’t connect to the OpenRefine port (it just hung). The fix was to amend the automatically created security group rules (which by default just allow ssh on port 22) with a a Custom TCP rule that allowed incoming traffic on port 3334 from All Domains.

Which meant success:

To simplify matters, I then copied this edited security group to my own “openrefine” security group that I could use as the basis of the AMI packaging.

Just one thing to note about creating an AMI – Amazon will start billing you for it… As the Packer Getting Started guide suggests:

After running the above example, your AWS account now has an AMI associated with it. AMIs are stored in S3 by Amazon, so unless you want to be charged about $0.01 per month, you’ll probably want to remove it. Remove the AMI by first deregistering it on the AWS AMI management page. Next, delete the associated snapshot on the AWS snapshot management page.

Next up, I need to try a full build of the TM351 VM on AWS (a full build without the Mongo shard activity (which I couldn’t get to work yesterday – though this looks like it could provide a handy helper script (and I maybe also need to work through this.) The fuller build seems fine from the vagrant script in Virtualbox, Digital Ocean and Linode.

After that (and fixing the Mongo sharding thing), I’ll see if I can weave the build scripts into a set of interconnected Docker containers, one Dockerfile per application and a docker-compose.yml to weave them together. (See the original test from way back when.)

And then there’ll just be the look-see to see whether we can get the machine built and running on a Raspberry Pi 3 model B.

I also started wondering about whether I should pop a simple Flask app into the VM on port 80, showing an OU splash screen and a “Welcome to TM351” message… If I can get that running, then we have a means of piping stuff into a web page on the students’ own machines that is completely out of the controlling hands of LTS:-)

PS for an example of how to set up authentication over these services, see: Simple Authenticated Access to VM Services Using NGINX and Vagrant Port Forwarding.

Running Docker Container Compositions in the Cloud on Digital Ocean

With TM351 about to start, I thought I’d revisit the docker container approach to delivering the services required by the course to see if I could get something akin to the course software running in the cloud.

Copies of the dockerfiles used to create the images can be found on Github, with prebuilt images available on dockerhub (https://hub.docker.com/r/psychemedia/ou-tm351-*).

A couple of handy ones out of the can are:

That said, at the current time, the images are not intended for use as part of the course

The following docker-compose.yml file will create a set of linked containers that resemble (ish!) the monolithic VM we distributed to students as a Virtualbox box.

    container_name: tm351-dockerui
    image: dockerui/dockerui
        - "35183:9000"
        - /var/run/docker.sock:/var/run/docker.sock
    privileged: true

    container_name: tm351-devmongodata
    command: echo mongodb_created
    #Share same layers as the container we want to link to?
    image: mongo:3.0.7
        - /data/db

    container_name: tm351-mongodb
    image: mongo:3.0.7
        - "27017:27017"
        - devmongodata
    command: --smallfiles

    container_name: tm351-mongodb-seed
    image: psychemedia/ou-tm351-mongodb-simple-seed
        - mongodb

    container_name: tm351-devpostgresdata
    command: echo created
    image: busybox
        - /var/lib/postgresql/data
    container_name: tm351-postgres
    image: psychemedia/ou-tm351-postgres
        - "5432:5432"

    container_name: tm351-openrefine
    image: psychemedia/tm351-openrefine
        - "35181:3333"
    privileged: true
    container_name: tm351-notebook
    #build: ./tm351_scipystacknserver
    image: psychemedia/ou-tm351-pystack
        - "35180:8888"
        - postgres:postgres
        - mongodb:mongodb
        - openrefine:openrefine
    privileged: true

Place a copy of the docker-compose.yml YAML file somewhere, from Kitematic, open the command line, cd into the directory containing the YAML file, and enter the command docker-compose up -d – the images are on dockerhub and should be downloaded automatically.

Refer back to Kitematic and you should see running containers – the settings panel for the notebooks container shows the address you can find the notebook server at.


The notebooks and OpenRefine containers should also be linked to shared folders in the directory you ran the Docker Compose script from.

Running the Containers in the Cloud – Docker-Machine and Digital Ocean

As well as running the linked containers on my own machine, my real intention was to see how easy it would be to get them running in the cloud and using just the browser on my own computer to access them.

And it turns out to be really easy. The following example uses cloud host Digital Ocean.

To start with, you’ll need a Digital Ocean account with some credit in it and a Digital Ocean API token:


(You may be able to get some Digital Ocean credit for free as part of the Github Education Student Developer Pack.)

Then it’s just a case of a few command line instructions to get things running using Docker Machine:

docker-machine ls
#kitematic usess: default

#Create a droplet on Digital Ocean
docker-machine create -d digitalocean --digitalocean-access-token YOUR_ACCESS_TOKEN --digitalocean-region lon1 --digitalocean-size 4gb ou-tm351-test 

#Check the IP address of the machine
docker-machine ip ou-tm351-test

#Display information about the machine
docker-machine env ou-tm351-test
#This returns necessary config details
#For example:
##export DOCKER_TLS_VERIFY="1"
##export DOCKER_HOST="tcp://IP_ADDRESS:2376"
##export DOCKER_CERT_PATH="/Users/YOUR-USER/.docker/machine/machines/ou-tm351-test"
##export DOCKER_MACHINE_NAME="ou-tm351-test"
# Run this command to configure your shell: 
# eval $(docker-machine env ou-tm351-test)

#Set the environment variables as recommended
export DOCKER_HOST="tcp://IP_ADDRESS:2376"
export DOCKER_CERT_PATH="/Users/YOUR-USER/.docker/machine/machines/ou-tm351-test"

#Run command to set current docker-machine
eval "$(docker-machine env ou-tm351-test)"

#If the docker-compose.yml file is in .
docker-compose up -d
#This will launch the linked containers on Digital Ocean

#The notebooks should now be viewable at:

#OpenRefine should now be viewable at:

#To stop the machine
docker-machine stop ou-tm351-test
#To remove the Digital Ocean droplet (so you stop paying for it...
docker-machine rm ou-tm351-test

#Reset the current docker machine to the Kitematic machine
eval "$(docker-machine env default)"

So that’s a start. Issues still arise in terms of persisting state, such as the database contents, notebook files* and OpenRefine projects: if you leave the containers running on Digital Ocean to persist the state, the meter will keep running.

(* I should probably also build a container that demos how to bake a few example notebooks into a container running the notebook server and TM351 python distribution.)

A Peek Inside the TM351 VM

So this is how I currently think of the TM351 VM:


What would be nice would be a drag’n’drop tool to let me draw pictures like that that would then generate the build scripts… (a docker compose script, or set of puppter scripts, for the architectural bits on the left, and a Vagrantfile to set up the port forwarding, for example).

For docker, I wouldn’t have thought that would be too hard – a docker compose file could describe most of that picture, right? Not sure how fiddly it would be for a more traditional VM, though, depending on how it was put together?