Tagged: TM351

Creating a Jupyter Bundler Extension to Download Zipped Notebook and HTML Files

In the first version of the TM351 VM, we had a simple toolbar extension that would download a zipped ipynb file, along with an HTML version of the notebook, so it could be uploaded and previewed in the OU Open Design Studio. (Yes, I know, it would have been much better to have an nbviewer handler as an ODS plugin, but the we don’t do that sort of tech innovation, apparently…)

Looking at updating the extension today for the latest version of Jupyter notebooks, I noticed the availability of custom bundler extensions that allow you to add additional tools to support notebook downloads and deployment (I’m not sure what deployment relates to?). Adding a new download option allows it to be added to the notebook Edit -> Download menu:

The extension is created as a python package:

# odszip/setup.py
from setuptools import setup

setup(name='odszip',
      version='0.0.1',
      description='Save Jupyter notebook and HTML in zip file with .nbk suffix',
      author='',
      author_email='',
      license='MIT',
      packages=['odszip'],
      zip_safe=False)
#odszip/odszip/download.py

# Copyright (c) The Open University, 2017
# Copyright (c) Jupyter Development Team.

# Distributed under the terms of the Modified BSD License.
# Based on: https://github.com/jupyter-incubator/dashboards_bundlers/

import os
import shutil
import tempfile

#THIS IS A REQUIRED FUNCTION
def _jupyter_bundlerextension_paths():
    '''API for notebook bundler installation on notebook 5.0+'''
    return [{
                'name': 'odszip_download',
                'label': 'ODSzip (.nbk)',
                'module_name': 'odszip.download',
                'group': 'download'
            }]


def make_download_bundle(abs_nb_path, staging_dir, tools):
	'''
	Assembles the notebook and resources it needs, returning the path to a
	zip file bundling the notebook and its requirements if there are any,
	the notebook's path otherwise.
	:param abs_nb_path: The path to the notebook
	:param staging_dir: Temporary work directory, created and removed by the caller
	'''
    
	# Clean up bundle dir if it exists
	shutil.rmtree(staging_dir, True)
	os.makedirs(staging_dir)
	
	# Get name of notebook from filename
	notebook_basename = os.path.basename(abs_nb_path)
	notebook_name = os.path.splitext(notebook_basename)[0]
	
	# Add the notebook
	shutil.copy2(abs_nb_path, os.path.join(staging_dir, notebook_basename))
	
	# Include HTML version of file
	cmd='jupyter nbconvert --to html "{abs_nb_path}" --output-dir "{staging_dir}"'.format(abs_nb_path=abs_nb_path,staging_dir=staging_dir)
	os.system(cmd)

	zip_file = shutil.make_archive(staging_dir, format='zip', root_dir=staging_dir, base_dir='.')
	return zip_file

#THIS IS A REQUIRED FUNCTION       
def bundle(handler, model):
	'''
	Downloads a notebook as an HTML file and zips it with the notebook
	'''
	
	# Based on https://github.com/jupyter-incubator/dashboards_bundlers
	
	abs_nb_path = os.path.join(
		handler.settings['contents_manager'].root_dir,
		model['path']
	)
		
	notebook_basename = os.path.basename(abs_nb_path)
	notebook_name = os.path.splitext(notebook_basename)[0]
	
	tmp_dir = tempfile.mkdtemp()

	output_dir = os.path.join(tmp_dir, notebook_name)
	bundle_path = make_download_bundle(abs_nb_path, output_dir, handler.tools)
		
	handler.set_header('Content-Disposition', 'attachment; filename="%s"' % (notebook_name + '.nbk'))
	
	handler.set_header('Content-Type', 'application/zip')
	
	with open(bundle_path, 'rb') as bundle_file:
		handler.write(bundle_file.read())

	handler.finish()


	# We read and send synchronously, so we can clean up safely after finish
	shutil.rmtree(tmp_dir, True)

We can then create the python package and install the extension, remmebering to restart the Jupyter server for the extension to take effect.

#Install the ODSzip extension package
pip3 install --upgrade --force-reinstall ./odszip

#Enable the ODSzip extension
jupyter bundlerextension enable --py odszip.download  --sys-prefix

Getting Web Services Up and Running on MicroSoft Azure Using Vagrant and the Azure CLI

As well as Getting Web Services Up and Running on Amazon Web Services (AWS) Using Vagrant and the AWS CLI, we can also use Vagrant to provision machines on other web hosts, such as the Microsoft Azure cloud paltform. In this post, I’ll describe a command line based recipe for doing just that.

To start with, you’ll need to get a Microsoft Azure account.

When you’ve done that, install the Azure command line interface (CLI). On a Mac:

curl -L https://aka.ms/InstallAzureCli | bash

For me, this installed to ~/bin/az.

With the client installed, login:

~/bin/az login

This requires a token based handshake with a Microsoft authentication website.

List the range of machine images available (if you haven’t set the path to az, use the full ~/bin/az):

az vm image list

There was only one that looked suitable to me for my purposes: Canonical:UbuntuServer:16.04-LTS:latest.

To run the provisioner, we need a Subscription ID; this will be used to set the vagrant .subscription_id parameter. These are listed on the Azure Portal.

We also need to create an Active Directory Service Principal:

az ad sp create-for-rbac

This information will be used to configure the Vagrantfile: the appId sets the vagrant .client_id, the password the .client_secret, and the tenant the .tenant_id.

You can also inspect the application in the App Registrations area of the Azure Portal.

Now let’s set up Vagrant. We’ll use the vagrant-azure plugin:

vagrant plugin install vagrant-azure --plugin-version '2.0.0.pre6'

We need to add a dummy box:

vagrant box add azure https://github.com/azure/vagrant-azure/raw/v2.0/dummy.box

Now let’s set up the Vagrantfile:

config.vm.provider :azure do |azure, override|
    #The path to your ssh keys
    override.ssh.private_key_path = '~/.ssh/id_rsa'

    #The default box we added
    override.vm.box = 'azure'
    
    #Set a territory
    azure.location="uksouth"

    #Provide your own group and VM name
    azure.resource_group_name="tm351azuretest"
    azure.vm_name="tm351azurevmtest"

    # Set an appropriate image (the UbuntuServer is actually the current default value)
    azure.vm_image_urn="Canonical:UbuntuServer:16.04-LTS:latest"

    #Use a valid subscription ID
    #https://portal.azure.com/#blade/HubsExtension/MyAccessBlade/resourceId/
    azure.subscription_id = ENV['AZURE_SUBSCRIPTION_ID']

    # Using details from the Active Directory Service Principal setup
    azure.tenant_id = ENV['AZURE_TENANT_ID']
    azure.client_id = ENV['AZURE_CLIENT_ID']
    azure.client_secret = ENV['AZURE_CLIENT_SECRET']

end

With the Vagrantfile parameters in place, we should then be able call the Azure provider using the command:

vagrant up --provider=azure

But we’re still not quite done… If you’re running services on the VM, populated from elsewhere in the Vagranfile, you’ll need to add some security rules to make the ports accessible. I’m running services on ports 80,35180 and 35181 for example:

az vm open-port -g tm351azuretest -n tm351azuretest --port 80 --priority 130
az vm open-port -g tm351azuretest -n tm351azuretest --port 35180 --priority 140
az vm open-port -g tm351azuretest -n tm351azuretest --port 35181 --priority 150

Now we can lookup the IP address of the server:

az vm list-ip-addresses

and see if our applications are there :-)

Getting Web Services Up and Running on Amazon Web Services (AWS) Using Vagrant and the AWS CLI

From past experience of trying to get things up and running with AWS (Amazon Web Services), it can be a bit of a faff trying to work out what to set where the first time. So here’s an example of how to get a browser based application up and running on EC2 using vagrant from the command line.

(If you want to work through sorting the settings out via the AWS online management console, try Oliver Veits’ tutorial AWS Automation based on Vagrant — Part 2: Installation and Usage of the Vagrant AWS Plugin; you might also need to refer to Part 1: Getting started with AWS.)

This post in part assumes you know how to provision your own virtual machine locally using Vagrant. Here are the steps you need to take to be able to run an AWS provisioner (on a Mac or Linux machine… not sure about Windows?).

First up – sign up for AWS (get credit via the Github Education Pack)…

Pick up some credentials via AWS root Security Credentials (Access Keys (Access Key ID and Secret Access Key)):

Ensure that the key is active (Make Active).

There’s quite a bit of set up to do to configure the provisioner script. This can be done on the command line using the Amazon Command Line Interface (AWS CLI):

pip install --upgrade --user awscli

Now you need to configure the AWS CLI:

aws configure

Use the security credentials you picked up to configure the client*.

When we launch the AWS machine, vagrant needs to be able to access it via ssh using the public IP address automatically assigned to the machine. In deployment too, if we’re building specific services we want to be able to access over the web, we need to open up access to the ports those services are listening on.

By default, the machine will be locked down, so we need to open up specific ports by setting security rules. These are assigned on the basis of a security group. So lets create one of those (mine is named after the course VM I’m building…):

aws ec2 create-security-group --group-name tm351cloud --description "Security group for tm351 services"

We’re going to use this group in the .security_groups parameter in the Vagrantfile.

Now we need to create the security group rules. In my case, I want to open up ssh (port 22) to allow incoming traffic from my IP address, and ports 80, 35180 and 35181 to allow http traffic from anywhere. (The /0 suffix in the rules allows any IP format.)

MYIP=$(curl http://checkip.amazonaws.com/)
aws ec2 authorize-security-group-ingress --group-name tm351cloud --protocol tcp --port 22 --cidr ${MYIP}/0
aws ec2 authorize-security-group-ingress --group-name tm351cloud --protocol tcp --port 80 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress --group-name tm351cloud --protocol tcp --port 35180 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress --group-name tm351cloud --protocol tcp --port 35181 --cidr 0.0.0.0/0

# Check the policies
aws ec2 describe-security-groups --group-names tm351cloud

Having opened up at least the ssh port 22, we need to set up some SSH keys with a particular name (vagrantaws) that we will use with the vagrant .keypair_name parameter, and save them to a local file (vagrantaws.pem) with the appropriate permissions.

aws ec2 create-key-pair --key-name vagrantaws --query 'KeyMaterial' --output text > vagrantaws.pem
chmod 400 vagrantaws.pem

The vagrant provisioner also requires specific access tokens (.access_key_id, .secret_access_key, .session_token) to access EC2. Create these tokens, entering your own duration (in seconds):

aws sts get-session-token --duration-seconds 129600

Now we can start to look at the Vagrant set up. Install the vagrant AWS provisioner:

vagrant plugin install vagrant-aws

After setting up the Vagrantfile, you will be able to provision your machine on AWS using:

vagrant up --provider=aws

Add a dummy box:

vagrant box add awsdummy https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box

Now let’s look at the Vagrantfile:

#Set up the provider block
config.vm.provider :aws do |aws, override|

    #Use the ec2 security group set previously
    aws.security_groups = ["tm351cloud"]

    #Whatever name we want
    override.vm.hostname = "tm351aws"

    #The name of the dummy box we added
    override.vm.box = "awsdummy"
    
    #Set up machine access using our keypair name and ssh key path
    override.ssh.username = "ubuntu"
    aws.keypair_name="vagrantaws"
    override.ssh.private_key_path = "vagrantaws.pem"

    #Use the values generated by the session token generator
    aws.access_key_id = "YOUR_KEY_ID"
    aws.secret_access_key = "YOUR_SECRET_ACCESS_KEY"
    aws.session_token = "YOUR_SESSION_TOKEN"

    #Specify a region and valid ami for that region, along with the desired instance size
    aws.region = "eu-west-1"
    aws.ami = "ami-971238f1" 
    aws.instance_type="t2.small"

  end

Running vagrant up --provider=aws should run the Vagrant provisioner with the AWS provider. Running vagrant destroy will tear down the machine (so you don’t keep paying for it… I think the users, security groups and keypairs are free?)

To check on the IP address of your instance, run:

aws ec2 describe-instances

or check on the AWS EC2 console. You can also check the machine is ripped down correctly when you have finished with it from there.

(I need to check what happens if you vagrant suspend and then vagrant resume. Presumably, the state is preserved, but you are billed for storage, if not running time?)


*Alternatively, we could create a specific user with more limited credentials.

Create a user we can use to help set up the credentials to use with the vagrant provisioner:

aws iam create-user --user-name vagrant

Now we need to give that user permissions to build our EC2 instance, by attaching an appropriate security policy (AmazonEC2FullAccess). In other words, the Vagrantfile will make use of the AWS vagrant user to provision the machine, so we need to give that AWS user the appropraite permissions on AWS:

aws iam attach-user-policy --user-name vagrant --policy-arn arn:aws:iam::aws:policy/AmazonEC2FullAccess

Generate some keys:

aws iam create-access-key --user-name vagrant

Then run aws configure with the new keys.


Course Apps in the the Cloud – Experimenting With Open Refine on Digital Ocean, Linode and AWS / Amazon EC2 Web Services

With OUr data management and analysis course coming up to its third presentation start in October, various revisions and updates are currently being made to the materials, in part based on feedback from students, in part based the module team’s reflections on how the course material is performing.

We also have an opportunity to update the virtual machine supplied to students, so I’ve spent the last couple of days poking around in the various script rewrites I’ve toyed with over the last couple of years. When we started the course, Jupyter notebooks were still called IPython notebooks, and the ecosystem was still in its infancy. But whilst the module review process means changes are supposed to be kept to a minimum, there is still an opportunity to bake a few more tools into the VM that didn’t exist a couple of years ago when the VM was first gold mastered. (I’ll do a review of some of the Jupyter notebook features that I think should be released into the VM in another post.)

When the VM was first put together, I took it as an opportunity to explore automated build processes. The VM itself was built from Puppet scripts orchestrated from Vagrant, with another Vagrant script managing the machine we delivered to students (setting up shared folders, handling port forwarding, and giving the internal services a kick if required). I also explored a dockerised version, but Docker too was still in its infancy when we first looked at how to best virtualise the services and apps distributed as part of the course materials (IPython/Jupyter notebooks, PostgreSQL, MongoDB and OpenRefine). With Docker now having native versions for recent Macs and Windows platforms, I thought it might be worth exploring again; but OUr student computing policy means we have to build to lowest common denominator machines that are years old (though I’m ignoring the 32 bit hardware platform constraint and we’ll post an online workaround – or ship a Raspberry Pi version of the VM – if we have to!).

So… to demo where I’m at in terms of process, and keep a note to myself, the build has forsaken Puppet and I’ve gone back to simple shell scripts. As an example of most of the tricks I’ve had to invoke, I’ll post recipes for getting OpenRefine up and running on several virtual hosts in several different ways. Still to do is a dockerised version and and RPi version of the TM351 VM config, but I’m hoping the shell scripts will all be reusable (and if not, I’ll try to tweak them so they work as is as part of whatever build process is required…

To begin with, the builder shell scripts are as follows (.sh files all end up requiring execute permissions granted somehow…).

Structure is:

./quickbuild/quick_build.sh
./quickbuild/basepackages.sh
./quickbuild/openrefine/openrefine.sh
./quickbuild/openrefine/services/refine.service

The main build script calls a script to add in base packages, and scripts for each application (in their own folder). I really should have had the same invocation filename or filename pattern (e.g. reusing the directory name) in each build folder.

## ./quickbuild/quick_build.sh
#chmod ugo+x on this file

#!/usr/bin/env bash
#Set the base build directory to the one containing this script
THISDIR=$(dirname "$0")

chmod ugo+x $THISDIR/basepackages.sh
chmod ugo+x $THISDIR/openrefine/openrefine.sh

#Build script for building machine
$THISDIR/basepackages.sh

$THISDIR/openrefine/openrefine.sh

#tidy up
apt-get autoremove -y && apt-get clean && updatedb

The base packages script does some updating of package lists and then pulls in a range of essential utility packages, some of which are actually required for builds further down the line.

## ./quickbuild/basepackages.sh

#!/usr/bin/env bash

#Build script for building machine
apt-get clean && apt-get -y update && apt-get -y upgrade && apt-get install -y bash-completion vim curl zip unzip bzip2 && apt-get install -y build-essential gcc && apt-get install -y g++ gfortran && apt-get install -y libatlas-base-dev libfreetype6-dev libpng-dev libhdf5-serial-dev && apt-get install -y git python3 python3-dev python3-pip && pip3 install --upgrade pip

The application build files install additional packages specific to the application or its build process. We had some issues with service starts in the original VM (Ubuntu 14.04 LTS), but the service management in Ubuntu 16.04 LTS is much cleaner – and in my own testing so far, much more reliable.

# ./quickbuild/openrefine/openrefine.sh
#!/bin/bash

THISDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

apt-get -y update && apt-get install -y wget ant unzip openjdk-8-jre-headless && apt-get clean -y

echo "Setting up OpenRefine: "

#Prep for download
mkdir -p /opt
mkdir -p /root

if [ ! -f /opt/openrefine.done ]; then
	echo "Downloading OpenRefine..."
	wget -q --no-check-certificate  -P /root https://github.com/OpenRefine/OpenRefine/releases/download/2.7-rc.2/openrefine-linux-2.7-rc.2.tar.gz
	echo "...downloaded OpenRefine"

	echo "Unpacking OpenRefine..."
	tar -xzf /root/openrefine-linux-2.7-rc.2.tar.gz -C /opt  && rm /root/openrefine-linux-2.7-rc.2.tar.gz
	#Unpacks to: /opt/openrefine-2.7-rc.2
	touch /opt/openrefine.done
	echo "...unpacked OpenRefine"
else
	echo "...already downloaded and unpacked OpenRefine"
fi

cp $THISDIR/services/refine.service /lib/systemd/system/refine.service

# Enable autostart
sudo systemctl enable refine.service

# Refresh service config
sudo systemctl daemon-reload

#(Re)start service
sudo systemctl restart refine.service

Applications are run as services, where possible. If I get a chance – and space/resource requirements allow – I made add some service monitoring to try to ensure application services are always running when the VM is running.

## ./quickbuild/openrefine/services/refine.service
[Unit]
Description=Refine

#When to bring the service up
#via https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
#Wait for a network stack to appear
After=network.target
#If we actually need the network to have a routable IP address:
#After=network-online.target 

[Service]
Environment=REFINE_HOST=0.0.0.0
ExecStart=/opt/openrefine-2.7-rc.2/refine -p 3334 -d /vagrant/openrefine_projects
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Everything can be packaged up in a zip file with a command (tuned to omit Mac cruft, in part) of the form:

zip -r quickbuild.zip quickbuild -x *.vagrant* -x *.DS_Store -x *.git* -x *.ipynb_checkpoints*

So those are the files and the basic outline. Our initial plan is to run the VMs once again locally on a student’s own machine, using Virtualbox. I think we’ll stick with vagrant to manage this, not least because we can issue updates via new Vagrantfiles, not that we’ve done that to date…

By the by, I’m running vagrant with a handful of plugins:

#Speed up repeated builds
vagrant plugin install vagrant-cachier

#Use correct Virtualbox Guest Additions
vagrant plugin install vagrant-vbguest

#Help with provisioning to virtual hosts
vagrant plugin install vagrant-digitalocean
vagrant plugin install vagrant-linode
vagrant plugin install vagrant-aws

The following Vagrantfile builds the local Virtualbox instance by default. To build to DOgital Ocean or Linode, use the following:

  • vagrant up --provider=digital_ocean
  • vagrant up --provider=linode

I didn’t get the AWS vagrant provisioner to work (too many things to go wrong in terms of settings!)

The Linode build also required a hack to get the box to build correctly…

# ./quickbuild/Vagrantfile

#Vagrantfile for building machine from build scripts

Vagrant.configure("2") do |config|

#------------------------- PROVIDER: VIRTUALBOX (BUILD) ------------------------------

  config.vm.provider :virtualbox do |virtualbox|

      #ubuntu/xenial bug? https://bugs.launchpad.net/cloud-images/+bug/1569237
      config.vm.box = "bento/ubuntu-16.04"
      #Stick with the default key
      config.ssh.insert_key=false

      #For local testing:
      #config.vm.box = "tm351basebuild"
      #override.vm.box_url = "eg URL on dropbox"
      #config.vm.box_url = "../boxes/test.box"

      config.vm.hostname = "tm351base"

      virtualbox.name = "tm351basebuildbuild"
      #We need the memory to install scipy and build indexes on seeded mongodb
      #After the build it can be reduced back down to 1024
      virtualbox.memory = 2048
      #virtualbox.cpus = 1
      # virtualbox.gui = true

      #---- START PORT FORWARDING ----
      #Registered ports: https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers
      #openrefine
      config.vm.network :forwarded_port, guest: 3334, host: 35101, auto_correct: true

      #---- END PORT FORWARDING ----
    end

#------------------------- END PROVIDER: VIRTUALBOX (BUILD) ------------------------------

#------------------------- PROVIDER: DIGITAL OCEAN ------------------------------

config.vm.provider :digital_ocean do |provider, override|
		override.ssh.insert_key=true
        override.ssh.private_key_path = '~/.ssh/id_rsa'
        override.vm.box = 'digital_ocean'
        override.vm.box_url = "https://github.com/devopsgroup-io/vagrant-digitalocean/raw/master/box/digital_ocean.box"
        provider.token = 'YOUR_TOKEN'
        provider.image = 'ubuntu-16-04-x64'
        provider.region = 'lon1'
        provider.size = '2gb'

  end

#------------------------- END PROVIDER: DIGITAL OCEAN ------------------------------

#------------------------- PROVIDER: LINODE ------------------------------

config.vm.provider :linode do |provider, override|
    override.ssh.insert_key=true
    override.ssh.private_key_path = '~/.ssh/id_rsa'
    override.vm.box = 'linode/ubuntu1604'

    provider.api_key = 'YOUR KEY'
    provider.distribution = 'Ubuntu 16.04 LTS'
    provider.datacenter = 'london'
    provider.plan = 'Linode 2048'
    provider.size=2048

    #grub needs updating - but want's to do it interactively
    #this bit of voodoo from Stack Overflow hacks a non-interactive install of it
    override.vm.provision :shell, :inline => <<-SH
    	apt-get update && DEBIAN_FRONTEND=noninteractive apt-get -y -o DPkg::options::="--force-confdef" -o DPkg::options::="--force-confold"  install grub-pc
	SH

  end

#------------------------- END PROVIDER: LINODE ------------------------------

#------------------------- PROVIDER: AWS ------------------------------

  #  I DIDN'T GET THIS TO WORK - MAYBE SEVERAL THINGS WRONG HERE - AND IN AWS SETTINGS ????

  config.vm.provider :aws do |aws, override|
  	config.vm.hostname = "tm351aws"
  	#vagrant box add dummy https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box
    override.vm.box = "dummy"
    aws.access_key_id = ""
    aws.secret_access_key = ""

    #https://github.com/mitchellh/vagrant-aws/issues/405#issuecomment-130342371
    #Download and install the Amazon Command Line Interface
    #http://docs.aws.amazon.com/cli/latest/userguide/installing.html
    #Configure the command line interface
    #http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
    #$aws configure
    #Request the session token
    #$aws sts get-session-token --duration-seconds 129600 (enter your own duration)
    aws.session_token = ""

    #Keypair also generated via AWS console?
    aws.keypair_name = "vagrantAWSkeypair"

    aws.region = "eu-west-2a"
    aws.ami = "ami-ed908589"
    aws.instance_type="t2.small"

    override.ssh.username = "ubuntu"
    override.ssh.private_key_path =  '~/.ssh/id_rsa'

  end

  # NOTE THAT RUNNING THIS PROVISIONER MAY LEAVE THINGS BILL INCURRING ON AWS... SO CHECK

#------------------------- END PROVIDER: AWS ------------------------------

#------------------------------

  config.vm.provision :shell, :inline => <<-SH
  	#Add build scripts here
  	cd /vagrant/build
  	source ./quick_build.sh
  SH

end

(The vagrant script can be tidied to hide keys by setting eg export DIGITAL_OCEAN_TOKEN="YOUR TOKEN HERE" from the command line you call vagrant from, and in the Vagrantfile setting provider.token = ENV['DIGITAL_OCEAN_TOKEN']).)

One of the nice things about the current version of vagrant is that you have to destroy a machine before launching another one of the same name with a different provisioners (though this looks set to change in forthcoming versions of vagrant). Why nice? Because the vagrant destroy command kills the node the machine is running on – so it won’t be left running and you won’t forget to turn it off (and won’t keep the meter running….)

Firing up the boxes on various hosts, go to port 3334 at the appropriate IP address and you should see OpenRefine running there…

Having failed to get the machine up and running on AWS, I though I’d try the simple route of packaging an AMI using Packer.

The build script was remarkably simple – once I got one that worked!

#awspacker.json

{
  "variables": {
    "aws_access_key": "",
    "aws_secret_key": ""
  },
  "builders": [{
    "type": "amazon-ebs",
    "access_key": "{{user `aws_access_key`}}",
    "secret_key": "{{user `aws_secret_key`}}",
    "region": "eu-west-1",
    "source_ami": "ami-971238f1",
    "instance_type": "t2.micro",
    "ssh_username": "ubuntu",
    "ami_name": "openrefine",
    "security_group_id": "OPTIONAL_YOUR_VAGRANT_GROUP"
  }],

  "provisioners": [

    {
      "destination": "/tmp/",
      "source": "./toupload/",
      "type": "file"
    },
    {
      "inline": [
        "cd /tmp && sudo apt-get update && sudo apt-get install unzip && sudo unzip /tmp/quickbuild.zip -d /tmp && sudo chmod ugo+x /tmp/quickbuild/quick_build.sh && sudo /tmp/quickbuild/quick_build.sh "
      ],
      "type": "shell"
    }
  ]

}

(The eu-west-2 (London) region wasn’t recognised by Packer for some reason…)

The machine can now be built on AWS and packaged as an AMI using Packer as follows (top level security tokens can be generated from the AWS Security Credentials console):

#Package the build files
mkdir -p toupload && zip -r toupload/quickbuild.zip quickbuild -x *.vagrant* -x *.DS_Store -x *.git* -x *.ipynb_checkpoints*

#Pack the machine
packer build -var 'aws_access_key=YOUR_KEY' -var 'aws_secret_key=YOUR_SECRET' awspacker.json

Launching an instance of this AMI, I found that I couldn’t connect to the OpenRefine port (it just hung). The fix was to amend the automatically created security group rules (which by default just allow ssh on port 22) with a a Custom TCP rule that allowed incoming traffic on port 3334 from All Domains.

Which meant success:

To simplify matters, I then copied this edited security group to my own “openrefine” security group that I could use as the basis of the AMI packaging.

Just one thing to note about creating an AMI – Amazon will start billing you for it… As the Packer Getting Started guide suggests:

After running the above example, your AWS account now has an AMI associated with it. AMIs are stored in S3 by Amazon, so unless you want to be charged about $0.01 per month, you’ll probably want to remove it. Remove the AMI by first deregistering it on the AWS AMI management page. Next, delete the associated snapshot on the AWS snapshot management page.

Next up, I need to try a full build of the TM351 VM on AWS (a full build without the Mongo shard activity (which I couldn’t get to work yesterday – though this looks like it could provide a handy helper script (and I maybe also need to work through this.) The fuller build seems fine from the vagrant script in Virtualbox, Digital Ocean and Linode.

After that (and fixing the Mongo sharding thing), I’ll see if I can weave the build scripts into a set of interconnected Docker containers, one Dockerfile per application and a docker-compose.yml to weave them together. (See the original test from way back when.)

And then there’ll just be the look-see to see whether we can get the machine built and running on a Raspberry Pi 3 model B.

I also started wondering about whether I should pop a simple Flask app into the VM on port 80, showing an OU splash screen and a “Welcome to TM351” message… If I can get that running, then we have a means of piping stuff into a web page on the students’ own machines that is completely out of the controlling hands of LTS:-)

PS for an example of how to set up authentication over these services, see: Simple Authenticated Access to VM Services Using NGINX and Vagrant Port Forwarding.

Running Docker Container Compositions in the Cloud on Digital Ocean

With TM351 about to start, I thought I’d revisit the docker container approach to delivering the services required by the course to see if I could get something akin to the course software running in the cloud.

Copies of the dockerfiles used to create the images can be found on Github, with prebuilt images available on dockerhub (https://hub.docker.com/r/psychemedia/ou-tm351-*).

A couple of handy ones out of the can are:

That said, at the current time, the images are not intended for use as part of the course

The following docker-compose.yml file will create a set of linked containers that resemble (ish!) the monolithic VM we distributed to students as a Virtualbox box.

dockerui:
    container_name: tm351-dockerui
    image: dockerui/dockerui
    ports:
        - "35183:9000"
    volumes:
        - /var/run/docker.sock:/var/run/docker.sock
    privileged: true

devmongodata:
    container_name: tm351-devmongodata
    command: echo mongodb_created
    #Share same layers as the container we want to link to?
    image: mongo:3.0.7
    volumes: 
        - /data/db

mongodb:
    container_name: tm351-mongodb
    image: mongo:3.0.7
    ports:
        - "27017:27017"
    volumes_from:
        - devmongodata
    command: --smallfiles

mongodb-seed:
    container_name: tm351-mongodb-seed
    image: psychemedia/ou-tm351-mongodb-simple-seed
    links:
        - mongodb

devpostgresdata:
    container_name: tm351-devpostgresdata
    command: echo created
    image: busybox
    volumes: 
        - /var/lib/postgresql/data
 
postgres:
    container_name: tm351-postgres
    environment:
        - POSTGRES_PASSWORD=PGPass
    image: psychemedia/ou-tm351-postgres
    ports:
        - "5432:5432"

openrefine:
    container_name: tm351-openrefine
    image: psychemedia/tm351-openrefine
    ports:
        - "35181:3333"
    privileged: true
    
notebook:
    container_name: tm351-notebook
    #build: ./tm351_scipystacknserver
    image: psychemedia/ou-tm351-pystack
    ports:
        - "35180:8888"
    links:
        - postgres:postgres
        - mongodb:mongodb
        - openrefine:openrefine
    privileged: true

Place a copy of the docker-compose.yml YAML file somewhere, from Kitematic, open the command line, cd into the directory containing the YAML file, and enter the command docker-compose up -d – the images are on dockerhub and should be downloaded automatically.

Refer back to Kitematic and you should see running containers – the settings panel for the notebooks container shows the address you can find the notebook server at.

kitematic

The notebooks and OpenRefine containers should also be linked to shared folders in the directory you ran the Docker Compose script from.

Running the Containers in the Cloud – Docker-Machine and Digital Ocean

As well as running the linked containers on my own machine, my real intention was to see how easy it would be to get them running in the cloud and using just the browser on my own computer to access them.

And it turns out to be really easy. The following example uses cloud host Digital Ocean.

To start with, you’ll need a Digital Ocean account with some credit in it and a Digital Ocean API token:

DigitalOcean_Control_Panel

(You may be able to get some Digital Ocean credit for free as part of the Github Education Student Developer Pack.)

Then it’s just a case of a few command line instructions to get things running using Docker Machine:

docker-machine ls
#kitematic usess: default

#Create a droplet on Digital Ocean
docker-machine create -d digitalocean --digitalocean-access-token YOUR_ACCESS_TOKEN --digitalocean-region lon1 --digitalocean-size 4gb ou-tm351-test 

#Check the IP address of the machine
docker-machine ip ou-tm351-test

#Display information about the machine
docker-machine env ou-tm351-test
#This returns necessary config details
#For example:
##export DOCKER_TLS_VERIFY="1"
##export DOCKER_HOST="tcp://IP_ADDRESS:2376"
##export DOCKER_CERT_PATH="/Users/YOUR-USER/.docker/machine/machines/ou-tm351-test"
##export DOCKER_MACHINE_NAME="ou-tm351-test"
# Run this command to configure your shell: 
# eval $(docker-machine env ou-tm351-test)

#Set the environment variables as recommended
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://IP_ADDRESS:2376"
export DOCKER_CERT_PATH="/Users/YOUR-USER/.docker/machine/machines/ou-tm351-test"

#Run command to set current docker-machine
eval "$(docker-machine env ou-tm351-test)"

#If the docker-compose.yml file is in .
docker-compose up -d
#This will launch the linked containers on Digital Ocean

#The notebooks should now be viewable at:
#http://IP_ADDRESS:35180

#OpenRefine should now be viewable at:
#http://IP_ADDRESS:35181

#To stop the machine
docker-machine stop ou-tm351-test
#To remove the Digital Ocean droplet (so you stop paying for it...
docker-machine rm ou-tm351-test

#Reset the current docker machine to the Kitematic machine
eval "$(docker-machine env default)"

So that’s a start. Issues still arise in terms of persisting state, such as the database contents, notebook files* and OpenRefine projects: if you leave the containers running on Digital Ocean to persist the state, the meter will keep running.

(* I should probably also build a container that demos how to bake a few example notebooks into a container running the notebook server and TM351 python distribution.)

A Peek Inside the TM351 VM

So this is how I currently think of the TM351 VM:

OU-VM-June2015Review_pptx

What would be nice would be a drag’n’drop tool to let me draw pictures like that that would then generate the build scripts… (a docker compose script, or set of puppter scripts, for the architectural bits on the left, and a Vagrantfile to set up the port forwarding, for example).

For docker, I wouldn’t have thought that would be too hard – a docker compose file could describe most of that picture, right? Not sure how fiddly it would be for a more traditional VM, though, depending on how it was put together?

Exporting and Distributing Docker Images and Data Container Contents

Although it was a beautiful day today, and I should really have spent it in the garden, or tinkering with F1 data, I lost the day to the screen and keyboard pondering various ways in which we might be able to use Kitematic to support course activities.

One thing I’ve had on pause for some time is the possibility of distributing docker images to students via a USB stick, and then loading them into Kitematic. To do this we need to get tarballs of the appropriate images so we could then distribute them.

docker save psychemedia/openrefine_ou:tm351d2test | gzip -c > test_openrefine_ou.tgz
docker save psychemedia/tm351_scipystacknserver:tm351d3test | gzip -c > test_ipynb.tgz
docker save psychemedia/dockerui_patch:tm351d2test | gzip -c > test_dockerui.tgz
docker save busybox:latest | gzip -c > test_busybox.tgz
docker save mongo:latest | gzip -c > test_mongo.tgz
docker save postgres:latest | gzip -c > test_postgres.tgz

On the to do list is getting to these to with the portable Kitematic branch (I’m not sure if that branch will continue, or whether the interest is too niche?!), but in the meantime, I could load it into the Kitematic VM from the Kitematice CLI using:

docker load < test_mongo.tgz

assuming the test_mongo.tgz file is in the current working directory.

Another I need to explore is how to get the set up the data volume containers on the students’ machine.

The current virtual machine build scripts aim to seed the databases from raw data, but to set up the student machines it would seem more sensible to either rebuild a database from a backup, or just load in a copy of the seeded data volume container. (All the while we have to be mindful of providing a route for the students to recreate the original, as distributed, setup, just in case things go wrong. At the same time, we also need to start thing about backup strategies for the students so they can checkpoint their own work…)

The traditional backup and restore route for PostgreSQL seems to be something like the following:

#Use docker exec to run a postgres export
docker exec -t vagrant_devpostgres_1 pg_dumpall -Upostgres -c &gt; dump_`date +%d-%m-%Y"_"%H_%M_%S`.sql
#If it's a large file, maybe worth zipping: pg_dump dbname | gzip > filename.gz

#The restore route would presumably be something like:
cat postgres_dump.sql | docker exec -i vagrant_devpostgres_1 psql -Upostgres
#For the compressed backup: cat postgres_dump.gz | gunzip | psql -Upostgres

For mongo, things seem to be a little bit more complicated. Something like:

docker exec -t vagrant_mongo_1 mongodump

#Complementary restore command is: mongorestore

would generate a dump in the container, but then we’d have to tar it and get it out? Something like these mongodump containers may be easier? (mongo seems to have issues with mounting data containers on host, on a Mac at least?

By the by, if you need to get into a container within a Vagrant launched VM (I use vagrant with vagrant-docker-compose), the following shows how:

#If you need to get into a container:
vagrant ssh
#Then in the VM:
  docker exec -it CONTAINERNAME bash

Another way of getting to the data is to export the contents of the seeded data volume containers from the build machine. For example:

#  Export data from a data volume container that is linked to a database server

#postgres
docker run --volumes-from vagrant_devpostgres_1 -v $(pwd):/backup busybox tar cvf /backup/postgresbackup.tar /var/lib/postgresql/data 

#I wonder if these should be run with --rm to dispose of the temporary container once run?

#mongo - BUT SEE CAVEAT BELOW
docker run --volumes-from vagrant_mongo_1 -v $(pwd):/backup busybox tar cvf /backup/mongobackup.tar /data/db

We can then take the tar file, distribute it to students, and use it to seed a data volume container.

Again, from the Kitematic command line, I can run something like the following to create a couple of data volume containers:

#Create a data volume container
docker create -v /var/lib/postgresql/data --name devpostgresdata busybox true
#Restore the contents
docker run --volumes-from devpostgresdata -v $(pwd):/backup ubuntu sh -c "tar xvf /backup/postgresbackup.tar"
#Note - the docker helpfiles don't show how to use sh -c - which appears to be required...
#Again, I wonder whether this should be run with --rm somewhere to minimise clutter?

Unfortunately, things don’t seem to run so smoothly with mongo?

#Unfortunately, when trying to run a mongo server against a data volume container
#the presence of a mongod.lock seems to break things
#We probably shouldn't do this, but if the database has settled down and completed
#  all its writes, it should be okay?!
docker run --volumes-from vagrant_mongo_1 -v $(pwd):/backup busybox tar cvf /backup/mongobackup.tar /data/db --exclude=*mongod.lock
#This generates a copy of the distributable file without the lock...

#Here's an example of the reconstitution from the distributable file for mongo
docker create -v /data/db --name devmongodata busybox true
docker run --volumes-from devmongodata -v $(pwd):/backup ubuntu sh -c "tar xvf /backup/mongobackup.tar"

(If I’m doing something wrong wrt the getting the mongo data out of the container, please let me know… I wonder as well with the cavalier way I treat the lock file whether the mongo container should be started up in repair mode?!)

If have a docker-compose.yml file in the working directory like the following:

mongo:
  image: mongo
  ports:
    - "27017:27017"
  volumes_from:
    - devmongodata

##We DO NOT need to declare the data volume here
#We have already created it
#Also, if we leave it in, a "docker-compose rm" command
#will destroy the data volume container...
#...which means we wouldn't persist the data in it
#devmongodata:
#    command: echo created
#    image: busybox
#    volumes: 
#        - /data/db

We can the run docker-compose up and it should fire up a mongo container and link it to the seeded data volume container, making the data contains in that data volume container available to us.

I’ve popped some test files here. Download and unzip, from the Kitematic CLI cd into the unzipped dir, create and populate the data containers as above, then run: docker-compose up

You should be presented with some application containers including OpenRefine and an OU customised IPython notebook server. You’ll need to mount the IPython notebooks folder onto the unzipped folder. The example notebook (if everything works!) should show demonstrate calls to prepopulated mongo and postgres databases.

Hopefully!