…aka “how to run OpenRefine in the cloud in just a couple of clicks and for a few pennies an hour”…
Tutum cloud is now Docker Cloud. The Docker Cloud UI is the same as the tutum UI (because it is the tutum UI), so the following instructions should still work…
I managed to get my first container up and running in the cloud today (yeah!:-), using tutum to launch a container I’d defined on Dockerhub and run it on a linked DigitalOcean server (or as they call them, “droplet”).
This sort of thing is probably a “so what?” to many devs, or even folk who do the self-hosting thing, where for example you can launch your own web applications using CPanel, setting up your own WordPress site, perhaps, or an online database.
The difference for me is that the instance of OpenRefine I got up and running in the cloud via a web browser was the result of composing several different, loosely coupled services together:
- I’d already published a container on dockerhub that launches the latest release version of OpenRefine: psychemedia/docker-openrefine. This lets me run OpenRefine in a boot2docker virtual machine running on my own desktop and access it through a browser on the same computer.
- Digital Ocean is a cloud hosting service with simple billing (I looked at Amazon AWS but it was just too complicated) that lets you launch cloud hosted virtual machines of a variety of sizes and in a variety of territories (including the UK). Billing is per hour with a monthly cap with different rates for different machine specs. To get started, you need to register an account and make a small ($5 or so) downpayment using Paypal or a credit card. So that’s all I did there – created an account and made a small payment. [Affiliate Link: sign up to Digital Ocean and get $10 credit]
- tutum an intermediary service that makes it easy to launch servers and containers running inside them. By linking a DigitalOcean account to tutum, I can launch containers on DigitalOcean in a relatively straightforward way…
Launching OpenRefine via tutum
I’m going to start by launching a 2GB machine which comes in a 3 cents an hour, capped at $20 a month.
Now we need to get a container – which I’m thinking of as if it was a personal app, or personal app server:
I’m going to make use of a public container image – here’s one I prepared earlier…
We need to do a tiny bit of configuration. Specifically, all I need to do is ensure that I make the port public so I can connect to it; by default, it will be assigned to a random port in a particular range on the publicly viewable service. I can also set the service name, but for now I’ll leave the default.
If I create and deploy the container, the image will be pulled from dockerhub and a container launched based on it that I should be able to access via a public URL:
If the URL you copy or click on starts with tcp:// change it to http://
The first time I pull the container into a specific machine it takes a little time to set up as the container files are imported into the machine. If I create another container using the same image (another OpenRefine instance, for example), it should start really quickly because all the required files have already been loaded into the node.
Unfortunately, when I go through to the corresponding URL, there’s nothing there. Looking at the logs, I think maybe there wasn’t enough memory to launch a second OpenRefine container… (I could test this by launching a second droplet/server with more memory, and then deploying a couple of containers to that one.)
The billing is calculated on DigitalOcean on a hourly rate, based on the number and size of servers running. To stop racking up charges, you can terminate the server/droplet (so you also lose the containers).
Note than in the case of OpenRefine, we could allow several users all to access the same OpenRefine container (the same URL) and just run different projects within them.
Although this is probably not the way that dev ops folk think of containers, I’m seeing them as a great way of packaging service based applications that I might one to run at a personal level, or perhaps in a teaching/training context, maybe on a self-service basis, maybe on a teacher self-service basis (fire up one application server that everyone in a cohort can log on to, or one container/application server for each of them; I noticed that I could automatically launch as many containers as I wanted – a 64GB 20 core processor costs about $1 per hour on Digital Ocean, so for an all day School of Data training session, for example, with 15-20 participants, that would be about $10, with everyone in such a class of 20 having their own OpenRefine container/server, all started with the same single click? Alternatively, we could fire up separate droplet servers, one per participant, each running its own set of containers? That might be harder to initialise though (i.e. more than one or two clicks?!) Or maybe not?)
One thing I haven’t explored yet is mounting data containers/volumes to link to application containers. This makes sense in a data teaching context because it cuts down on bandwidth. If folk are going to work on the same 1GB file, it makes sense to just load it in to the virtual machine once, then let all the containers synch from that local copy, rather than each container having to download its own copy of the file.
The advantage of the approach described in the walkthrough above over “pre-configured” self-hosting solutions is the extensibility of the range of applications available to me. If I can find – or create – a Dockerfile that will configure a container to run a particular application, I can test it on my local machine (using boot2docker, for example) and then deploy a public version in the cloud, at an affordable rate, in just a couple of steps.
Whilst templated configurations using things like fig or panamax which would support the 1-click launch of multiple linked containers configurations aren’t supported by tutum yet, I believe they are in the timeline… So I look forward to trying out a click cloud version of Using Docker to Build Linked Container Course VMs when that comes onstream:-)
In an institutional setting, I can easily imagine a local docker registry that hosts images for apps that are “approved” within the institution, or perhaps tagged as relevant to particular courses. I don’t know if it’s similarly possible to run your own panamax configuration registry, as opposed to pushing a public panamax template for example, but I could imagine that being useful institutionally too? For example, I could put container images on a dockerhub style OU teaching hub or OU research hub, and container or toolchain configurations that pull from those on a panamax style course template register, or research team/project reregister? To front this, something like tutum, though with an even easier interface to allow me to fire up machines and tear them down?
Just by the by, I think part of the capital funding the OU got recently from HEFCE was slated for a teaching related institutional “cloud”, so if that’s the case, it would be great to have a play around trying to set up a simple self-service personal app runner thing ?;-) That said, I think the pitch in that bid probably had the forthcoming TM352 Web, Mobile and Cloud course in mind (2016? 2017??), though from what I can tell I’m about as persona non grata as possible with respect to even being allowed to talk to anyone about that course!;-)