Running OpenRefine in the Clear on Digital Ocean

In a couple of earlier posts, I’ve described how to get OpenRefine up and running remotely over the web by installing the OpenRefine server onto a Digital Ocean Linux server and running it there behind a simple authenticating proxy(Running OpenRefine On Digital Ocean Using Simple Auth and the more automated Authenticated OpenRefine Server on Digital Ocean, Redux).

In this post I’ll show how to set up a simple OpenRefine server, without authentication, using Docker (I’ll show how to add in the authenicating nginx proxy in a follow on post).

Docker is a virtualisation technology that heavily draws on the idea of “containers”, isolated computational environments that provide just enough operating system to run a particular application within them.

As well as hosting raw Linux servers, Digital Ocean also provides Linux-servers-with-docker as a one-click application.

Here’s how to start a docker machine on Digital Ocean.

Creating a Digital Ocean Docker Droplet

First up, create a new droplet as a one-click app, selecting docker as the one-click application type:

To give ourselves some space to work with, I’m going to choose the 3GB server (it may work with default settings in a 2GB server, or it may ruin your day…). It’s metered by the hour, but it’ll still only cost a few pennies for a quick demo. (You can also get $100 free credit as a new user if you sign up here.)

DigitalOcean_-_Create_Droplets_

Select a data center region (I typically go for a local one):

If you want to, add your SSH key (recipe here, but it’s not really necessary: the ssh key just makes it easier for you to login to the server from your own computer if you need to. If you haven’t heard of ssh keys before, ignore this step!)

Hit the big green button to create your droplet (if you want to, give the sever a nicer hostname first…).

Accessing the Digital Ocean Droplet Server Terminal

Your one-click docker server will now start up. Once its there (it should take less than a minute) click through to its admin page. Assuming you haven’t added ssh keys, you’ll need to log in through the console. The login details for your server should have been emailed to the email address associated with your Digital Ocean account. Use them to login.

On first login, you’ll be prompted to change the password (it was emailed to you in plain text after all!)

If you choose a really simple replacement password, you may need to choose another one. Also note that the (current) UNIX password was the one you were emailed, so you’ll essentially be providing this password twice in quick succession (once for the first login, then again to authorise the enforced password change). Copy and pasting the password into the console from your email should work…

Once you’ve changed your password, you’ll be logged out and you’ll have to log back in again with your new password. (Isn’t security a faff?! That’s why ssh keys…!;-)

Now you get to install and launch OpenRefine. I’ve got an example image here, and the recipe for creating it here, but you don’t need really need to look at that if you trust me…

What you do need to do is run:

docker run -d -p 3333:3333 --name openrefine psychemedia/openrefinedemo

What this command does is download and run the container psychemedia/openrefinedemo, naming it (purely for our convenience) as openrefine.

You can learn how to create an OpenRefine docker image here: How to Create a Simple Dockerfile for Building an OpenRefine Docker Image.

The -d flags runs the container in “detached”, standalone mode (in the background, essentially). The -p 3333:3333 is read as -p PUBLICPORT:INTERNALPORT. The OpenRefine server is started on INTERNALPORT=3333 and we’re also going to view it on a URL port 3333.

The container will take a few seconds to download if this is the first time you’ve called for it:

and then it’ll a print out a long id number once it’s launched and running the background.

(You can check it’s running by running the command docker ps.)

In both the terminal and the droplet admin pages, as well as the droplet status line in the current droplet listing pages, you should see the public IP address associated with the droplet. Copy that address into your browser and add the port mapping (:3333). You should now be able to see a running version of OpenRefine. (And so should anyone else who wanders by that URL:PORT combination…)

Let’s now move the application to another port. We could do this by launching another container, with a new unique name (container names, when we assign them, need to be unique) and assigned to another port. (The OpenRefine internal service port will remain the same). For example:

docker run -d -p 3334:3333 --name openrefine2 psychemedia/openrefinedemo

This creates a new container running a fresh instance of the OpenRefine server. You should see it on IPADDRESS:3334.

(Alternatively we can omit the name and a random one will be assigned, for example, docker run -d -p 3335:3333)

Note that the docker image does not need to be downloaded again. We simply reuse the one we downloaded previously, and spawn a new instance of it as a new container.

Each container does take up memory though, so kill the original container:

docker kill openrefine

and remove it:

docker rm openrefine.

For a last quick demo, let’s create a new instance of the contain, once again called openrefine (assuming we’ve removed the one previously called that) and run it on port 80, the default http port, which means we should be able to see it directly by going to just the IPADDRESS (with no port specified) in our browser:

docker run -d -p 80:3333 --name openrefine psychemedia/openrefinedemo

When you’re done, you can halt the droplet (in which case, you’ll keep on paying rent for it) or destroy it (which means you won’t be billed for any additional hours, or parts thereof, on top of the time you’ve already been running the droplet):

You don’t need to tidy up around the docker containers, they’ll die with the droplet.

So, not all that hard, is it? Probably a darn sight easier than trying to get anything out of your local IT unit?!

In the next post, I’ll show how to combine the container with another one containing nginx to provide some simple authentication. (There are lots of prebuilt containers out there we can just take “off-the-shelf”, and nginx is one of them.) I’ll maybe also have a look at how you might persist projects in hibernating container / droplet, perhaps look at how we might be able to upload files that OpenRefine can work on, and maybe even try to figure out a way to simply synch your project files from Digital Ocean to your own file storage location somewhere. Maybe…

PS third party nginx proxy example: https://github.com/beevelop/docker-nginx-basic-auth

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

3 thoughts on “Running OpenRefine in the Clear on Digital Ocean”

    1. Thanks Stephen. Good to know. As I suspect you know all too well, it’s easy to miss a step in the form of a “just do this” comment which actually unpacks a bit further and stymies someone who doesn’t know how to “just”…

Comments are closed.

%d bloggers like this: