As a cloud host, Digital Ocean provides a really easy way in to getting services up and running on the web.
Here’s a quick recipe for getting Open Refine up and running behind a simple authentication scheme.
Creating a Server With Digital Ocean
First up, create a Digital Ocean account if you don’t already have one (this link will get you started with $100 free credit).
Creating and launching a server is easy… Select Create
then Droplet
, choose the server type you want — let’s use a simple Ubuntu box — and choose the server size. For lots of quick tests, I use the smallest box, but from experience I think OpenRefine prefers MORE THAN 2GB or more…
Really… If you pick a 2GB server, you may find that OpenRefine hangs on start and ruins your day when you end up trying to debug what you think other other problems…. Be warned, kids… stay unfrustrated out there…
The servers are charged at a metered rate, and you can stop them any time, so for a quick test, it’ll cost you pennies… (A $100 credit can go a long way…!)
Next, choose a data center region; I generally pick a local one…
You also have the option of adding an ssh key. This makes life much easier when trying to log in to the server from your own machine using ssh (you can just run ssh root@IP_ADDRESS
and it’ll log you straight in; there’s a recipe for setting up an SSH key here).
If you don’t want to set up an SSH, a root password will be emailed to you. You can use this password to log in to your server via a web terminal, which means you can do everything via a web UI if you need to…
Create your server by clicking the big green button…
It should only take a few seconds to start up… And when it has, you’ll be presented with it’s public IP address.
If you need a web terminal, click through the on the server name, and you should see a link to launch a web console.
Installing OpenRefine
From the console, we can install all we need to run OpenRefine. This is a minimum viable example — we should probably find a better place to install OpenRefine, and may want to run it as a particular user with limited permissions. Working as root with everything wide open makes life easier, though not necessarily safer…!
OpenRefine requires a Java environment, so we need to install that:
apt-get update && apt-get install -y openjdk-8-jre
We can download the OpenRefine application via the command-line using a command of the form wget -q -O DOWNLOADED_FILE_NAME URL
; the download link for each release can be found on the OpenRefine releases page:
wget -q -O openrefine-2.8.tar.gz https://github.com/OpenRefine/OpenRefine/releases/download/2.8/openrefine-linux-2.8.tar.gz
The downloaded file is provided as a tar
archive file, which we need to unpack:
tar xzf openrefine-2.8.tar.gz
This will unbundle the files into the directory ./openrefine-2.8
.
Let’s create an alias for a working directory in which to place the OpenRefine project files:
OPENREFINE_DIR="$HOME/openrefine"
And create that directory:
mkdir -p $OPENREFINE_DIR
You should now be able to run OpenRefine in the background using the command:
nohup openrefine-2.8/refine -p 3333 -d OPENREFINE_DIR > /dev/null 2>&1 &
This will run OpenRefine on port 3333. If you copy the IP address of your Digital Ocean server, and go to http://IP.ADDRESS:3333
, you should see OpenRefine running there. (Note that if you’re in Chrome, Google may well tell you that the address is dangerous…)
Adding Simple Authentication
The OpenRefine server is running as a public service on the public web. If you want to add a simple layer of authentication, you can add a web server proxy to the server that will prompt for a password when a new visit is made to the server.
One of the easiest proxies to get up and running is nginx
. Let’s install it, along with a simple Apache toolkit that will help us create a simple password:
apt-get install -y nginx apache2-utils
Now create a simple user/password combination. Ever secure, I’ll go with user test
and password letmein
:
htpasswd -b -c /etc/nginx/.htpasswd test letmein
Now we need to define the proxy. A Digital Ocean tutorial (How To Install Nginx on Ubuntu 18.04) describes how to set up a firewall – I’m selecting the 'Nginx Full'
option because I’m working via SSH, but if you’re working in the web terminal, the more restrictive 'Nginx HTTP'
may be more appropriate:
sudo ufw allow 'Nginx Full'
If you try loading the OpenRefine server on port 3333, you should now find that it’s blocked: the firewall is only letting web traffic through on port 80.
We now need to open access back up to the OpenRefine server, albeit via a password challenge. The following will create a default nginx
configuration file that will expose the OpenRefine service running on port 3333 via default http port 80, mediated by a simple authorisation challenge:
config=''' server { listen 80; auth_basic Protected...; auth_basic_user_file /etc/nginx/.htpasswd; location / { proxy_pass http://127.0.0.1:3333; } } ''' echo "$config" > /etc/nginx/sites-available/default
Restart the nginx
proxy to put the new settings into effect:
nginx -s reload
If you now go to http://IPADDRESS
you should be presented with a challenge. Enter the credentials you defined, and you should see your OpenRefine server:
Finishing Up
When you’ve finished your session, you can destroy the droplet. This will tear the server down and you won’t be billed for it anymore.
Alternatively, you can switch the droplet off, but keep it in a shutdown state that you can restart in the future.
However, as the above prompt suggests, you will continue to be billed, even if the service is not running, because it is still consuming Digital Ocean resources.