AutoStarting A Headless OpenRefine Server in MyBinder Using Repo2Docker and a start Config File

When I first started using MyBinder in 2015, it came with the option of autostarting selectable services — PostgreSQL and a Spark server — within the container along with the Jupyter notebook server (early review). I’m not sure when it disappeared (the Github repo commit history should show it, if anyone’s feeling forensic investigative and wants to let me know via a comment) but for some time I’ve been wondering how to start my own services, such as a database server, or OpenRefine server, in a Binderised container.

I guess I shoulda read the docs again…

…although the pace of change with: a) documented features; b) undocumented features, means it can be hard to keep up with what’s possible in the Jupyterverse (I’m nowhere close to cracking that with the TrackingJupyter in its current formulation…).

So via this issue, some handy leads…

repo2docker start

Binderised containers are built using repo2docker. A new-to-me config file for repo2docker is the start file, “a script that can contain simple commands to be run at runtime. If you want this to be a shell script, make sure the first line is #!/bin/bash. The last line must be exec "$@" equivalent”.

So for example, if we want to autorun a headless OpenRefine instance in MyBinder that I can access via a notebook using an OpenRefine Python client, (see an example notebook here), we can just add the following start file to the repo:

#!/bin/bash

#Start OpenRefine
OPENREFINE_DIR="$HOME/openrefine"
mkdir -p $OPENREFINE_DIR
nohup openrefine-2.8/refine -p 3333 -d OPENREFINE_DIR > /dev/null 2>&1 &

#Do the normal Binder start thing here...
exec "$@" 

Demo here: Binder

PS thanks as ever to the ever helpful Jupyter devs for putting up with my “issues”…

PPS I got stuck trying to generalise the running auto-starting, arbitrary services thing any further. You can see the state of my ignorance here. As is often the case, “permissions” makes me realise I don’t really understand how any of this stuff actually works. That’s why I tend to work: a) locally, b) as root…!

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...