Python packages such as the Jupyter Server Proxy allow you to use a Jupyter notebook server as a proxy for other services running in the same environment, such as a MyBinder container.
The jupyter-server-proxy
package represents a generalisation of earlier demonstrations that showed how to proxy RStudio and OpenRefine in a MyBinder container.
In this post, I’ll serve up several ways of getting OpenRefine running in a MyBinder container. You can find all the examples in branches off psychemedia/jupyterserverproxy-openrefine.
Early original work on getting OpenRefine running in MyBinder was done by @betatim (betatim/openrefineder) using an earlier package, nbserverproxy
; @yuvipanda helped me get my head round various bits of jupyterhub/jupyter-server-proxy/ which is key to proxying web services via Jupyter (jupyter-server-proxy
docs). @manics provided the jupyter-server-proxy PR for handling predefined, rather than allocated, port mappings, which also made life much easier…
Common Installation Requirements
The following steps are pretty much common to all the recipes, and are responsibly for installing OpenRefine and its dependencies.
First, in a binder/apt.txt
file, the Java dependency:
openjdk-8-jre
A binder/postBuild
step to install a specific version of OpenRefine:
#!/bin/bash set -e VERSION=2.8 wget -q -O openrefine-$VERSION.tar.gz https://github.com/OpenRefine/OpenRefine/releases/download/$VERSION/openrefine-linux-$VERSION.tar.gz mkdir -p $HOME/.openrefine tar xzf openrefine-$VERSION.tar.gz -C $HOME/.openrefine rm openrefine-$VERSION.tar.gz mkdir -p $HOME/openrefine
A binder/requirements.txt
file to install the Python OpenRefine API client:
git+https://github.com/dbutlerdb/refine-client-py
Note that this a fork of the original client that supports Python 3. It works with OpenRefine 2.8 but I’m not sure if it works properly with OpenRefine 3. There are multiple forks of the client and from what I can tell they are differently broken. It would be great of OpenRefine repo took on one fork as the official client that everyone could contribute to.
start
definition – Autostarting headless OpenRefine Server
This start
branch (repo) demonstrates:
- using a
binder/start
file to auto start OpenRefine; - a notebook/client demo; this essentially runs in a headless mode.
The binder/start
file extend the MyBinder start CMD to run the commands included in the file in addition to the default command to start the Jupyter notebook server. (The binder/start
file can also be used to run things like the setting of environment variables. I’m not sure how to make available an environment variable defined in binder/postBuild
inside binder/start
?)
#!/bin/bash #Start OpenRefine OPENREFINE_DIR="$HOME/openrefine" mkdir -p $OPENREFINE_DIR nohup $HOME/.openrefine/openrefine-2.8/refine -p 3333 -d OPENREFINE_DIR > /dev/null 2>&1 & exec "$@"
In this demo, you won’t be able to see the OpenRefine GUI using this demo. Instead, you can access it via its API using an OpenRefine python client. An included notebook gives a worked example (note that at the moment you can’t run the first few parts of the demo because they assume the presence of a pre-existing OpenRefine project. Instructions appear further down the notebook for creating a project and working with it using the API client; I’ll do a separate post on the OpenRefine Python client at some point…)
simpleproxy
definition
The simpleproxy
branch (repo) extends the start branch
with a proxy that can be used to render the OpenRefine GUI.
The binder/requirements.txt
needs an additional package — the jupyter-server-proxy
package. (I’m using the repo version because at the time of writing the PyPi released version doesn’t include all the features we need…)
git+https://github.com/dbutlerdb/refine-client-py git+https://github.com/jupyterhub/jupyter-server-proxy
If you launch the Binder, it uses the serverproxy to proxy the OpenRefine port to proxy/3333/
; note that the trailing slash is important. Without it, the static files (CSS etc) required to render the page are not resolved correctly.
traitlet-nolab
definition
The traitlet-nolab
branch (repo) uses the traitlet method (docs) to add a menu option to the Jupyter notebook homepage that allows OpenRefine to be started and launched from the notebook home New menu.
OpenRefine will also be started automatically if you start the MyBinder container with ?urlpath=openrefine
or navigate directly to http://MYBINDERURL/openrefine
.
Start on Jupyter notebook homepage:
In this case, the binder/start
invocation is not required.
Once started, OpenRefine will appear on a named proxy path, openrefine
(the slash may be omitted in this case).
The traitlet is defined in a jupyter_notebook_config.py
file:
# Traitlet configuration file for jupyter-notebook. c.ServerProxy.servers = { 'openrefine': { 'command': ['/home/jovyan/.openrefine/openrefine-2.8/refine', '-p', '{port}','-d','/home/jovyan/openrefine'], 'port': 3333, 'timeout': 120, 'launcher_entry': { 'title': 'OpenRefine' }, }, }
This is copied into the correct location by an additional binder/postBuild
step:
mkdir -p $HOME/.jupyter/ #Although located in binder/, # this bash file runs in $HOME rather than $HOME/binder mv jupyter_notebook_config.py $HOME/.jupyter/
The traitlet definition file is loaded in as a notebook server configuration file prior to starting the notebook server.
Note that the definition file uses the port: 3333
attribute to explicitly set the port that the server will be served against. If this is omitted, then a port will be dynamically allocated by the proxy server. In the case of OpenRefine, I am defining a port explicitly so that the Python API client can connect to it directly on the assumed default port 3333.
Note that if we try to use the Python client without starting the OpenRefine server by launching it, the connection will fail because there will be no running OpenRefine server for the client to connect to.
Python package setup
definition
The setup
branch (repo) demonstrates:
- using serverproxy (setup definition (docs)) to add an OpenRefine menu option to the notebook start menu. The configuration uses a fixed port assignment once again so that we can work with the client package using default port settings.
Start in Jupyter notebook homepage:
For this build, we go back to the base setup (no binder/start
, not traitlet definition files) and add a setup.py
file:
import setuptools setuptools.setup( name="jupyter-openrefine-server", # py_modules rather than packages, since we only have 1 file py_modules=['openrefine'], entry_points={ 'jupyter_serverproxy_servers': [ # name = packagename:function_name 'openrefine = openrefine:setup_openrefine', ] }, )
This calls on an openrefine.py
file to define the configuration:
import os def setup_openrefine(): path = os.path.join(os.environ['HOME'], 'openrefine') return { 'command': ['$HOME/.openrefine/openrefine-2.8/refine', '-p', '{port}','-d',path], 'port': 3333, 'launcher_entry': { 'title': 'OpenRefine', }, }
As before, an OpenRefine option is added to the start menu and can be used to start the OpenRefine server and launch the UI client on the path openrefine
. (As we started the server on a known port, can also find it explictly at proxy/3333
.)
Calling the aliased URL directly will also start the server. This means we can tell MyBinder to open on the openrefine
path (or add ?urlpath=openrefine
to the Binder URL) and the container will open into the OpenRefine application.
Once again, we need to launch the OpenRefine app before we can connect to it from the Python client.
master
branch – traitlet definition, Notebook and JupyterLab Support
The master
branch (repo) builds on the traitlet
definition branch and demonstrates:
- using serverproxy (traitlet definition) to add an OpenRefine menu option to the notebook start menu. The configuration uses a fixed port assigment so that we can work with the client package.
- a button is also enabled and added to the JupyterLab launcher.
OpenRefine can now be started and launched from the notebook homepage New menu or from the JupyterLab launcher, via a ?urlpath=openrefine
MyBinder luanch invocation, or by navigating directly to the proxied path openrefine
.
In this case, we need to enable the JupyterLab extension with the following addition to the binder/postBuild
file:
#Enable the OpenRefine icon in JuptyerLab desktop launcher jupyter labextension install jupyterlab-server-proxy
This will enable a default start button in the JupyterLab launcher.
We can also provide an icon for the start button. Further modify the binder/postBuild
file to copy the logo to a desired location:
#Although located in binder/, mv open-refine-logo.svg $HOME/.jupyter/
and modify the jupyter_notebook_config.py
by with the addition of a path to the start logo, also ensuring that the launcher entry is enabled:
# Traitlet configuration file for jupyter-notebook. c.ServerProxy.servers = { 'openrefine': { 'command': ['/home/jovyan/.openrefine/openrefine-2.8/refine', '-p', '{port}','-d','/home/jovyan/openrefine'], 'port': 3333, 'timeout': 120, 'launcher_entry': { 'enabled': True, 'icon_path': '/home/jovyan/.jupyter/open-refine-logo.svg', 'title': 'OpenRefine', }, }, }
We should now see a start button for OpenRefine in the JupyterLab launcher.
Clicking on the button will autostart the server an open a browser tab onto the OpenRefine application GUI.
Summary
Running OpenRefine in MyBinder using JupyterServerProxy allows us to use OpenRefine as part of a shareable, on demand, serverless, Jupyter mediated workbench, defined via a public Github repository.
As well as being access as a GUI application, and via the Python API client, OpenRefine can also connect to a PostgreSQL server running inside the MyBinder container. For running PostgreSQL inside MyBinder, see Running a PostgreSQL Server in a MyBinder Container; for connecting OpenRefine to a Postgres server running in the same MyBinder container, see OpenRefine Database Connections in MyBinder.