Running a PostgreSQL Server in a MyBinder Container

The original MyBinder service used to run an optional PostgreSQL DBMS alongside the Jupyter notebook service inside a Binder container (my original review).

But if you want to run a Postgres database in the same MyBinder environment nowadays, you need to add it in yourself.

Here are some recipes with different pros and cons. As @manics comments here, “[m]ost distributions package postgres to be run as a system service, so the user permissions are locked down.”, which means that you can’t run Postgres as an arbitrary user. The best approach is probably the last one, which uses an Anaconda packaged version of Postgres that has a more liberal attitude…

Recipe the First – Hacking Permissions

I picked up this approach from dchud/datamanagement-notebook/ based around Docker. It gets around the problem that the Postgres Linux package requires a particular user (postgres) or an alternative user with root permissions to start and stop the server.

Use a Dockerfile to install postgres and create a simple database test user, as well as escalating default user notebook jovyan to sudoers (along with the password redspot). The jovyan user can then start / stop the Postgres server via an appropriate entrypoint script.

USER root

RUN chown -R postgres:postgres /var/run/postgresql
RUN echo "jovyan ALL=(ALL)   ALL" >> /etc/sudoers
RUN echo "jovyan:redspot" | chpasswd

COPY ./entrypoint.sh /
RUN chmod +x /entrypoint.sh

USER $NB_USER
ENTRYPOINT ["/entrypoint.sh"]

The entrypoint.sh script will start the Postgres server and then continue with any other start-up actions required to start the Jupyter notebook server install by repo2docker/MyBinder by default:

#!/bin/bash
set -e

echo redspot | sudo -S service postgresql start

exec "$@"

Try it on MyBinder from here.

A major issue with this approach is that you may not want jovyan, or another user, to have root privileges.

Recipe The Second – Hacking Fewer Permissions

The second example comes from @manics/@crucifixkiss and is based on manics/omero-server-jupyter.

In this approach, which also uses a Dockerfile, we again escalate the privileges of the jovyan user, although this time in a more controlled way:

USER root

#The trick in this Dockerfile is to change the ownership of /run/postgresql
RUN  apt-get update && \
    apt-get install -qq -y \
        postgresql postgresql-client && apt-get clean && \
    chown jovyan /run/postgresql/

COPY ./entrypoint.sh  /
RUN chmod +x /entrypoint.sh

In this case, the entrypoint.sh script doesn’t require any tampering with sudo:

#!/bin/bash
set -e

PGDATA=${PGDATA:-/home/jovyan/srv/pgsql}

if [ ! -d "$PGDATA" ]; then
  /usr/lib/postgresql/10/bin/initdb -D "$PGDATA" --auth-host=md5 --encoding=UTF8
fi
/usr/lib/postgresql/10/bin/pg_ctl -D "$PGDATA" status || /usr/lib/postgresql/10/bin/pg_ctl -D "$PGDATA" -l "$PGDATA/pg.log" start

psql postgres -c "CREATE USER testuser PASSWORD 'testpass'"
createdb -O testuser testdb

exec "$@"

You can try it on MyBinder from here.

Recipe the Third – An Alternative Distribution

The third approach is again via @manics and uses an Anaconda packaged version of Postgres, installing the postgresql package via an environment.yml file.

A postbuild step initialises everything and pulls in a script to set up a dummy user and database.

#!/bin/bash
set -eux

#Make sure that everything is initialised properly
PGDATA=${PGDATA:-/home/jovyan/srv/pgsql}
if [ ! -d "$PGDATA" ]; then
  initdb -D "$PGDATA" --auth-host=md5 --encoding=UTF8
fi

#Start the database during the build process
# so that we can seed it with users, a dummy seeded db, etc
pg_ctl -D "$PGDATA" -l "$PGDATA/pg.log" start

#Call a script to create a dummy user and seeded dummy db
#Make sure that the script is executable...
chmod +x $HOME/init_db.sh
$HOME/init_db.sh

For example, here’s a simple init_db.sh script:

#!/bin/bash
set -eux

THISDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

#Demo PostgreSQL Database initialisation
psql postgres -c "CREATE USER testuser PASSWORD 'testpass'"

#The -O flag below sets the user: createdb -O DBUSER DBNAME
createdb -O testuser testdb

psql -d testdb -U testuser -f $THISDIR/seed_db.sql

which in turn pulls in a simple .sql file to seed the dummy database:

-- Demo PostgreSQL Database initialisation

DROP TABLE IF EXISTS quickdemo CASCADE;
CREATE TABLE quickdemo(id INT, name VARCHAR(20), value INT);
INSERT INTO quickdemo VALUES(1,'This',12);
INSERT INTO quickdemo VALUES(2,'That',345);

Picking up on the recipe described in an earlier post (AutoStarting A Headless OpenRefine Server in MyBinder Using Repo2Docker and a start Config File), the database is autostarted using a start file:

#!/bin/bash
set -eux
PGDATA=${PGDATA:-/home/jovyan/srv/pgsql}
pg_ctl -D "$PGDATA" -l "$PGDATA/pg.log" start

exec "$@"

In a Jupyter notebook, we can connect to the database in several ways.

For example, we can connect directly using the the psycopg2 package:

import psycopg2

conn = psycopg2.connect("dbname='postgres'")
cur = conn.cursor()
cur.execute("SELECT datname from pg_database")

cur.fetchall()

Alternatively we can connect using something like ipython-sql magic, using a connection string that attaches us using a passwordless connection string as the default (jovyan) user and default connection details (we use default ports etc.): postgresql:///postgres

Or we can go to the other extreme, and use a connection string that connects us using the test user credentials, explicit host/port details, and a specified database: postgresql://testuser:testpass@localhost:5432/testdb

You can try it on MyBinder from here.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

2 thoughts on “Running a PostgreSQL Server in a MyBinder Container”

    1. @Ian :-) One thing I’ve noticed is that Google discoverability of my recent posts is going down because no-one creates persistent web links any more…

Comments are closed.

%d bloggers like this: