The original MyBinder service used to run an optional PostgreSQL DBMS alongside the Jupyter notebook service inside a Binder container (my original review).
But if you want to run a Postgres database in the same MyBinder environment nowadays, you need to add it in yourself.
Here are some recipes with different pros and cons. As @manics comments here, “[m]ost distributions package postgres to be run as a system service, so the user permissions are locked down.”, which means that you can’t run Postgres as an arbitrary user. The best approach is probably the last one, which uses an Anaconda packaged version of Postgres that has a more liberal attitude…
Recipe the First – Hacking Permissions
I picked up this approach from dchud/datamanagement-notebook/ based around Docker. It gets around the problem that the Postgres Linux package requires a particular user (postgres
) or an alternative user with root permissions to start and stop the server.
Use a Dockerfile to install postgres and create a simple database test user, as well as escalating default user notebook jovyan
to sudoers (along with the password redspot
). The jovyan
user can then start / stop the Postgres server via an appropriate entrypoint script.
USER root RUN chown -R postgres:postgres /var/run/postgresql RUN echo "jovyan ALL=(ALL) ALL" >> /etc/sudoers RUN echo "jovyan:redspot" | chpasswd COPY ./entrypoint.sh / RUN chmod +x /entrypoint.sh USER $NB_USER ENTRYPOINT ["/entrypoint.sh"]
The entrypoint.sh
script will start the Postgres server and then continue with any other start-up actions required to start the Jupyter notebook server install by repo2docker/MyBinder by default:
#!/bin/bash set -e echo redspot | sudo -S service postgresql start exec "$@"
Try it on MyBinder from here.
A major issue with this approach is that you may not want jovyan
, or another user, to have root privileges.
Recipe The Second – Hacking Fewer Permissions
The second example comes from @manics/@crucifixkiss and is based on manics/omero-server-jupyter.
In this approach, which also uses a Dockerfile, we again escalate the privileges of the jovyan
user, although this time in a more controlled way:
USER root #The trick in this Dockerfile is to change the ownership of /run/postgresql RUN apt-get update && \ apt-get install -qq -y \ postgresql postgresql-client && apt-get clean && \ chown jovyan /run/postgresql/ COPY ./entrypoint.sh / RUN chmod +x /entrypoint.sh
In this case, the entrypoint.sh
script doesn’t require any tampering with sudo
:
#!/bin/bash set -e PGDATA=${PGDATA:-/home/jovyan/srv/pgsql} if [ ! -d "$PGDATA" ]; then /usr/lib/postgresql/10/bin/initdb -D "$PGDATA" --auth-host=md5 --encoding=UTF8 fi /usr/lib/postgresql/10/bin/pg_ctl -D "$PGDATA" status || /usr/lib/postgresql/10/bin/pg_ctl -D "$PGDATA" -l "$PGDATA/pg.log" start psql postgres -c "CREATE USER testuser PASSWORD 'testpass'" createdb -O testuser testdb exec "$@"
You can try it on MyBinder from here.
Recipe the Third – An Alternative Distribution
The third approach is again via @manics and uses an Anaconda packaged version of Postgres, installing the postgresql
package via an environment.yml
file.
A postbuild
step initialises everything and pulls in a script to set up a dummy user and database.
#!/bin/bash set -eux #Make sure that everything is initialised properly PGDATA=${PGDATA:-/home/jovyan/srv/pgsql} if [ ! -d "$PGDATA" ]; then initdb -D "$PGDATA" --auth-host=md5 --encoding=UTF8 fi #Start the database during the build process # so that we can seed it with users, a dummy seeded db, etc pg_ctl -D "$PGDATA" -l "$PGDATA/pg.log" start #Call a script to create a dummy user and seeded dummy db #Make sure that the script is executable... chmod +x $HOME/init_db.sh $HOME/init_db.sh
For example, here’s a simple init_db.sh
script:
#!/bin/bash set -eux THISDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" #Demo PostgreSQL Database initialisation psql postgres -c "CREATE USER testuser PASSWORD 'testpass'" #The -O flag below sets the user: createdb -O DBUSER DBNAME createdb -O testuser testdb psql -d testdb -U testuser -f $THISDIR/seed_db.sql
which in turn pulls in a simple .sql
file to seed the dummy database:
-- Demo PostgreSQL Database initialisation DROP TABLE IF EXISTS quickdemo CASCADE; CREATE TABLE quickdemo(id INT, name VARCHAR(20), value INT); INSERT INTO quickdemo VALUES(1,'This',12); INSERT INTO quickdemo VALUES(2,'That',345);
Picking up on the recipe described in an earlier post (AutoStarting A Headless OpenRefine Server in MyBinder Using Repo2Docker and a start Config File), the database is autostarted using a start
file:
#!/bin/bash set -eux PGDATA=${PGDATA:-/home/jovyan/srv/pgsql} pg_ctl -D "$PGDATA" -l "$PGDATA/pg.log" start exec "$@"
In a Jupyter notebook, we can connect to the database in several ways.
For example, we can connect directly using the the psycopg2
package:
import psycopg2 conn = psycopg2.connect("dbname='postgres'") cur = conn.cursor() cur.execute("SELECT datname from pg_database") cur.fetchall()
Alternatively we can connect using something like ipython-sql magic, using a connection string that attaches us using a passwordless connection string as the default (jovyan
) user and default connection details (we use default ports etc.): postgresql:///postgres
Or we can go to the other extreme, and use a connection string that connects us using the test user credentials, explicit host/port details, and a specified database: postgresql://testuser:testpass@localhost:5432/testdb
You can try it on MyBinder from here.
Was casting about for an example using Postgres, should have known to look for you first!
@Ian :-) One thing I’ve noticed is that Google discoverability of my recent posts is going down because no-one creates persistent web links any more…