Fragment – Jupyter For Edu

With more and more core components, as well as user contibutions, being added to the Jupyter framework, I’m starting to lose track of what’s possible. One of the things I might be useful for the OU, and Institute of Coding, context is to explore various architectural patterns that can be constructed in a Jupyter mediated environment that are particular useful for education.

In advance of getting a Github repo / wiki together to start that, here are a few fragments my my feeds, several of which have appeared in just the last couple of days:

Jupyter Enterprise Gateway Now a Top Level Jupyter Project

Via the Jupyter blog, I see the Jupyter Enterprise Gateway is now a top-level Jupyter project.

The Jupyter Enterprise Gateway “enables Jupyter Notebook to launch remote kernels in a distributed cluster“, which provides a handy separation between a notebook server (or Jupyterhub multi-user notebook server) and the kernel that a notebook runs against. For example, Jupyter Enterprise Gateway can be used to create kernels in a scaleable way using Kubernetes, or (I’m guessing…?) to do things like launch remote kernels running on a GPU cluster. From the docs it looks like Jupyter Enterprise Gateway  should work in a Jupyterhub context, although I can’t offhand find a simple howto / recipe for how to do that. (Presumably, Jupyterhub creates and launches user specific notebook server containers, and these then create and connect to arbitrary kernel running back-ends via the Jupyter Enterprise Gateway? Here’s a related issue I found.)

Running Notebook Cells One at a Time in a Terminal

The ever productive Doug Blank has a recipe for stepping through notebook cells in a terminal [code: nbplayer]. The player launches an IPython terminal that displays the first cell in the notebook and lets you step through them (executing or skipping the cell) one at a time. You can also run your own commands in between stepping through the notebook cells.

I can imagine using this to create a fixed set of steps for an activity that I want a student to work through, whilst giving them “free time” to explore the state of current execution environment, for example, or try out particular “given” functions with different parameters. This approach also provides a workaround for using notebook authored exercises in the terminal environment, which I know some colleagues favour over the notebook environment.

On my to do list is recast some of the activities from the new TM112 course to see how they feel using this execution model, and then compare that to the original activity and the activity run using the same notebook in a notebook environment.

Adding Multiple Student Users to a Jupyterhub Environment

Also via Doug Blank, a recipe for adding multiple users to a Jupyterhub environment using a form that allows you to simply add a list of user names: a more flexible way of adding accounts to Jupyterhub. User account details and random passwords are created automatically and then emailed to students.

To allow users to change passwords, e.g. on first run, I think the NotebookApp.allow_password_change=True notebook server parameter (Jupyter notebook – Config file and command line options) allows that?

The repo also shows a way of bundling nbviewer to allow users to “publish” HTML versions of their notebooks.

Doug also points to yuvipanda/jupyterhub-firstuseauthenticator, a first use authenticator for Jupyterhub that allows new users to create an account and then set a password on it. This could be really handy for workshops, where you want to allow uses to self-serve an environment that persists over a couple of workshop sessions, for example. (One thing we still need to do in the OU is get a Jupyterhub server up and running with persistent user storage; for TM112, we ran a temporary notebook server, which meant students couldn’t save and return to notebooks on the server – they’d have to download notebooks and then re-upload them into a new session if they wanted to return to working on a notebook they had modified. That said, the activity was designed as a “displosable” activity…)

Zip All Notebooks

This handy extension — nbzipprovides a button to zip and download a Jupyter notebook server folder.  If you’re working on a temporary notebook server, this provides and easy way of grabbing all the notebooks in one go. What might be even nicer would be to select a sub-folder, or selected set of files, using checkbox selectors? I’m not sure if there’s a complementary tool that will let you upload a zipped archive and unpack it in one go?

Fragment – Running Multiple Services, such as Jupyter Notebooks and a Postgres Database, in a Single Docker Container

Over the last couple of days, I’ve been fettling the build scripts for the TM351 VM, which typically uses vagrant to build a VirtualBox VM from a set of shell scripts, so they can be used to build a single Docker container that runs all the TM351 services, specifically Jupyter notebooks, OpenRefine, PostgreSQL and MongoDB.

Docker containers are typically constructed to a run a single service, with compositions of containers wired together using Docker Compose to create applications that deliver, or rely on, more than one running service. For example, in a previous post (Setting up a Containerised Desktop API server (MySQL + Apache / PHP 5) for the ergast Motor Racing Data API) I showed how to set up a couple of containers to work together, one running a MySQL database server, the other an http service that provided an API to the database.

So how to run multiple services in the same container? Docs on the Docker website suggest using supervisord to run multiple services in a single container, so here’s a fragment on how I’ve done that from my TM351 build.

To begin with, I’ve built the container up as a tiered set of containers, in a similar way to the way the stack of opinionated Jupyter notebook Docker containers are constructed:

#Define a stub to identify the images in this image stack
IMAGESTUB=psychemedia/tm361testm

# minimal
## Define a minimal container, eg a basic Linux container
## using whatever flavour of Linux we prefer
docker build --rm -t ${IMAGESTUB}-minimal-test ./minimal

# base
## The base container installs core packages
## The intention is to define a common build environment
## populated with packages likely to be common to many courses
docker build --rm --build-arg BASE=${IMAGESTUB}-minimal-test -t ${IMAGESTUB}-base-test ./base

#...

One of the things I’ve done to try to generalise the build steps is allow the name a base container to be used to bootstrap a new one by passing the name of the base image in via an optional variable (in the above case, --build-arg BASE=${IMAGESTUB}-minimal-test). Each Dockerfile in a build step directory uses the following construction to work out which image to use as the FROM basis:

#Set ARG values using --build-arg =
#Each ARG value can also have a default value
ARG BASE=psychemedia/ou-tm351-base-test
FROM ${BASE}

Using the same approach, I have used separate build tiers for the following components:

  • jupyter base: minimal Jupyter notebook install;
  • jupyter custom: add some customisation onto a pre-existing Jupyter notebook install;
  • openrefine: add the OpenRefine application; (note, we could just use BASE=ubuntu to create this a simple, standalone OpenRefine container);
  • postgres: create a seeded PostgreSQL database; note, this could be split into two: a base postgres tier and then a customisation that adds users, creates and seed databases etc;
  • mongodb: add in a seeded mongo database; again, the seeding could be added as an extra tier on a minimal database tier;
  • topup: a tier to add in anything I’ve missed without having to go back to rebuild from an earlier step…

The intention behind splitting out these tiers is that we might want to have a battle hardened OU postgres tier, for example, that could be shared between different courses. Alternatively, we might want to have tiers offering customisations for specific presentations of a course, whilst reusing several other fixed tiers intended to last out the life of the course.

By the by, it can be quite handy to poke inside an image once you’ve created it to check that everything is in the right place:

#Explore inside animage by entering it with a shell command
docker run -it --entrypoint=/bin/bash psychemedia/ou-tm351-jupyter-base-test -i

Once the services are in place, I add a final layer to the container that ensures supervisord is available and set up with an appropriate supervisord.conf configuration file:

##Dockerfile
#Final tier Dockerfile
ARG BASE=psychemedia/testpieces
FROM ${BASE}

USER root
RUN apt-get update && apt-get install -y supervisor

RUN mkdir -p /openrefine_projects  && chown oustudent:100 /openrefine_projects
VOLUME /openrefine_projects

RUN mkdir -p /notebooks  && chown oustudent:100 /notebooks
VOLUME /notebooks

RUN mkdir -p /var/log/supervisor
COPY monolithic_container_supervisord.conf /etc/supervisor/conf.d/supervisord.conf

EXPOSE 3334
EXPOSE 8888

CMD ["/usr/bin/supervisord"]

The supervisord.conf file is defined as follows:

##supervisord.conf
##We can check running processes under supervisord with: supervisorctl

[supervisord]
nodaemon=true
logfile=/dev/stdout
loglevel=trace
logfile_maxbytes=0
#The HOME envt needs setting to the correct USER
#otherwise jupyter throws: [Errno 13] Permission denied: '/root/.local'
#https://github.com/jupyter/notebook/issues/1719
environment=HOME=/home/oustudent

[program:jupyternotebook]
#Note the auth is a bit ropey on this atm!
command=/usr/local/bin/jupyter notebook --port=8888 --ip=0.0.0.0 --y --log-level=WARN --no-browser --allow-root --NotebookApp.password= --NotebookApp.token=
#The directory we want to start in
#(replaces jupyter notebook parameter: --notebook-dir=/notebooks)
directory=/notebooks
autostart=true
autorestart=true
startsecs=5
user=oustudent
stdout_logfile=NONE
stderr_logfile=NONE

[program:postgresql]
command=/usr/lib/postgresql/9.5/bin/postgres -D /var/lib/postgresql/9.5/main -c config_file=/etc/postgresql/9.5/main/postgresql.conf
user=postgres
autostart=true
autorestart=true
startsecs=5

[program:mongodb]
command=/usr/bin/mongod --dbpath=/var/lib/mongodb --port=27351
user=mongodb
autostart=true
autorestart=true
startsecs=5

[program:openrefine]
command=/opt/openrefine-3.0-beta/refine -p 3334 -i 0.0.0.0 -d /vagrant/openrefine_projects
user=oustudent
autostart=true
autorestart=true
startsecs=5
stdout_logfile=NONE
stderr_logfile=NONE

One thing I need to do better is to find a way to stage the construction of the supervisord.conf file, bearing in mind that multiple tiers may relate to the same servicel for example, I have a jupyter-base tier to create a minimal Jupyter notebook server and then a jupyter-base-custom tier that adds in specific customisations, such as branding and course related notebook extensions.

When the final container is built, the supervisord command is run and the multiple services started.

One other thing to note: we’re hoping to run TM351 environments on an internal OpenStack cluster. The current cluster only allows students to expose a single port, and port 80 at that, from the VM (IP addresses are in scant supply, and network security lockdowns are in place all over the place). The current VM exposes at least two http services: Jupyter notebooks and OpenRefine, so we need a proxy in place if we are to expose them both via a single port. Helpfully, the nbserverproxy Jupyter extension (as described in Exposing Multiple Services Via a Single http Port Using Jupyter nbserverproxy), allows us to do just that. One thing to note, though – I had to enable it via the same user that launches the notebook server in the suoervisord.conf settings:

##Dockerfile fragment

RUN $PIP install nbserverproxy

USER oustudent
RUN jupyter serverextension enable --py nbserverproxy
USER root

To run the VM, I can call something like:

docker run -p 8899:8888 -d psychemedia/tm351dockermonotest

and then to access the additional services, I can browse to e.g. localhost:8899/proxy/3334/ to see the OpenRefine application.

PS in case you’re wondering why I syndicated this through RBloggers too, the same recipe will work if you’re using Jupyter notebooks with an R kernel, rather than the default IPython one.

Exposing Multiple Services Via a Single http Port Using Jupyter nbserverproxy

Over the last couple of weeks I’ve been circling, but failing to make much actual progress on using, OpenStack as a platform for making self-serve OU hosted VMs available to students. (I’m increasingly starting to think this is not sensible, but I’m struggling to find someone I can chat to about it… OpenStack is too enterprise, like a heavy “Java” thing where I need a just works “Python” thing…).

Anyway.

One of the issues with the OU Faculty OpenStack setup is the way the security model locks everything down. Not only is no API access available, there is also a limit on IP address allocation and open ports are limited to port 80 (and maybe port 22? Or maybe not.)

For the TM351 VM – which is what we’re looking to put onto OU OpenStack – we have been exposing services on at least two http ports, one for the Jupyter notebooks and one for OpenRefine. (The latest build also has a simple VM webserver, and I’m experimenting with a notebook search engine. Optionally, we have also allowed students to open up ports to the PostgreSQL and MongoDB services.)

If I do find a sensible way to get the VM running on OpenStack, finding a way to shove all the http services through port 80 looks like a necessary requirement. Previously, I’d noticed that @betatim’s openrefineder demo made use of a proxy to expose the OpenRefine service via the Jupyter notebook port, and looking at it again today I noticed that the nbopenrefineproxy package it was using is available as a Jupyterhub project package: jupyterhub/nbserverproxy.

In the current TM351 VM set-up, we have the following:

  • Jupyter notebook on guest port 8888, host port 35180
  • OpenRefine on guest port 3334, host port 35181

However, if I install and enable nbserverproxy, and restart the Jupyter notebook server, I can now find OpenRefine proxied as http://localhost:35180/proxy/3334/ as well as on http://localhost:35181.

One gotcha to note is that the OpenRefine page doesn’t render properly from that URL without the trailing slash because the OpenRefine HTML includes relative links to assets:

...
<link type="text/css" rel="stylesheet" href="externals/select2/select2.css" />
 <link type="text/css" rel="stylesheet" href="externals/tablesorter/theme.blue.css" />
...

which resolve as e.g. http://localhost:35180/proxy/externals/select2/select2.css (404).

However, with the trailing slash, the links do resolve correctly (e.g. as http://localhost:35180/proxy/3334/externals/select2/select2.css) when the trailing slash is added.

Handy… and the way to go if we do get this running on OpenStack.

PS if you know of a baby steps tutorial that shows how I can build a custom VM image on a Mac that I can upload to OpenStack, please let me know via the comments. Or otherwise get in touch if you can talk me through the various approaches.

Web Terminals and X11 Remote Desktop Access for Personal Virtual Applications

One of the components I’ve found useful when putting together virtual machines / Docker containers is a simple web based terminal that gives me terminal access into the machine.

This is a living post where I can keep notes on the various solutions I’ve tried…

Jupyter Notebooks

If you run a Jupyter notebook server, or JupyterLab, in a container, you get a web based terminal for free, albeit running under whatever user/privileges the Jupyter user environment is using.

Gritty

Browser based terminal; requires a node install, which is a faff, but it seems to work quite well…  Here’s a Dockerfile fragment:

#Install node.js
RUN apt-get update && apt-get install -y build-essential
RUN curl -sL https://deb.nodesource.com/setup_8.x | bash && \
apt-get install -y nodejs && apt-get clean
RUN npm i npm@latest -g

#Install web terminal - gritty
#Provides access into container via in-browser terminal
RUN npm i gritty -g --unsafe-perm

#Use the terminal as the default running service for the container
#When the container is running, we should be able to access browser terminal into it
EXPOSE 1337
CMD gritty

The --no-perms thing for me was required to get it to install cleanly…

You’d probably want to run this alongside another process. See Docker docs – Run multiple services in a container for now. I need to work on my own reusable recipe to support this general sort of activity. (The supervisord route is probable the best way..? Example (untested), untested simple example – use /usr/bin/supervisord as the CMD , untested tutorial)

Here’s another example in a simple Vagrantfile:

Vagrant.configure("2") do |config|

#------------------------- PROVIDER: VIRTUALBOX (BUILD) ------------------------------

    config.vm.provider :virtualbox do |virtualbox|

        #Stick with the default key
        config.ssh.insert_key=false

        #For local testing:
        config.vm.box = "ubuntu/xenial64"

        config.vm.hostname = "gritty-test"

        virtualbox.name = "gritty_test"
        virtualbox.memory = 1024
        virtualbox.cpus = 1
        # virtualbox.gui = true

        #Don't bother updating guest additions
        config.vbguest.auto_update = false

        #---- START PORT FORWARDING ----
        #Registered ports: https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers

        #Gritty
        config.vm.network :forwarded_port, guest: 1337, host: 8898, auto_correct: true

        #---- END PORT FORWARDING ----

     end
#------------------------- END PROVIDER: VIRTUALBOX (BUILD) ------------------------------

  config.vm.provision :shell, :inline => <<-SH

    if [ ! -f /vagrant/.firstrun_setup.done ]; then
        apt-get update && apt-get install -y build-essential
        curl -sL https://deb.nodesource.com/setup_8.x | bash && apt-get install -y nodejs && apt-get clean
        npm i npm@latest -g

        #Install web terminal - gritty
        #Provides access into container via in-browser terminal
        npm i gritty -g --unsafe-perm

        cp /vagrant/gritty.service /lib/systemd/system/gritty.service

    	# Enable autostart
	systemctl enable gritty.service

	# Refresh service config
	systemctl daemon-reload

	#(Re)start service
	systemctl restart gritty.service

        touch /vagrant/.firstrun_setup.done
    fi
   SH

   #config.vm.provision "shell", inline: "gritty --port 1337 &", run: "always"

end

And here's an example gritty.service file:


[Unit]
Description=OU Gritty Terminal Demo

#When to bring the service up
#via https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
#Wait for a network stack to appear
After=network.target
#If we actually need the network to have a routable IP address:
#After=network-online.target

[Service]
#User=oustudent
Environment=GRITTYPORT=1337

#ExecStartPre=mkdir -p /vagrant/openrefine_projects

ExecStart=/usr/bin/gritty --port ${GRITTYPORT}
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Alternatives: xterm.js seems to provide the basis of several terminal servers such as ttyd; here’s an example with authentication from Duke: mccahill/xterm-js-docker; and a minimal example server: simon-engledew/dockerweb. Here’s an example of a web ssh terminal with a ore complete config file example here.

novnc

Another route to a web based terminal via the browser is to cheat – and allow access to VM desktop through the browser. This can be done with using a virtualised desktop. For example, novnc eg https://github.com/psharkey/docker/tree/master/novnc with other containers to compose with it.

The dit4c/dockerfile-dit4c-container-x11 container is a nice example of a set of Docker container layers to support X11 exposure via a browser using novnc.

The new Docker Desktop looks like it will have some GUI tools to help assemble applications in Docker Desktop so it’d be great if that included a robust, templated containers to handle both X11, and terminal access, through a browser.

See also: Apache Guacamole; example – Guacamole and Wine – Running Small Legacy Windows Apps Via a Browser or Accessing GUI Apps (Audacity) Via a Browser from a Container Using Guacamole.

PS here’s a set-up for running under supervisord in a container.

Docker – multiple services in same container recipe

This uses a docker-compose file (run using docker-compose up) that uses a local Dockerfile that calls on build.sh script to create a container image that runs a couple of services as specified in the supervisord.conf file. Note the requirement on installing the supervisor package.

First, the supervisord.conf script; this is responsible for starting the running applications (and replaces any services; which means my build files need to separate our services and copy them over / start them up if we are not doing the docker build).

##supervisord.conf
[supervisord]
nodaemon=true

[program:gritty]
command = /usr/bin/gritty --port 80

[program:gritty2]
command = /usr/bin/gritty --port 8899

The build.sh script is responsible for installing the necessary packages into the container.

##build.sh
apt-get update && apt-get install -y build-essential curl
curl -sL https://deb.nodesource.com/setup_8.x | bash && apt-get install -y nodejs && apt-get clean
npm i npm@latest -g

#Install web terminal - gritty
#Provides access into container via in-browser terminal
npm i gritty -g --unsafe-perm

The Dockerfile orchestrates the build of the container.

##Dockerfile
FROM ubuntu:16.04

RUN apt-get update && apt-get install -y supervisor

ADD build.sh /root/build.sh
RUN chmod +x /root/build.sh && /root/build.sh

RUN apt-get clean

COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf

CMD ["/usr/bin/supervisord"]

The docker-compose.yaml orchestrates the deployment of the container, including port mappings.

##docker-compose.yaml
version: '3'
services:
  multidemo:
    build: .
    ports:
     - "8087:80"
     - "8097:8899"

Running the container should publish terminals running as separate services on different ports (host ports 8087 and 8097.

Seven Ways of Making Use of SQLite

SQLite is a really handy file based database engine. When you put data into a database, it can make it easier to search; it also provides a great tool for looking for stories or story leads hidden in the data. So here are seven ways of getting started with SQLite.

Querying SQLite Databases Using Rich User Interfaces

Whilst SQLite comes with it’s own command line client, many people will find the easiest way of getting started with querying SQLite databases is to use an application, either on the desktop or accessed via a browser based UI.

franchise is a browser based UI that you can access via the cloud or run locally (

(https://github.com/hvf/franchise)). If you have a SQLite database file (make sure the suffix is .sql) you can upload it and explore it using the franchise application. (If you have a CSV or Excel data file, you can upload that too and it will add it to its own temporary SQLite database). Here’s a review: Asking Questions of CSV Data, Using SQL In the Browser, With Franchise.

If you prefer something on the desktop, the cross-platform sqlitebrowser might suit your needs.

Another great way of making use of SQLite is bring it alive using datasette. A single command allows you to publish an interactive, browser based user interface to one or more databases, either on your own computer or via an online host such as Zeit Now, or Heroku. For example, I’ve popped up three databases I scrape together on Heroku and pointed my own URL at them (unfortunately, I tend to run out of free Heroku credits in the last week of the month at which point the site goes down!)

datasette allows you to query the databases through a customisable webpage and supports a range of plugins. For example, the datasette-cluster-map will detect latitude and longitude columns in a datatable and present the results using an interactive map. I gave it a spin with this map of UK Food Hygiene Ratings. You can find several other examples of datasettes published in the wild on the datasette wiki.

Finding Data: SQLite Databases in the Wild

Whilst tools such as datasette are great for quickly getting started with querying a SQLite database, one obvious question that arises is: what SQLite database?

Once you start poking around, however, you can start to find examples of SQLite databases working quietly behind the scenes on you own computer. (Searching your computer for files with a .sqlite suffix is one way of finding them!)

As a specific example, the Safari, Chrome and Firefox web browsers all keep track of your browser history using a SQLite database on your computer (this gist – dropmeaword/browser_history.md – tells you where you can find the files. You can then simply query them using datasette. On a Mac, I can simply run:

datasette ~/Library/Application\ Support/Google/Chrome/Default/History

and I can then start to query my browser history using the datasette browser based UI. Here’s an ‘inspect your browser history’ tutorial to get you started…

Ingesting Data

Applications such as franchise allow you to upload a CSV or Excel datafile and automatically import it into a SQLite database so that it can be queried using SQLite. The datasette ecosystem also includes and application for uploading CSV files and ingesting them into a SQlite database: datasette publish.

Behind the scenes of that application is a Python command line utility called csvs-to-sqlite. A simple command lets yoiu convert a CSV file to a SQLite database:

csvs-to-sqlite myfile.csv mydatabase.db

Whilst csvs-to-sqlite focusses on the conversion of CSV files into a SQLite database, the more general sqlitebiter command line utility can convert CSV, Excel, HTML tables (eg from a URL), JSON, Jupyter Notebooks, Markdown, TSV and Google-Sheets to a SQLite database file using a similar command format.

Using SQLite from the Commandline

Natively, SQLite comes with its own command line shell that allows you to connect to and query a SQLite database from the command line.

As well as command line tools for converting data contained in various file formats into a SQLite database, several command line tools embed that conversion within a command line interface that allows you convert a data file to an in-memory SQLite database and query it directly.

For example, using the csvsql command from csvkit:

csvsql --query "select * from iris as i join irismeta as m on (i.species = m.species)" examples/iris.csv examples/irismeta.csv

Or textql:

Or the simply named q:

q -H "SELECT COUNT(DISTINCT(uuid)) FROM ./clicks.csv"

Querying SQLite Databases from Programming Languages Such as Python and R

If you are developing your own data-centric reproducible research pipelines, it’s likely that you will be using a programming language such as R or the Python pandas library.

There are two main ways for using SQLite in this context. The first is to connect to the database from the programming language and then query it from within the language. For example, in R, you might use the RSQLite package. In Python, you can connect to a SQLite database using the base sqlite3 package.

The other approach is to use SQLite as an in-memory database that provides a SQL query interface to a dataframe. In R, this can be achieved using the sqldf package:

library(sqldf)
sqldf("select * from iris limit 5")

In Python/pandas, you can use the pandasql package:

from pandasql import sqldf, load_births
pysqldf = lambda q: sqldf(q, globals())
births = load_births()
print(pysqldf("SELECT * FROM births LIMIT 10;").head())

In many respects, sqldf and pandasql behave like programming language native versions of command-line utilities such as csvsql, textql and q, although rather than importing a data file into a SQLite database so that it can be queried, they import the contents of the referenced dataframe.

pandas also provides native support for adding dataframes as tables to a connected SQLite database, as well as reading results from queries onto the database back into a dataframe. Once you start getting into the swing of putting data into a database, and then running JOINed queries over multiple tables, you’ll start to wonder why you spent so much time getting hassled by VLOOKUP. As an example, here’s a way of Making a Simple Database to Act as a Lookup for the ONS Register of Geographic Codes.

Using SQLite in Your Own Applications

If you are comfortable with using programming code to manipulate your data, then you may want to explore ways of using SQLite to create your own data driven applications.

One way yo start is to use SQLite completely within the browser. Accessing desktop applications from a webpage is typically a no-no because of browser security restrictions, but SQLite is quite a light application, so it can – and has been – compiled to Javascript so that it can be imported as a Javascript library and run from within a webpage: sql.js. You can see an example of how it can be used to provide a simple browser based, SQLite powered data explorer, running solely within a browser here: official demo or SQLite Viewer.

As well as running SQLite in a browser, SQLite can also be used to power an API. One of the advantages of running a datasette service is that it also exposes a datasette API. This means you can publish your datasette to a web host then other computers can querying it programmatically.

If you are working in a Python Jupyter environment, it’s simple enough to use the Jupyer kernel gateway to create your own APIs. Here’s an example of building a service to allow the lookup of ONS codes from a simple SQLite database: Building a JSON API Using Jupyter Notebooks in Under 5 Minutes.

Another way of using SQLite databases in a Jupyter environment is to use Scripted Forms to For example, here’s one of my own recipes for Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter that shows how to create a simple interactive form for querying a SQLite database containing descriptions of images used in OpenLearn courses.

SQLite Database Admin Tools

As well as providing a simple explorer and query interface, the sqlitebrowser tool also supports a range of other SQLite database administration functions such as the ability to create, define, modify and delete tables and indexes, or edit, add and delete individual records.

The browser based sqlite-web application provides a similar range of utulities via a browser based, rather than desktop client, UI.

Summary

SQLite is lightweight, in-memory and file based database that allows you to run SQL queries over a wide range of tabular datasets. If you work with data, knowing how to write even simple SQL queries can add a powerful tool to your toolbox. SQLite, and the associated tools created around it, is almost certainly one of the easiest ways in to using this most versatile, portable, and personal data power tool.

PS Here are some more ways of using SQLite:

Jigsaw Pieces – Linux Service Indicators, Jupyter Kernel Monitoring and Environment Management

Something I’ve been pondering for some time is how to set up some simple Linux service monitoring so that I can display an in indicator light in a web page to show whether a Linux service is running or not.

For example, in the TM351 VM, it could be handy to display some indicator lights in a Jupyter notebook status bar showing whether the database services we connect to from the notebooks are running correctly,

So here are some pieces that may contribute to that:

My thinking is:

  • use monit to monitor a process; if the process is down, write to a service status file in my www server directory, eg service_servicename_status.txt. If a service is running the contents of this file are 1, otherwise 0;
  • use the JQuery fragment to poll the status file every few seconds;
  • if the status file returns 0, display a red indicator, otherwise green.

Here are some other monitoring / environment managing fragments I’m pondering:

  • something like ps_mem, a Python utility *to accurately report the in core memory usage for a program*. I’m wondering if I could use that to track how much memory each Jupyter notebook python kernel is taking up (or maybe monit can do that?) There’s an old extnesion that looks like ti shows reports: nbtop. Or perhaps use psutil (via this issue, which seems to offer a solution?);
  • a minimal example of setting up notebook homepage tab for a hello world webpage; Writing a notebook server extension looks like it has the ingredients, and nb_conda provides a fuller working example. Actually, that extension looks useful for *Jupyter-as-a-learning-environment* because it lets you select different conda environments, which could be handy for running different activities.

Any other examples out there of Jupyter monitoring / environment management?

Interactive Authoring Environments for Reproducible Media: Stencila

One of the problems associated with keeping up with tech is that a lot of things that “make sense” are not the result of the introduction or availability of a new tool or application in and of itself, but in the way that it might make a new combination of tools possible that support a complete end to end workflow or that can be used to reengineer (a large part of) an existing workflow.

In the OU, it’s probably fair to say that the document workflow associated with creating course materials has its issues. I’m still keen to explore how a Jupyter notebook or Rmd workflow would work, particularly if the authored documents included recipes for embedded media objects such as diagrams, items retrieved from a third party API, or rendered from a source representation or recipe.

One “obvious” problem is that the Jupyter notebook or RStudio Rmd editor is “too hard” to work with (that is, it’s not Word).

A few days ago I saw a tweet mentioning the use of Stencila with Binderhub. Stencila? Apparently, *”[a]n open source office suite for reproducible research”. From the blurb:

[T]oday’s tools for reproducible research can be intimidating – especially if you’re not a coder. Stencila make reproducible research more accessible with the intuitive word processor and spreadsheet interfaces that you and your colleagues are already used to.

That sounds appropriate… It’s available as a desktop app, but courtesy of minrk/jupyter-dar (I think?), it runs on binderhub and can be accessed via a browser too:

 

You can try it here.

As with Jupyter notebooks, you can edit and run code cells, as well as authoring text. But the UI is smoother than in Jupyter notebooks.

(This is one of the things I don’t understand about colleagues’ attitude towards emerging tech projects: they look at today’s UX and think that’s it, because that’s how it is inside an organisation – you take what you’re given and it stays the same for decades. In a living project, stuff tends to get better if it’s being used and there are issues with it…)

The Jupyter-Dar strapline pitches “Jupyter + DAR compatibility exploration for running Stencila on binder”. Hmm. DAR? That’s also new to me:

Dar stands for (Reproducible) Document Archive and specifies a virtual file format that holds multiple digital documents, complete with images and other assets. A Dar consists of a manifest file (manifest.xml) that describes the contents.

Dar is being designed for storing reproducible research publications, but the underlying concepts are suitable for any kind of digital publications that can be bundled together with their assets.

Repo: [substance/dar](https://github.com/substance/dar)

Sounds interesting. And which reminds me: how’s OpenCreate coming along, I wonder? (My permissions appear to have been revoked again; or the URL has changed.)

PS seems like there’s more activity in the “pure web” notebook application world. Hot on the heels of Mike Bostock’s Observable notebooks (rationale) comes iodide, “[a] frictionless portable notebook-style interface for literate scientific computing in the browser” (examples).

I don’t know if these things just require you to use Javascript, or whether they can also embed things like Brython.

I’m not sure I fully get the js/browser notebooks yet? I like the richer extensibility of things like Jupyter in terms of arbitrary language/kernel availability, though I suppose the web notebooks might be able to hook into other kernels using similar mechanics to those used by things like Thebelab?

I guess one advantage is that you can do stuff on a Chromebook, and without a network connection if you cache all the required JS packages locally? Although with new ChromeOS offering support for Linux – and hence, Docker containers – natively, Chromebooks could get a whole lot more exciting over the next few months. From what I can tell, corsvm looks like a ChromeOS native equivalent to something like Virtualbox (with an equivalent of Guest Additions?). It’ll be interesting how well things like audio works? Reports suggest that graphical UIs will work, presumably using some sort of native X11 support rather than noVNC, so now could be a good time to start looking out for souped up Pixelbook…