R HTMLWidgets in Jupyter Notebooks

A quick note for displaying R htmlwidgets in Jupyter notebooks without requiring pandoc – there may be a more native way but this acts as a workaround in the meantime if not:

library(htmlwidgets)
library(IRdisplay)

library(leaflet)
m = leaflet() %>% addTiles()
saveWidget(m, 'demo.html', selfcontained = FALSE)
display_html('<iframe src="demo.html"></iframe>')

PS and from the other side, using reticulate for Python powered Shiny apps.

Generative Assessment Creation

It’s coming round to that time of year where we have to create the assessment material for courses with an October start date. In many cases, we reuse question forms from previous presentations but change the specific details. If a question is suitably defined, then large parts of this process could be automated.

In the OU, automated question / answer option randomisation is used to provide iCMAs (interactive computer marked assessments) via the student VLE using OpenMark. As well as purely text based questions, questions can include tables or images as part of the question.

One way of supporting such question types is to manually create a set of answer options, perhaps with linked media assets, and then allow randomisation of them.

Another way is to define the question in a generative way so that the correct and incorrect answers are automatically generated.(This seems to be one of those use cases for why ‘everyone should learn to code’;-)

Pinching screenshots from an (old?) OpenMark tutorial, we can see how a dynamically generated question might be defined. For example, create a set of variables:

and then generate a templated question, and student feedback generator, around them:

Packages also exist for creating generative questions/answers more generally. For example, the R exams package allows you to define question/answer templates in Rmd and then generate questions and solutions in a variety of output document formats.


You can also write templates that include the creation of graphical assets such as charts:

 

Via my feeds over the weekend, I noticed that this package now also supports the creation of more general diagrams created from a TikZ diagram template. For example, logic diagrams:

Or automata diagrams:

(You can see more exam templates here: www.r-exams.org/templates.)

As I’m still on a “we can do everything in Jupyter” kick, one of the things I’ve explored is various IPython/notebook magics that support diagram creation. At the moment, these are just generic magics that allow you to write TikZ diagrams, for example, that make use of various TikZ packages:

One the to do list is to create some example magics that template different question types.

I’m not sure if OpenCreate is following a similar model? (I seem to have lost access permissions again…)

FWIW, I’ve also started looking at my show’n’tell notebooks again, trying to get them working in Azure notebooks. (OU staff should be able to log in to noteooks.azure.com using OUCU@open.ac.uk credentials.) For the moment, I’m depositing them at https://notebooks.azure.com/OUsefulInfo/libraries/gettingstarted, although some tidying may happen at some point. There are also several more basic demo notebooks I need to put together (e.g. on creating charts and using interactive widgets, digital humanities demos, R demos and (if they work!) polyglot R and python notebook demos, etc.). To use the notebooks interactively, log in and clone the library into your own user space.

First Class R Support in Binder / Binderhub – Shiny Apps As Well as R-Kernels and RStudio

I notice from the binder-examples/r repo that Binderhub now appears to offer all sorts of R goodness out of the can, if you specify a particular R build.

From the same repo root, you can get:

And from previously, here’s a workaround for displaying R/HTMLwidgets in a Jupyter notebook.

OpenRefine is also available from a simple URL – https://mybinder.org/v2/gh/betatim/openrefineder/master?urlpath=openrefine – courtesy of betatim/openrefineder:

Perhaps it’s time for me to try to get my head round what the Jupyter notebook proxy handlers are doing…

PS see also Scripted Forms for a simple markdown script way of building interactive Jupyter widget powered UIs.

Embedded Audio Players in Jupyter Notebooks Running IRKernel

For ref, when running IRkernel Jupyter R notebooks, media objects can be embedded by making use of the shiny::tags function, that can return HTML elements with appropriate MIME types, and are renderable using _repr_html machinery (h/t @flying-sheep):

For example:

PS By the by, I notice the existence of another R kernel for Jupyter notebooks, JuniperKernel. An advantage of this over IRkernel is that it supports xwidgets, a C++ widget implementation for Jupyter notebooks akin to ipywidgets. (As far as I know, there isn’t a widget implementation for IRkernel, although maybe something can be finessed using shiny::tags and other bits of shiny machinery? ) Although I could install JuniperKernel on my Mac, I had some issues getting it to work that I haven’t had a chance to explore yet, so I don’t yet have a widgets demo…

PS having to save an audio file to then load back into the player may be a faff; that said, it looks like there may be a route to using a data URI? Not tried this yet; if you have a short reproducible demo that works, please share via the comments:-)

Fragment – Running Multiple Services, such as Jupyter Notebooks and a Postgres Database, in a Single Docker Container

Over the last couple of days, I’ve been fettling the build scripts for the TM351 VM, which typically uses vagrant to build a VirtualBox VM from a set of shell scripts, so they can be used to build a single Docker container that runs all the TM351 services, specifically Jupyter notebooks, OpenRefine, PostgreSQL and MongoDB.

Docker containers are typically constructed to a run a single service, with compositions of containers wired together using Docker Compose to create applications that deliver, or rely on, more than one running service. For example, in a previous post (Setting up a Containerised Desktop API server (MySQL + Apache / PHP 5) for the ergast Motor Racing Data API) I showed how to set up a couple of containers to work together, one running a MySQL database server, the other an http service that provided an API to the database.

So how to run multiple services in the same container? Docs on the Docker website suggest using supervisord to run multiple services in a single container, so here’s a fragment on how I’ve done that from my TM351 build.

To begin with, I’ve built the container up as a tiered set of containers, in a similar way to the way the stack of opinionated Jupyter notebook Docker containers are constructed:

#Define a stub to identify the images in this image stack
IMAGESTUB=psychemedia/tm361testm

# minimal
## Define a minimal container, eg a basic Linux container
## using whatever flavour of Linux we prefer
docker build --rm -t ${IMAGESTUB}-minimal-test ./minimal

# base
## The base container installs core packages
## The intention is to define a common build environment
## populated with packages likely to be common to many courses
docker build --rm --build-arg BASE=${IMAGESTUB}-minimal-test -t ${IMAGESTUB}-base-test ./base

#...

One of the things I’ve done to try to generalise the build steps is allow the name a base container to be used to bootstrap a new one by passing the name of the base image in via an optional variable (in the above case, --build-arg BASE=${IMAGESTUB}-minimal-test). Each Dockerfile in a build step directory uses the following construction to work out which image to use as the FROM basis:

#Set ARG values using --build-arg =
#Each ARG value can also have a default value
ARG BASE=psychemedia/ou-tm351-base-test
FROM ${BASE}

Using the same approach, I have used separate build tiers for the following components:

  • jupyter base: minimal Jupyter notebook install;
  • jupyter custom: add some customisation onto a pre-existing Jupyter notebook install;
  • openrefine: add the OpenRefine application; (note, we could just use BASE=ubuntu to create this a simple, standalone OpenRefine container);
  • postgres: create a seeded PostgreSQL database; note, this could be split into two: a base postgres tier and then a customisation that adds users, creates and seed databases etc;
  • mongodb: add in a seeded mongo database; again, the seeding could be added as an extra tier on a minimal database tier;
  • topup: a tier to add in anything I’ve missed without having to go back to rebuild from an earlier step…

The intention behind splitting out these tiers is that we might want to have a battle hardened OU postgres tier, for example, that could be shared between different courses. Alternatively, we might want to have tiers offering customisations for specific presentations of a course, whilst reusing several other fixed tiers intended to last out the life of the course.

By the by, it can be quite handy to poke inside an image once you’ve created it to check that everything is in the right place:

#Explore inside animage by entering it with a shell command
docker run -it --entrypoint=/bin/bash psychemedia/ou-tm351-jupyter-base-test -i

Once the services are in place, I add a final layer to the container that ensures supervisord is available and set up with an appropriate supervisord.conf configuration file:

##Dockerfile
#Final tier Dockerfile
ARG BASE=psychemedia/testpieces
FROM ${BASE}

USER root
RUN apt-get update && apt-get install -y supervisor

RUN mkdir -p /openrefine_projects  && chown oustudent:100 /openrefine_projects
VOLUME /openrefine_projects

RUN mkdir -p /notebooks  && chown oustudent:100 /notebooks
VOLUME /notebooks

RUN mkdir -p /var/log/supervisor
COPY monolithic_container_supervisord.conf /etc/supervisor/conf.d/supervisord.conf

EXPOSE 3334
EXPOSE 8888

CMD ["/usr/bin/supervisord"]

The supervisord.conf file is defined as follows:

##supervisord.conf
##We can check running processes under supervisord with: supervisorctl

[supervisord]
nodaemon=true
logfile=/dev/stdout
loglevel=trace
logfile_maxbytes=0
#The HOME envt needs setting to the correct USER
#otherwise jupyter throws: [Errno 13] Permission denied: '/root/.local'
#https://github.com/jupyter/notebook/issues/1719
environment=HOME=/home/oustudent

[program:jupyternotebook]
#Note the auth is a bit ropey on this atm!
command=/usr/local/bin/jupyter notebook --port=8888 --ip=0.0.0.0 --y --log-level=WARN --no-browser --allow-root --NotebookApp.password= --NotebookApp.token=
#The directory we want to start in
#(replaces jupyter notebook parameter: --notebook-dir=/notebooks)
directory=/notebooks
autostart=true
autorestart=true
startsecs=5
user=oustudent
stdout_logfile=NONE
stderr_logfile=NONE

[program:postgresql]
command=/usr/lib/postgresql/9.5/bin/postgres -D /var/lib/postgresql/9.5/main -c config_file=/etc/postgresql/9.5/main/postgresql.conf
user=postgres
autostart=true
autorestart=true
startsecs=5

[program:mongodb]
command=/usr/bin/mongod --dbpath=/var/lib/mongodb --port=27351
user=mongodb
autostart=true
autorestart=true
startsecs=5

[program:openrefine]
command=/opt/openrefine-3.0-beta/refine -p 3334 -i 0.0.0.0 -d /vagrant/openrefine_projects
user=oustudent
autostart=true
autorestart=true
startsecs=5
stdout_logfile=NONE
stderr_logfile=NONE

One thing I need to do better is to find a way to stage the construction of the supervisord.conf file, bearing in mind that multiple tiers may relate to the same servicel for example, I have a jupyter-base tier to create a minimal Jupyter notebook server and then a jupyter-base-custom tier that adds in specific customisations, such as branding and course related notebook extensions.

When the final container is built, the supervisord command is run and the multiple services started.

One other thing to note: we’re hoping to run TM351 environments on an internal OpenStack cluster. The current cluster only allows students to expose a single port, and port 80 at that, from the VM (IP addresses are in scant supply, and network security lockdowns are in place all over the place). The current VM exposes at least two http services: Jupyter notebooks and OpenRefine, so we need a proxy in place if we are to expose them both via a single port. Helpfully, the nbserverproxy Jupyter extension (as described in Exposing Multiple Services Via a Single http Port Using Jupyter nbserverproxy), allows us to do just that. One thing to note, though – I had to enable it via the same user that launches the notebook server in the suoervisord.conf settings:

##Dockerfile fragment

RUN $PIP install nbserverproxy

USER oustudent
RUN jupyter serverextension enable --py nbserverproxy
USER root

To run the VM, I can call something like:

docker run -p 8899:8888 -d psychemedia/tm351dockermonotest

and then to access the additional services, I can browse to e.g. localhost:8899/proxy/3334/ to see the OpenRefine application.

PS in case you’re wondering why I syndicated this through RBloggers too, the same recipe will work if you’re using Jupyter notebooks with an R kernel, rather than the default IPython one.

Looking Up R / CRAN Package Maintainers With an ac.uk Affiliation

Trying to find an examiner for a particular PhD thesis relating to a rather interesting datastructure for wrangling messy datatables, I wondered whether we might find a likely suspect amongst the R package maintainer community.

We can get a list of R package maintainers here and a list of package name / short descriptions here.

FWIW, here’s the code fragment:

import pandas as pd

maintainers = pd.read_html('https://cran.r-project.org/web/checks/check_summary_by_maintainer.html')[0]
maintainers_email = maintainers.dropna(subset=[0])
maintainers_email[maintainers_email[0].str.contains('.ac.uk')][[0,1]]

packages = pd.read_html('https://cran.r-project.org/web/packages/available_packages_by_name.html')[0]
packages

maintainers_email_acuk = maintainers_email[maintainers_email[0].str.contains('.ac.uk')][[0,1]]
maintainers_email_acuk.merge(packages,left_on=1,right_on=0)

See also: What Do you Mean You Write Code EVERY DAY?, examples of which I’ve just turned into a new blog category: WDYMYWCED.

Running R Projects in MyBinder – Dockerfile Creation With Holepunch

For those who don’t know it, MyBinder is a reproducible research automation tool that will take the contents of a Github repository, build a Docker container based on requirements files found inside the repo, and then present the user with a temporary, running container that can serve a Jupyter notebook, JupyterLab or RStudio environment to the user. All at the click of a button.

Although the primary, default, UI is the original Jupyter notebook interface, it is also possible to open a MyBinder environment into JupyterLab or, if the R packaging is install, RStudio.

For example, using the demo https://github.com/binder-examples/r repository, which contains a simple base R environment, with RStudio installed, we can use my Binder to launch RStudio running over the contents of that repository:

When we launch the binderised repo, we get — RStudio in the browser:

Part of the Binder magic is to install a set of required packages into the container, along with “content” documents (Jupyter notebooks, for example, or Rmd files), based on requirements identified in the repo. The build process is managed using a tool called repo2docker, and the way requirements / config files need to be defined can be found here.

To make building requirements files easier for R projects, the rather wonderful holepunch package will automatically parse the contents of an R project looking for package dependencies, and will then create a DESCRIPTION metadata file itemising the found R package dependencies. (holepunch can also be used to create install.R files.) Alongside it, a Dockerfile is created that references the DESCRIPTION file and allows Binderhub to build the container based on the project’s requirements.

For an example of how holepunch can be used in support of academic publishing, see this repo — rgayler/scorecal_CSCC_2019 — which contains the source documents for a recent presentation by Ross Gayler to the Credit Scoring & Credit Control XVI Conference. This repo contains the Rmd document required to generate the presentation PDF (via knitr) and Binder build files created by holepunch.

Clicking the repo’s MyBinder  button takes you, after a moment or two, to a running instance of RStudio, within which you can open, and edit, the presentation .Rmd file and knitr it to produce a presentation PDF.

In this particular case, the repository is also associated with a Zenodo DOI.

As well as launching Binderised repositories from the Github (or other repository) URL, MyBinder can also launch a container from a Zenodo DOI reference.

The screenshot actually uses the incorrect DOI…

For example, https://mybinder.org/v2/zenodo/10.5281/zenodo.3402938/?urlpath=rstudio.