Feeding a MyBinder Container Built From One Github Repository With the Contents of Another

Long time readers should be more than well aware by now of MyBinder, the Jupyter project service that will build a Docker image from the contents of a git repository and then launch a container based on that image so you can work with a live, running, albeit temporary, instance if it.

But that’s not all it can do…

Via Chris Holdgraf on the Jupyter discourse community site (Tip: embed custom github content in a Binder link with nbgitpuller), comes a magical trick whereby you can launch a MyBinder instance built from one repository and populate it with files from another.

Why’s this useful? Well, if you’ve had a play with your own repos using MyBinder, you’ll know that each time you make a change to a repository, MyBinder will want to rebuild the Docker image next time you try to launch the repo there.

So if your repo defines a complex build that takes some time to install all of its dependencies, you have to wait for that build even if all you did was correct a typo in the markdown of a notebook file.

So here’s the trick…

nbgitpuller is a Jupyter server extension that supports the “one-way synchronization of a remote git repository to a local git repository”.

There are other approaches to git syncing too. See the next edition of Tracking Jupyter to find out what they are…

Originally developed as a tool to help distribute notebooks to students, it can be called via a Jupyter server URL. For example, if you have nbgitpuller installed in a local Jupyter server running on the default port 8888, the following URL will pull data from the specified repo into the base directory the notebook server points to using a URL of the form:

localhost:8888/git-pull?repo=https://github.com/USER/NOTEBOOK_REPO

One of the neat things about Binderhub / MyBinder is that can pass a git-pull? argument through as part of a MyBinder launch URL, so if the repo you want to build from installs and enables nbgitpuller, you can then pull notebooks into the launched container from a second, nbgitpulled repository.

For example, yesterday I came across the Python show_ast package, and incorporated IPython magic,  that will render the abstract syntax tree of a Python command:

Such a thing may be useful in an introductory programming course (TBH, I’m never really sure what people try to teach in introductory programming courses, what the useful mental models are, how best to help folk learn them, and how to figure out how to teach them…).

As with most Python based repos, particularly ones that contain Jupyter notebooks (that is, .ipynb files [….thinks… ooh… I has a plan to try s/thing else too….]) I generally try to “run” them via MyBinder. In this case, the repo didn’t work because there is a dependency on the Linux graphviz apt package and the Python graphviz package.

At this point, I’d generally fork the repo, create a binderise branch containing the dependencies, then try that out on MyBinder, sometimes adding an issue and/or making a pull request to the original repository suggesting they Binderise it…

…but nbgitpuller provides a different opportunity. Suppose I create a base container that contains the Graphviz Linux application and the graphivz Python package. Something like this: ouseful-testing/binder-graphviz.

Then I can create a MyBinder session from that repo and pull in the show_ast package from its repo and run the notebook directly:

https://mybinder.org/v2/gh/ouseful-testing/binder-graphviz/master/?urlpath=git-pull?repo=https://github.com/hchasestevens/show_ast

Fortuitously, things work seemlessly in this case because the example notebook lives in directory where we can import show_ast without the need to install it (otherwise we’d have needed to run pip install . at the top level of the repo). In general, where notebooks are kept in a notebooks or docs directory, for example, the path to import the package would break. (Hmmm… I need to think about protocols for handling that… It’s better practise to put the notebooks somewhere but that means we need to install the package or change the import path to it, which is one more step for folk to stumble over…)

Thinking about my old show’n’tell repo, the branches of which scruffily define various Binder environments suited to particular topic areas (environments for working on chemistry notebooks, for example, or astronomy notebooks, or classical language or music notebooks) and also contain demo notebooks, I could instead just define a set of base Binder environment containers, slow to build but built infrequently, and then lighter weight notebook repos containing just demo notebooks for a particular topic area. These could then be quickly and easily updated, and run on MyBinder having been nbgitpulled by a base container, without having to rebuild the base container each time I update a notebook in a notebook repo.

A couple of other things to note here. First, nbgitpuller has its own helper for creating nbgitpuller URLs, the nbgitpuller link generator:

It’s not hard to imagine a similar UI, or another tab to that UI, that can build a MyBinder link from a “standard” base container selected from a dropdown menu (or an optional link to a git repo) and then a provided git repo link for the target content repo.

Second, this has got me thinking about how we (don’t) handle notebook distribution very well in the OU.

For our TM351 internal course, we control the student’s computing environment via VM we provide them with, so we could install nbgitpuller in it, but the notebooks are stored in a private Github repo and we don’t want to give students any keys to it at all. (For some reason, I seem to be the only person who doesn’t have a problem with the notebooks being in a public repo!;-)

For our public notebook utilising courses on FutureLearn or OpenLearn, the notebooks are in a public repo, but we don’t have control of the learners’ computing environments, (which is to say, we can’t preinstall nbgitpuller and can’t guarantee that learners will have permissions of network access to install it themselves).

It’s almost as if various pieces keep appearing, but the jigsaw never quite seems to fit together…

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: