Running Executable Jupyter/IPython Notebooks Directly from Github With Binder

It’s taken me way too long to get round to posting this, but it’s a compelling idea that I think more notice should be taken of… binder ([code]).

The idea is quite simple – specify a public github project (username/repo) that contains one or more Jupyter (IPython) notebooks, hit “go”, and the service will automatically create a docker container image that includes a Jupyter notebook server and a copy of the files contained in the repository.

binder

(Note that you can specify any public Github repository – it doesn’t have to be one you have control over at all.)

Once the container image is created, visiting mybinder.org/repo/gitusername/gitrepo will launch a new container based on that image and display a Jupyter notebook interface at the redirected to URL. Any Jupyter notebooks contained within the original repository can then be opened, edited and executed as an active notebook document.

What this means is we could pop a set of course related notebooks into a repository, and share a link to mybinder.org/repo/gitusername/gitrepo. Whenever the link is visited, a container is fired up from the image and the user is redirected to that container. If I go to the URL again, another container is fired up. Within the container, a Jupyter notebook server is running, which means you can access the notebooks that were hosted in the Github repo as interactive, “live” (that is, executable) notebooks.

Alternatively, a user could clone the original repository, and then create a container image based on their copy of the repository, and then launch live notebooks from their own repository.

I’m still trying to find out what’s exactly going on under the covers of the binder service. In particular, a couple of questions came immediately to mind:

  • how long do containers persist? For example, at the moment we’re running a FutureLearn course (Learn to Code for Data Analysis) that makes use of IPython/Jupyter notebooks (https://www.futurelearn.com/courses/learn-to-code), but it requires learners to install Anaconda (which has caused a few issues). The course lasts 4 weeks, with learners studying a couple of hours a day maybe two days a week. Presumably, the binder containers are destroyed as a matter of course according to some schedule or rule – but what rule? I guess learners could always save and download their notebooks to the desktop and then upload them to a running server, but it would be more convenient if they could bookmark their container and return to it over the life of the course? (So for example, if Futurelearn was operating a binder service, joining the course could provide authenticated access to a container at http://www.futurelearn.com/courses/learn-to-code/USERID/notebook for the duration of the course, and maybe a week or two after? Following ResBaz Cloud – Containerised Research Apps as a Service, it might also allow for a user to export a copy of their container?)
  • how does the system scale? The FutureLearn course has several thousand students registered to it. To use the binder approach towards providing any student who wants one with a web-accessible, containerised version of the notebook application so they don’t have to insall one of their own, how easily would it scale? eg how easy is it to give a credit card to some back-end hosting company, get some keys, plug them in as binder settings and just expect it to work? (You can probably guess at my level devops/sysadmin ability/knowledge!;-)

Along with those immediate questions, a handful of more roadmap style questions also came to mind:

  • how easy would it be to set up the Jupyter notebook system to use an alternative kernel? e.g. to support a Ruby or R course? (I notice that tmpnb.org offers a variety of kernels, for example?)
  • how easy would it be to provide alternative services to the binder model? eg something like RStudio, for example, or OpenRefine? I notice that the binder repository initialisation allows you to declare the presence of a custom Dockerfile within the repo that can be used to fire up the container – so maybe binder is not so far off a general purpose docker-container-from-online-Dockerfile launcher? Which could be really handy?
  • does binder make use of Docker Compose to tie multiple applications together, as for example in the way it allows you to link in a Postgres server? How extensible is this? Could linkages of a similar form to arbitrary applications be configured via a custom Dockerfile?
  • is closer integration with github on the way? For example, if a user logged in to binder with github credentials, could files then saved or synched back from the notebook to that user’s corresponding repository?

Whatever – will be interesting to see what other universities may do with this, if anything…

See also Seven Ways of Running IPython Notebooks and ResBaz Cloud – Containerised Research Apps as a Service.

PS I just noticed an interesting looking post from @KinLane on API business models: I Have A Bunch Of API Resources, Now I Need A Plan, Or Potentially Several Plans. This has got me wondering: what sort of business plan might support a “Studyapp” – applications on demand, as a service – form of hosting?

Several FutureLearn courses, for all their web first rhetoric, require studentslearners to install software onto their own computers. (From what I can tell, FutureLearn aren’t interested in helping “partners” do anything that takes eyeballs away from FutureLearn.com. So I don’t understand why they seem reluctant to explore ways of using tech to provide interactive experiences within the FutureLearn context, like using embedded IPython notebooks, for example. (Trying to innovate around workflow is also a joke.) And IMVHO, the lack of innovation foresight within the OU itself (FutureLearn’s parent…) seems just as bad at the moment… As I’ve commented elsewhere, “[m]y attitude is that folk will increasingly have access to the web, but not necessarily access to a computer onto which they can install software applications. … IMHO, we are now in a position where we can offer students access to “computer lab” machines, variously flavoured, that can run either on a student’s own machine (if it can cope with it) or remotely (and then either on OU mediated services or via a commercial third party on which students independently run the software). But the lack of imagination and support for trying to innovate in our production processes and delivery models means it might make more sense to look to working with third parties to try to find ways of (self-)supporting our students.”. (See also: What Happens When “Computers” Are Replaced by Tablets and Phones?) But I’m not sure anyone else agrees… (So maybe I’m just wrong!;-)

That said, it’s got me properly wondering – what would it take for me to set up a service that provides access to MOOC or university course software, as a service, at least, for uncustomised, open source software, accessible via a browser? And would anybody pay to cover the server costs? How about if web hosting and a domain was bundled up with it, that could also be used to store copies of the software based activities once the course had finished? A “personal, persistent, customised, computer lab machine”, essentially?

Possibly related to this thought, Jim Groom’s reflections on The Indie EdTech Movement, although I’m thinking more of educators doing the institution stuff for themselves as a way of helping the students-do-it-for-themselves. (Which in turn reminds me of this hack around the idea of THEY STOLE OUR REVOLUTION LEARNING ENVIRONMENT. NOW WE’RE STEALING IT BACK !)

PS see also this by C. Titus Brown on Is mybinder 95% of the way to next-gen computational science publishing, or only 90%?

3 comments

  1. Helen

    I’m actually on the Future Learn course though I haven’t got so far with it as I’m on a Chromebook and not brave enough to experiment with installing the Python back end at the moment. I think you would have to really consider how many of your students would get as far as getting going on a MOOC with Jupyter/Python notebooks if they weren’t interested in tech / coding. I’m mainly looking into all of this as I want to get into the field!

    • Tony Hirst

      Helen – You won’t be able to run the notebooks directly on a Chromebook, but there are ways of running notebooks in the cloud – as per the example described in this post, but also using things like Wakari.com or tmpnb.org.

  2. Pingback: Warcbase and notebook data exploration | Libwebrarian's Blog