Merging Several Binder Configurations

As more and more repositories start to incorporate MyBinder / repo2docker build specifications, more and more building blocks start to appear for how to get particular things running in MyBinder. For example, I have several ouseful-template-repos with various building blocks for getting different databases running in MyBinder, and occasionally require an environment that also loads in a Jupyter-server-proxied application, such as OpenRefine. Other times, I might want to pull in the config for a partculalry ast install, or merge configs someone else has developed to run different sets of notbooks in the same Binderised repo.

But: a problem arises if you want to combine multiple Binder specifications from various repos into a single Binder setup in a single repo – how do you do it?

One way might be for repo2docker to itereate through multiple build steps, one for each Binder specification. There may be clashes, of course, such as conflicting package versions from different specifications, but it would then fall to the user to try to resolve the issue. Which is fine, if Binder is making a best attempt rather than guaranteeing to work.

Assuming that such a facility does not exist, it would require updates to repo2docker, so that’s not something we can easily hack around with ourselves. So how about something where we try to combine the contents of multiple binder/ setup directories ourselves. This is something we can start to do easily enough ourselves, and as a personal tool doesn’t necessarily have to work “properly” and “for everything”: for starters, it only has to work with what we want it to work with. And if it only works so far, getting 80% of the way to a working combined configuration that’s fine too.

So what would we need to do?

Simple list files like apt.txt and requirements.txt could be simply concatenated together, leaving it up to pip to do whatever it does with any clashes in pinned package numbers, for example (though we may want to report possible clashes, perhaps via a comment in the file, to help the user debug things).

In a shell script, something like the following would concatenate files in directories binder_1, binder_2, etc.:

for i in $(ls -d binder_*)
do
   echo >> binder/apt.txt
   echo "# $i" >> binder/apt.txt
   cat "$i/requirements.txt" >> binder/apt.txt
done

In Python, something like:

import os

with open('binder/requirements.txt', 'w') as outfile:
    for d in [d for d in os.listdir() if d.startswith('binder_') and os.path.isdir(d)]:
        # Should test: if 'requirements.txt' in os.listdir(d)
        with open(os.path.join(d, 'requirements.txt')) as infile:
            outfile.write(f'\n#{d}\n')
            outfile.write(infile.read())

Merging environment.yml files is a little trickier — the structure within the file is hierarchical — but a package like hiyapyco can help us with that:

import hiyapyco
import fnmatch

_envs = [os.path.join(d, e) for e in [d for d in os.listdir() if d.startswith('binder_') and os.path.isdir(d)] if fnmatch.fnmatch(e, '*.y*ml')]

merged = hiyapyco.load(_envs,
                       method=hiyapyco.METHOD_MERGE,
                       interpolate=True)

with open('binder/environment.yml', 'w') as f:
    f.write(hiyapyco.dump(merged))

There is an issue with environments where we have both environment.yml and requirements.txt files because the environments.yml trumps requirements.txt: the former will run but the latter won’t. A workaround I have used in the past for installing from both is to call install from the requirements.txt file by using a directive in the postBuild file to handle the requirements.txt installation.

I’ve also had to use a related trick to install a really dependent Python package explicitly via postBuild and then install from a renamed requirements.txt also via postBuild: the pip installer installs packages in whatever order it wants, and doesn’t necessarily follow any order “specified” in the requirements.txt file. This means that on certain occasions, a build can fail becuase one Python package is relying on another which is specified in the requirements.txt file but hasn’t been installed yet.

Another approach might be to grab any requirements from a (merged) requirements.txt file into an environment.yml file. For example, we can create a “dummy” _environment.yml file that will install elements from our requirements file, and then merge that into an existing environments.yml file. (We’d probbaly guard this with a check that both environment.y*ml and requirements.txt are in binder/):

_yaml = '''dependencies:
  - pip
  - pip:
'''

# if 'requirements.txt' in os.listdir() and 'environment.yml' in os.listdir():

with open('binder/requirements.txt') as f:
    for item in f.readlines():
        if item and not item.startswith('#'):
            _yaml = f'{_yaml}    - {item.strip()}\n'

with open('binder/_environment.yml', 'w') as f:
    f.write(_yaml)

merged = hiyapyco.load('binder/environment.yml', 'binder/_environment.yml',
                       method=hiyapyco.METHOD_MERGE,
                       interpolate=True)

with open('binder/environment.yml', 'w') as f:
    f.write(hiyapyco.dump(merged))

# Maybe also now delete requirements.txt?

For postBuild elements, different postBuild files may well operate in different shells (for example, we may have one that executes bash code, another that contains Python code). Perhaps the simplest way of “merging” this is to just copy over the separate postBuild files and generate a new one that calls each of them in turn.

import shutil

postBuild = ''

for d in [d for d in os.listdir() if d.startswith('binder_') and os.path.isdir(d)]:
    if 'postBuild' in os.listdir(d) and os.path.isfile(os.path.join(d, 'postBuild')):
        _from = os.path.join(d, 'postBuild')
        _to = os.path.join('binder', f'postBuild_{d}')
        shutil.copyfile(_from, _to)
        postBuild = f'{postBuild}\n./{_to}\n'

with open('binder/postBuild', 'w') as outfile:
    outfile.write(postBuild)

I’m guessing we could do the same for start?

If you want to have a play, the beginnings of a test file can be found here (for some reason, WordPress craps all over it and deletes half of it if I try to embed it in the sourcecode block etc. (I really should move to a blogging platform that does what I need…)

Installing Applications via postBuild in MyBinder and repo2docker

A note on downloading and installing things into a Binderised repo, or a container built using repo2docker.

If you save the files into $HOME as part of the container build process, if you try to use the image outside of MyBinder you will find that if storage volumes or local directories are mounted onto $HOME, your saved files are clobbered.

The MyBinder / repo2docker build is pretty limiting in terms of permissions the default jovyan user has over the file system. $HOME is one place you can write to, but if you need somewhere outside the path, then $CONDA_DIR (which defaults to /srv/conda) is handy…

For example, I just tweaked my neo4j binder repo to install a downloaded neo4j server into that path.

Accessing MyBinder Kernels Remotely from IPython Magic and from VS Code

One of the issues facing us as a distance learning organisation is how to support the computing needs of distance learning students, on a cross-platform basis, and over a wide range of computer specifications.

The approach we have taken for our TM351 Data Management and Analysis course is to ship students a Virtualbox virtual machine. This mostly works. But in some cases it doesn’t. So in the absence of an institutional online hosted notebook, I started wondering about whether we could freeload on MyBinder as a way of helping students run the course software.

I’ve started working on an image here though it’s still divergent from the shipped VM (I need to sort out things like database seeding, and maybe fix some of the package versions…), but that leaves open the question of how students would then access the environment.

One solution would be to let students work on MyBinder directly, but this raises the question of how to get the course notebooks into the Binder environment (the notebooks are in a repo, but its a private repo) and out again at the end of a session. One solution might be to use a Jupyter github extension but this would require students setting up a Github repository, installing and configuring the extension, remember to sync (unless auto-save-and-commit is available, or could be added to the extendion) and so on…

An alternative solution would be to find a way of treating MyBinder like an Enterprise Gateway server, launching a kernel via MyBinder from a local notebook server extension. But I don’t know how to do that.

Some fragments I have had laying around for a bit were the first fumblings towards a Python Mybinder client API, based on the Sage Cell client for running a chunk of code on a remote server… So I wondered whether I could do another pass over that code to ceate some IPython magic that let you create a MyBinder environment from a repo and then execute code against it from a magicked code cell. Proof of concept code for that is here: innovationOUtside/ipython_binder_magic.

One problem is that the connection seems to time out quite quickly. The code is really hacky and could probably be rebuilt from functions in the Jupyter client package, but making sense of that code is beyond my limited cut-and-paste abilities. But: it does offer a minimal working demo of what such a thing could be like. At a push, a student could install a minimal Jupyter server on their machine, install the magic, and then write notebooks using magic to run the code against a Binder kernel, albeit one that keeps dying. Whilst this would be inconvenient, it’s not a complete catastrophe because the notebook would be bing saved to the student’s local machine.

Another alternative struck me today when I say that Yuvi Panda had posted to the official Jupyter blog a recipe on how to connect to a remote Jupyterhub from Visual Studio Code. The mechanics are quite simple — I posted a demo here about how to connect from VS Code to a remote Jupyter server running on Digital Ocean, and the same approach works for connecting to out VM notebook server, if you tweak the VM notebook server’s access permissions — but it requires you to have a token. Yuvi’s post says how to find that from a remote JupyterHub server, but can we find the token for a MyBinder server?

If you open your browser’s developer tools and watch the network traffic as you launch a MyBinder server, then you can indeed see the URL used to launch the environment, along with the necessary token:

But that’s a bit of a faff if we want students to launch a Binder environment, watch the newtrok traff, grab the token and then use that to create a connection to the Binder environment from VS Code.

Searching the contents of pages from a running Binder environment, it seems that the token is hidden in the page:

And it’s not that hard to find… it’s in the link from the Jupyter log. The URL needs a tiny bit of editing (cut the /tree path element) but then the URL is good to go as the kernel connection URL in VS Code:

Then you can start working on your notebook in VS Code (open a new notebook from the settings menu), executing the code against the MyBinder environment.

You can also see the notebooks listed in the remote MyBinder environment.

So that’s another way… and now it’s got me thinking… how hard would it be to write a VS Code extension to launch a MyBinder container and then connect to it?

Ps by the by, I notice that developer tools in Firefox become increasingly useful with the Firefox 71 release in the form of a websocket inspector.

This lets you inspect traffic sent across a webseocket connection. For example, if we force a page reload on a running Jupyter notebook, we can see a websocket connection:

We can the click on that connection and monitor the messages being passed over it…

I thought this might help me debug / improve my Binder magic, but it hasn’t. The notebook looks like it sends an empty ping as a heartbeat (as per the docs), but if I try to send an empyt message from the magic it closes the connection? Instead, I send a message to the hearbeat channel…

Tinkering With Neo4j and Cypher

I am so bored of tech at the moment — I just wish I could pluck up the courage to go into the garden and start working on it again (it was, after all, one of the reasons for buying the house we’ve been in for several years now, and months go by without me setting foot into it; for the third year in a row the apples and pears have gone to rot, except for the ones the neighbours go scrumping for…) Instead, I sit all day, every day, in front of a screen, hacking at a keyboard… and I f*****g hate it…

Anyway… here’s some of the stuff that I’ve been playing with yesterday and today, in part prompted by a tweet doing the rounds again on:

#Software #Analytics with #Jupyter notebooks using a prefilled #Neo4j database running on #MyBinder by @softvisresearch
Created with building blocks from @feststelltaste and @psychemedia
#knowledgegraph #softwaredevelopment
https://github.com/softvis-research/BeLL

Impact.

Yeah:-)

Anyway… It prompted me to revisit my binder-neo4j repo that demos how to launch a neo4j database in a MyBinder container tp provide some more baby steps ways in to actually getting started running queries.

So yesterday I added in a third party cypher kernel to the build, HelgeCPH/cypher_kernel that lets you write cypher queries in code cells; and today I hacked together some simple magic — innovationOUtside/cypher_magic — that lets you write cypher queries in block magic cells in a “normal” (python kernel) notebook. This magic really should be extended a bit more eg to allow connections to arbitrary neo4j databases, and perhaps crib from the cypher_kernel to include graph conversions to a networkx graph object format as well as graphical vidusalisations.

The cypher-kernel uses visjs, as does an earlier cypher magic that appears to have rotted (ipython-cypher). But if we can get the graph objects into a nx format, then we could also use netwulf to make pretty diagrams…

The tweet-linked repo also looks interesting (although I don’t speak German at all, so, erm…); there may be things I can also pull out of there to add to my binder-neo4j repo, although I may need to rethink that: the binder-neo4j repo had started out as a minimal template repo for just getting started with neo4j in MyBinder/repo2docker. But it’s started creeping… Maybe I should pare it back again, install the magic from its own repo, and but the demos in a more disposable place.

Fragment – Jupyter Kernels / MyBinder as a Remote Code Execution Sandbox for Moodle

Although I don’t know for sure, I suspect that administrators of computing infrastructure in educational establishments are wary of requests from academics for compute services that allow students to run arbitrary code.

One of the main reasons why an educator would want to support this is that becuase setting up an environment can be hard: if you want a student to focus on writing code that makes use of particular packages, you probably don’t want them engaging in arcane sys admin practices and spending all them time trying to install those packages in the first place.

For the IT department, the thought of running arbitrary code that could be produced either by novices or deliberately malicious users is likely to raise several well-founded concerns: how do we stop users using the code environment to attack the server or network the code is running on; how do we stop folk from running code on out servers that could be used to attack external sites; and how do we control the resource requirements (storage, compute, network) when mistakes happen and folk try to repeatedly download the internet to our server.

One way of making hosted compute available to students is to execute code within isolated sandboxed environments that you can park in a safe area of the network and monitor closely.

In our Moodle VLE, the Moodle CodeRunner environment is used to allow students to run small fragments of code within just such an environment when completing interactive quiz questions. (I provide a quick review of the Moodle CodeRunner plugin in post [A] Quick First Look At Moodle CodeRunner.)

Presumably, someone somewhere has done a security audit and decided that the sandboxed code execution environment is a safe one and signed off on its use.

Another approach, described in this fragment on Jupyter Notebooks and Moodle, the SageCell filter for Moodle, allows you to run code against an external (stateless) SageCell server:

<?php
/**
 * SageCell filter for Moodle 3.4+
 *
 *  This filter will replace any Sage code in [sage]...[/sage]
 *  with a Ajax code from http://sagecell.sagemath.org
 *
 * @package    filter_sagecell
 * @copyright  2015-2018 Eugene Modlo, Sergey Semerikov
 * @license    http://www.gnu.org/copyleft/gpl.html GNU GPL v3 or later
 */

defined('MOODLE_INTERNAL') || die();

/**
 * Automatic SageCell embedding filter class.
 *
 * @package    filter_sagecell
 * @copyright  2015-2016 Eugene Modlo, Sergey Semerikov
 * @license    http://www.gnu.org/copyleft/gpl.html GNU GPL v3 or later
 */
class filter_sagecell extends moodle_text_filter {

    /**
     * Check text for Sage code in [sage]...[/sage].
     *
     * @param string $text
     * @param array $options
     * @return string
     */
    public function filter($text, array $options = array()) {

        if (!is_string($text) or empty($text)) {
            // Non string data can not be filtered anyway.
            return $text;
        }

        if (strpos($text, '[sage]') === false) {
            // Performance shortcut - if there is no </a> tag, nothing can match.
            return $text;
        }

        $newtext = $text; // Fullclone is slow and not needed here.

        $search = '/\[sage](.+?)\[\/sage]/is';
        $newtext = preg_replace_callback($search, 'filter_sagecell_callback', $newtext);

        if (is_null($newtext) or $newtext === $text) {
            // Error or not filtered.
            return $text;
        }

        return $newtext;
    }

}

/**
 * Replace Sage code with embedded SageCell, if possible.
 *
 * @param array $sagecode
 * @return string
 */
function filter_sagecell_callback($sagecode) {

    // SageCell code from [sage]...[/sage].
    $output = $sagecode[1];
    $output = str_ireplace("", "\n", $output);
    $output = str_ireplace("

", "\n", $output);
    $output = str_ireplace("
", "\n", $output);
    $output = str_ireplace("
", "\n", $output);
    $output = str_ireplace("
", "\n", $output);
    $output = str_ireplace("&nbsp;", "\x20", $output);
    $output = str_ireplace("\xc2\xa0", "\x20", $output);
    $output = clean_text($output);
    $output = str_ireplace("&lt;", "", $output);

    $id = uniqid("");

    $output = "" .
    "" .
        "sagecell.makeSagecell({inputLocation: \"#" . $id . "\"," .
        "evalButtonText: \"Evaluate\"," .
        "autoeval: true," .
        "hide: [\"evalButton\", \"editor\", \"messages\", \"permalink\", \"language\"] }" .
    ");" .
    "" .
    "
<div id="">". $output. "</div>
";

    return $output;
}

This looks to me like the SageCell Moodle filter essentially rewrites a [sage]...[/sage] delimited code block within a Moodle environment as a Javascript backed SageCell form and then lets users run the code embedded in the form against the remote server. This sort of thing could presumably be used to support interactive, executable code activities within a Moodle hosted web page, for example.

As I remarked previously, it’s not hard to imagine doing something similar to provide a [mybinder repository="..."]...[/mybinder]​ filter that could use a Javascript library such as ThebeLab or Juniper to provide a similar style of interaction backed by a MyBinder launched repository, though minor tweaks may be required around those packages to handle stateless rather than stateful transactions if repeated calls are made to the server.

Going back to the CodeRunner plugin (as described here):

[i]nternally CodeRunner is designed to support multiple sandboxes, implemented as subclasses of the abstract class qtype_coderunner_sandbox – see sandbox.php. Sandboxes are essentially plugins to CodeRunner. Several different ones have been used over the years but the only current ones are the jobe sandbox (file jobesandbox.php) and the ideone sandbox. The latter interfaces to the Sphere On-line judge server but is now more-or-less defunct. Both of those sandboxes run as services. CodeRunner can support multiple sandboxes at the same time and questions can be configured to select a particular sandbox (if desired). By default the first available sandbox that supports the language required by the question is used.

So could we use a MyBinder launched Jupyter server to provide sandboxed code execution?

One advantage of this would be that we could define a Jupyter environment that students could use on their own machines, or that we could host via a hosted notebook server, and that same environment could be used for CodeRunner style assessment.

Another advantage would be that if we want to run student created arbitrary code for teaching activities as well as CodeRunner based assessment activities, we’d only need to sign off on one sandboxed code execution environment rather than several.

So what’s required?

It’s years since I had used PHP, but I thought I’d have a go at creating a simple Python client that would let me:

  • start a MyBinder server against a specified Github repo;
  • start a kernel;
  • run a small code sample in the kernel and get a code execution response back.

Cribbing heavily from juniper.js and this rather handy sagecell-client.py, I came up with a hacky recipe that works a minimal proof of concept here: mybinder_py_client-ipynb.

I think this is stateful, in that we execute several code blocks one after the other and exploit state in previous calls to the same kernel. It would probably also make sense to have a call that forces a new kernel for each code execution call, as well as providing a recipe for killing a kernel.

The next step in trying to use this approach for CodeRunner sandbox would presumably be to try to create a simple PHP based MyBinder client; then the next step would be to use that in a CodeRunner sandbox subclass.

But that’s out of scope for me atm…

Please let me know in the comments if you have a go at this… or know of any other Moodle / Jupyter integrations…

Binder Base Boxes, Several Ways…

A couple of weeks ago, Chris Holdgraf published a handy tip on the Jupyter Discourse site about how to embed custom github content in a Binder link with nbgitpuller.

One of the problems with (features of…) MyBinder is that if you make a change to a repo, even if it’s just a change to the README, it will spawn a rebuild of the Docker image built from the repo the next time the repo is launched onto MyBinder.

With the recent announcement of the Binder Federation, whereby there are multiple clusters (currently two…) onto which MyBinder launch requests are mapped, if each cluster maintains its own Docker image hub, this could mean that with N clusters available, your next N launches may all require a rebuild if each launch request is mapped to a different cluster.

So how does nbgitpuller help? If you install nbgitpuller into a Binderised repository, you can launch a container on MyBinder with a git-pull? argument. This will grab the contents of a specified repository into a notebook server environment before presenting you with the notebook homepage.

What this means is that we can construct a MyBinder URL that will:

  • launch a container built from one repo; and
  • populate it with files pulled from another.

The advantage of this is that you can create one repo with a complex set of build requirements and build a MyBinder image from that once and once only. If you also maintain a second repository with notebook files, or a package definition, with frequent changes, but run it in a Binderised container launched from the “fixed” build repo, you won’t need to rebuild the container each time: just launch from the pre-built one and then synch the changed content in from the other repo.

To pull the contents of a repo http://github.com/USER/REPO into a MyBinder container built from a particular binder-base-boxes branch, use a MyBinder URL of the form:

https://mybinder.org/v2/gh/ouseful-demos/binder-base-boxes/BASEBOXBRANCH/?urlpath=git-pull?repo=https://github.com/USER/REPO

To pull the contents from a particular branch of a repo http://github.com/USER/REPO/tree/BRANCH, use a MyBinder URL of the form:

https://mybinder.org/v2/gh/ouseful-demos/binder-base-boxes/BASEBOXBRANCH/?urlpath=git-pull?repo=https://github.com/USER/REPO%26amp%3Bbranch=BRANCH

Note the escaping on the & conjunction between the repo and branch arguments that keeps it inside the scope of the git-pull?repo phrase.

To pull the contents from a particular branch of a repo http://github.com/USER/REPO/tree/BRANCH and launch into a particular notebook, use a MyBinder URL of the form:

https://mybinder.org/v2/gh/ouseful-demos/binder-base-boxes/BASEBOXBRANCH/?urlpath=git-pull?repo=https://github.com/USER/REPO%26amp%3Bbranch=BRANCH%26amp%3BsubPath=FILENAME.ipynb

You can see several examples in the various branches of https://github.com/ouseful-demos/binder-base-boxes.

See Feeding a MyBinder Container Built From One Github Repository With the Contents of Another for an earlier review of this approach (which I have to admit, I’d forgotten I’d posted when I started this post!).

On my to do list is to try to add a tab to the nbgitpuller/link generator to simplify the process of link creation. But in addition to a helper tool, is there a convention we might adopt to make it clearer when we are using this sort of split build/content repo approach?

Github conventionally uses the gh-pages branch as a “reserved” branch for constructing Github Pages docs related to a particular repo. Could we take a similar approach for defining a “Binder build” branch?

The binder/ directory in a repo can be used to partition Binder build requirements in a repo, but there are a couple of problems associated with this:

  • a maintainer may not want to have the binder/ directory cluttering their package repo;
  • any updates to the repo will force a rebuild of the Binder image next time the repo is run on a particular Binder node. (With Binder federation, if there are N hosts in the federation, after updating a repo, is it possible that my next N attempts to run the repo on MyBinder may require a rebuild if I am directed to a different host each time?)

If by convention something like a binder-build branch was used to contain the build requirements for a repo, then the process for calling a build (by default) could be simplified.

Eg rather than having something like:

https://mybinder.org/v2/gh/colinleach/binder-box/master/?urlpath=git-pull?repo=https://github.com/colinleach/astro-Jupyter

we would have something like:

https://mybinder.org/v2/gh/colinleach/astro-Jupyter/binder-build/?urlpath=git-pull?repo=https://github.com/colinleach/astro-Jupyter

which could simplify to something that defaults to a build from binder-build branch (the “build” branch) and nbgitpull from master (the “content” branch):

https://mybinder.org/v2/gh/colinleach/astro-Jupyter?binder-build=True

Complications could be added to support changing the build branch, the nbgitpull branch, the commit/ID of a particular build, etc?

It might overly complicate things further, but I could also imagine:

  • automatically injecting nbgitpuller into the Binder image and enabling it;
  • providing some sort of directive support so that if the content directory has a setup.py file the package from that content directory is installed.

Binder Buildpacks

As well as defining dynamically constructed Binder base boxes built from one repo and used to provide an environment within which to run the contents of another, there is a second sense in which we might define Binder base boxes and that is to consider the base environment on which repo2docker constructs a Binder image.

In the nbgitpuller approach, I am treating the Binder base box (sense 1) as the environment that the git pulled content runs in. In the buildpack appraoch, the Binder base box (sense 2) is the image that repo2docker uses to bootstrap the Binder image build process. Binder base box sense 1 = Binder base box sense 2 + Binder repo build process. Maybe it’d make more sense to swap those senses, so sense 2 builds on sense 1?!

This approach is discussed in the repo2docker issue #487 Make it possible to configure the base image with an example implementation in pangeo-stacks/pull/27. The implementation allows users to create a Dockerfile in which they specify a required base Docker image upon which the normal apt.txt, environment.yml, requirements.txt and postBuild steps can be applied.

The Dockerfile FROM statement takes the form:

FROM yuvipanda/pangeo-base-notebook-onbuild:2019.04.15-4

and then other build files (requirements.txt etc) are declared as normal.

The -onbuild component marks out the base image as one that should be built on (I think). I’m not sure how the date component applies (or whether it is required or optional). I’m not sure if the base box itself also needs some custom configuration? I think an example of the code use to build it is in the base-notebook directory of this repo: https://github.com/yuvipanda/pangeo-stacks .

Summary

Installing nbgitpuller into a Binderised repo allows us to pull the contents of a second Github repository into the first. This means we can build a complex environment from one repository once and pull regularly updated content from another repo into it without needing a rebuild step. Using the -onbuild approach, Binderhub can use repo2docker to build a Binder image from a user defined base image and then apply normal build steps to it. This means that optimised base boxes can be defined on which additional customisations can be layered. This can also make development of Binder boxes more efficient by starting rebuilds further up the image layer stack by building on top of prebuilt boxes rather than having build images from scratch.

MyBinder Launches From Any git Repository: Github, Gists, GitLab, Bitbucket etc

By default, MyBinder looks to repositories on Github for its builds, but it can also build from Githubs gists, GitLab.com repositories, and, well, any git repository with a networked endpoint, it seems:

What prompted me to this was looking for a way to launch a MyBinder container from Bitbucket. (For the archaeologists, there are various issues and PRs (such as here, and here, as well as this recent forum post — How to use bitbucket repositories on mybinder.org — that trace some of the history…)

So what’s the trick?

For now, you need to get hold of the URL to a particular Bitbucket repo commit. For example, to try running this repo you need to co to the Commits page and grab the URL for the most recent master commit (or whichever one you you want) which will contain the commit hash:

For example, soenthing like https://bitbucket.org/ueacomputervision/image-labelling-tool/commits/f3ddb33e4839f8a0fe73c168993b405adc13daf0 gives the commit hash f3ddb33e4839f8a0fe73c168993b405adc13daf0.

For the repo base URL https://bitbucket.org/ueacomputervision/image-labelling-tool, the MyBinder launch link then takes on the form:

https://mybinder.org/v2/git/https%3A%2F%2Fbitbucket.org%2Fueacomputervision%2Fimage-labelling-tool.git/f3ddb33e4839f8a0fe73c168993b405adc13daf0

which is to say:

https://mybinder.org/v2/git/ESCAPED_REPO_URL.git/COMMIT_HASH

But it does look like things may get easier in the near future…