Merging Several Binder Configurations

As more and more repositories start to incorporate MyBinder / repo2docker build specifications, more and more building blocks start to appear for how to get particular things running in MyBinder. For example, I have several ouseful-template-repos with various building blocks for getting different databases running in MyBinder, and occasionally require an environment that also loads in a Jupyter-server-proxied application, such as OpenRefine. Other times, I might want to pull in the config for a partculalry ast install, or merge configs someone else has developed to run different sets of notbooks in the same Binderised repo.

But: a problem arises if you want to combine multiple Binder specifications from various repos into a single Binder setup in a single repo – how do you do it?

One way might be for repo2docker to itereate through multiple build steps, one for each Binder specification. There may be clashes, of course, such as conflicting package versions from different specifications, but it would then fall to the user to try to resolve the issue. Which is fine, if Binder is making a best attempt rather than guaranteeing to work.

Assuming that such a facility does not exist, it would require updates to repo2docker, so that’s not something we can easily hack around with ourselves. So how about something where we try to combine the contents of multiple binder/ setup directories ourselves. This is something we can start to do easily enough ourselves, and as a personal tool doesn’t necessarily have to work “properly” and “for everything”: for starters, it only has to work with what we want it to work with. And if it only works so far, getting 80% of the way to a working combined configuration that’s fine too.

So what would we need to do?

Simple list files like apt.txt and requirements.txt could be simply concatenated together, leaving it up to pip to do whatever it does with any clashes in pinned package numbers, for example (though we may want to report possible clashes, perhaps via a comment in the file, to help the user debug things).

In a shell script, something like the following would concatenate files in directories binder_1, binder_2, etc.:

for i in $(ls -d binder_*)
   echo >> binder/apt.txt
   echo "# $i" >> binder/apt.txt
   cat "$i/requirements.txt" >> binder/apt.txt

In Python, something like:

import os

with open('binder/requirements.txt', 'w') as outfile:
    for d in [d for d in os.listdir() if d.startswith('binder_') and os.path.isdir(d)]:
        # Should test: if 'requirements.txt' in os.listdir(d)
        with open(os.path.join(d, 'requirements.txt')) as infile:

Merging environment.yml files is a little trickier — the structure within the file is hierarchical — but a package like hiyapyco can help us with that:

import hiyapyco
import fnmatch

_envs = [os.path.join(d, e) for e in [d for d in os.listdir() if d.startswith('binder_') and os.path.isdir(d)] if fnmatch.fnmatch(e, '*.y*ml')]

merged = hiyapyco.load(_envs,

with open('binder/environment.yml', 'w') as f:

There is an issue with environments where we have both environment.yml and requirements.txt files because the environments.yml trumps requirements.txt: the former will run but the latter won’t. A workaround I have used in the past for installing from both is to call install from the requirements.txt file by using a directive in the postBuild file to handle the requirements.txt installation.

I’ve also had to use a related trick to install a really dependent Python package explicitly via postBuild and then install from a renamed requirements.txt also via postBuild: the pip installer installs packages in whatever order it wants, and doesn’t necessarily follow any order “specified” in the requirements.txt file. This means that on certain occasions, a build can fail becuase one Python package is relying on another which is specified in the requirements.txt file but hasn’t been installed yet.

Another approach might be to grab any requirements from a (merged) requirements.txt file into an environment.yml file. For example, we can create a “dummy” _environment.yml file that will install elements from our requirements file, and then merge that into an existing environments.yml file. (We’d probbaly guard this with a check that both environment.y*ml and requirements.txt are in binder/):

_yaml = '''dependencies:
  - pip
  - pip:

# if 'requirements.txt' in os.listdir() and 'environment.yml' in os.listdir():

with open('binder/requirements.txt') as f:
    for item in f.readlines():
        if item and not item.startswith('#'):
            _yaml = f'{_yaml}    - {item.strip()}\n'

with open('binder/_environment.yml', 'w') as f:

merged = hiyapyco.load('binder/environment.yml', 'binder/_environment.yml',

with open('binder/environment.yml', 'w') as f:

# Maybe also now delete requirements.txt?

For postBuild elements, different postBuild files may well operate in different shells (for example, we may have one that executes bash code, another that contains Python code). Perhaps the simplest way of “merging” this is to just copy over the separate postBuild files and generate a new one that calls each of them in turn.

import shutil

postBuild = ''

for d in [d for d in os.listdir() if d.startswith('binder_') and os.path.isdir(d)]:
    if 'postBuild' in os.listdir(d) and os.path.isfile(os.path.join(d, 'postBuild')):
        _from = os.path.join(d, 'postBuild')
        _to = os.path.join('binder', f'postBuild_{d}')
        shutil.copyfile(_from, _to)
        postBuild = f'{postBuild}\n./{_to}\n'

with open('binder/postBuild', 'w') as outfile:

I’m guessing we could do the same for start?

If you want to have a play, the beginnings of a test file can be found here (for some reason, WordPress craps all over it and deletes half of it if I try to embed it in the sourcecode block etc. (I really should move to a blogging platform that does what I need…)

PoC: Using Git Commit Messages As a CLI

Following an idle wonder last week on Using Git Commit Messages as a Command Line?, I had a play and came up with a demo of sorts: ouseful-testing/action-steps.

The idea is that by creating a Github Action that performs actions based, in part at least, on the contents of a Github commit message, we can start to use commit messages as as a CLI to invoke particular Github Action mediated activities.

My first couple of proofs of concept were:

  • a simple script that replaces one file (the README) with the contents of another. At the moment, both files need to be in the same branch (ideally, the replacement files would be pulled in from another branch but I couldn’t figure out how to do that offhand). If you just make a “dummy” commit to any old file with the commit message Update Readme the README will be updated with the contents of one file. If you use the commit message Reset Readme it will be replaced with the contents of another. My thinking in part, here, is that you could “commit” progress messages as you work through a thing and the README keeps getting updated with the next thing you have to do as you commit to say you’ve done the previous thing.
name: Updates
on: push

    if: (github.event.commits[0].message == 'Update Readme')
    runs-on: ubuntu-latest

    - name: Copy Repository Contents
      uses: actions/checkout@v2
    - name: commit changes
      run: |
        git config --global "${GH_EMAIL}"
        git config --global "${GH_USERNAME}"
        # git checkout -B fastpages-automated-setup
        git add
        git commit -m'Update README'
        git push
        GH_EMAIL: ${{ github.event.commits[0] }}
        GH_USERNAME: ${{ github.event.commits[0].author.username }}
    if: (github.event.commits[0].message == 'Reset Readme')
    runs-on: ubuntu-latest

    - name: Copy Repository Contents
      uses: actions/checkout@v2
    - name: commit changes
      run: |
        git config --global "${GH_EMAIL}"
        git config --global "${GH_USERNAME}"
        # git checkout -B fastpages-automated-setup
        git add
        git commit -m'Reset README'
        git push
        GH_EMAIL: ${{ github.event.commits[0] }}
        GH_USERNAME: ${{ github.event.commits[0].author.username }}
  • a simple script that lets you upload one or more zip files as part of a push; if the commit message starts with Unzip, in response to the commit, unzip the committed zip files, and then delete the .zip archive files you had just committed, pushing the unzipped files in their place.
name: Unzip
    - '**.zip'

    if: startsWith(github.event.commits[0].message, 'Unzip')
    runs-on: ubuntu-latest

    - name: Copy Repository Contents
      uses: actions/checkout@v2
        fetch-depth: 2

    - name: handle zip
      run: |
        git config --global "${GH_EMAIL}"
        git config --global "${GH_USERNAME}"
        for f in $(git diff HEAD^..HEAD --no-commit-id --name-only | grep -E '.zip$')
              echo $f
              fn=`unzip $f | grep -m1 'creating:' | cut -d' ' -f5-`
              echo $fn
              git rm $f
              git add $fn
              git commit -m"unzip $f"
        git push
        GH_EMAIL: ${{ github.event.commits[0] }}
        GH_USERNAME: ${{ github.event.commits[0].author.username }}

I did also wonder about whether it would be possible to implement something like Adventure, played by issuing instructions through Git commit messages and maybe updating the readme with the game response to each step… Stepping through the hstory of READMEs would be your game transcript…

Are fastpages Really an EASY Way to Publish a Blog From Jupyter Notebooks?

I tried to submit this to the discourse forum, having been invited to do so, but after handing over credentials to get an account so I could log in, then having to go to my email client to click the confirmation code, then not being able to create a new topic (new user policy, maybe?) then having my post quarantined and my account largely suspended, I thought I’d post the text of the post here (glad I took a copy…)

(I appreciate that my behaviour / attitude around this may be seen as both childish and churlish,  but I was originally riled by the “easy” hype around fastpages (because I don’t think it necessarily is easy for anyone other than a particularly select population…) and since then, things have just gone downhill in terms of ease of use / communication!;-) “Just do X” has (just) so much baggage associated with it…

Original post and replies thread

Picking up on a Twitter thread, some comments around the “fastpages supports really easy Jupyter blogging” effusiveness on Twitter.

(Note this isn’t meant to be hostile, it’s meant to be usefully critical ;-)

For any seasoned Github user and developer who’s also been responsible for maintaining documentation sites using Jekyll, fastpages “just” requires folk to use Github and Jekyll style publishing to publish a blog site from notebook files and markdown docs.

For anyone familiar with Github, git, and Jekyll publishing, the fastpages automation simplifies some of the faff required in getting that stuff working. (Other approaches, such as Jupyter Book, ipypublish and nbsphinx offer related publishing routes but less hype. A proper comparison of all the approaches might be useful…)

So if you’re familiar with Github and Jekyll, the benefits are quite possibly both clear and enticing. But if you aren’t a Github user or a Jekyll user, things are pretty much as opaque as every they were.

The fastpages mechanic of generating a PR on the first commit generated when cloning the template repo is really neat, and an idea I’ll likely steal. But for a novice, without mental model of how Github works, this doesn’t in and of itself make things that much easier. The naive user is faced with a complex UI, using complex jargon, and probably doesn’t know where to go looking for the PR, how to handle it, what it means when they do handle it, etc etc.

The file listing on the master home page you’re faced with when cloning the repo is also intimidating. There are a lot of files, there’s lots of directory names starting with scary underscores, lots of `.whatever` hidden files. That’s fine if you’re creating a workflow that’s “easy” for folk who are happy with all this stuff, but if the claim is that this is an “easy route into blogging with Jupyter” in general, it isn’t.

One of the attractive features of the Jupyter notebook UI and infrastructure is that someone with little technical knowledge on the command line can quickly start using magics and high level commands, a line at a time, to get stuff done. Just because someone can plot a chart a from a pandas data frame populate[d] from a loaded in CSV file doesn’t necessarily mean they know how to set up the Jupyterhub server they’re actually a user of, nor even how to install pandas into the environment they’re using. As a *user*, why should they? The same goes for their familiarity, or otherwise, with Github and Jekyll. (By the by, it’s probably best to leave the “but they ought to…” arguments aside…)

I’m all for folk developing skills, but onboarding is really hard. And oftentimes, when trying to persuade people to adopt new tech in conservative institutions, you only get infrequent opportunities to entice them in. If you claim something is easy, that you “just” this and that, then watch their face as confusion and terror reigns, and you’ve lost your conversion opportunity. They won’t try again.

To make things *really* easy means taking things much slower. Cloning the repo and showing a clean page with a very simple set of instructions, and all the scary stuff hidden in branches, provides an opportunity for generating an easy way in. The initial readme could provide a set of very clear instructions about setting up tokens etc, along with why they’re necessary (eg Stephen Downes had a go at simplifying them [here, part 1]( and [here, part 2](

Things would also be simple if the all[simpler if all] the Jekyll scaffolding were hidden away somewhere, and the user could just slowly introduce things into the top level directory, the homepage for their blog source files, with all the scaffolding hidden away and built on via branches.

This level of simplicity may or may not be desirable for a (semi-)professional, if ad hoc, tool, but if the desire is to find a way to make it easier for novices (to Github, to Jekyll) to publish in what is still quite a low level way, I think more scaffolding is required. (A limiting case of easy is probably to just click a button on your Jupyter notebook and have the file posted somewhere, from where it magically then appears on a public URL.)

Inspired by the initial commit handling Github Action, I started some baby steps explorations of a way of making “performative” Github commit actions ([action-steps]( that might (or might not!) make things simpler for a novice user (they also run the risk of them developing bad mental models, but I’m just exploring ideas).

For example, you might encourage someone via the readme to create a new file from the Github web UI with a particular filename or particular commit message, and then handle that in a particular way, perhaps updating the README with the next step; this might include some description of how you could then compare the original readme with the updated one. (I did start wondering whether I could code Adventure to be played via commit messages! Has that been done before I wonder?)

You might have additional commit messages that introduce new files into the top level repo, a file at a time. (Where to put simple documentation describing commit performative commands would be another issue!)

I appreciate this is probably *not* how Github is traditionally used, where a principle of least surprise about what appears in the repo compared to the files you actually commit is a sound one (that said, a lot of workflows do make use of commit hooks that do change files…) But I would argue that using Github for the primary purpose of making use of its Github Pages publishing mechanism is not using Github in a traditional version control application way either. Version control is NOT the aim. So what I’m thinking of here is where the user can instruct Git to add in very particular new files at particular times in response to particular commands issued via a particular commit message for a particular reason: to allow them to incrementally develop the complexity of their environment from within the environment as they grow familiar with it. Along the way, the mechanism could coach an introduce the user to features of Github that may be useful in a blogging context, such as the ability to “track changes” and maintain different versions of a content as you draft it etc. This would then introduce them to version control as a side effect of them developing particular blogging workflow practices in an environment that can coach them as they use it.

This may all just be nonsense, of course!

For some definition of “just”…

Fragment: Hard to Use OpenLearn OU-XML to Markdown Tool, If You Fancy Trying It…

Over the years, I’ve dabbled on and off with OU-XML, the XML document format that OU and OpenLearn texts are mastered in. Over the last year I’ve been exploring convertng OU-XML to the simple markdown text format (eg here).

There are a several advantages to using markdown: firstly, it’s a simple text format; secondly, you can open and edit markdown docs in a Jupyter notebook UI via Jupytext; thirdly, there are well proven (though still fiddly…) workflows for publising websites from markdown source docs (eg on of my experiments here).

As to why editing markdown docs in a notebook UI is useful: for one, you can edit — and preview — Latex, which means you can write maths equations and chemical formulae in a simple text way; for another, you can add code into your document that can embed interactives: for example, my folium magic lets you embed maps with markers or shaperfiles in to the document with a single, relatively straightforward, one-liner; or code to generate charts from data; or create simple interactive applications using ipywidgets. And so on. In short, the notebook is a medium that affords you lots of possibilities for incorporating generated, as well as interactive, content.

Following a proviocation by Marco Kalz / @mkalz yesterday, I cobbled together various bits of code into this repo — innovationOUtside/open-ouxml-tools — which doubles as the src for an installable Python package’n’CLI, that lets you:

  • download and grab the OU-XML for an OpenLearn unit, along with all its image assets, into a SQLite database;
  • generate a set of markdown files from the SQLite database.

With the single test unit I tried it on, it seems to work okay in MyBinder (just click on the button on the repo homepage, than click on the file when the notebook UI loads).

To get the files out, the nbarchive extension is preinstalled into the Binderised environment so you should be able to zip and export the all the generated files.

They could then be uploaded into a clone of something like ouseful-template-repos/oer-md-publish for autopublishing. (That example uses CircleCI as per this). I’ll try to figure out a Github Action way of doing something similar over the next few days, perhaps in a repo that will also grab a specified OpenLEarn unit for you (eg by using a Git commit performative CLI call, for example…?!;-)

Note that I’m still not claiming that this is easy, but I think the pieces are there if anyone wants to work through it and try it out. If folk do play with it, I’m more likely to try to make it a bit easier. But I know that because it isn’t easy, most folk won’t try it. (S’like a built in defense mechanism for me; matched time. If no-one else bothers, I don’t have to either… So if you want this thing to become real, you have to invest time into it now, too…)

PS I’m working on a new way of introducing recipes like this, as TINEWY (tin yui) ones: There Is No Easy Way Yet.

Fragment: Using Git Commit Messages as a Command Line?

Pondering the way in which the fastai/fastpages repo (as described here) generates a PR from the first commit after the repo is cloned, I started pondering this:

name: Setup
on: push

    if: (github.event.commits[0].message == 'Initial commit') && (github.run_number == 1)
    runs-on: ubuntu-latest

    - name: Set up Python
      uses: actions/setup-python@v1
        python-version: 3.6

    - name: Copy Repository Contents
      uses: actions/checkout@v2
    - name: modify files
      run: |
        import re, os
        from pathlib import Path
        nwo = os.getenv('GITHUB_REPOSITORY')
        username, repo_name = nwo.split('/')
        readme_template_path = Path('')
        readme_path = Path('')
        config_path = Path('_config.yml')
        pr_msg_path = Path('')
        assert readme_template_path.exists(), 'Did not find in the current directory!'
        assert readme_path.exists(), 'Did not find in the current directory!'
        assert config_path.exists(), 'Did not find _config.yml in the current directory!'
        assert pr_msg_path.exists(), 'Did not find in the current directory!'
        # replace content of README with template
        readme = readme_template_path.read_text().replace('{_username_}', username).replace('{_repo_name_}', repo_name)
        # update _config.yml
        cfg = config_path.read_text()
        cfg = re.sub(r'^(github_username: )(fastai)', r'\1{}'.format(username), cfg, flags=re.MULTILINE)
        cfg = re.sub(r'^(baseurl: )("")', r'\1"/{}"'.format(repo_name), cfg, flags=re.MULTILINE)
        cfg = re.sub(r'^(github_repo: ")(fastpages)', r'\1{}'.format(repo_name), cfg, flags=re.MULTILINE)
        cfg = re.sub(r'^(url: "https://)(")', r'\1{}\3'.format(username), cfg, flags=re.MULTILINE)
        # prepare the pr message
        pr = pr_msg_path.read_text().replace('{_username_}', username).replace('{_repo_name_}', repo_name)
      shell: python

    - name: commit changes
      run: |
        git config --global "${GH_EMAIL}"
        git config --global "${GH_USERNAME}"
        git checkout -B fastpages-automated-setup
        git rm CNAME action.yml _checkbox.png
        git rm _notebooks/2020-02-21-introducing-fastpages.ipynb
        git rm .github/workflows/chatops.yaml
        git rm -rf .github/ISSUE_TEMPLATE
        git add _config.yml
        git commit -m'setup repo'
        git push -f --set-upstream origin fastpages-automated-setup
        GH_EMAIL: ${{ github.event.commits[0] }}
        GH_USERNAME: ${{ github.event.commits[0].author.username }}

    - name: Open a PR
      uses: actions/github-script@0.5.0
        github-token: ${{secrets.GITHUB_TOKEN}}
        script: |
          var fs = require('fs');
          var contents = fs.readFileSync('', 'utf8');
                        owner: context.repo.owner,
                        repo: context.repo.repo,
                        title: 'Initial Setup',
                        head: 'fastpages-automated-setup',
                        base: 'master',
                        body: `${contents}`

In particular, the line if: (github.event.commits[0].message == 'Initial commit') got me wondering: what if we use commit messages to perform some other actions?

For example, something I keep wondering about is how to generate Binder environment specifications that can be easily reused. I’ve pondered this before in the context of “Binder base boxes”, the most useful approach (I think) being to define a basebox repo that is prebuilt and then nbgitpull your own repo into it.

Another approach I’ve idly wondered about was a simple script that could generate binder/ setups for you. For example binder_base chemistry might generate you a binder/ directory with apt.txt, requirements.txt and postBuild files preconfigured with packages that are relevant to working with chemistry related content; binder_base astronomy might create you a binder/environment.yml that will pull in a load of astronomy packages. Other switches might let you automatically add-in config info around package installation and setup for various extensions, and so on.

Putting these two together, I can imagine a commit message that would call an action that could:

  • create a domain relevant set of binder/ files;
  • commit them to the repo (if permissions available), or create a PR.

Smithsonian 3D Museum Artefacts

Via an O’Reilly Radar / Four short links post (via my RSS reader, obvs…), I learn about the Smithsonian Open Access site (and from that I remember I used to love the whole GLAM / open api thang. Why did I ever stop playing around with that stuff?)

One area of the site provides a view over datasets (lots of weather/meteorology data?!), another access to 3D models (though no models of skeleton clocks that I could see, as yet?!).

The 3D model viewer — Voyager — is open source (smithsonian/dpo-voyager) and available as a standalone or embedded web component.

There’s also a tool and workflow for creating a “story” around a 3D model that lets you:

  • set the pose of the object;
  • capture a 2D rendering of the object;
  • tweak background settings;
  • annotate the model in 3D space;
  • associate an HTML article with an object so it can be displayed alongside the object in an intergrated view;
  • create an interactive tour that provides “an animated walk through a Voyager scene [consisting of] a number of steps”.

The document JSON based SVX format used by the Smithsonian Voyager “resembles glTF, the standard for serving 3D scenes on the web”.

This might be really interesting thing to explore in the context of refreshing some OpenLearn materials?

PS by the by, following through on some of the glTF stuff, I come across this gallery of glTF models — Sketchfab — and some models from the University of Exeter:  exeterdigitalhumanities. Good to see an HEI getting their warez into public spaces…

Clock Watching

Last week, as something of rather an impulse purchase, I bought a 19th century skeleton clock from a clock shop wandered past, by chance, in Teignmouth (“Tinmuth”, I think?) — Time Flies:

The clock’s back home now, and I’m slowly starting to learn about it (so if I talk nonsense in this post, please feel free to pick me up on it via the comments!):

As a first time clock owner, it’s fascinating trying to set it up. The period is tweaked via the pendulum — lengthen the pendulum and you slow down time (i.e. fix a fast running clock). It seems to be running a bit slow at the moment, so I need to raise the pendulum slightly, but I figure waiting another 18 hours or so to give it another full day’s run to see what the daily error is. (I suspect it’s still getting used to ambient temperatures etc., and settling in after it’s trip home.) There is some (deliberate? consequence of age?) freedom in how the wheels align, and one of those definitely seemed out, so I pushed it back, only to have the clock stop after 20 mins or so as various bits of my tinkering seem to have compounded the wrong way: the energy supply must be sensitively tuned relative to the amount of friction that can be introduced into the system.

Slightly more off-putting was a clunk on the rise to the hour, increasing in frequency, and then a slowing after the hour. There’s a single strike (I guess that’s an example of a complication, unless complications only refer to watches???), so I wondered if it could be something to do with the eccentricity of that; but it had more of a sound of something slipping or giving way, which I fancied might have something to do with the fusee powertrain:

Having emailed a quick audio grab to the clockshop:

a response quickly came back  that, firstly, it was very off-beat, (which I’d been introduced to in the shop as one of the things that could go “wrong” with it), and of less concern that the clunk was likely a thing, perhaps with the fusee mechanism, that would probably start to settle down as the clock found its way and tempered in:

Taking a look at the audio clip in Audacity, it’s easy to see that the tick and the tock were not evenly spaced:

The fix, as I’d been shown in the shop, and clarified via the “Andrew Clayton, Clock Repairs” website, from which the below image was taken, was to “bend the crutch”:

My warped logic for which way to bend the crutch (the bit at the back) was towards from the tock side, figuring that the clock need to spend less time getting back to the centrepoint from that side. So right hand high and push low with the left, counter to the above example.

Things are a bit better now (though a little bit more adjustment is still required), and the clunking seemed to have settled a bit too although it seems to have just come back now the temperature in the house is changing as night falls and the heating does whatever the heating does:

One thing I did notice having got the beat (I thought) sorted was that it really needs setting in situ. I’d got a pretty good beat going with the clock sat on a rug, but when I moved it back it went off again: presumably the level was slightly off one location relative to the other. A small two-asix spirit level is now on my “must-get-one-of-those” list.

Quite a fascinating machine, really, and something to learn the ways of over the days, weeks, months and years. It’s an eight-day wind and needs a service at least every 20 years, apparently…

In passing, and in trying to start looking for services of vocabulary (pretty much all learning is based, in part at least, at getting the vocabulary down and relating that to what you can see, and hear…), I came across various menions of Parliament clocks, named after the short lived Duties on Clocks and Watches Act, 1797, and the idea of a marriage, a clock in a non-original case,  (from a “marriage of unrelated parts”).

There’s a lot there that might be interesting to explore for a story or two, methinks…

And it’s far more interesting than digital tech…