Datadive Reproducibility – Time for a DataBox?

Whilst at the Global Witness “Beneficial Ownership” datadive a couple of weeks ago, one of the things I was pondering  – how to make the weekend’s discoveries reproducible on the one hand, useful as a set of still working legacy tooling on the other – blended into another: how to provide an on-ramp for folk attending the event who were not familiar with the data or the way in which t was provided.

Event facilitators DataKind worked in advance with Global Witness to produce an orientation exercise based around a sample dataset. Several other prepped datasets were also made available via USB memory sticks distributed as required to the three different working groups.

The orientation exercise was framed as a series of questions applied to a core dataset, a denormalised flat 250MB or so CSV file containing just over a million or so rows, with headers. (I think Excel could cope with this – not sure if that was by design or happy accident.)

For data wranglers expert at working with raw datafiles and their own computers, this doesn’t present much of a problem. My gut reaction was to open the datafile into a pandas dataframe in a Jupyter notebook and twiddle with it there; but as pandas holds dataframes in memory, this may not be the best approach, particularly if you have multiple large dataframes open at the same time. As previously mentioned, I think the data also fit into Excel okay.

Another approach after previewing the data, even if just by looking at it on the command line with a head command, was to load the data into a database and look at it from there.

This immediately begs several questions of course  – if I have a database set up on my machine and import the database without thinking about it, how can someone else recreate that? If I don’t have a database on my machine (so I need to install one and get it running) and/or I don’t then know how to get data into the database, I’m no better off. (It may well be that there are great analysts who know how to work with data stored in databases but don’t know how to do the data engineering stuff of getting the database up and running and populated with data in the first place.)

My preferred solution for this at the moment is to see whether Docker containers can help. And in this case, I think they can. I’d already had a couple of quick plays looking at getting the Companies House significant ownership data into various databases (Mongo, neo4j) and used a recipe that linked a database container with a Jupyter notebook server that I could write my analysis scripts in (linking RStudio rather than Jupyter notebooks is just as straightforward).

Using those patterns, it was easy enough to create a similar recipe to link a Postgres database container to a Jupyter notebook server. The next step – loading the data in. Now it just so happens that in the days before the datadive, I’d been putting together some revised notebooks for an OU course on data management and analysis that dealt with quick ways of loading data into a Postgres data, so I wondered whether those notes provided enough scaffolding to help me load the sample core data into a database: a) even if I was new to working with databases, and b) in a reproducible way. The short answer was “yes”. Putting the two steps together, the results can be found here: Getting started – Database Loader Notebook.

With the data in a reproducibly shareable and “live” queryable form, I put together a notebook that worked through the orientation exercises. Along the way, I found a new-to-me HTML5/d3js package for displaying small  interactive network diagrams, visjs2jupyter. My attempt at the orientation exercises can be found here: Orientation Activities.

Whilst I am all in favour of experts datawranglers using their own recipes, tools and methods for working with the data – that’s part of the point of these expert datadives – I think there may also be mileage in providing a base install where the data is in some sort of immediately queryable form, such as in a minimal, even if not properly normalised, database. This means that datasets too large to be manipulated in memory or loaded into Excel can be worked with immediately. It also means that orientation materials can be produced that pose interesting questions that can be used to get a quick overview of the data, or tutorial materials produced that show how to work with off-the-shelf powertool combinations (Jupyter notebooks / Python/pandas / PostgreSQL, for example, or RStudio /R /PostgreSQL ).

Providing a base set up to start from also acts as an invitation to extend that environment in a reproducible way over the course of the datadive. (When working on your own computer with your own tooling, it can be way too easy to forget what packages (apt-get, pip and so on) you have pre-installed that will cause breaking changes to any outcome code you show with others who do not have the same environment. Creating a fresh environment for the datadive, and documenting what you add to it, can help with that, but testing in a linked container, but otherwise isolated, context really helps you keep track of what you needed to add to make things work!

If you also keep track of what you needed to do handle undeclared file encodings, weird separator characters, or password protected zip files from the provided files, it means that others should be able to work with the files in a reliable way…

(Just a note on that point for datadive organisers – metadata about file encodings, unusual zip formats, weird separator encodings etc is a useful thing to share, rather than have to painfully discover….)

Using tools like Docker is one way of improving the shareability of immediately queryable data, but is there an even quick way? One thing I want to explore on my to do list is the idea of a “databox”, a Raspberry Pi image that when booted runs a database server and Jupyter notebook (or RStudio) environment. The database can be pre-seeded with data for the datadive, so all that should be required is for an individual to plug the Raspberry Pi into their computer with an ethernet cable, and run from there. (This won’t work for really large datasets – the Raspberry Pi lacks grunt – but it’s enough to get you started.)

Note that these approaches scale out to other domains, such as data journalism projects (each project on its own Raspberry PI SD card or docker-compose setup…)

Jupyter Notebooks as Part of a Publishing System – “Executable” Inline Maths and Music Notations

One of the books I’m reading at the moment is Michael Hiltzik’s Dealers of Lightning: Xerox PARC and the Dawn of the Computer Age (my copy is second hand, ex-library stock…), birthplace to ethernet and the laser printer, as well as many of the computer user interactions we take for granted today. One thing I hadn’t fully appreciated was Xerox’s interests in publishing systems, which is in part what put it in mind for this post. The chapter I just finished reading tells of their invention of a modeless, WYSIWYG word processor, something that would be less hostile than the mode based editors of the time (I like the joke about accidentally entering command mode and typing edit – e: select entire document, d: delete selection, i:insert, t: the letter inserted. Oops – you just replaced your document with the letter t).

It must have been a tremendously exciting time there, having to invent the tools you wanted to use because they didn’t exist yet (some may say that’s still the case, but in a different way now, I think: we have many more building blocks at our disposal). But it’s still an exciting time, because while a lot of stuff has been invented, whether or not there is more to come, there are still ways of figuring out how to make it work easier, still ways of figuring out how to work the technology into our workflows in more sensible way, still many, many ways of trying to figure out how to use different bits of tech in combination with each other in order to get what feels like much more than we might reasonably expect from considering them as a set of separate parts, piled together.

One of the places this exploration could – should – take place is in education. Whilst at HE we often talk down tools in place of concepts, introducing new tools to students provides one way of exporting ideas embodied as tools into wider society. Tools like Jupyter notebooks, for example.

The  more I use Jupyter notebooks, the more I see their potential as a powerful general purpose tool not just for reproducible research, but also as general purpose computational workbench and as a powerful authoring medium.

Enlightened publishers such as O’Reilly seem to have got on board with using interactive notebooks in a publishing context (for example, Embracing Jupyter Notebooks at O’Reilly) and colleges such as Bryn Mawr in the US keep coming up with all manner of interesting ways of using notebooks in a course context – if you know of other great (or even not so great) use case examples in publishing or education, please let me know via the comments to this post – but I still get the feeling that many other people don’t get it.

“Initially the reaction to the concept [of the Gypsy, GUI powered wordprocessor that was to become part of the Ginn publishing system] was ‘You’re going to have to drag me kicking and screaming,'” Mott recalled. “But everyone who sat in front of that system and used it, to a person, was a convert within an hour.”
Michael Hiltzik, Dealers of Lightning: Xerox PARC and the Dawn of the Computer Age, p210

For example, in writing computing related documents, the ability to show a line of code and the output of that code, automatically generated by executing the code, and then automatically inserted into the document, means that when writing code examples, “helpful corrections” by an over-zealous editor go out of the window. The human hand should go nowhere near the output text.


Similarly when creating charts from data, or plotting equations: the charts should be created from the data or the equation by running a script over a source dataset, or plotting an equation directly.


Again, the editor, or artist, should have no hand in “tweaking” the output to make it look better.

If the chart needs restyling, the artist needs to learn how to use a theme (like this?!) or theme generator rather then messing around with a graphics package (wrong sort of graphic). To add annotations, again, use code because it makes the graphic more maintainable.


We can also use various off-the-shelf libraries to generate HTML/Javascript fragments for creating inline interactives that can be embedded within the notebook, or saved and then reused elsewhere.


There are also several toolkits around for creating other sorts of diagram from code, as I’ve written about previously, such as the tools provided on


Aside from making diagrams more easily maintainable, rendering them inline within a Jupyter notebook that also contains the programmatic “source code” for the diagram, written diagrams also provide a way in to the automatic generation of figure londesc text.

Electrical circuit schematics can also be written and embedded in a Jupyter notebook, as this Schemdraw example shows:


So far, I haven’t found an example of a schematic plotting library that also allows you to simulate the behaviour of the circuit from the same definition though (eg I can’t simulate(d, …) in the above example, though I could presumably parameterise a circuit definition for a simulation package and use the same parameter values to label a corresponding Schemdraw circuit).

There are some notations that are “executable”, though. For example, the sympy (symbolic Python) package lets you write texts using python variables that can be rendered either as a symbol using mathematical notation, or by their value.


(There’s a rendering bug in the generated Mathjax in the notebook I was using – I think this has been corrected in more recent versions.)

We can also use interactive widgets to help us identify and set parameter values to generate the sort of example we want:


Sympy also provides support for a wide range of calculations. For example, we can “write” a formula, render it using mathematical notation, and then evaluate it. A Jupyter notebook plugin (not shown) allows python statements to be included and executed inline, which means that expressions and calculations can be included – and evaluated – inline. Changing the parameters in an example is then easy to achieve, with the added benefit that the guaranteed correct result of automatically evaluating the modified expression can also be inlined.


(For interactive examples, see the notebooks in the sympy folder here; the notebooks are also runnable by launching a mybinder container – click on the launch:binder button to fire one up.) 

It looks like there are also tools out there for converting from LateX math expressions to sympy equivalents.

As well as writing mathematical expressions than can be both expressed using mathematical notation, and evaluated as a mathematical expression, we can also write music, expressing a score in notational form or creating an admittedly beepy audio file corresponding to it.


(For an interactive example, run the midiMusic.ipynb notebook by clicking through on the launch:binder button from here.)

We can also generate audio files from formulae (I haven’t tried this in a sympy context yet, though) and then visualise them as data.


Packages such as librosa also seem to provide all sorts of tools for analysing an visualising audio files.

When we put together the Learn to Code MOOC for FutureLearn, which uses Jupyter notebooks as an interactive exercise environment for learners, we started writing the materials in (web pages for the FutureLearn teaching text, notebooks for the interactive exercises) in Jupyter notebooks. The notebooks can export as markdown, the FutureLearn publishing systems is based around content entered as a markdown, so we should have been able to publish direct from the notebooks to FutureLearn, right? Wrong. The workflow doesn’t support it: editor takes content in Microsoft Word, passes it back to authors for correction, then someone does something to turn it into markdown for FutureLearn. Or at least, that’s the OU’s publishing route (which has plenty of other quirks too…).

Or perhaps will be was the OU’s publishing route, because there’s a project on internally (the workshops around which I haven’t been able to make, unfortunately) to look at new authoring environments for producing OU content, though I’m not sure if this is intended to feed into the backend of the current route – Microsoft Word, Oxygen XML editor, OU-XML, HTML/PDF etc output – or envisages a different pathway to final output. I started to explore using Google docs as an OU XML exporter, but that raised little interest – it’ll be interesting to see what sort of authoring environment(s) the current project delivers.

(By the by, I remember being really excited about the OU-XML a publishing system route when it was being developed, not least because I could imagine its potential for feeding other use cases, some of which I started to explore a few years later; I was less enthused by its actual execution and the lack of imagination around putting it to work though… I also thought we might be able to use FutureLearn as a route to exploring how we might not just experiment with workflows and publishing systems, but also the tech – and business models around the same – for supporting stateful and stateless interactive, online student activities. Like hosting a mybinder style service, for example, or embedded interactions like the O’Reily Thebe demo, or even delivering a course as a set of linked Jupyter notebooks. You can probably guess how successful that’s been…)

So could Jupyter notebooks have a role to play in producing semi-automated content (automated, for example in the production of graphical objects and the embedding of automatically evaluated expressions)? Markdown support is already supported and it shouldn’t take someone too long (should it?!) to put together an nbformat exporter that could generate OU-XML (if that is still the route we’re going?)? It’d be interesting to hear how O’Reilly are getting on…

Whatever, again…

Simple Demo of Green Screen Principle in a Jupyter Notebook Using MyBinder

One of my favourite bits of edtech  in the form of open educational technology infrastucture at the moment is mybinder (code), which allows you to fire up a semi-customised Docker container and run Jupyter notebooks based on the contents of a github repository. This makes is trivial to share interactive, Jupyter notebook demos, as long as you’re happy to make your notebooks public and pop them into github.

As an example, here’s a simple notebook I knocked up yesterday to demonstrate how we could created a composited image from a foreground image captured against a green screen, and a background image we wanted to place behind our foregrounded character.

The recipe was based on one I found in a Bryn Mawr College demo (Bryn Mawr is one of the places I look to for interesting ways of using Jupyter notebooks in an educational context.)

The demo works by looking at each pixel in turn in the foreground (greenscreened) image and checking its RGB colour value. If it looks to be green, use the corresponding pixel from the background image in the composited image; if it’s not green, use the colour values of the pixel in the foreground image.

The trick comes in setting appropriate threshold values to detect the green coloured background. Using Jupyter notebooks and ipywidgets, it’s easy enough to create a demo that lets you try out different “green detection” settings using sliders to select RGB colour ranges. And using mybinder, it’s trivial to share a copy of the working notebook – fire up a container and look for the Green screen.ipynb notebook: demo notebooks on mybinder.


(You can find the actual notebook code on github here.)

I was going to say that one of the things I don’t think you can do at the moment is share a link to an actual notebook, but in that respect I’d be wrong… The reason I thought was that to launch a mybinder instance, eg from the psychemedia/ou-tm11n github repo, you’d use a URL of the form; this then launches a container instance at a dynamically created location – eg http://SOME_IP_ADDRESS/user/SOME_CONTAINER_ID – with a URL and container ID that you don’t know in advance.

The notebook contents of the repo are copied into a notebooks folder in the container when the container image is built from the repo, and accessed down that path on the container URL, such as http://SOME_IP_ADDRESS/user/SOME_CONTAINER_ID/notebooks/Green%20screen%20-%20tm112.ipynb.

However, on checking, it seems that any path added to the mybinder call is passed along and appended to the URL of the dynamically created container.

Which means you can add the path to a notebook in the repo to the notebooks/ path when you call mybinder – – and the path will will passed through to the launched container.

In other words, you can share a link to a live notebook running on dynamically created container – such as this one – by calling mybinder with the local path to the notebook.

You can also go back up to the Jupyter notebook homepage from a notebook page by going up a level in the URL to the notebooks folder, eg .

I like mybinder a bit more each day:-)

Making Music and Embedding Sounds in Jupyter Notebooks

It’s looking as if the new level 1 courses won’t be making use of Jupyter notebooks (unless I can find a way of sneaking them in via the single unit I’be put together!;-) but I still think they’re worth spending time exploring for course material production as well as presentation.

So to this end, as I read through the materials being drafted by others for the course, I’ll be looking for opportunities to do the quickest of quick demos, whenever the opportunity arises, to flag things that might be worth exploring more in future.

So here’s a quick example. One of the nice design features of TM112, the second of the two new first level courses, is that it incorporates some mimi-project activities for students work on across the course. One of the project themes relates to music, so I wondered what doing something musical in a Jupyter notebook might look like.

The first thing I tried was taking the outlines of one of the activities – generating an audio file using python and MIDI – to see how the embedding might work in a notebook context, without the faff of having to generate an audio file from python and then find a means of playing it:


Yep – that seems to work… Poking around music related libraries, it seems we can also generate musical notation…


In fact, we can also generate musical notation from a MIDI file too…


(I assume the mappings are correct…)

So there may be opportunities there for creating simple audio files, along with the corresponding score, within the notebooks. Then any changes required to the audio file, as well as the score, can be effected in tandem.

I also had a quick go at generating audio files “from scratch” and then embedding the playable audio file



That seems to work too…

We can also plot the waveform:


This might be handy for a physics or electronics course?

As well as providing an environment for creating “media-ful” teaching resources, the code could also provide the basis of interactive student explorations. I don’t have a demo of any widget powered examples to hand in a musical context (maybe later!), but for now, if you do want to play with the notebooks that generated the above, you can do so on mybinder – – in the midiMusic.ipynb and Audio.ipynb notebooks. The original notebooks are here:

PS this looks like it could be handy:, a Jupyter notebook extension to render abc notation, h/t Antti Kaihola.

Accessible Jupyter Notebooks?

Pondering the extent to which Jupyter notebooks provide an accessible UI, I had a naive play with the Mac VoiceOver app run over Jupyter notebooks the other day: markdown cells were easy enough to convert to speech, but the code cells and their outputs are nested block elements which seemed to take a bit more navigation (I think I really need to learn how to use VoiceOver properly for a proper test!). Suffice to say, I really should learn how to use screen-reader software, because as it stands I can’t really tell how accessible the notebooks are…

A quick search around for accessibility related extensions turned up the jupyter-a11y: reader extension [code], which looks like it could be a handy crib. This extension will speak aloud a the contents of a code cell or markdown cell as well as navigational features such as whether you are in the cell at the top or the bottom of the page. I’m not sure it speaks aloud the output of code cell though? But the code looks simple enough, so this might be worth a play with…

On the topic of reading aloud code cell outputs, I also started wondering whether it would be possible to generate “accessible” alt or longdesc text for matplotlib generated charts and add those to the element inserted into the code cell output. This text could also be used to feed the reader narrator. (See also First Thoughts on Automatically Generating Accessible Text Descriptions of ggplot Charts in R for some quick examples of generating textual descriptions from matplotlib charts.)

Another way of complementing the jupyter-a11y reader extension might be to use the python pindent [code] tool to annotate the contents of code cells with accessible comments (such as comments that identify the end of if/else blocks, and function definitions). Another advantage of having a pindent extension to annotate the content of notebook python code cells is that it might help improve the readability of code for novices. So for example, we could have a notebook toolbar button that will toggle pindent annotations on a selected code cell.

For code read aloud by the reader extension, I wonder if it would be worth running the content of any (python) code cells through pindent first?

PS FWIW, here’s a related issue on Github.

PPS another tool that helps make python code a bit more accessible, in an active sense, in a Jupyter notebook is this pop-up variable inspector widget.

Querying Panama Papers Neo4j Database Container From a Linked Jupyter Notebook Container

A few weeks ago I posted some quick doodles showing, on the one hand, how to get the Panama Papers data into a simple SQLite database and in another how to link a neo4j graph database to a Jupyter notebook server using Docker Compose.

As the original Panama Papers investigation used neo4j as its backend database, I thought putting the data into a neo4j container could give me the excuse I needed to start looking at neo4j.

Anyway, it seems as if someone has already pushed a neo4j Docker container image preseeded with the Panama Papers data, so here’s my quickstart.

To use it, you need to have Docker installed, download the docker-compose.yaml file and then run:

docker-compose up

If you do this from a command line launched from Kitematic, Kitematic should provide you with a link to the neo4j database, running on the Docker IP address and port 7474. Log in with the default credentials ( neo4j / neo4j ) and change the password to panamapapers (all lower case).

Download the quickstart notebook into the newly created notebooks directory, and you should be able to see it from the notebooks homepage on Docker IP address port 8890 (or again, just follow the link from Kitematic).

image: ryguyrg/neo4j-panama-papers
image: jupyter/scipy-notebook
image: rocker/rstudio
# image: jupyter/r-notebook
# ports:
# – "8889:8888"
# links:
# – neo4j:neo4j
# volumes:
# – ./notebooks:/home/jovyan/work

view raw
hosted with ❤ by GitHub

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

view raw
hosted with ❤ by GitHub

I’m still trying to find my way around both the py2neo Python wrapper and the neo4j Cypher query language, so the demo thus far is not that inspiring!

And I’m not sure when I’ll get a chance to look at it again…:-(

Using IPython on Lego EV3 Robots Running Ev3Dev

In part so I don’t lose the recipe, here are some notes for getting up and running with IPython on a Lego EV3 brick.

The command lines are prefixed to show whether we’re running them on the Mac or the brick…

To start with, we need to flash a microSD card with an image of the ev3dev operating system. The instructions are on the ev3dev site – Writing an SD Card Image Using Command Line Tools on OS X. – , but I repeat the ones for a Mac here for convenience.

  1. Download an image from the repository – I used the ev3dev-jessie-2015-12-30 image because the current development version is in a state of flux and the python bindings don’t necessarily work with it just at the moment…
  2. Assuming you have downloaded the file to your home Downloads directory (that is, ~/Downloads), launch a new terminal and run: cd Downloads
    [Mac] unzip
    [Mac] diskutil list
  3. Put the microSD card (at least 2GB, but no more than 16GB, I think? I used a 4GB microSD (HC) card) into an SD card adapter and pop it into the Mac SD card reader. Check that it’s there and what it’s called; (you’re looking for the new disk…):
    [Mac] diskutil list
  4. Now unmount the new disk corresponding the the SD card: diskutil unmountDisk /dev/disk1s1Downloads_—_robot_ev3dev____—_ssh_—_112×25
    If you don’t see the /dev/desk1 listing, check that the write protect slider on your SD card holder isn’t in the write protect position.
  5. We’re going to write some raw bits to the card (/dev/disk1 in my example), so we need to write to /dev/rdisk1 (note the r before the disk name). The write will take some time – 5 minutes or so – but if you’re twitchy, ctrl-T will show you a progress message). You’ll also need to enter your Mac password. (NOTE: if you use the wrong disk name, you can completely trash your computer. So be careful;-)
    [Mac] sudo dd if=~Downloads/ev3-ev3dev-jessie-2015-12-30.img of=/dev/rdisk1 bs=4m
    GOTCHA: when I tried, I got a Permission Denied message at first. It seems that for some reason my Mac thought the SD card was write protected. On the SD card adapter is a small slider that sets the card to “locked” or “unlocked”. The SD card reader in the Mac is a mechanical switch that detects which state the slider is in and whether the card is locked. I don’t know if it was a problem with the card adapter or the Mac reader, but I took the card reader out of the Mac, changed the slider setting, put card reader back in, and did the unmount and then sudo dd steps again. It still didn’t work, so I took the card out again, put the slider back the other way, and tired again. This time it did work.
  6. Once the copy is complete, take the SD card adapter out, remove the microSD car and put in in the EV3 brick. Start the EV3 brick in the normal way.

Hopefully, you should the brick boot into the Brickman UI (it may take two or three minutes, include a period when the screen is blank and the orange lights are ticking for a bit…)

Navigate the brick settings to the Networks menu, select Wifi and scan for your home network. Connect to the network (the password settings will be saved, so you shouldn’t have to enter them again).

By default, the IP address of the brick should be set to To make life easier, I set up passwordless ssh access to the brick. Using some guidance in part originally from here, I accessed the brick from my Mac terminal and set up an ssh folder. First, log in to the brick from the Mac terminal:

[Mac] ssh robot@

When prompted, the password for the robot user is maker.


This should log you in to the brick. Run the following command to create an ssh directory into which the shh key will be placed, and then exit the brick commandline to go back to the Mac command line.

[Brick] install -d -m 700 ~/.ssh
[Brick] exit

Create a new key on your Mac, and then copy it over to the brick:

[Mac] ssh-keygen -R
[Mac] cat ~/.ssh/ | ssh robot@ 'cat > .ssh/authorized_keys'
You will be prompted again for the password – maker – but next time you try to log in to the brick, you should be able to do so without having to enter the password. (Instead, the ssh key will be used to get you in.)


If you login to the brick again – from the Mac commandline, run:

[Mac] ssh robot@

you should be able to run a simple Python test program. Attach a large motor to input A on the brick. From the brick commandline, run:

[Brick] python

to open up a Python command prompt, and then enter the following commands to use the preinstalled ev3dev-lang-python Python bindings to run the motor for a few seconds:

[Python] import ev3dev.ev3 as ev3
[Python] m = ev3.LargeMotor('outA')
[Python] m.run_timed(time_sp=3000, duty_cycle_sp=75)


[Python] exit

to exit from the Python interpreter.

Now we’re going to install IPython. Run the following commands on the brick command line (update, but DO NOT upgrade the apt packages). If prompted for a password, it’s still maker:

[Brick] sudo apt-get update
[Brick] sudo apt-get install -y ipython

You should now be able to start an IPython interpreter on the brick:

[Brick] ipython

The Python code to test the motor should still work (hit return it you find you are stuck in a code block). Typing:

[Brick] exit

will take you out of the interpreter and back to the brick command line.

One of the nice things about IPython is that we can connect to it remotely. What this means is that I can start an IPython editor on my Mac, but connect it to an IPython process running on the brick. To do this, we need to install another package:

[Brick] sudo apt-get install -y python-zmq

Now you should be able to start an IPython process on the brick that we can connect to from the Mac:

[Brick] ipython kernel

The process will start running and you should see a message of the form:

To connect another client to this kernel, use:
--existing kernel-2716.json

This file contains connections information about the kernel.

Now open another terminal on the Mac, (cmd-T in the terminal window should open a new Terminal tab) and let’s see if we can find where that file is. Actually, here’s a crib – in the second terminal window, go into the brick:

[Mac] ssh robot@

And then on the brick command line in this second terminal window, show a listing of a the directory that should contain the connection file:

[Brick] sudo ls /home/robot/.ipython/profile_default/security/

You should see the –existing kernel-2716.json file (or whatever it’s exactly called) there. Exit the brick command line:

[Brick] exit

Still in the second terminal window, and now back on the Mac command, copy the connection file from the brick to your current directory on the Mac:

[Mac] scp robot@ ./

If you have IPython installed on you Mac, you should now be able to open an IPython interactive terminal on the Mac that is connected to the IPython kernel running on the brick:

[Mac] ipython console --existing ${PWD}/kernel-2716.json --ssh robot@

You should be able to execute the Python motor test command as before (remember to import the necessary ev3dev.ev3 package first).

Actually, when I ran the ipython console command, the recent version of jupyter on my Mac gave me a depreaction warning which means I would have been better running:

[Mac] jupyter console --existing ${PWD}/kernel-2716.json --ssh robot@

So far so good – can we do any more with this?

Well, yes, a couple of things. When starting the IPython process on the brick, we could force the name and location of the connection file:

[Mac] ipython kernel -f /home/robot/connection-file.json

Then on the Mac we could directly copy over the connection file:

[Mac] scp robot@ ./

Secondly, we can configure a Jupyter notebook server running on the Mac to so that it will create a new IPython process on the brick for each new notebook.

Whilst you can configure this yourself, it’s possibly easy to make use of the remote_ikernel helper:

[Mac] pip3 install remote_ikernel
[Mac] remote_ikernel manage --add --kernel_cmd="ipython kernel -f {connection_file}" --name="Ev3dev" --interface=ssh --host=robot@

Now when you should be able to connect to a notebook run against an IPython kernel on the brick.


Note that it may take a few seconds for the kernel to connect and the first cell to run – but from then on it should be quite responsive.


To show a list of available kernels for a particular jupyter server, run the following in a Jupyter code cell:

import jupyter_client

PS for ad hoc use, I thought it might be useful to try setting up a quick wifi router that I could use to connect the brick to my laptop in the form of an old Samsung Galaxy S3 android phone (without a SIM card). Installing the Hotspot Control app provided a quick way of doing this… and it worked:-)

PPS for a more recent version of IPython, install it from pip.

If you installed IPython using the apt-get route, uninstall it with:

[Brick] sudo apt-get uninstall ipython

Install pip and some handy supporting tools that pip may well require at some point:

[Brick] sudo apt-get install build-essential python-dev


[Brick] sudo apt-get install python-pip

would run an old version of pippip --version shows 1.5.6 – which could be removed using sudo apt-get remove python-setuptools.

To grab a more recent version, use:

[Brick] wget
[Brick] sudo -H python

which seems to take a long time to run with no obvious sign of progress, and then tidy up:

[Brick] rm

Just to be sure, then update it:

[Brick] sudo pip install --upgrade setuptools pip

which also seems to take forever. Then install IPython:

[Brick] sudo pip install -y ipython

I’m also going to see if I can give IPythonwidgets a go, although the install requirements looks as if it’ll bring down a huge chunk of the Jupyter framework too, and I’m not sure that’s all necessary?

[Brick] sudo pip install ipywidgets

For a lighter install, try sudo pip install --no-deps ipywidgets to install ipywidgets without dependencies. The only required dependencies are ipython>=4.0.0, ipykernel>=4.2.2 and traitlets>=4.2.0;.

The widgets didn’t seem to work at first, but it seems that following a recent update to the Jupyter server on host, it needed a config kick before running jupyter notebook:

jupyter nbextension enable --py --sys-prefix widgetsnbextension

PPS It seems to take a bit of time for the connection to the IPython server to be set up:


The Timeout occurs quite quickly, but then I have to wait a few dozen seconds for the tunnels on ports to be set up. Once this is done, you should be able to run things in a code cell. I usually start with print("Hello World!") ;-)

PPPS For plotting charts:

sudo apt-get install -y python-matplotlib

Could maybe even go the whole hog…

sudo apt-get install -y python-pandas
sudo pip install seaborn

PPPPS Here’s my current build file (under testing atm – it takes about an hour or so…) –, so:
[Mac] scp robot@
[Brick] chmod +x
[Brick] sudo ./

sudo apt-get update
sudo apt-get install -y build-essential python-dev
sudo apt-get install -y python-zmq python-matplotlib python-pandas

sudo -H python
sudo pip install --upgrade setuptools pip

sudo pip install ipython ipykernel traitlets seaborn
sudo pip install --no-deps ipywidgets

PPPPPS to clone the SD card on a Mac, insert the SD card and run:

[Mac] diskutil list
[Mac] sudo dd if=/dev/disk1 of=~/Desktop/my_ev3dev_image.dmg

The corresponding restore (in the process described at the start of this post) would use /my_ev3dev_image.dmg rather than the ev3-ev3dev-jessie-2015-12-30.img image.

PPPPPPS Connecting to remote kernel on brick – start IPyhton kernel on brick:

[Brick] ipython kernel -f /home/robot/test.json

Copy the connection file over to the host:
[Mac] scp robot@ ./

Check the path you copied it to
[Mac] pwd

For me, that returned /Users/ajh59.

Start a console on the host using the existing connection file – use a full, explicit path to the file. Also works with things like Spyder:

[Mac] jupyter console --existing /Users/ajh59/test.json --ssh robot@


Steps Towards Some Docker IPython Magic – Draft Magic to Call a Contentmine Container from a Jupyter Notebook Container

I haven’t written any magics for IPython before (and it probably shows!) but I started sketching out some magic for the Contentmine command-line container I described in Using Docker as a Personal Productivity Tool – Running Command Line Apps Bundled in Docker Containers,

What I’d like to explore is a more general way of calling command line functions accessed from arbitrary containers via a piece of generic magic, but I need to learn a few things along the way, such as handling arguments for a start!

The current approach provides crude magic for calling the contentmine functions included in a public contentmine container from a Jupyter notebook running inside a container. The commandline contentmine container is started from within the notebook contained and uses a volume-from the notebook container to pass files between the containers. The path to the directory mounted from the notebook is identified by a bit of jiggery pokery , as is the method for spotting what container the notebook is actually running in (I’m all ears if you know of a better way of doing either of these things?:-)

The magic has the form:

%getpapers /notebooks rhinocerous

to run the getpapers query (with fixed switch settings for now) and the search term rhinocerous; files are shared back from the contentmine container into the .notebooks folder of the Jupyter container.

Other functions include:

%norma /notebooks rhinocerous
%cmine /notebooks rhinocerous

These functions are applied to files in the same folder as was created by the search term (rhinocerous).

The magic needs updating so that it will also work in a Jupyter notebook that is not running within a container – this should simply be just of case of switching in a different directory path. The magics also need tweaking so we can pass parameters in. I’m not sure if more flexibility should also be allowed on specifying the path (we need to make sure that the paths for the mounted directories are the correct ones!)

What I’d like to work towards is some sort of line magic along the lines of:

%docker psychemedia/contentmine -mountdir /CALLING_CONTAINER_PATH -v ${MOUNTDIR}:/PATH COMMAND -ARGS etc

or cell magic:

%%docker psychemedia/contentmine -mountdir /CALLING_CONTAINER_PATH -v ${MOUNTDIR}:/PATH

Note that these go against the docker command line syntax – should they be closer to it?

The code, and a walked through demo, are included in the notebook available via this gist, which should also be embedded below.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
#Start the notebook server linked to the contentmins container as follows:
#docker-compose up -d
image: jupyter/notebook
- "8899:8888"
- contentmineshare
- ./notebooks:/notebooks
# - ./contentmine:/cmstore
- /var/run/docker.sock:/var/run/docker.sock
privileged: true
# links:
# - contentmine:contentmine
image: psychemedia/contentmine
- /contentmine
view raw docker-compose.yml hosted with ❤ by GitHub

Thinking Around the Edges – Lego EV3 Robots Running Remote Jupyter Kernels

I’ve been pondering the use of Lego EV3 robots on the OU TXR120 residential school again, and after a couple of chats with OU colleagues Jane Bromley (who just won a sci-fi film pitch competition run by New Scientist) and Jon Rosewell (who leads on our level 1 robotics offerings), here’s where I’m at on “visioning” it…?!;-)

The setting for the residential school is 8 groups of 3 students per group working with their own Lego EV3 robot to complete a set of challenges. Although we were hoping to have laptops from which the robots could be programmed wirelessly, it seems as if we might end up with wifi-less desktop machines (which will require tethering the robots to the desktop machine to programme them) with big screens, though I could be wrong.. and we may be able to work around that anyway (a quick trip to Maplin to buy some wifi dongles, for example, or some cables we can plug into a wifi routing switch, for example…) Having laptops would have been a boon for making the room a bit more flexible to work with (but many students have their own laptops…;-) and having a big screen that laptops could be mirror displayed against would possibly improve student experience.

The Lego bricks come with their own firmware, but as described in Pondering New Ways of Programming Lego EV3 Mindstorms Bricks we can also put Linux on them, and make use of a python wrapper library to actually programme the bricks using Python.

In what follows, I’ll refer to the EV3 brick as the client and a laptop or desktop computer that connects to it as the host.

One of the easiest ways to access the Python on the brick is to connect the brick to the network and then ssh in to it – for example (use the correct IP address for the brick):

ssh root@

Each time you ssh in, you have to provide a password (r00tme is the default on my brick), but passwordless ssh can be set up which makes things quicker (you don’t have to enter the password).

To set up passwordless ssh onto the brick, you need to check that an .ssh folder is available in the home directory (~) on the brick for the ssh key. To do this, ssh in to EV3, cd ~ to ensure you’re at home you home, then ls -a to list the directory contents. If there is no .ssh directory, create one (this post suggests install -d -m 700 ~/.ssh).

If you don’t have an ssh key set up on the host machine you want to get passwordless ssh access from to the brick, follow the above link to find out how to create one. Once you do have ssh keys set up on host, run something along the lines of the following, using the IP address of your connected EV3, to copy the key across to the EV3:

cat ~/.ssh/ | ssh root@ 'cat > .ssh/authorized_keys'

You will need to provide the password to execute this command (r00tme for the version of Ev3dev I’m running).

You should now be able to ssh in without a password: ssh root@

Rather than prompting for the password, ssh will use the key provided to log you in.

In my previous post, I described how it was possible to run an old IPython notebook server on the brick that can expose notebooks over the network, although it was a little slow. It’s also possible to ssh in to the brick and run an IPython terminal on the brick:

ssh root@

A third way I’d have expected to work is to access a remote IPython kernel on the brick from an IPython console on my laptop: ssh into the EV3 brick, launch an IPython kernel, and pick up the location of the connection file. For example, if the command:

ssh root@
ipython kernel

to run an IPython kernel on the EV3 responds with something like:

To connect another client to this kernel, use:
    --existing kernel-1129.json

Back on the laptop I’d expect to be able to run:
scp root@ ./
to grab a local copy of the connection file from the brick and copy it over to my host machine (which works) and then use it as an existing connection for a Jupyter console on the laptop host:
jupyter console --existing ./kernel-1129.json --ssh server

But that doesn’t work for some reason? CRITICAL | Could not find existing kernel connection file ./kernel-1129.json


So far, so much noise – the important thing to take away from the above is to get the passwordless ssh set up, because it makes the following possible…

Recall the basic scenario – we want to run code on the brick from a computer. The bricks will run an IPython notebook, but it’s slow. Or we can ssh in to the brick, start an IPython process up, and run things from an IPython command line on the brick.

But what if we could run a Jupyter server on host, making notebooks available via a browser, but use a remote kernel running on the brick?

The remote-ikernel package makes setting up remote kernels that can be accessed from the Jupyter server relatively straightforward to do – install the package on your host machine (eg my laptop) and then run the remote-ikernel command with the settings required:

pip3 install remote_ikernel
remote_ikernel manage --add --kernel_cmd="ipython kernel -f {connection_file}" --name="Ev3dev" --interface=ssh --host=root@

This creates a kernel.json file and places it in the correct location. (On my Mac this was in the directory /Users/MYUSER/Library/Jupyter/kernels/. )

The kernel.json file created for me (/Users/ajh59/Library/Jupyter/kernels/rik_ssh_root_192_168_1_106_ev3dev/kernel.json) was as follows:

  "argv": [
    "ipython kernel -f {host_connection_file}",
  "display_name": "SSH root@ Ev3dev",
  "remote_ikernel_argv": [
    "--kernel_cmd=ipython kernel -f {connection_file}",

Launching the notebook server using jupyter notebook in the normal way should result in the server picking up the remote kernel from the new kernel file.

To list the kernels available, I launched a Jupyter notebook with the normal (local) Python kernel and ran:

from jupyter_client.kernelspec import KernelSpecManager

My newly created on was there, albeit horribly named (the name was also the directory name the kernel.json file was created in): rik_ssh_root_192_168_1_106_ev3dev.

So here’s where we’re at now:

  • a single desktop or laptop computer with passwordless ssh access to a Lego EV3;
  • the desktop or laptop computer runs a Jupyter server;
  • Jupyter notebooks are available that will start up and run code on a remote IPython kernel on the brick.



Where do we need to be? The most immediate need is to have something that works for residential school. This means we need 8 desktop computers and 8 EV3s.

One thing we’d need to do is find a way of marrying computers and EV3s. Specifically, we need to have known IP addresses associated with specific bricks (can we assign a known IP based on MAC address of the wifi dongle attached to the bricks?) and we need to make sure that passwordless ssh works between the computers and the EV3s. The easiest way of doing the latter would be to use the same ssh keypair in every machine, but this would mean student group A might be able to mistakenly connect to the Student Group B’s machine.

One way round it might be to set up a Jupyterhub server that is responsible for managing connections. The Jupyterhub server is a multiuser hub that will set up a Jupyter server for each logged in user. If we set up the Jupyterhub attached the same wifi network as the EV3 bricks, with 8 users, one per student group, each user/student group could then presumably be assigned to one of the bricks. Students would login to the Jupyterhub, and their Jupyter server would be associated with the remote_ikernel kernel.json file for one of the bricks. Anyone who can connect to the local wifi network can then login to the Jupyterhub as one of the group users, launch a notebook, and run code on one of the bricks. This means that students could write notebooks from their own laptops, connected to the network. Wifi-less desktop machines provided for the residential school could presumably be added into the network via a local ethernet cable?

So we’d have something like this:


Does that make sense?

Another Route to Jupyter Notebooks – Azure Machine Learning

In much the same way that the IBM DataScientist Workbench seeks to provide some level of integration between analysis tools such as Jupyter notebooks and data access and storage, Azure Machine Learning studio also provides a suite of tools for accessing and working with data in one location. Microsoft’s offering is new to me, but it crossed my radar with the announcement that they have added native R kernel support, as well as Python 2 and 3, to their Jupyter notebooks: Jupyter Notebooks with R in Azure ML Studio.

Guest workspaces are available for free (I’m not sure if this is once only, or whether you can keep going back?) but there is also a free workspace if you have a (free) Microsoft account.


Once inside, you are provides with a range of tools – the one I’m interested in to begin with is the notebook (although the piepwork/dataflow experiments environment also looks interesting):


Select a kernel:


give your new notebook a name, and it will launch into a new browser tab:


You can also arrange notebooks within separate project folders. For example, create a project:


and then add notebooks to it:

When creating a new notebook, you may have noted an option to View More in Gallery. The gallery includes examples of a range of project components, including example notebooks:


Thinking about things like the MyBinder app, which lets you launch a notebook in a container from a Github account, it would be nice to see additional buttons being made available to let folk run notebooks in Azure Machine Learning, or the Data Scientist Workbench.

It’s also worth noting how user tools – such as notebooks – seem to be being provided for free with a limited amount of compute and storage resource behind them as a way of recruiting users into platforms where they might then start to pay for more compute power.

From a course delivery perspective, I’m often unclear as to whether we can tell students to sign up for such services as part of a course or whether that breaks the service terms?  (Some providers, such as Wakari, make it clear that “[f]or classes, projects, and long-term use, we strongly encourage a paid plan or Wakari Enterprise. Special academic pricing is available.”) It seems unfair that we should require students to sign up for accounts on a “free” service in their own name as part of our offering for a couple of reasons at least: first, we have no control over what happens on the service; second, it seems that it’s a commercial transaction that should be formalised in some way, even if only to agree that we can (will?) send our students to that particular service exclusively. Another possibility is that we say students should make their own service available, whether by installing software themselves or finding an online provider for themselves.

On the other hand, trying to get online services provided at course scale in a timely fashion within an HEI seems to be all but impossible, at least until such a time as the indie edtech providers such as Reclaim Hosting start to move even more into end-user app provision either at the individual level, or affordable class level (with an SLA)…

See also: Seven Ways of Running IPython / Jupyter Notebooks.