Docker Housekeeping _ Removing Old Images and Containers

Some handy commands for tidying up old Docker images and containers…

Remove Named Images

docker rmi `docker images --filter 'dangling=true' -q --no-trunc`

Remove Images With a Particular Name Pattern

Ish via here.

For example, removing repo2docker default named containers which start r2d:

docker images | awk '{ print $1,$2, $3 }' | grep r2d | awk '{print $3 }' | xargs -I {} docker rmi {}

For added aggression, use rmi -f {}.

Remove Containers With a Particular Name Pattern

Via here.

docker ps -a | awk '{ print $1,$2 }' | grep r2d | awk '{print $1 }' | xargs -I {} docker rm {}

Remove Exited Containers

docker ps -a | grep Exit | cut -d ' ' -f 1 | xargs docker rm

PS lots of handy tips here:

BYOA (Bring Your Own Application) – Running Containerised Applications on the Desktop

Imagine this: you’re on a Mac, though at times you’re on a Windows box, and at other times you’ve just got a browser to hand. You’re happy using applications that get stuff done, even if it means using different tools for different purposes, but you get fed up having to find applications that you can install across all the different platforms you work on. Not least because sometimes the things don’t seem to want to install with all sorts of other stuff being installed first.

Or maybe this: you want to run some data analysis using Python, or R, in Jupyter notebooks, or RStudio, and the dataset you want to work with is quite large; which means you also need to hook up to a database, or a query engine, and get the data into a form that the database or query engine can cope with, as well as hooking up your analysis environment (RStudio, or Jupyter notebooks) to what is now a queryable datasource.

So what do you do?

Here’s what I do: I use a technology that’s been developed for another purpose, and I make it personal.

The technology is – are – Docker containers. Docker is a brand, but in part it’s the same way that Hoover, and more recently, Dyson, are brands. Say: Hoover, or Dyson, and you quite possibly mean: vacuum cleaner. Say “Docker”, and you quite possibly mean lightweight virtual machine using Linux Containers, or Open Container Initiative (OCI) containers. But Docker is easier to say/remember. And it also refers to a complete technology stack – and toolset – for working with those containers. (Read more….)

(You can skip this paragraph – it’s not how I use Docker at all…) Docker is most widely used by the people who develop web applications, and who keep web applications running at scale. It’s a devops tech (devops, short for development/operations). The devs hack the code, the ops folk get that code running for users. Traditionally, these were separate teams; now, there is some overlap, particularly if you’re running on agile and continuous integration and/or continuous deployment are your thing. Docker, in short, is something your more curious IT folk may know something about. Or not.

But I don’t care about that. Not least, because they probably don’t think about Docker in the way I do…

What I care about are some of the features that Docker has, and how I can use those features to make my own life easier, irrespective of how the folk who develop Docker, or the folk in IT, think it should be used; which is probably more along the lines of helping enterprise IT deliver enterprise across the whole organisation, rather than supporting personal, DIY, BYOA (“bring your own app”) IT that works at an individual level in the form of end-user applications, or personal digital workbenches (I’ll come on to those in a moment…)

Take the first example mentioned at the start – wanting to run the same application pretty much anywhere, without the installation grief. For example, suppose you wanted to run the OpenRefine, a rather handy browser based tool for cleaning and filtering datasets. Installing OpenRefine requires that you download and install an appropriate version of Java, as well as downloading and installing OpenRefine. What do you do?

Here’s what I do: I use Docker (which yes, does require me to download and install Docker. But I’ll pick up on that later…) And I use Kitematic (which, admittedly, I have to install from the Docker toolbar menu; note I could also launch containers from the command line…).

But first a bit of jargon…

Docker Images versus Docker Containers

One way off thinking of a docker image is as an installer file. It’s the thing you download that you can then use to install your application. In the case of docker, the image file identifies the operating system requirements of the application, as well as the application code itself. The docker application uses this information to launch a docker container, which is an instance of your application. Note that you can run multiple instances of the same image at the same time (that is, multiple containers launched from the same image). This is different to the model of having multiple documents open in the same application, such as multiple spreadsheets or word documents open in Microsoft Excel or Word. Instead, it would be akin to running several different instances of Excel or Word, with the documents open in one copy of Word isolated from and invisible to another running copy of Word.

The containers themselves can also be running or stopped; a stopped container is one that is essentially hibernating, and whose operation can be resumed. Any changes made to the contents of a container are retained if the container is stopped and restarted. Containers can also be destroyed, which is like uninstalling the application and removing all its install files. When a new container is launched, it’s a bit like reinstalling and starting a fresh installation of the application from the original installer (the docker image).


Kitematic is a desktop application (for Mac and Windows) that looks a bit like an ap store.

Using Kitematic, I can search for an OpenRefine image – in this case, ou-tm351-openrefine-test, which happens to be an image I created myself – that has been published via the Dockerhub public docker image registry.

I create the container – which downloads a Docker image that contains a self-contained operating system and all the files that OpenRefine needs to run as well as the application itself – and starts it running.

I copy the URL – in this case localhost:32771, and paste it into my browser location bar:

You may notice the Kitematic page also had a folder highlighted – this is a folder shared between the container and my desktop. Project files will be save to that folder. When I have finished with the application, I can stop it, and then restart it later. Or I can destroy it. This kills the container and removes all its files, but the base image is still on my computer. If I want to run OpenRefine again, I won’t need to download any files. But – and this is important – the application container that is started up will be completely fresh, run as-if for the the first time from the image I downloaded. Like a reinstalled application. Which is effectively what it is. But the files stored on my desktop will be shared back with it.

Stopping/starting or restarting the container is like switching it off and on again. Deleting the container in Kitematic, then searching for it again and creating a new container uses the previously downloaded files and essentially reinstalls the application.

If I want to run another application, I can  – there are lots of images available. For example, if you want to run a local version of the Dillinger markdown editor, you can find the original of that:

Note, however, that here the lack of love Kitematic has received in the last few years – I imagine because Docker are more interested in enterprise use of Docker, not personal use – because if you just try to create the container, it won’t work (you’ll get an error message). Instead, you need to specify the version of the editor you want to use explicitly. Click the dots and in order to select a tag (that is, a version):

Choose the tag/version, and close the selection dialogues – then click on the Create button:


This time, the homescreen is a little more helpful, providing a link that will launch the app in the browser. You may also get a preview of the screen (sometimes this doesn’t appear; click the Settings tab, then the Home tab, and it may refresh. As I mentioned above, Kitematic doesn’t seem to have had a lot of love since Docker bought it…)

Click the link button, and you should be redirected to the app running in your browser:

So that’s part of the story…

Viewing Application UIs and Launching Applications from Shortcuts

As well as using Kitematic to launch containerised applications, we can also use desktop shortcuts. I mentioned how we can use docker commands on the command like to launch an application – what Kitematic does is simply generate and execute those commands, and then display status and configuration information about the running container – so we can just pop equivalent commands into a shortcut, click on the shortcut and launch the application.

This is what Jessie Frazelle does with her Docker Containers on the [Mac] Desktop. The corresponding desktop-container library shelf is here: jessfraz/dockerfiles and associated shortcut definitions here: desktop-container commands. The applications are packaged to use X11 to expose the graphical user interface, so if you want to run RStudio, or Libre Office or Audacity in a desktop-container on a Mac, that could be a great place to start. Bits of guidance on getting up and running with X11 applications on OS/X can be found here: Docker for Mac and GUI applications and here: Bring Linux apps to the Mac Desktop with Docker, although this recipe seems most complete: Containerize GUI Applications on Mac.

Note: in testing, I had to install XQuartz (brew install xquartz), enabling the Security tab preference to Allow connections from network client, and socat (brew install socat), reboot, and then run socat TCP-LISTEN:6000,reuseaddr,fork UNIX-CLIENT:\"$DISPLAY\" (which is blocking) in a terminal on its own; in another terminal, I found the local IP address as ip=$(ifconfig en0 | grep inet | awk '$1=="inet" {print $2}') and could then run a docker command of the form docker run --name firefox -e DISPLAY=$ip:0 -v /tmp/.X11-unix:/tmp/.X11-unix jess/firefox which starts up XQuartz if it’s not already running; note also that on occasion I kept getting situations with the error socat[1739] E bind(5, {LEN=0 AF=2}, 16): Address already in use; the netstat command shows port 60000 waiting, but after a few minutes there’s a time out and the port is freed up; closing the docker app window also leaves the old, named container around, which needs removing with something of the form docker rm firefox before the app can be run again. Note that there is no --device /dev/snd mountpoint for docker in OS/X so no easy way to pass through audio? I’m not sure how well this would work on a Windows machine, but presumably there are X11 clients for Windows that aren’t too painful to install? I’m not sure what socat equivalent would be if the client was only listening privately and needed a tunnel from the open X11 port 6000 to the socket where the client listens for connections? (Nothing needed with xming e.g. as suggested here?

(Hmmm, thinks: a version of something like Kitematic that also bundles an X11 client could be really handy? )

Whereas Jessie Frazelle’s desktop-containers make use of the X11 windowing system to all her host operating system to display the UI of the applications running inside a container, I prefer to use applications that expose their UI via http using a browser. Where this is not possible – as for example in the case of the Audacity audio editor, it is often possible to run the application within the container and then render the desktop running in the container via a browser window using an HTML remote desktop viewer such as Apache Guacamole. Here’s an old proof of concept: Accessing GUI Apps Via a Browser from a Container Using Guacamole.

Creating a Docker Library Shelf

Another part of the story is that you can build your own images and either share them publicly via the Dockerhub registry, keep them locally on your own computer, post them to a private Dockerhub repository (you  get a single private repository as part of the Dockerhub free plan, or can pay for more…), or run your own image registry.

Using this latter option – running your own Docker registry – means you can essentially run your own digital application shelf, although I don’t think you’ll be able to use Kitematic with it without forking your own version of it to use your registry (issue)? Instead, you have to use the command line. Again, the lack of support for “personal users” of Docker is an issue, but that’s another reason to try to blog more recipes and find more workarounds, and maybe get the occasional feature request adopted if we can define a use case well enough…

Alternatively, you can just make the shelf available as a set of Dockerfiles and shortcuts, as Jessie Frazelle has done with her jessfraz/dockerfiles Github repo.

There are a couple more posts to come in this series: one looking at how we can run Docker containers on a remote host (“in the cloud”); another looking at how we can link multiple applications to provide our own linked application, digital workbenches, within which linked applications can easily share files and connections between themselves.

Rolling Your Own Jupyter and RStudio Data Analysis Environment Around Apache Drill Using docker-compose

I had a bit of a play last night trying to hook a Jupyter notebook container up to an Apache Drill container using docker-compose. The idea was to have a shared data volume between the two of them, but I couldn’t for the life of me get that to work using the the docker-compose version 2 or 3 (services/volumes) syntax – for some reason, any of the Apache Drill containers I tried wouldn’t fire up properly.

So I eventually (3am…:-( went for a simpler approach, synching data through a local directory on host.

The result is something that looks like this:

The Apache Drill container, and an Apache Zookeeper container to keep it in check, I found via Dockerhub. I also reused an official RStudio container. The Jupyter container is one I rolled for TM351.

The Jupyter and RStudio containers can both talk to the Apache Drill container, and both analysis apps have access to their own data folder mounted in an application folder in the current directory on host.The data folders mount into separate directories in the Apache Drill container. Both applications can query into data files contained in either data directory as viewable from Apache Drill.

This is far from ideal, but it works. (The structure is as suggested so that RStudio and Jupyter scripts can both be used to download data into a data directory viewable from the Apache Drill container. Another approach would be to mount a separate ./data directory and provide some means for populating it with data files. Alternatively, if the files already exist on host,  mounting the host data directory onto a /data volume in the Apache Drill container would work too.

Here’s the docker-compose.yaml file I’ve ended up with:

  image: dialonce/drill
    - 8047:8047
    - zookeeper
    -  ./notebooks/data:/nbdata
    -  ./R/data:/rdata

  image: jplock/zookeeper

  container_name: notebook-apache-drill-test
  image: psychemedia/ou-tm351-jupyter-custom-pystack-test
    - 35200:8888
    - ./notebooks:/notebooks/
    - drill:drill

  container_name: rstudio-apache-drill-test
  image: rocker/tidyverse
    - PASSWORD=letmein
  #default user is: rstudio
    - ./R:/home/rstudio
    - 8787:8787
    - drill:drill

If you have docker installed and running, running docker-compose up -d in the folder containing the docker-compose.yaml file will launch three linked containers: Jupyter notebook on localhost port 35200, RStudio on port 8787, and Apache Drill on port 8047. If the ./notebooks, ./notebooks/data, ./R and ./R/data subfolders don’t exist they will be created.

We can use the clients to variously download data files and run Apache Drill queries against them. In Jupyter notebooks, I used the pydrill package to connect. Note the hostname used is the linked container name (in this case, drill).

If we download data to the ./notebooks/data folder which is mounted inside the Apache Drill container as /nbdata, we can query against it.

(Note – it probably would make sense to used a modified Apache Drill container configured to use CSV headers, as per Querying Large CSV Files With Apache Drill.)

We can also query against that same data file from the RStudio container. In this case I used the DrillR package (I had hoped to use the sergeant package (“drill sergeant”, I assume?! Sigh..;-) but it uses the RJDBC package which expects to find java installed, rather than DBI, and java isn’t installed in the rocker/tidyverse container I used.) UPDATE: sergeant now works without Java dependency... Thanks, Bob:-)

I’m not sure if DrillR is being actively developed, but it would be handy if it could return the data from the query as a dataframe.

So , getting up and running with Apache Drill and a data analysis environment is not that hard at all, if you have docker installed:-)

PS 8-9/18 – Seems like sergeant has moved on somewhat (Updates to the sergeant (Apache Drill connector) Package & a look at Apache Drill 1.14.0 release) and now lets you make calls from R into Apache Drill running in a disposably launched container: Driving Drill Dynamically with Docker and Updating Storage Configurations On-the-fly with sergeant. See also this Using Apache Drill with R cookbook.

Querying Panama Papers Neo4j Database Container From a Linked Jupyter Notebook Container

A few weeks ago I posted some quick doodles showing, on the one hand, how to get the Panama Papers data into a simple SQLite database and in another how to link a neo4j graph database to a Jupyter notebook server using Docker Compose.

As the original Panama Papers investigation used neo4j as its backend database, I thought putting the data into a neo4j container could give me the excuse I needed to start looking at neo4j.

Anyway, it seems as if someone has already pushed a neo4j Docker container image preseeded with the Panama Papers data, so here’s my quickstart.

To use it, you need to have Docker installed, download the docker-compose.yaml file and then run:

docker-compose up

If you do this from a command line launched from Kitematic, Kitematic should provide you with a link to the neo4j database, running on the Docker IP address and port 7474. Log in with the default credentials ( neo4j / neo4j ) and change the password to panamapapers (all lower case).

Download the quickstart notebook into the newly created notebooks directory, and you should be able to see it from the notebooks homepage on Docker IP address port 8890 (or again, just follow the link from Kitematic).

I’m still trying to find my way around both the py2neo Python wrapper and the neo4j Cypher query language, so the demo thus far is not that inspiring!

And I’m not sure when I’ll get a chance to look at it again…:-(

Application Shelves for the Digital Library – Fragments

Earlier today, I came across BioShaDock: a community driven bioinformatics shared Docker-based tools registry (BioShadock registry). This collects together a wide range of containerised applications and tools relevant to the bioinformatics community. Users can take one or more applications “off-the-shelf” and run them, without having to go through any complicated software installation process themselves, even if the installation process is a nightmare confusion of dependency hell: the tools are preinstalled and ready to run…

The container images essentially represent reference images that can be freely used by the community. The application containers come pre-installed and ready to run, exact replicas of their parent reference image. The images can be versioned with different versions or builds of the application, so you can reference the use of a particular version of an application and provide a way of sharing exactly that version with other people.

So could we imagine this as a specialist reference shelf in a Digital Library? A specialist reference application shelf, with off-the-shelf, ready-to-run run tools, anywhere, any time?

Another of the nice things about containers is that you can wire them together using things like Docker Compose or Panamax templates to provide a suite of integrated applications that can work with each other. Linked containers can pass information between each other in isolation from the rest of the world. One click can provision and launch all the applications, wired together. And everything can be versioned and archived. Containerised operations can also be sequenced too (eg using DRAY docker pipelines or OpenAPI).

Sometimes, you might want to bundle a set of applications together in a single, shareable package as a virtual machine. These can be versioned, and shared, so everyone has access to the same tools installed in the same way within a single virtual machine. Things like the DH Box, “a push-button Digital Humanities laboratory” (DHBox on github); or the Data Science Toolbox. These could go on to another part of the digital library applications shelf – a more “general purpose toolbox” area, perhaps?

As a complement to the “computer area” in the physical library that provides access to software on particular desktops, the digital library could have “execution rooms” that will actually let you run the applications taken off the shelf, and access them through a web browser UI, for example. So runtime environments like mybinder or tmpnb. Go the the digital library execution room (which is just a web page, though you may have to authenticate to gain access to the server that will actually run the code for you..), say which container, container collection, or reference VM you want to run, and click “start”. Or take the images home you with (that is, run them on your own computer, or on a third party host).

Some fragments relating to the above to try to help me situate this idea of runnable, packaged application shelves with the context of the library in general…

  • libraries have been, and still are, one of the places you go access IT equipment and learn IT skills;
  • libraries used to be, and still are, a place you could go to get advice on, and training in, advanced information skills, particularly discovery, critical reading and presentation;
  • libraries used to be, and still, a locus for collections of things that are often valuable to community or communities associated with a particular library;
  • libraries provide access to reference works or reference materials that provide a common “axiomatic” basis for particular activities;
  • libraries are places that provide access to commercial databases;
  • libraries provide archival and preservation services;
  • libraries may be organisational units that support data and information management needs of their host organisation.

Some more fragments:

  • the creation of a particular piece of work may involve many different steps;
  • one or more specific tools may be involved in the execution of each step;
  • general purpose tools may support the actions required perform a range of tasks to a “good enough” level of quality;
  • specialist tools may provide a more powerful environment for performing a particular task to a higher level of quality

Some questions:

  • what tools are available for performing a particular information related task or set of tasks?
  • what are the best tools for performing a particular information related task or set of tasks?
  • where can I get access to the tools required for a particular task without having to install them myself?
  • how can I effectively organise a workflow that requires the use of several different tools?
  • how can I preserve, document or reference the workflow so that I can reuse it or share it with others?

Some observations:

  • Docker containers provide a way of packaging an application or tool so that it can be “run anywhere”;
  • Docker containers may be linked together in particular compositions so that they can interoperate with each other;
  • docker container images may be grouped together in collections within a subject specific registry: for example, BioShaDock.

OpenRobertaLab – Simple Robot Programming Simulator and UI for Lego EV3 Bricks

Rather regretting not having done a deep dive into programming environments for the Lego EV3 somewhat earlier, I came across the inspired OpenRobertaLab (code, docs) only a couple of days ago.


(Way back when , in the first incarnation of the OU Robotics Outreach Group, we were part of the original Roberta project which was developing a European educational robotics pack, so it’s nice to see it’s continued.)

OpenRobertaLab is a browser accessible environment that allows users to use blocks to program a simulated robot.


I’m not sure how easy it is to change the test track used in the simulator? That said, the default does have some nice features – a line to follow, colour bars to detect, a square to drive round.

The OU Robotlab simulator supported a pen down option that meant you could trace the path taken by the robot – I’m not sure if RobertaLab has a similar feature?


It also looks as if user accounts are available, presumably so you can save your programmes and return to them at a later date:


Account creation looks to be self-service:


OpenRobertaLab also allows you to program a connected EV3 robot running leJOS, the community developed Java programming environment for the EV3s. It seems that it’s also possible to connect to a brick running ev3dev to OpenRobertaLab using the robertalab-ev3dev connector. This package is preinstalled in ev3dev, although it needs enabling (and the brick rebooting) to run. ssh into the brick and then from the brick commandline, run:

sudo systemctl unmask openrobertalab.service
sudo systemctl start openrobertalab.service

Following a reboot, the Open Robertalab client should now automatically run and be available from the OpenRobertaLab menu on the brick. To stop the service / cancel it from running automatically, run:

sudo systemctl stop openrobertalab.service
sudo systemctl mask openrobertalab.service

If the brick has access to the internet, you should now be able to simply connect to the OpenRobertalab server (

Requesting a connection from the brick gives you an access code you need to enter on the OpenRobertaLab server. From the robots menu, select connect...:


and enter the provided connection code (use the connection code displayed on your EV3):


On connecting, you should hear a celebratory beep!

Note that this was as far as I got – Open Robertalab told me a more recent version of the brick firmware was available and suggested I installed it. Whilst claiming I may still be possible to run commands using old firmware, that didn’t seem to be the case?

As we well as accessing the public Open Robertalab environment on the web, you can also run your own server. There are a few dependencies required for this, so I put together a Docker container psychemedia/robertalab (Dockerfile) containing the server, which means you should be able to run it using Kitematic:


(For persisting things like user accounts, and and saved programmes, there should probably be a shared data container to persist that info?)

A random port will be assigned, though you can change this to the original default (1999):


The simulator should run fine using the IP address assigned to the docker machine, but in order to connect a robot on the same local WiFi network to the Open RobertaLab server, or connect to the programming environment from another computer on the local network, you will need to set up proter forwarding from the Docker VM:


See Exposing Services Running in a Docker Container Running in Virtualbox to Other Computers on a Local Network for more information on exposing the containerised Open Robertalab server to a local network.

On the EV3, you will need to connect to a custom Open Robertalab server. The settings will be the IP address of the computer on which the server is running, which you can find on a Mac from the Mac Network settings, along with the port number the server is running on:

So for example, if Kitematic has assigned the port number 32567, and you didn’t otherwise change it, and you host computer IP address is, you should connect to: from the Open Robertalab connection settings on the brick. On connecting, you will be presented with a pass code as above, which you should connect to from your local OpenRobertaLab webpage.

Note that when trying to run programmes on a connected brick, I suffered the firmware mismatch problem again.

Exposing Services Running in a Docker Container Running in Virtualbox to Other Computers on a Local Network

Most of my experiments with Docker on my desktop machine to date have been focused on reducing installation pain and side-effects by running applications and services that I can access from a browser on the same desktop.

The services are exposed against the IP address of the virtual machine running docker, rather than localhost of the host machine, which also means that the containerised services can’t be accessed by other machines connected to the same local network.

So how do we get the docker container ports exposed on the host’s localhost network IP address?

If docker is running the containers via Virtualbox in the virtual machine named default, it seems all we need to do is tweak a couple of port forwarding rules in Virtualbox. So if I’m trying to get port 32769 on the docker IP address relayed to the same port on the host localhost, I can issue the following terminal command if the Docker Virtualbox is currently running:

VBoxManage controlvm "default" natpf1 "tcp-port32769,tcp,,32769,,32769"

which has syntax:

natpf<1-N> [<rulename>],tcp|udp,[<hostip>], <hostport>,[<guestip>],<guestport>

Alternatively, the rule can be created from the Network – Port Forwarding Virtualbox  settings for the default box:


To clear the rule, use:

VBoxManage controlvm "default" natpf1 delete "tcp-port32769"

or delete from the Virtualbox box settings Network – Port Forwarding rule dialogue.

If the box is not currently running, use:

VBoxManage modifyvm "default" --natpf1 "tcp-port32769,tcp,,32769,,32769"
VBoxManage modifyvm "default" --natpf1 delete "tcp-port32769"

The port should now be visible and localhost:32769 and by extension may be exposed to machines on the same network as the host machine by calling the IP address of the host machine with the value of the forwarded port on host.

On a Mac, you can find the local IP address of the machine from the Mac’s Network settings: