From Linked Data to Linked Applications?

Pondering how to put together some Docker IPython magic for running arbitrary command line functions in arbitrary docker containers (this is as far as I’ve got so far), I think the commands must include a couple of things:

  1. the name of the container (perhaps rooted in a particular repository): psychemedia/contentmine or dockerhub::psychemedia/contentmine, for example;
  2. the actual command to be called: for example, one of the contentine commands: getpapers -q {QUERY} -o {OUTPUTDIR} -x

We might also optionally specify mount directories with the calling and called containers, using a conventional default otherwise.

This got me thinking that the called functions might be viewed as operating in a namespace (psychemedia/contentmine or dockerhub::psychemedia/contentmine, for example). And this in turn got me thinking about “big-L, big-A” Linked Applications.

According to Tim Berners Lee’s four rules of Linked Data, the web of data should:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

So how about a web of containerised applications, that would:

  1. Use URIs as names for container images
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information (in the minimal case, this corresponds to a Dockerhub page for example; in a user-centric world, this could just return a help file identifying the commands available in the container, along with help for individual commands; )
  4. Include a Dockerfile. so that they can discover what the application is built from (also may link to other Dockerfiles).

Compared with Linked Data, where the idea is about relating data items one to another, the identifying HTTP URI actually represents the ability to make a call into a functional, execution space. Linkage into the world of linked web resources might be provided through Linked Data relations that specify that a particular resource was generated from an instance of a Linked Application or that the resource can be manipulated by an instance of a particular application.

So for example, files linked to on the web might have a relation that identifies the filetype, and the filetype is linked by another relation that says it can be opened in a particular linked application. Another file might link to a description of the workflow that created it, and the individual steps in the workflow might link to function/command identifiers that are linked to linked applications through relations that associate particular functions with a particular linked application.

Workflows may be defined generically, and then instantiated within a particular experiment. So for example: load file with particular properties, run FFT on particular columns, save output file becomes instantiated within a particular run of an experiment as load file with this URI, run the FFT command from this linked application on particular columns, save output file with this URI.

Hmm… thinks.. there is a huge amount of work already done in the area of automated workflows and workflow execution frameworks/environments for scientific computing. So this is presumably already largely solved? For example, Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker, C. Zheng & D. Thain, 2015 [PDF]?

A handful of other quick points:

  • the model I’m exploring in the Docker magic context is essentially stateless/serverless computing approach, where a commandline container is created on demand and treated in a disposable way to just run a particular function before being destroyed; (see also the OpenAPI approach).
  • The Linked Application notion extends to other containerised applications, such as ones that expose an HTML user interface over http that can be accessed via a browser. In such cases, things like WSDL (or WADL; remember WADL?) provided a machine readable formalised way of describing functional resource availability.
  • In the sense that commandline containerised Linked Applications are actually services, we can also think about web services publishing an http API in a similar way?
  • services such as Sandstorm, which have the notion of self-running containerised documents, have the potentially to actually bind a specific document within an interactive execution environment for that document.

Hmmm… so how much nonsense is all of the above, then?

Steps Towards Some Docker IPython Magic – Draft Magic to Call a Contentmine Container from a Jupyter Notebook Container

I haven’t written any magics for IPython before (and it probably shows!) but I started sketching out some magic for the Contentmine command-line container I described in Using Docker as a Personal Productivity Tool – Running Command Line Apps Bundled in Docker Containers,

What I’d like to explore is a more general way of calling command line functions accessed from arbitrary containers via a piece of generic magic, but I need to learn a few things along the way, such as handling arguments for a start!

The current approach provides crude magic for calling the contentmine functions included in a public contentmine container from a Jupyter notebook running inside a container. The commandline contentmine container is started from within the notebook contained and uses a volume-from the notebook container to pass files between the containers. The path to the directory mounted from the notebook is identified by a bit of jiggery pokery , as is the method for spotting what container the notebook is actually running in (I’m all ears if you know of a better way of doing either of these things?:-)

The magic has the form:

%getpapers /notebooks rhinocerous

to run the getpapers query (with fixed switch settings for now) and the search term rhinocerous; files are shared back from the contentmine container into the .notebooks folder of the Jupyter container.

Other functions include:

%norma /notebooks rhinocerous
%cmine /notebooks rhinocerous

These functions are applied to files in the same folder as was created by the search term (rhinocerous).

The magic needs updating so that it will also work in a Jupyter notebook that is not running within a container – this should simply be just of case of switching in a different directory path. The magics also need tweaking so we can pass parameters in. I’m not sure if more flexibility should also be allowed on specifying the path (we need to make sure that the paths for the mounted directories are the correct ones!)

What I’d like to work towards is some sort of line magic along the lines of:

%docker psychemedia/contentmine -mountdir /CALLING_CONTAINER_PATH -v ${MOUNTDIR}:/PATH COMMAND -ARGS etc

or cell magic:

%%docker psychemedia/contentmine -mountdir /CALLING_CONTAINER_PATH -v ${MOUNTDIR}:/PATH

Note that these go against the docker command line syntax – should they be closer to it?

The code, and a walked through demo, are included in the notebook available via this gist, which should also be embedded below.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
#Start the notebook server linked to the contentmins container as follows:
#docker-compose up -d
image: jupyter/notebook
- "8899:8888"
- contentmineshare
- ./notebooks:/notebooks
# - ./contentmine:/cmstore
- /var/run/docker.sock:/var/run/docker.sock
privileged: true
# links:
# - contentmine:contentmine
image: psychemedia/contentmine
- /contentmine
view raw docker-compose.yml hosted with ❤ by GitHub

Docker as a Personal Application Runner

Consider, for a moment, the following scenarios associated with installing and running a desktop based application on your own computer:

  • a learner installing software for a distance education course: course materials are produced in advance of the course and may be written with a particular version of the software in mind, distributed as part of the course materials. Learners may have arbitrary O/S (various versions of Windows and OS/X), may be be working on work computers with aggressive IT enforced security policies, or may be working on shared/public computers. Some courses may require links between different applications (for example, a data analysis packages and a database system); in addition, some students may not be able to install any software on their own computer – how can we support them?
  • academic research environment: much academic software is difficult to install and may require an element of sysadmin skills, as well as a particular o/s and particular version so supporting libraries. Why should a digital humanities researcher who want to work with text analysis tools provided in a particular text analysis package also have to learn sys admin skills to install the software before they can use the functions that actually matter to them? Or consider a research group environment, where it’s important that research group members have access to the same software configuration but on their own machines.
  • data journalism environment: another twist on the research environment, data journalists may want to compartmentalise and preserve a particular analysis of a particular dataset, along with the tools associated with running those analyses, as “evidence”, in case the story they write on it is challenged in court. Or maybe they need to fire up a particular suite of interlinked tools for producing a particular story in quick time (from accessing the raw data for the first time to publishing the story within a few hours), making sure they work from a clean set up each time.

What we have here is a packaging problem. We also have a situation where the responsibility for installing a single copy of the application or linked applications is an individual user or small team working on an arbitrary platform with few, if any, sys admin skills.

So can Docker help?

A couple of recent posts on the Docker blog set out to explore what Docker is not.

The first – Containers are not VMs – argues that Docker “is not a virtualization technology, it’s an application delivery technology”. The post goes on:

In a VM-centered world, the unit of abstraction is a monolithic VM that stores not only application code, but often its stateful data. A VM takes everything that used to sit on a physical server and just packs it into a single binary so it can be moved around. But it is still the same thing. With containers the abstraction is the application; or more accurately a service that helps to make up the application.

With containers, typically many services (each represented as a single container) comprise an application. Applications are now able to be deconstructed into much smaller components which fundamentally changes the way they are managed in production.

So, how do you backup your container, you don’t. Your data doesn’t live in the container, it lives in a named volume that is shared between 1-N containers that you define. You backup the data volume, and forget about the container. Optimally your containers are completely stateless and immutable.

The key idea here is that with Docker we have a “something” (in the form of a self-contained container) that implements an application’s logic and publishes the application as a service, but isn’t really all that interested in preserving the state of, or any data associated with, the application. If you want to preserve data or state, you need to store it in a separate persistent data container, or alternative data storage service, that is linked to application containers that want to call on it.

The second post – There’s Application Virtualization and There’s Docker – suggests that “Docker is not application virtualization” in the sense of “put[ting] the application inside of a sandbox that includes the app and all its necessary DLLs. Or, … hosting the application on a server, and serving it up remotely…”, but I think I take issue with this in the way it can be misinterpreted as a generality.

The post explicitly considers such application virtualisation in the context of applications that are “monolithic in that they contain their own GUI (vs. a web app that is accessed via a browser)”, things like Microsoft Office or other “traditional” desktop based applications, for example.

But many of the applications I am interest in are ones that publish their user interface as a service, of sorts, over HTTP in the form of a browser based HTML API, or that are accessed via a the commandline. For these sorts of applications, I believe that Docker represents a powerful environment for personal, disposable, application virtualisation. For example, dedicated readers of this blog may already be aware of my demonstrations of how to:

Via Paul Murrell, I also note this approach for defining a pipeline approach for running docker containers: An OpenAPI Pipeline for NZ Crime Data. Pipeline steps are defined in separate XML modules, and the whole pipeline defined in another XML file. For example, the module step OpenAPI/NZcrime/region-Glendowie.xml runs a specified R command in a Docker container fired up to execute just that command. The pipeline definition file identifies the component modules as nodes in some sort of execution graph, along with the edges connecting them as steps in the pipeline. The pipeline manager handles the execution of the steps in order and passes state between the step in one of several ways (for example, via a shared file or a passed variable). (Further work on the OpenAPI pipeline approach is described in An Improved Pipeline for CPI Data.)

What these examples show is that as well as providing devops provisioning support for scaleable applications on the one hand, and an environment for effective testing and rapid development of applications on the other, Docker containers may also have a role to play in “user-pulled” applications.

This is not so much thinking of Docker from an enterprise perspective in which it acts as an environment that supports development and auto-scaled deployment of containerised applications and services, nor is it a view that a web hosting service might take of Docker images providing an appropriate packaging format for the self-service deployment of long-lived services, such as blogs or wiki applications, (a Docker hub and deployment system to rival cPanel, for example).

Instead, it’s a user-centric, rather than devops-centric view, seeing containers from a single user, desktop perspective, seeing Docker and its ilk as providing an environment that can support off-the-shelf, ready to run tools and applications, that can be run locally or in the cloud, individually or in concert with each other.

Next up in this series: a reflection on the possibilities of a “Digital Library Application Shelf”.

More Docker Doodlings – Accessing GUI Apps Via a Browser from a Container Using Guacamole

In a PS to Using Docker as a Personal Productivity Tool – Running Command Line Apps Bundled in Docker Containers, I linked to a demonstration by Jessie Frazelle on how to connect to GUI based apps running in a container via X11. This is all very well if you have an X client, but it would be neater if we could find a way of treating the docker container as a virtual desktop container, and then accessing the app running inside it via the desktop presented through a browser.

Digging around, Guacamole looks like it provides a handy package for exposing a Linux desktop via a browser based user interface [video demo].

Very nice… Which got me wondering: can we run guacamole inside a container, alongside an X.11 producing app, to expose that app?

Via the Dockerfile referenced in Digikam on Mac OS/X or how to use docker to run a graphical app on Mac OS/X I tracked down linuxserver/dockergui, a Docker image that “makes it possible to use any X application on a headless server through a modern web browser such as chrome”.

Exciting:-) [UPDATE: note that that image uses an old version of the guacamole packages; I tried updating to the latest versions of the packages but it doesn’t just work so I rolled back. Support for Docker was introduced after the version used in the linuxserver/dockergui, but I don’t fully understand what that support does! Ideally, it’d be nice to run a guacamole container and then use docker-compose to link in the applications you want to expose to it? Is that possible? Anyone got an example of how to do it?]

So I gave it a go with Audacity. The files I used are contained in this gist that should also be embedded at the bottom of this post.

(Because the original linuxserver/dockergui was quite old, I downloaded their source files and built a current one of my own to seed my Audacity container.)

Building the Audacity container with:

docker build -t psychemedia/audacitygui .

and then running it with:

docker run -d -p 8080:8080 -p 3389:3389 -e "TZ=Europe/London" --name AudacityGui psychemedia/audacitygui

this is what pops up in the browser:


If we click through on the app, we’re presented with a launcher:


Select the app, and Hey, Presto!, Audacity appears…


It seems to work, too… Create a chirp, and then analyse it:


We seem to be able to load and save files in the nobody directory:


I tried exposing the /nobody folder by adding VOLUME /nobody to the Dockerfile and running a -v "${PWD}/files":/nobody switch, but it seemed to break things which is presumably a permissions thing? There are various user roles settings in the linuxserver/dockergui build files, so making poking around with those would fix things? Otherwise, we might have to see the container directly with any files we want in it?:-(

UPDATE: adding RUN mkdir -p /audacityfiles && adduser nobody root to the Dockerfile along with ./VOLUME /audacityfiles and then adding -v "${PWD}/files":/audacityfiles when I create the container allows me to share files in to the container, but I can’t seem to save to the folder? Nor do variations on the theme, such as s creating a subfolder in the nobody folder and giving the same ownership and permissions as the nobody folder. I can save into the nobody folder though. (Just not share it?)

WORKAROUND: in Kitematic, the Exec button on the container view toolbar takes you into the container shell. From there, you can copy files into the shared direcory. For example: cp /nobody/test.aup /nobody/share/test.aup Moving the share to a folder outside /nobody, eg to /audacityfiles means we can simply compy everything from /nobody to /audacityfiles.

Another niggle is with the sound – one of the reasons I tried the Audacity app… (If we can have the visual of the desktop, we want to try to push for sound too, right?!)

Unfortunately, when I tried to play the audio file I’d created, it wasn’t having any of it:


Looking at the log file of the container launch in Kitematic, it seems that ALSA (the Advanced Linux Sound Architecture project) wasn’t happy?


I suspect trying to fix this is a bit beyond my ken, as too are the sorting out the shared folder permissions, I suspect… (I don’t really do sysadmin – which is why I like the idea of ready-to-run application containers).

UPDATE 2: using a different build of the image – hurricane/dockergui:x11rdp1.3, from the linuxserver/dockergui x11rdp1.3 branch, audio does work, though at times it seemed to struggle a bit. I still can’t save files to shared folder though:-(

UPDATE 3: I pushed an image as psychemedia/audacity2. It works from the command line as:
docker run -d -p 8080:8080 -p 3389:3389 -e "TZ=Europe/London" --name AudacityGui -v "${PWD}":/nobody/share psychemedia/audacity2

Anyway – half-way there. If nothing else, we could create and analyse audio files visually in the browser using Audacity, even if we can’t get hold of those audio files or play them!

I’d hope there was a simple permissions fix to get the file sharing to work (anyone? anyone?! ;-) but I suspect the audio bit might be a little bit harder? But if you know of a fix, please let me know:-)

PS I just tried launching psychemedia/audacity2 public Dockerhub image via Docker Cloud, and it seemed to work…

Attempt to expose Audacity in a web browser using a docker container enhanced with guacamole...

To build the container, run something like:

docker build -t psychemedia/audacitygui .
docker run -d -p 8080:8080 -p 3389:3389 -e "TZ=Europe/London" --name AudacityGui   psychemedia/audacitygui
view raw hosted with ❤ by GitHub
# Builds a docker gui image
#FROM hurricane/dockergui:xvnc
FROM hurricane/dockergui:x11rdp1.3
#Use an updated build
#FROM psychemedia/dockergui
# Set environment variables
# User/Group Id gui app will be executed as default are 99 and 100
# Gui App Name default is "GUI_APPLICATION"
ENV APP_NAME="Audacity"
# Default resolution, change if you like
# Use baseimage-docker's init system
CMD ["/sbin/my_init"]
#echo 'deb trusty main universe restricted' > #/etc/apt/sources.list
#echo 'deb trusty-updates main universe restricted' >> #/etc/apt/sources.list
# Install packages needed for app
# Install steps for X app
RUN apt-get update && apt-get install -y \
# Copy X app start script to right location
# Place whater volumes and ports you want exposed here:
#Trying to expose /nobody and running with -v "${PWD}/files":/nobody doesn't seem to work?
RUN mkdir -p /share
VOLUME /share
EXPOSE 3389 8080
view raw Dockerfile hosted with ❤ by GitHub
view raw hosted with ❤ by GitHub

Using Docker as a Personal Productivity Tool – Running Command Line Apps Bundled in Docker Containers

With its focus on enterprise use, it’s probably with good reason that the Docker folk aren’t that interested in exploring the role that Docker may have to play as a technology that supports the execution of desktop applications, or at least, applications for desktop users. (The lack of significant love for Kitematic seems to be representative of that.)

But I think that’s a shame; because for educational and scientific/research applications, docker can be quite handy as a way of packaging software that ultimately presents itself using a browser based user interface delivered over http, as I’ve demonstrated previously in the context of Jupyter notebooks, OpenRefine, RStudio, R Shiny apps, linked applications and so on.

I’ve also shown how we can use Docker containers to package applications that offer machine services via an http endpoint, such as Apache Tika.

I think this latter use case shows how we can start to imagine things like a “digital humanities application shelf” in a digital library (fragmentary thoughts on this), that allows users to take either an image of the application off the shelf (where an image is a thing that lets you fire up a pristine instance of the application), or a running instance of the application of the shelf. (Furthermore, the application can be run locally, on your own desktop computer, or in the cloud, for example, using something like a mybinder like service). The user can then use the application directly (if it has a browser based UI), or call on it from elsewhere (eg in the case of Apache Tika). Once they’re done, they can keep a copy of whatever files they were working with and destroy their running version of the application. If they need the application again, they can just pull a new copy (of the latest version of the app, or the version they used previously) and fire up a new instance of it.

Another way of using Docker came to mind over the weekend when I saw a video demonstrating the use of the contentmine scientific literature analysis toolset. The contentmine installation instructions are a bit of a fiddle for the uninitiated, so I thought I’d try to pop them into a container. That was easy enough (for a certain definition of easy – it was a faff getting node to work and npm to be found, the Java requirements took a couple of goes, and I;m sure the image is way bigger than it really needs to be…), as the Dockerfile below/in the gist shows.

But the question then was how to access the tools? The tools themselves are commandline apps, so the first thing we want to do is to be able to call into the container to run the command. A handy post by Mike English entitled Distributing Command Line Tools with Docker shows how to do this, so that’s all good then…

The next step is to consider how to retain copies of the files created by the command line apps, or pass files to the apps for processing. If we have a target host directory and mount it into the container as as a shared volume, we can keep the files on our desktop or allow the container to create files into the host directory. Then they’ll be accessible to us all the time, even if we destroy the container.

The gist that should be embedded below shows the Dockerfile and a simple batch file passes the Contentmine tool commands into the container which then executes them. The batch file idea could be further extended to produce a set of command shortcuts that essentially alias the Contentmine commands (eg a ./getpapers command rather than a ./contentmine getpapers command, or that combine the various steps associated with a particular pipeline or workflow – getpapers/norma/cmine, for example – into a single command.

UPDATE: the CenturyLinkLabs DRAY docker pipeline looks interesting in this respect for sequencing a set of docker containers and passing the output of one as the input to the next.

If there are other folk out there looking at using Docker specifically for self-managed “pull your own container” individual desktop/user applications, rather than as a devops solution for deploying services at scale, I’d love to chat…:-)

PS for several other examples of using Docker for desktop apps, including accessing GUI based apps using X WIndows / X11, see Jessie Frazelle’s post Docker Containers on the Desktop.

PPS See also More Docker Doodlings – Accessing GUI Apps Via a Browser from a Container Using Guacamole for an attempt at exposing a GUI based app, such as Audacity, running in a container via a browser. Note that I couldn’t get a shared folder or the audio to work, although the GUI bit did…

PPPS I wondered how easy it would be to run command-line containers from within Jupyter notebook itself running in inside a container, but got stuck. Related question on Stack Overflow here.

The rest of the way this post is published is something of an experiment – everything below the line is pulled in from a gist using the WordPress – embedding gists shortcode…

Contentmine Docker CLI App

An attempt to make it easier to use Contentmine tools, by simplifying the install...

Based on some cribs from Distributing Command Line Tools with Docker

  • create a contentmine directory: eg mkdir -p contentmine
  • download the Dockerfile into it
  • from a Docker CLI, cd in to the directory; create an image using eg docker build -t psychemedia/contentmine .
  • download the contentmine script and make it executable: chmod u+x contentmine
  • create a folder on host in the contentmine folder to act as a shared folder with the contentmine container: eg mkdir -p cm
  • run commands
    • ./contentmine getpapers -q aardvark -o /contentmine/aardvark -x
    • ./contentmine norma --project /contentmine/aardvark -i fulltext.xml -o scholarly.html --transform nlm2html
    • ./contentmine cmine /contentmine/aardvark

Contentmine Home: Contentmine

view raw hosted with ❤ by GitHub
## contentmine - a wrapper script for runnining contentmine packages
#/via Distributing Command Line Tools with Docker
#Make this file executable: chmod u+x contentmine
docker run --rm --volume "${PWD}/cm":/contentmine --tty --interactive psychemedia/contentmine "$@"
view raw contentmine hosted with ❤ by GitHub
FROM node:4.3.2
##Based on
RUN apt-get clean -y && apt-get -y update && apt-get -y upgrade && \
apt-get -y update && apt-get install -y wget ant unzip openjdk-7-jdk && \
apt-get clean -y
RUN wget --no-check-certificate
RUN wget --no-check-certificate
RUN dpkg -i norma_0.1.SNAPSHOT_all.deb
RUN dpkg -i ami2_0.1.SNAPSHOT_all.deb
RUN npm install --global getpapers
RUN mkdir /contentmine
VOLUME /contentmine
RUN cd /contentmine
#The intuition behind this is: 'can we avoid setup hassles by distributing contentmine commandline app via a container?'
#docker build -t psychemedia/contentmine .
#Then maybe something like:
#mkdir -p cm
#docker run --volume "${PWD}/cm":/contentmine --interactive psychemedia/contentmine getpapers -q aardvark -o /contentmine/aardvark -x
#docker run --volume "${PWD}/cm":/contentmine --interactive psychemedia/contentmine norma --project /contentmine/aardvark -i fulltext.xml -o scholarly.html --transform nlm2html
#docker run --volume "${PWD}/cm":/contentmine --interactive psychemedia/contentmine cmine /contentmine/aardvark
##Following how about:
#Create the following script as contentmine; chmod u+x contentmine
## contentmine - a wrapper script for runnining contentmine packages
#docker run --volume "${PWD}/cm":/contentmine --tty --interactive psychemedia/contentmine "$@"
#Then run eg: ./contentmine getpapers -q aardvark -o /contentmine/aardvark2 -x
view raw Dockerfile hosted with ❤ by GitHub

Accessing a Neo4j Graph Database Server from RStudio and Jupyter R Notebooks Using Docker Containers

In Getting Started With the Neo4j Graph Database – Linking Neo4j and Jupyter SciPy Docker Containers Using Docker Compose I posted a recipe demonstrating how to link a Jupyter notebook container with a neo4j container to provide a quick way to get up an running with neo4j from a Python environment.

It struck me that it should be just as easy to launch an R environment, so here’s a docker-compose.yml file that will do just that:

  image: kbastani/docker-neo4j:latest
    - "7474:7474"
    - "1337:1337"
    - /opt/data

  image: rocker/rstudio
    - "8787:8787"
    - neo4j:neo4j
    - ./rstudio:/home/rstudio

  image: jupyter/r-notebook
    - "8889:8888"
    - neo4j:neo4j
    - ./notebooks:/home/jovyan/work

If you’re using Kitematic (available via the Docker Toolbox), launch the docker command line interface (Docker CLI), cd into the directory containing the docker-compose.yml file, and run the docker-compose up -d command. This will download the necessary images and fire up the linked containers: one running neo4j, one running RStudio, and one running a Jupyter notebook with an R kernel.

You should then be able to find the URLs/links for RStudio and the notebooks in Kitematic:


Once again, Nicole White has some quickstart examples for using R with neo4j, this time using the Rneo4j R package. One thing I noticed with the Jupyter R kernel was that I needed to specify the CRAN mirror when installing the package: install.packages('RNeo4j', repos="")

To connect to the neo4j database, use the domain mapping specified in the Docker Compose file: graph = startGraph("http://neo4j:7474/db/data/")

Here’s an example in RStudio running from the container:


And the Jupyter notebook:


Notebooks and RStudio project files are shared into subdirectories of the current directory (from which the docker compose command was run) on host.

Tutum Cloud Becomes Docker Cloud

In several previous posts, I’ve shown how to launch docker containers in the cloud using Tutum Cloud, for example to run OpenRefine, RStudio, or Shiny apps in the cloud. Docker bought Tutum out several months ago, and now it seems that they’re running the tutum environment as Docker Cloud (docs, announcement).


Tutum – as was – looks like it’s probably on the way out…


The onboarding screen confirms that the familiar Tutum features are available in much the same form as before:


The announcement post suggests that you can get a free node and one free repository, but whilst my Docker hub images seemed to have appeared, I couldn’t see how to get a free node. (Maybe I was too quick linking my Digital Ocean account?)

As with Tutum, (because it is, was, tutum!), Docker Cloud provides you with the opportunity to spin up one or more servers on a linked cloud host and run containers on those servers, either individually or linked together as part of a “stack” (essentially, a Docker Compose container composition). You also get an image repository within Docker Cloud – mine looks as if it’s linked to my DockerHub repository:


A nice feature of this is that you can 1-click start a container from your image repository if you have a server node running.

The Docker Cloud service currently provides a rebranding of the Tutum offering, so it’ll be interesting to see if product features continue to be developed. One thing I keep thinking might be interesting is a personal MyBinder style service that simplifies the deploy to Tutum (as was) service and allows users to use linked hosts and persistent logins to 1-click launch container services with persistent state (for example, Launch Docker Container Compositions via Tutum and – But What About Container Stashing?). I guess this would mean linking in some cheap storage somehow, rather than having to keep server nodes up to persist container state? By the by, C. Titus Brown has some interesting reflections on MyBinder here: Is mybinder 95% of the way to next-gen computational science publishing, or only 90%?

If nothing else, signing up to Docker Cloud does give you a $20 Digital Ocean credit voucher that can also be applied to pre-exisiting Digital Ocean accounts:-) (If you haven’t already signed up for Digital Ocean but want to give it a spin, this affiliate link should also get you $10 free credit.)

PS as is the way of these things, I currently run docker containers in the cloud on my own tab (or credit vouchers) rather than institutional servers, because – you know – corporate IT. So I was interested to see that Docker have also recently launched a Docker DataCenter service (docs), and the associated promise of “containers-as-a-service” (CaaS), that makes it easy to offer cloud-based container deployment infrastructure. Just sayin’…;-)

PPS So when are we going to get Docker Cloud integration in Kitematic, or a Kitematic client for Docker Cloud?

Running Docker Container Compositions in the Cloud on Digital Ocean

With TM351 about to start, I thought I’d revisit the docker container approach to delivering the services required by the course to see if I could get something akin to the course software running in the cloud.

Copies of the dockerfiles used to create the images can be found on Github, with prebuilt images available on dockerhub (*).

A couple of handy ones out of the can are:

That said, at the current time, the images are not intended for use as part of the course

The following docker-compose.yml file will create a set of linked containers that resemble (ish!) the monolithic VM we distributed to students as a Virtualbox box.

    container_name: tm351-dockerui
    image: dockerui/dockerui
        - "35183:9000"
        - /var/run/docker.sock:/var/run/docker.sock
    privileged: true

    container_name: tm351-devmongodata
    command: echo mongodb_created
    #Share same layers as the container we want to link to?
    image: mongo:3.0.7
        - /data/db

    container_name: tm351-mongodb
    image: mongo:3.0.7
        - "27017:27017"
        - devmongodata
    command: --smallfiles

    container_name: tm351-mongodb-seed
    image: psychemedia/ou-tm351-mongodb-simple-seed
        - mongodb

    container_name: tm351-devpostgresdata
    command: echo created
    image: busybox
        - /var/lib/postgresql/data
    container_name: tm351-postgres
    image: psychemedia/ou-tm351-postgres
        - "5432:5432"

    container_name: tm351-openrefine
    image: psychemedia/tm351-openrefine
        - "35181:3333"
    privileged: true
    container_name: tm351-notebook
    #build: ./tm351_scipystacknserver
    image: psychemedia/ou-tm351-pystack
        - "35180:8888"
        - postgres:postgres
        - mongodb:mongodb
        - openrefine:openrefine
    privileged: true

Place a copy of the docker-compose.yml YAML file somewhere, from Kitematic, open the command line, cd into the directory containing the YAML file, and enter the command docker-compose up -d – the images are on dockerhub and should be downloaded automatically.

Refer back to Kitematic and you should see running containers – the settings panel for the notebooks container shows the address you can find the notebook server at.


The notebooks and OpenRefine containers should also be linked to shared folders in the directory you ran the Docker Compose script from.

Running the Containers in the Cloud – Docker-Machine and Digital Ocean

As well as running the linked containers on my own machine, my real intention was to see how easy it would be to get them running in the cloud and using just the browser on my own computer to access them.

And it turns out to be really easy. The following example uses cloud host Digital Ocean.

To start with, you’ll need a Digital Ocean account with some credit in it and a Digital Ocean API token:


(You may be able to get some Digital Ocean credit for free as part of the Github Education Student Developer Pack.)

Then it’s just a case of a few command line instructions to get things running using Docker Machine:

docker-machine ls
#kitematic usess: default

#Create a droplet on Digital Ocean
docker-machine create -d digitalocean --digitalocean-access-token YOUR_ACCESS_TOKEN --digitalocean-region lon1 --digitalocean-size 4gb ou-tm351-test 

#Check the IP address of the machine
docker-machine ip ou-tm351-test

#Display information about the machine
docker-machine env ou-tm351-test
#This returns necessary config details
#For example:
##export DOCKER_TLS_VERIFY="1"
##export DOCKER_HOST="tcp://IP_ADDRESS:2376"
##export DOCKER_CERT_PATH="/Users/YOUR-USER/.docker/machine/machines/ou-tm351-test"
##export DOCKER_MACHINE_NAME="ou-tm351-test"
# Run this command to configure your shell: 
# eval $(docker-machine env ou-tm351-test)

#Set the environment variables as recommended
export DOCKER_HOST="tcp://IP_ADDRESS:2376"
export DOCKER_CERT_PATH="/Users/YOUR-USER/.docker/machine/machines/ou-tm351-test"

#Run command to set current docker-machine
eval "$(docker-machine env ou-tm351-test)"

#If the docker-compose.yml file is in .
docker-compose up -d
#This will launch the linked containers on Digital Ocean

#The notebooks should now be viewable at:

#OpenRefine should now be viewable at:

#To stop the machine
docker-machine stop ou-tm351-test
#To remove the Digital Ocean droplet (so you stop paying for it...
docker-machine rm ou-tm351-test

#Reset the current docker machine to the Kitematic machine
eval "$(docker-machine env default)"

So that’s a start. Issues still arise in terms of persisting state, such as the database contents, notebook files* and OpenRefine projects: if you leave the containers running on Digital Ocean to persist the state, the meter will keep running.

(* I should probably also build a container that demos how to bake a few example notebooks into a container running the notebook server and TM351 python distribution.)

How to Run A Shiny App in the Cloud Using Tutum Docker Cloud, Digital Ocean and Docker Containers

Via RBloggers, I spotted this post on Deploying Your Very Own Shiny Server (here’s another on a similar theme). I’ve been toying with the idea of running some of my own Shiny apps, so that post provided a useful prompt, though way too involved for me;-)

So here’s what seems to me to be an easier, rather more pointy-clicky, wiring stuff together way using Docker containers (though it might not seem that much easier to you the first time through!). The recipe includes: github, Dockerhub, Tutum and Digital Ocean.

To being with, I created a minimal shiny app to allow the user to select a CSV file, upload it to the app and display it. The ui.R and server.R files, along with whatever else you need, should be placed into an application directory, for example shiny_demo within a project directory, which I’m confusingly also calling shiny_demo (I should have called it something else to make it a bit clearer – for example, shiny_demo_project.)

The shiny server comes from a prebuilt docker container on dockerhub – rocker/shiny.

This shiny server can run several Shiny applications, though I only want to run one: shiny_demo.

I’m going to put my application into it’s own container. This container will use the rocker/shiny container as a base, and simply copy my application folder into the shiny server folder from which applications are served. My Dockerfile is really simple and contains just two lines – it looks like this and goes into a file called Dockerfile in the project directory:

FROM rocker/shiny
ADD shiny_demo /srv/shiny-server/shiny_demo

The ADD command simply copies the the contents of the child directory into a similarly named directory in the container’s /srv/shiny-server/ directory. You could add as many applications you wanted to the server as long as each is in it’s own directory. For example, if I have several applications:


I can add the second application to my container using:

ADD shiny_other_demo /srv/shiny-server/shiny_other_demo

The next thing I need to do is check-in my shiny_demo project into Github. (I don’t have a how to on this, unfortunately…) In fact, I’ve checked my project in as part of another repository (docker-containers).


The next step is to build a container image on DockerHub. If I create an account and log in to DockerHub, I can link my Github account to it.

(I can do that locally if Docker is installed locally by cding into the folder containing the Dockerfile and running docker build -t shinydemo . Then I can run it using the command docker run shinydemo -p 8881:3838 (the server runs on port 3838 in the container, which I expose as localhost:8881. The application is at localhost:8881/shiny_demo)

I can then create an Automated Build that will build a container image from my Github repository. First, identify the repository on my linked Github account and name the image:


Then add the path the project directory that contains the Dockerfile for the image you’re interested in:


Click on Trigger to build the image the first time. In the future, every time I update that folder in the repository, the container image will be rebuilt to include the updates.

So now I have a Docker container image on Dockerhub that contains the Shiny server from the rocker/shiny image and a copy of my shiny application files.

Now I need to go Tutum (also part of the Docker empire), which is an application for launching containers on a range of cloud services. If you link your Digital Ocean account to tutum, you can use tutum to launch docker containers on Dockerhub on a Digital Ocean droplet.

Within tutum, you’ll need to create a new node cluster on Digital Ocean:

(Notwithstanding the below, I generally go for a single 4GB node…)

Now we need to create a service from a container image:

I can find the container image I want to deploy on the cluster that I previously built on Dockerhub:


Select the image and then configure it – you may want to rename it, for example. One thing you definitely need to do though is tick to publish the port – this will make the shiny server port visible on the web.


Create and deploy the service. When the container is built, and has started running, you’ll be told where you can find it.


Note that if you click on the link to the running container, the default URL starts with tcp:// which you’ll need to change to http://. The port will be dynamically allocated unless you specified a particular port mapping on the service creation page.

To view your shiny app, simply add the name of the folder the application is in to the URL.

When you’ve finished running the app, you may want to shut the container down – and more importantly perhaps, switch the Digital Ocean droplet off so you don’t continue paying for it!


As I said at the start, the first time round seems quite complicated. After all, you need to:

(Actually, you can miss out the dockerhub steps, and instead link your github account to your tutum account and do the automated build from the github files within tutum: Tutum automatic image builds from GitHub repositories. The service can then be launched by finding the container image in your tutum repository)

However, once you do have your project files in github, you can then easily update them and easily launch them on Digital Ocean. In fact, you can make it even easier by adding a deploy to tutum button to a project file in Github.

See also: How to run RStudio on Digital Ocean via Tutum and How to run OpenRefine on Digital Ocean via Tutum.

PS to test the container locally, I launch a docker terminal from Kitematic, cd into the project folder, and run something like:

docker build -t psychemedia/shinydemo .
docker run --name shinydemo -i -t psychemedia/shinydemo -p 8881:3838

In the above, I set the port map (the service runs in the container on 3838, which I expose on host as localhost:8881. Alternatively, I could find a link to the server from within Kitematic.

Launch Docker Container Compositions via Tutum and – But What About Container Stashing?

Via a couple of tweets, it seems that 1-click launching of runnable docker container compositions to the cloud is almost possible with Tutum – deploy to Tutum button [h/t @borja_burgos] – with collections of ready–to-go compositions (or in Tutum parlance, stacks) available on [h/t @tutumcloud].

The deploy to Tutum button is very much like the binder setup, with URLs taking the form:

The repository – such as a github repository – will look for tutum.yml, docker-compose.yml and fig.yml files (in that order) and pre-configure a Tutum stack dialogue with the information described in the file.

The stack can then be deployed to a one or more already running nodes.

The site hosts a range of pre-defined configuration files that can be used with the deploy button, so in certain respects it acts much the same way as the panamax directory (Panamax marketplace?)

One of the other things I learned about Tutum is that they have a container defined that can cope with load balancing: if you launch multiple container instances of the same docker image, you can load balance across them (tutum: load balancing a web service). At least one of the configurations on (Load balancing a Web Service) seems to show how to script this.

One of the downsides of the load balancing, and indeed the deploy to Tutum recipe generally is that there doesn’t seem to be a way to ensure that server nodes on which to run the containers are available: presumably, you have to start these yourself?

What would be nice would be the ability to also specify an autoscaling rule that could be used to fire up at least one node on which to run a deployed stack? Autoscaling rules would also let you power up/power down server nodes depending on load, which could presumably keep the cost of running servers down to a minimum needed to actually service whatever load is being experienced. (I’m thinking of occasional, and relative low usage models, which are perhaps also slightly different from a normal web scaling model. For example, the ability to fire up a configuration of several instances of OpenRefine for a workshop, and have autoscaling cope with deploying additional containers (and if required, additional server nodes) depending on how many people turn up to the workshop or want to participate).)

There seems to be a discussion thread about autoscaling on the Tutum site, but I’m not sure there is actually a corresponding service offering? (Via @tutumcloud, there is a webhook triggered version: Tutum triggers.)

One final thing that niggles at me particularly in respect of personal application hosting is the ability to “stash” a copy of a container somewhere so that it can be reused later, rather than destroying containers after each use. ( appears to have this sorted…) A major reason for doing this would be to preserve user files. I guess one way round it is to use a linked data container and then keep the server node containing that linked data container alive, in between rounds of destroying and starting up new application containers (application containers that link to the data container to store user files). The downside of this is that you need to keep a server alive – and keep paying for it.

What would be really handy would be the ability to “stash” a container in some cheap storage somewhere, and then retrieve that container each time someone wanted to run their application (this could be a linked data container, or it could be the application container itself, with files preserved locally inside it?) (Related: some of my earlier notes on how to share docker data containers.)

I’m not sure whether there are any string’n’glue models that might support this? (If you know of any, particularly if they work in a Tutum context, please let me know via the comments…)