From Linked Data to Linked Applications?

Pondering how to put together some Docker IPython magic for running arbitrary command line functions in arbitrary docker containers (this is as far as I’ve got so far), I think the commands must include a couple of things:

  1. the name of the container (perhaps rooted in a particular repository): psychemedia/contentmine or dockerhub::psychemedia/contentmine, for example;
  2. the actual command to be called: for example, one of the contentine commands: getpapers -q {QUERY} -o {OUTPUTDIR} -x

We might also optionally specify mount directories with the calling and called containers, using a conventional default otherwise.

This got me thinking that the called functions might be viewed as operating in a namespace (psychemedia/contentmine or dockerhub::psychemedia/contentmine, for example). And this in turn got me thinking about “big-L, big-A” Linked Applications.

According to Tim Berners Lee’s four rules of Linked Data, the web of data should:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

So how about a web of containerised applications, that would:

  1. Use URIs as names for container images
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information (in the minimal case, this corresponds to a Dockerhub page for example; in a user-centric world, this could just return a help file identifying the commands available in the container, along with help for individual commands; )
  4. Include a Dockerfile. so that they can discover what the application is built from (also may link to other Dockerfiles).

Compared with Linked Data, where the idea is about relating data items one to another, the identifying HTTP URI actually represents the ability to make a call into a functional, execution space. Linkage into the world of linked web resources might be provided through Linked Data relations that specify that a particular resource was generated from an instance of a Linked Application or that the resource can be manipulated by an instance of a particular application.

So for example, files linked to on the web might have a relation that identifies the filetype, and the filetype is linked by another relation that says it can be opened in a particular linked application. Another file might link to a description of the workflow that created it, and the individual steps in the workflow might link to function/command identifiers that are linked to linked applications through relations that associate particular functions with a particular linked application.

Workflows may be defined generically, and then instantiated within a particular experiment. So for example: load file with particular properties, run FFT on particular columns, save output file becomes instantiated within a particular run of an experiment as load file with this URI, run the FFT command from this linked application on particular columns, save output file with this URI.

Hmm… thinks.. there is a huge amount of work already done in the area of automated workflows and workflow execution frameworks/environments for scientific computing. So this is presumably already largely solved? For example, Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker, C. Zheng & D. Thain, 2015 [PDF]?

A handful of other quick points:

  • the model I’m exploring in the Docker magic context is essentially stateless/serverless computing approach, where a commandline container is created on demand and treated in a disposable way to just run a particular function before being destroyed; (see also the OpenAPI approach).
  • The Linked Application notion extends to other containerised applications, such as ones that expose an HTML user interface over http that can be accessed via a browser. In such cases, things like WSDL (or WADL; remember WADL?) provided a machine readable formalised way of describing functional resource availability.
  • In the sense that commandline containerised Linked Applications are actually services, we can also think about web services publishing an http API in a similar way?
  • services such as Sandstorm, which have the notion of self-running containerised documents, have the potentially to actually bind a specific document within an interactive execution environment for that document.

Hmmm… so how much nonsense is all of the above, then?

Steps Towards Some Docker IPython Magic – Draft Magic to Call a Contentmine Container from a Jupyter Notebook Container

I haven’t written any magics for IPython before (and it probably shows!) but I started sketching out some magic for the Contentmine command-line container I described in Using Docker as a Personal Productivity Tool – Running Command Line Apps Bundled in Docker Containers,

What I’d like to explore is a more general way of calling command line functions accessed from arbitrary containers via a piece of generic magic, but I need to learn a few things along the way, such as handling arguments for a start!

The current approach provides crude magic for calling the contentmine functions included in a public contentmine container from a Jupyter notebook running inside a container. The commandline contentmine container is started from within the notebook contained and uses a volume-from the notebook container to pass files between the containers. The path to the directory mounted from the notebook is identified by a bit of jiggery pokery , as is the method for spotting what container the notebook is actually running in (I’m all ears if you know of a better way of doing either of these things?:-)

The magic has the form:

%getpapers /notebooks rhinocerous

to run the getpapers query (with fixed switch settings for now) and the search term rhinocerous; files are shared back from the contentmine container into the .notebooks folder of the Jupyter container.

Other functions include:

%norma /notebooks rhinocerous
%cmine /notebooks rhinocerous

These functions are applied to files in the same folder as was created by the search term (rhinocerous).

The magic needs updating so that it will also work in a Jupyter notebook that is not running within a container – this should simply be just of case of switching in a different directory path. The magics also need tweaking so we can pass parameters in. I’m not sure if more flexibility should also be allowed on specifying the path (we need to make sure that the paths for the mounted directories are the correct ones!)

What I’d like to work towards is some sort of line magic along the lines of:

%docker psychemedia/contentmine -mountdir /CALLING_CONTAINER_PATH -v ${MOUNTDIR}:/PATH COMMAND -ARGS etc

or cell magic:

%%docker psychemedia/contentmine -mountdir /CALLING_CONTAINER_PATH -v ${MOUNTDIR}:/PATH
COMMAND -ARGS etc
...
COMMAND -ARGS etc

Note that these go against the docker command line syntax – should they be closer to it?

The code, and a walked through demo, are included in the notebook available via this gist, which should also be embedded below.


Docker as a Personal Application Runner

Consider, for a moment, the following scenarios associated with installing and running a desktop based application on your own computer:

  • a learner installing software for a distance education course: course materials are produced in advance of the course and may be written with a particular version of the software in mind, distributed as part of the course materials. Learners may have arbitrary O/S (various versions of Windows and OS/X), may be be working on work computers with aggressive IT enforced security policies, or may be working on shared/public computers. Some courses may require links between different applications (for example, a data analysis packages and a database system); in addition, some students may not be able to install any software on their own computer – how can we support them?
  • academic research environment: much academic software is difficult to install and may require an element of sysadmin skills, as well as a particular o/s and particular version so supporting libraries. Why should a digital humanities researcher who want to work with text analysis tools provided in a particular text analysis package also have to learn sys admin skills to install the software before they can use the functions that actually matter to them? Or consider a research group environment, where it’s important that research group members have access to the same software configuration but on their own machines.
  • data journalism environment: another twist on the research environment, data journalists may want to compartmentalise and preserve a particular analysis of a particular dataset, along with the tools associated with running those analyses, as “evidence”, in case the story they write on it is challenged in court. Or maybe they need to fire up a particular suite of interlinked tools for producing a particular story in quick time (from accessing the raw data for the first time to publishing the story within a few hours), making sure they work from a clean set up each time.

What we have here is a packaging problem. We also have a situation where the responsibility for installing a single copy of the application or linked applications is an individual user or small team working on an arbitrary platform with few, if any, sys admin skills.

So can Docker help?

A couple of recent posts on the Docker blog set out to explore what Docker is not.

The first – Containers are not VMs – argues that Docker “is not a virtualization technology, it’s an application delivery technology”. The post goes on:

In a VM-centered world, the unit of abstraction is a monolithic VM that stores not only application code, but often its stateful data. A VM takes everything that used to sit on a physical server and just packs it into a single binary so it can be moved around. But it is still the same thing. With containers the abstraction is the application; or more accurately a service that helps to make up the application.

With containers, typically many services (each represented as a single container) comprise an application. Applications are now able to be deconstructed into much smaller components which fundamentally changes the way they are managed in production.

So, how do you backup your container, you don’t. Your data doesn’t live in the container, it lives in a named volume that is shared between 1-N containers that you define. You backup the data volume, and forget about the container. Optimally your containers are completely stateless and immutable.

The key idea here is that with Docker we have a “something” (in the form of a self-contained container) that implements an application’s logic and publishes the application as a service, but isn’t really all that interested in preserving the state of, or any data associated with, the application. If you want to preserve data or state, you need to store it in a separate persistent data container, or alternative data storage service, that is linked to application containers that want to call on it.

The second post – There’s Application Virtualization and There’s Docker – suggests that “Docker is not application virtualization” in the sense of “put[ting] the application inside of a sandbox that includes the app and all its necessary DLLs. Or, … hosting the application on a server, and serving it up remotely…”, but I think I take issue with this in the way it can be misinterpreted as a generality.

The post explicitly considers such application virtualisation in the context of applications that are “monolithic in that they contain their own GUI (vs. a web app that is accessed via a browser)”, things like Microsoft Office or other “traditional” desktop based applications, for example.

But many of the applications I am interest in are ones that publish their user interface as a service, of sorts, over HTTP in the form of a browser based HTML API, or that are accessed via a the commandline. For these sorts of applications, I believe that Docker represents a powerful environment for personal, disposable, application virtualisation. For example, dedicated readers of this blog may already be aware of my demonstrations of how to:

Via Paul Murrell, I also note this approach for defining a pipeline approach for running docker containers: An OpenAPI Pipeline for NZ Crime Data. Pipeline steps are defined in separate XML modules, and the whole pipeline defined in another XML file. For example, the module step OpenAPI/NZcrime/region-Glendowie.xml runs a specified R command in a Docker container fired up to execute just that command. The pipeline definition file identifies the component modules as nodes in some sort of execution graph, along with the edges connecting them as steps in the pipeline. The pipeline manager handles the execution of the steps in order and passes state between the step in one of several ways (for example, via a shared file or a passed variable). (Further work on the OpenAPI pipeline approach is described in An Improved Pipeline for CPI Data.)

What these examples show is that as well as providing devops provisioning support for scaleable applications on the one hand, and an environment for effective testing and rapid development of applications on the other, Docker containers may also have a role to play in “user-pulled” applications.

This is not so much thinking of Docker from an enterprise perspective in which it acts as an environment that supports development and auto-scaled deployment of containerised applications and services, nor is it a view that a web hosting service might take of Docker images providing an appropriate packaging format for the self-service deployment of long-lived services, such as blogs or wiki applications, (a Docker hub and deployment system to rival cPanel, for example).

Instead, it’s a user-centric, rather than devops-centric view, seeing containers from a single user, desktop perspective, seeing Docker and its ilk as providing an environment that can support off-the-shelf, ready to run tools and applications, that can be run locally or in the cloud, individually or in concert with each other.

Next up in this series: a reflection on the possibilities of a “Digital Library Application Shelf”.

More Docker Doodlings – Accessing GUI Apps Via a Browser from a Container Using Guacamole

In a PS to Using Docker as a Personal Productivity Tool – Running Command Line Apps Bundled in Docker Containers, I linked to a demonstration by Jessie Frazelle on how to connect to GUI based apps running in a container via X11. This is all very well if you have an X client, but it would be neater if we could find a way of treating the docker container as a virtual desktop container, and then accessing the app running inside it via the desktop presented through a browser.

Digging around, Guacamole looks like it provides a handy package for exposing a Linux desktop via a browser based user interface [video demo].

Very nice… Which got me wondering: can we run guacamole inside a container, alongside an X.11 producing app, to expose that app?

Via the Dockerfile referenced in Digikam on Mac OS/X or how to use docker to run a graphical app on Mac OS/X I tracked down linuxserver/dockergui, a Docker image that “makes it possible to use any X application on a headless server through a modern web browser such as chrome”.

Exciting:-) [UPDATE: note that that image uses an old version of the guacamole packages; I tried updating to the latest versions of the packages but it doesn’t just work so I rolled back. Support for Docker was introduced after the version used in the linuxserver/dockergui, but I don’t fully understand what that support does! Ideally, it’d be nice to run a guacamole container and then use docker-compose to link in the applications you want to expose to it? Is that possible? Anyone got an example of how to do it?]

So I gave it a go with Audacity. The files I used are contained in this gist that should also be embedded at the bottom of this post.

(Because the original linuxserver/dockergui was quite old, I downloaded their source files and built a current one of my own to seed my Audacity container.)

Building the Audacity container with:

docker build -t psychemedia/audacitygui .

and then running it with:

docker run -d -p 8080:8080 -p 3389:3389 -e "TZ=Europe/London" --name AudacityGui psychemedia/audacitygui

this is what pops up in the browser:

Guacamole_0_9_6

If we click through on the app, we’re presented with a launcher:

Audacity

Select the app, and Hey, Presto!, Audacity appears…

Audacity1

It seems to work, too… Create a chirp, and then analyse it:

Audacity2

We seem to be able to load and save files in the nobody directory:

Audacity3

I tried exposing the /nobody folder by adding VOLUME /nobody to the Dockerfile and running a -v "${PWD}/files":/nobody switch, but it seemed to break things which is presumably a permissions thing? There are various user roles settings in the linuxserver/dockergui build files, so making poking around with those would fix things? Otherwise, we might have to see the container directly with any files we want in it?:-(

UPDATE: adding RUN mkdir -p /audacityfiles && adduser nobody root to the Dockerfile along with ./VOLUME /audacityfiles and then adding -v "${PWD}/files":/audacityfiles when I create the container allows me to share files in to the container, but I can’t seem to save to the folder? Nor do variations on the theme, such as s creating a subfolder in the nobody folder and giving the same ownership and permissions as the nobody folder. I can save into the nobody folder though. (Just not share it?)

WORKAROUND: in Kitematic, the Exec button on the container view toolbar takes you into the container shell. From there, you can copy files into the shared direcory. For example: cp /nobody/test.aup /nobody/share/test.aup Moving the share to a folder outside /nobody, eg to /audacityfiles means we can simply compy everything from /nobody to /audacityfiles.

Another niggle is with the sound – one of the reasons I tried the Audacity app… (If we can have the visual of the desktop, we want to try to push for sound too, right?!)

Unfortunately, when I tried to play the audio file I’d created, it wasn’t having any of it:

Audacity4

Looking at the log file of the container launch in Kitematic, it seems that ALSA (the Advanced Linux Sound Architecture project) wasn’t happy?

alsa_no

I suspect trying to fix this is a bit beyond my ken, as too are the sorting out the shared folder permissions, I suspect… (I don’t really do sysadmin – which is why I like the idea of ready-to-run application containers).

UPDATE 2: using a different build of the image – hurricane/dockergui:x11rdp1.3, from the linuxserver/dockergui x11rdp1.3 branch, audio does work, though at times it seemed to struggle a bit. I still can’t save files to shared folder though:-(

UPDATE 3: I pushed an image as psychemedia/audacity2. It works from the command line as:
docker run -d -p 8080:8080 -p 3389:3389 -e "TZ=Europe/London" --name AudacityGui -v "${PWD}":/nobody/share psychemedia/audacity2

Anyway – half-way there. If nothing else, we could create and analyse audio files visually in the browser using Audacity, even if we can’t get hold of those audio files or play them!

I’d hope there was a simple permissions fix to get the file sharing to work (anyone? anyone?! ;-) but I suspect the audio bit might be a little bit harder? But if you know of a fix, please let me know:-)

PS I just tried launching psychemedia/audacity2 public Dockerhub image via Docker Cloud, and it seemed to work…


Using Docker as a Personal Productivity Tool – Running Command Line Apps Bundled in Docker Containers

With its focus on enterprise use, it’s probably with good reason that the Docker folk aren’t that interested in exploring the role that Docker may have to play as a technology that supports the execution of desktop applications, or at least, applications for desktop users. (The lack of significant love for Kitematic seems to be representative of that.)

But I think that’s a shame; because for educational and scientific/research applications, docker can be quite handy as a way of packaging software that ultimately presents itself using a browser based user interface delivered over http, as I’ve demonstrated previously in the context of Jupyter notebooks, OpenRefine, RStudio, R Shiny apps, linked applications and so on.

I’ve also shown how we can use Docker containers to package applications that offer machine services via an http endpoint, such as Apache Tika.

I think this latter use case shows how we can start to imagine things like a “digital humanities application shelf” in a digital library (fragmentary thoughts on this), that allows users to take either an image of the application off the shelf (where an image is a thing that lets you fire up a pristine instance of the application), or a running instance of the application of the shelf. (Furthermore, the application can be run locally, on your own desktop computer, or in the cloud, for example, using something like a mybinder like service). The user can then use the application directly (if it has a browser based UI), or call on it from elsewhere (eg in the case of Apache Tika). Once they’re done, they can keep a copy of whatever files they were working with and destroy their running version of the application. If they need the application again, they can just pull a new copy (of the latest version of the app, or the version they used previously) and fire up a new instance of it.

Another way of using Docker came to mind over the weekend when I saw a video demonstrating the use of the contentmine scientific literature analysis toolset. The contentmine installation instructions are a bit of a fiddle for the uninitiated, so I thought I’d try to pop them into a container. That was easy enough (for a certain definition of easy – it was a faff getting node to work and npm to be found, the Java requirements took a couple of goes, and I;m sure the image is way bigger than it really needs to be…), as the Dockerfile below/in the gist shows.

But the question then was how to access the tools? The tools themselves are commandline apps, so the first thing we want to do is to be able to call into the container to run the command. A handy post by Mike English entitled Distributing Command Line Tools with Docker shows how to do this, so that’s all good then…

The next step is to consider how to retain copies of the files created by the command line apps, or pass files to the apps for processing. If we have a target host directory and mount it into the container as as a shared volume, we can keep the files on our desktop or allow the container to create files into the host directory. Then they’ll be accessible to us all the time, even if we destroy the container.

The gist that should be embedded below shows the Dockerfile and a simple batch file passes the Contentmine tool commands into the container which then executes them. The batch file idea could be further extended to produce a set of command shortcuts that essentially alias the Contentmine commands (eg a ./getpapers command rather than a ./contentmine getpapers command, or that combine the various steps associated with a particular pipeline or workflow – getpapers/norma/cmine, for example – into a single command.

UPDATE: the CenturyLinkLabs DRAY docker pipeline looks interesting in this respect for sequencing a set of docker containers and passing the output of one as the input to the next.

If there are other folk out there looking at using Docker specifically for self-managed “pull your own container” individual desktop/user applications, rather than as a devops solution for deploying services at scale, I’d love to chat…:-)

PS for several other examples of using Docker for desktop apps, including accessing GUI based apps using X WIndows / X11, see Jessie Frazelle’s post Docker Containers on the Desktop.

PPS See also More Docker Doodlings – Accessing GUI Apps Via a Browser from a Container Using Guacamole for an attempt at exposing a GUI based app, such as Audacity, running in a container via a browser. Note that I couldn’t get a shared folder or the audio to work, although the GUI bit did…

PPPS I wondered how easy it would be to run command-line containers from within Jupyter notebook itself running in inside a container, but got stuck. Related question on Stack Overflow here.

The rest of the way this post is published is something of an experiment – everything below the line is pulled in from a gist using the WordPress – embedding gists shortcode…


Accessing a Neo4j Graph Database Server from RStudio and Jupyter R Notebooks Using Docker Containers

In Getting Started With the Neo4j Graph Database – Linking Neo4j and Jupyter SciPy Docker Containers Using Docker Compose I posted a recipe demonstrating how to link a Jupyter notebook container with a neo4j container to provide a quick way to get up an running with neo4j from a Python environment.

It struck me that it should be just as easy to launch an R environment, so here’s a docker-compose.yml file that will do just that:

neo4j:
  image: kbastani/docker-neo4j:latest
  ports:
    - "7474:7474"
    - "1337:1337"
  volumes:
    - /opt/data

rstudio:
  image: rocker/rstudio
  ports:
    - "8787:8787"
  links:
    - neo4j:neo4j
  volumes:
    - ./rstudio:/home/rstudio

jupyterIR:
  image: jupyter/r-notebook
  ports:
    - "8889:8888"
  links:
    - neo4j:neo4j
  volumes:
    - ./notebooks:/home/jovyan/work

If you’re using Kitematic (available via the Docker Toolbox), launch the docker command line interface (Docker CLI), cd into the directory containing the docker-compose.yml file, and run the docker-compose up -d command. This will download the necessary images and fire up the linked containers: one running neo4j, one running RStudio, and one running a Jupyter notebook with an R kernel.

You should then be able to find the URLs/links for RStudio and the notebooks in Kitematic:

Screenshot_12_04_2016_08_59

Once again, Nicole White has some quickstart examples for using R with neo4j, this time using the Rneo4j R package. One thing I noticed with the Jupyter R kernel was that I needed to specify the CRAN mirror when installing the package: install.packages('RNeo4j', repos="http://cran.rstudio.com/")

To connect to the neo4j database, use the domain mapping specified in the Docker Compose file: graph = startGraph("http://neo4j:7474/db/data/")

Here’s an example in RStudio running from the container:

RStudio-neo4j

And the Jupyter notebook:

neo4j_R

Notebooks and RStudio project files are shared into subdirectories of the current directory (from which the docker compose command was run) on host.

Tutum Cloud Becomes Docker Cloud

In several previous posts, I’ve shown how to launch docker containers in the cloud using Tutum Cloud, for example to run OpenRefine, RStudio, or Shiny apps in the cloud. Docker bought Tutum out several months ago, and now it seems that they’re running the tutum environment as Docker Cloud (docs, announcement).

Release_Notes

Tutum – as was – looks like it’s probably on the way out…

Tutum

The onboarding screen confirms that the familiar Tutum features are available in much the same form as before:

Welcome_to_Docker_Cloud

The announcement post suggests that you can get a free node and one free repository, but whilst my Docker hub images seemed to have appeared, I couldn’t see how to get a free node. (Maybe I was too quick linking my Digital Ocean account?)

As with Tutum, (because it is, was, tutum!), Docker Cloud provides you with the opportunity to spin up one or more servers on a linked cloud host and run containers on those servers, either individually or linked together as part of a “stack” (essentially, a Docker Compose container composition). You also get an image repository within Docker Cloud – mine looks as if it’s linked to my DockerHub repository:

Repository_dashboard___Docker_Cloud_and_various

A nice feature of this is that you can 1-click start a container from your image repository if you have a server node running.

The Docker Cloud service currently provides a rebranding of the Tutum offering, so it’ll be interesting to see if product features continue to be developed. One thing I keep thinking might be interesting is a personal MyBinder style service that simplifies the deploy to Tutum (as was) service and allows users to use linked hosts and persistent logins to 1-click launch container services with persistent state (for example, Launch Docker Container Compositions via Tutum and Stackfiles.io – But What About Container Stashing?). I guess this would mean linking in some cheap storage somehow, rather than having to keep server nodes up to persist container state? By the by, C. Titus Brown has some interesting reflections on MyBinder here: Is mybinder 95% of the way to next-gen computational science publishing, or only 90%?

If nothing else, signing up to Docker Cloud does give you a $20 Digital Ocean credit voucher that can also be applied to pre-exisiting Digital Ocean accounts:-) (If you haven’t already signed up for Digital Ocean but want to give it a spin, this affiliate link should also get you $10 free credit.)

PS as is the way of these things, I currently run docker containers in the cloud on my own tab (or credit vouchers) rather than institutional servers, because – you know – corporate IT. So I was interested to see that Docker have also recently launched a Docker DataCenter service (docs), and the associated promise of “containers-as-a-service” (CaaS), that makes it easy to offer cloud-based container deployment infrastructure. Just sayin’…;-)

PPS So when are we going to get Docker Cloud integration in Kitematic, or a Kitematic client for Docker Cloud?