Accessing a Legacy Windows Application Running Under Wine On A Containerised, RDP Enabled Desktop In a Browser Via A Guacamole Server Running in a Docker Container

I finally got round to finding, and fiddling with, an Apache Guacamole container that I could actually make sense of and it seems to work, with audio, when connecting to my demo RobotLab/Wine RDP desktop.

The container I tried is based on the Github repo oznu/docker-guacamole.

The container is started with:

mkdir guac_config
docker run -p 8080:8080 -v guac_config:/config oznu/guacamole

Login with user name and password guacadmin.

I then launched a RobotLab container that is running an RDP server:

docker run --name tm129 --hostname tm129demo --shm-size 1g -p 3391:3389 -d ousefulcoursecontainers/tm129rdp

Inside Guacamole, we need to create a new connection profile. From the admin drop down menu, select Settings, click on the Connections tab and create a New Connection:

Given the connection a name and specify the protocol as RDP:

The connection settings require the IP address and port noumber that the connection is to be made on. The port mapping was specified when we started the RobotLab container (3391) but what’s the network address? If we try to claim “localhost” in the Guacamole container, that refers the container’s localhost, not localhost on host. On a Mac, we can pick up the host IP address from the Network panel in the System Preferences:

Enter the appropriate connection parameters and save them:

From the admin menu, select Home. From the home screen you should be able to select the new connection…

When the connection is opened, I was presented with a warning dialogue:

but clicking OK cleared it okay…

Then I could enter the RobotLab RDP connection details (username and password are both ubuntu):

and I was in to the desktop.

The application files can be found within the File System in the /opt directory.

As mentioned previously, the base container needs some fettling… When you first run the RobotLab or Neural applications, Wine wants to do some updates (which requires a network connection). If I could figure out how to create users in the base image, rather than user creation occurring as part of the entrypoint, following the recipe here.

Although it’s a little bit ropey, the Guacamole desktop does play out audio.

RobotLab has three instructions for playing audio: sound, send and tone. The sound and send commands play an audio file, and this works, sort of (the spoken works played using the send command are, erm, very robotic!). The tone command doesn’t work, but I’ve seen in docs that this was an outstanding issue for some versions of Windows, so maybe it doesn’t work properly under Wine anyway…

Finally, I note that if you leave the remote desktop running, a screensaver kicks in…

Although the audio support isn’t brilliant (maybe there are “settings” in the container config that can improve it?) the support is more or less good enough, as is, for audio feedback / beeps etc. And just about good enough for the RobotLab activities.

What this means is that now I do have a route for running RobotLab, via a browser, with sort of desktop support.

One other thing to note relates to the network addressing. If I start the Guacamole and RobotLab containers together via a docker-compose.yml file, I’m guessing I should be able to define a Docker Compose network to connect them and use that as the network address/alias name in the Guacamole connection setting?

But I’m creatively drained atm and can’t face trying to get anything else working today…

PS another remote desktop protocol, SPICE, which I found via mentions in OpenStack docs…: [t]he SPICE project aims to provide a complete open source solution for remote access to virtual machines in a seamless way so you can play videos, record audio, share usb devices and share folders without complications [docs and a simple looking howto]. Not sure how deprecated / live this is?

Viewing Dockerised Desktops via an X11 Bridge, novnc and RDP, Sort of…

So… the story so far…

As regular readers of this blog will know, I happen to be of the opinion that we should package more of OUr software using Docker containers, for a couple of reasons (at least):

  • we control the software environment, including all required dependencies, and avoiding any conflicts with preinstalled software;
  • the same image can be used to launch containers locally or remotely.

I also happen to believe that we should deliver all UIs through a browser. Taken together with the containerised services, this means that students just need a browser to run course related software. Which could be on their phone, for all I care.

I keep umming and ahhing about electron apps. If the apps that are electron wrapped are also packaged so that they can also be run as a service and accessed via a browser too, that’s fine, I guess…

There are some cases in which this won’t work. For example, not all applications we may want to distribute come with an HTML UI, but instead may be native applications (which is an issue becuase we are supposed to be platform independent), or cross platform applications that use native widgets (for example, Java apps, or electron apps).

One way round this is to run a desktop application in container and then expose its UI using X11, (aka the X Window System), although this looks like it may be on the way out in favour of other windowing alternatives, such as Wayland… See also Chrome OS Is Working To Remove The Last Of Its X11 Dependencies. (I am so out of my depth here!)

Although X11 does provide a way of rendering windows created on a remote (or containerised guest) system using native windows on your own desktop, a downside is that it requires X11 support on your own machine; and I haven’t found a cross-platform one that looks to be a popular de facto standard.

Another approach is to use VNC, in which the remote (or guest) system sends a compressed rendered version of the desktop back to your machine, which then renders it. (See X11 on Raspberry Pi – remote login from your laptop for a discussion of some of the similarities and differences between X11 and VNC.)

Note to self – one of the issues I’ve had with VNC is the low screen resolution of the rendered desktop… but is that just because I used a default low resolution in the remote VNC server? Another issue I’ve had in the past with novnc, a VNC client that renders desktops using HTML via a browser window, relates to video and audio support… Video is okay, but VNC doesn’t do audio?

Earlier today, I came across x11docker, that claims to run GUI applications and desktops in docker (though on Windows and Linux desktops only. The idea is that you “just type x11docker IMAGENAME [COMMAND]” to launch a container and an X11 connection is made that allows the application to be rendered in a native X11 window. You can find a recipe for doing something similar on a Mac here: Running GUI’s with Docker on Mac OS X.

But that all seems a little fiddly, not least because of a dependency on an X11 client which might need to be separately installed. However, it seems that we can use another Docker container — JAremko/docker-x11-bridge — running xpra (“an open-source multi-platform persistent remote display server and client for forwarding applications and desktop screens”) as bridge that can connect to an X11 serving docker container and render the desktop in a browser.

For example, Jess Frazelle’s collection of Dockerfiles containerise all manner of desktop applications (though I couldn’t get them all to work over X11; maybe I wasn’t starting the containers correctly?). I can get them running, in my browser, by starting the bridge:

docker run -d \
 --name x11-bridge \
 -e MODE="tcp" \
 -e XPRA_HTML="yes" \
 -e DISPLAY=:14 \
 -p 10000:10000 \
 jare/x11-bridge

and then firing up a couple of applications:

docker run -d --rm  \
  --name firefox \
  --volumes-from x11-bridge \
  -e DISPLAY=:14 \
  jess/firefox

docker run -d --rm  \
  --name gimp \
  --volumes-from x11-bridge \
  -e DISPLAY=:14 \
  jess/gimp

#Housekeeping
#docker kill gimp firefox
#docker rm gimp firefox
#docker rmi jess/gimp jess/firefox

Another approach is to use VNC within a container, an approach I’ve used with this DIT4C Inspired RobotLab Container/ (The DIT4C container is quite old now; perhaps there’s something more recent I should use? In particular, audio support was lacking.)

It’s been a while since I had a look around for good examples of novnc containers, but this Collection of Docker images with headless VNC environments could be a useful start:

Desktop environment Xfce4 or IceWM
VNC-Server (default VNC port 5901)
noVNC – HTML5 VNC client (default http port 6901)

The containers also allow screen resolution and colour depth to be set via environment variables. The demo seems to work (without audio) using novnc in a browser, and I can connect using TigerVNC to the VNC port, though again, without audio support.

Audio is a pain. On a Linux machine, you can mount an audio device when you start a novnc container (eg fcwu/docker-ubuntu-vnc-desktop) but I’m not sure if that works on a Mac? Or how it’d work on Windows?) That said, a few years ago I did find a recipe for getting audio out of a remote container that did seem to work — More Docker Doodlings – Accessing GUI Apps Via a Browser from a Container Using Guacamole — although it seems to be broken now (did the container format change in that period I wonder?). Is there a more recent (and robust) variant of this out there somewhere, I wonder?

Hmm… here’s another approach: using a remote desktop client. Microsoft produce RDP (Remote Desktop Protocol) clients for different platforms so that might provide a useful starting point.

This repo — danielguerra69/firefox-rdp — builds on danielguerra69/dockergui (fork of this) and shows how to create a container running Firefox that can be accessed via RDP. If I run it:

docker run --rm -d --shm-size 1g -p 3389:3389 --name firefox danielguerra/firefox-rdp

I can create a connection using the Microsoft remote desktop client at the address localhost:3389, login with my Mac credentials, and then use the application. Testing on Youtube shows that video and audio work too. So that’s promising…

(Docker housekeeping: docker kill firefox; docker rm firefox; docker rmi danielguerra/firefox-rdp.)

Hmmm… so maybe now we’re getting somewhere more recent. Eg danielguerra69/ubuntu-xrdp although this doesn’t render the desktop properly for me, and danielguerra69/alpine-xfce4-xrdp doesn’t play out the audio? Ah, well… I’ve wasted enough time on this for today…

On Not Faffing Around With Jupyter Notebook Docker Container Auth Tokens

Mark this post as deprecated… There already exists an easy way of setting the token when starting one of the Jupyter notebook Docker containers: -e JUPYTER_TOKEN="easy; it's already there". In fact, things are even easier if you export JUPYTER_TOKEN='easy' in the local environment, and then start the container with docker run --rm -d --name democontainer -p 9999:8888 -e JUPYTER_TOKEN jupyter/base-notebook (which is equivalent to -e JUPYTER_TOKEN=$JUPYTER_TOKEN). You can then autolaunch into the notebook with open "http://localhost:9999?token=${JUPYTER_TOKEN}". H/t @minrk for that…

[UPDATE: an exercise in reinventing the wheel… This is why I should really do something else with my life…]

I know they’re there for good reason, but starting the official Jupyter containers requires that you enter a token created when you launch the container, which means you need to check the docker logs…

In terms of usability, this is a bit of a faff. For example, the example URL is not necessarily the correct one (it specifies the port the notebook is running on inside the container rather than the exposed port you have mapped it to.

If you start the container with a -d flag, you don’t see the token (something that looks like the token is printed out but it’s not the token, it’s docker created…). However, you can see the log stream containing the token using Kitematic.

If you go directly to the notebook page without the token argument, you’ll need to login with it, or with a default password (which is not set in the official Jupyter Docker images).

To provide continued authenticated access, you also have the opportunity at the bottom of that screen to swap the token for a new password (this is via the c.NotebookApp.allow_password_change setting which by default is set to True):

I think the difference between default token and password is that in the config file, if you specify a token via the c.NotebookApp.token argument, you do so in plain text, whereas the c.NotebookApp.password  setting takes an MD5 hashed value. If you set c.NotebookApp.token='', you can get in without a token. For a full set of config settings, see the Jupyter notebook config file and command line options.

So, can we balance the need for a small amount security without going to the extreme of disabling auth altogether?

Here’s a Dockerfile I’ve just popped together that allows you to build a variant of the official containers with support for tokenless or predefined token access:

#Dockerfile
FROM jupyter/minimal-notebook

#Configure container to support easier access
ARG TOKEN=-1
RUN mkdir -p $HOME/.jupyter/
RUN if [ $TOKEN!=-1 ]; then echo "c.NotebookApp.token='$TOKEN'" >> $HOME/.jupyter/jupyter_notebook_config.py; fi

We can then build variations on a theme as follows by running the following build commands in the same directory as the Dockerfile:

# Automatically generated token (default behaviour)
docker build -t psychemedia/quicknotebook .

# Tokenless access (no auth)
docker build -t psychemedia/quicknotebook --build-arg TOKEN='' .

# Specified one time token (set your own plain text one time token)
docker build -t psychemedia/quicknotebook --build-arg TOKEN='letmein' .

And some more handy administrative commands, just for the record:

#Run the container
docker run --rm -d -p 8899:8888 --name quicknotebook psychemedia/quicknotebook
##Or:
docker run --rm -d --expose 8888 --name quicknotebook psychemedia/quicknotebook

#Stop the container
docker kill quicknotebook

#Tidy up after running if you didn't --rm
docker rm quicknotebook

#Push container to Docker hub (must be logged in)
docker push psychemedia/quicknotebook

I’m also starting to wonder whether there’s an easy way of using Docker ENV vars (passed in the docker run command via a -e MYVAR='myval' pattern) to allow containers to be started up with a particular token, not just created with specified tokens at build time? That would take some messing around with the container start command though…

There’s a handy guide to Dockerfile ARG and ENV vars here: Docker ARG vs ENV.

Hmm… looking at the start.sh script that runs as part of the base notebook start CMD, it looks like there’s a /usr/local/bin/start-notebook.d/ directory that can contain files that are executed prior to the notebook server starting…

So we can presumably just hack that to take an environment variable?

So let’s extend the Dockerfile:

ENV TOKEN=$TOKEN
USER root
RUN mkdir -p /usr/local/bin/start-notebook.d/
RUN echo  "if [ \$TOKEN!=-1 ]; then echo \"c.NotebookApp.token='\$TOKEN'\" >> $HOME/.jupyter/jupyter_notebook_config.py; fi" >> /usr/local/bin/start-notebook.d/tokeneffort.sh
RUN chmod +x /usr/local/bin/start-notebook.d/tokeneffort.sh
USER $NB_USER

Now we should also be able to set a one time token when we run the container:

docker run -d -p 8899:8888 --name quicknotebook -e TOKEN='letmeout' psychemedia/quicknotebook

Useful? [Not really, completely pointless; passing the token as an environment variable is already supported (which raises the question; how come I’ve kept missing this trick?!) At best, it was a refresher in the use of Dockerfile ARG and ENV vars.]

Docker Housekeeping _ Removing Old Images and Containers

Some handy commands for tidying up old Docker images and containers…

Remove Named Images


docker rmi `docker images --filter 'dangling=true' -q --no-trunc`

Remove Images With a Particular Name Pattern

Ish via here.

For example, removing repo2docker default named containers which start r2d:


docker images | awk '{ print $1,$2, $3 }' | grep r2d | awk '{print $3 }' | xargs -I {} docker rmi {}

For added aggression, use rmi -f {}.

Remove Containers With a Particular Name Pattern

Via here.


docker ps -a | awk '{ print $1,$2 }' | grep r2d | awk '{print $1 }' | xargs -I {} docker rm {}

Remove Exited Containers


docker ps -a | grep Exit | cut -d ' ' -f 1 | xargs docker rm

PS lots of handy tips here: https://www.digitalocean.com/community/tutorials/how-to-remove-docker-images-containers-and-volumes

BYOA (Bring Your Own Application) – Running Containerised Applications on the Desktop

Imagine this: you’re on a Mac, though at times you’re on a Windows box, and at other times you’ve just got a browser to hand. You’re happy using applications that get stuff done, even if it means using different tools for different purposes, but you get fed up having to find applications that you can install across all the different platforms you work on. Not least because sometimes the things don’t seem to want to install with all sorts of other stuff being installed first.

Or maybe this: you want to run some data analysis using Python, or R, in Jupyter notebooks, or RStudio, and the dataset you want to work with is quite large; which means you also need to hook up to a database, or a query engine, and get the data into a form that the database or query engine can cope with, as well as hooking up your analysis environment (RStudio, or Jupyter notebooks) to what is now a queryable datasource.

So what do you do?

Here’s what I do: I use a technology that’s been developed for another purpose, and I make it personal.

The technology is – are – Docker containers. Docker is a brand, but in part it’s the same way that Hoover, and more recently, Dyson, are brands. Say: Hoover, or Dyson, and you quite possibly mean: vacuum cleaner. Say “Docker”, and you quite possibly mean lightweight virtual machine using Linux Containers, or Open Container Initiative (OCI) containers. But Docker is easier to say/remember. And it also refers to a complete technology stack – and toolset – for working with those containers. (Read more….)

(You can skip this paragraph – it’s not how I use Docker at all…) Docker is most widely used by the people who develop web applications, and who keep web applications running at scale. It’s a devops tech (devops, short for development/operations). The devs hack the code, the ops folk get that code running for users. Traditionally, these were separate teams; now, there is some overlap, particularly if you’re running on agile and continuous integration and/or continuous deployment are your thing. Docker, in short, is something your more curious IT folk may know something about. Or not.

But I don’t care about that. Not least, because they probably don’t think about Docker in the way I do…

What I care about are some of the features that Docker has, and how I can use those features to make my own life easier, irrespective of how the folk who develop Docker, or the folk in IT, think it should be used; which is probably more along the lines of helping enterprise IT deliver enterprise across the whole organisation, rather than supporting personal, DIY, BYOA (“bring your own app”) IT that works at an individual level in the form of end-user applications, or personal digital workbenches (I’ll come on to those in a moment…)

Take the first example mentioned at the start – wanting to run the same application pretty much anywhere, without the installation grief. For example, suppose you wanted to run the OpenRefine, a rather handy browser based tool for cleaning and filtering datasets. Installing OpenRefine requires that you download and install an appropriate version of Java, as well as downloading and installing OpenRefine. What do you do?

Here’s what I do: I use Docker (which yes, does require me to download and install Docker. But I’ll pick up on that later…) And I use Kitematic (which, admittedly, I have to install from the Docker toolbar menu; note I could also launch containers from the command line…).

But first a bit of jargon…

Docker Images versus Docker Containers

One way off thinking of a docker image is as an installer file. It’s the thing you download that you can then use to install your application. In the case of docker, the image file identifies the operating system requirements of the application, as well as the application code itself. The docker application uses this information to launch a docker container, which is an instance of your application. Note that you can run multiple instances of the same image at the same time (that is, multiple containers launched from the same image). This is different to the model of having multiple documents open in the same application, such as multiple spreadsheets or word documents open in Microsoft Excel or Word. Instead, it would be akin to running several different instances of Excel or Word, with the documents open in one copy of Word isolated from and invisible to another running copy of Word.

The containers themselves can also be running or stopped; a stopped container is one that is essentially hibernating, and whose operation can be resumed. Any changes made to the contents of a container are retained if the container is stopped and restarted. Containers can also be destroyed, which is like uninstalling the application and removing all its install files. When a new container is launched, it’s a bit like reinstalling and starting a fresh installation of the application from the original installer (the docker image).

Kitematic

Kitematic is a desktop application (for Mac and Windows) that looks a bit like an ap store.

Using Kitematic, I can search for an OpenRefine image – in this case, ou-tm351-openrefine-test, which happens to be an image I created myself – that has been published via the Dockerhub public docker image registry.

I create the container – which downloads a Docker image that contains a self-contained operating system and all the files that OpenRefine needs to run as well as the application itself – and starts it running.

I copy the URL – in this case localhost:32771, and paste it into my browser location bar:

You may notice the Kitematic page also had a folder highlighted – this is a folder shared between the container and my desktop. Project files will be save to that folder. When I have finished with the application, I can stop it, and then restart it later. Or I can destroy it. This kills the container and removes all its files, but the base image is still on my computer. If I want to run OpenRefine again, I won’t need to download any files. But – and this is important – the application container that is started up will be completely fresh, run as-if for the the first time from the image I downloaded. Like a reinstalled application. Which is effectively what it is. But the files stored on my desktop will be shared back with it.

Stopping/starting or restarting the container is like switching it off and on again. Deleting the container in Kitematic, then searching for it again and creating a new container uses the previously downloaded files and essentially reinstalls the application.

If I want to run another application, I can  – there are lots of images available. For example, if you want to run a local version of the Dillinger markdown editor, you can find the original of that:

Note, however, that here the lack of love Kitematic has received in the last few years – I imagine because Docker are more interested in enterprise use of Docker, not personal use – because if you just try to create the container, it won’t work (you’ll get an error message). Instead, you need to specify the version of the editor you want to use explicitly. Click the dots and in order to select a tag (that is, a version):

Choose the tag/version, and close the selection dialogues – then click on the Create button:

 

This time, the homescreen is a little more helpful, providing a link that will launch the app in the browser. You may also get a preview of the screen (sometimes this doesn’t appear; click the Settings tab, then the Home tab, and it may refresh. As I mentioned above, Kitematic doesn’t seem to have had a lot of love since Docker bought it…)

Click the link button, and you should be redirected to the app running in your browser:

So that’s part of the story…

Viewing Application UIs and Launching Applications from Shortcuts

As well as using Kitematic to launch containerised applications, we can also use desktop shortcuts. I mentioned how we can use docker commands on the command like to launch an application – what Kitematic does is simply generate and execute those commands, and then display status and configuration information about the running container – so we can just pop equivalent commands into a shortcut, click on the shortcut and launch the application.

This is what Jessie Frazelle does with her Docker Containers on the [Mac] Desktop. The corresponding desktop-container library shelf is here: jessfraz/dockerfiles and associated shortcut definitions here: desktop-container commands. The applications are packaged to use X11 to expose the graphical user interface, so if you want to run RStudio, or Libre Office or Audacity in a desktop-container on a Mac, that could be a great place to start. Bits of guidance on getting up and running with X11 applications on OS/X can be found here: Docker for Mac and GUI applications and here: Bring Linux apps to the Mac Desktop with Docker, although this recipe seems most complete: Containerize GUI Applications on Mac.

Note: in testing, I had to install XQuartz (brew install xquartz), enabling the Security tab preference to Allow connections from network client, and socat (brew install socat), reboot, and then run socat TCP-LISTEN:6000,reuseaddr,fork UNIX-CLIENT:\"$DISPLAY\" (which is blocking) in a terminal on its own; in another terminal, I found the local IP address as ip=$(ifconfig en0 | grep inet | awk '$1=="inet" {print $2}') and could then run a docker command of the form docker run --name firefox -e DISPLAY=$ip:0 -v /tmp/.X11-unix:/tmp/.X11-unix jess/firefox which starts up XQuartz if it’s not already running; note also that on occasion I kept getting situations with the error socat[1739] E bind(5, {LEN=0 AF=2 0.0.0.0:6000}, 16): Address already in use; the netstat command shows port 60000 waiting, but after a few minutes there’s a time out and the port is freed up; closing the docker app window also leaves the old, named container around, which needs removing with something of the form docker rm firefox before the app can be run again. Note that there is no --device /dev/snd mountpoint for docker in OS/X so no easy way to pass through audio? I’m not sure how well this would work on a Windows machine, but presumably there are X11 clients for Windows that aren’t too painful to install? I’m not sure what socat equivalent would be if the client was only listening privately and needed a tunnel from the open X11 port 6000 to the socket where the client listens for connections? (Nothing needed with xming e.g. as suggested here?

(Hmmm, thinks: a version of something like Kitematic that also bundles an X11 client could be really handy? )

Whereas Jessie Frazelle’s desktop-containers make use of the X11 windowing system to all her host operating system to display the UI of the applications running inside a container, I prefer to use applications that expose their UI via http using a browser. Where this is not possible – as for example in the case of the Audacity audio editor, it is often possible to run the application within the container and then render the desktop running in the container via a browser window using an HTML remote desktop viewer such as Apache Guacamole. Here’s an old proof of concept: Accessing GUI Apps Via a Browser from a Container Using Guacamole.

Creating a Docker Library Shelf

Another part of the story is that you can build your own images and either share them publicly via the Dockerhub registry, keep them locally on your own computer, post them to a private Dockerhub repository (you  get a single private repository as part of the Dockerhub free plan, or can pay for more…), or run your own image registry.

Using this latter option – running your own Docker registry – means you can essentially run your own digital application shelf, although I don’t think you’ll be able to use Kitematic with it without forking your own version of it to use your registry (issue)? Instead, you have to use the command line. Again, the lack of support for “personal users” of Docker is an issue, but that’s another reason to try to blog more recipes and find more workarounds, and maybe get the occasional feature request adopted if we can define a use case well enough…

Alternatively, you can just make the shelf available as a set of Dockerfiles and shortcuts, as Jessie Frazelle has done with her jessfraz/dockerfiles Github repo.

There are a couple more posts to come in this series: one looking at how we can run Docker containers on a remote host (“in the cloud”); another looking at how we can link multiple applications to provide our own linked application, digital workbenches, within which linked applications can easily share files and connections between themselves.

Rolling Your Own Jupyter and RStudio Data Analysis Environment Around Apache Drill Using docker-compose

I had a bit of a play last night trying to hook a Jupyter notebook container up to an Apache Drill container using docker-compose. The idea was to have a shared data volume between the two of them, but I couldn’t for the life of me get that to work using the the docker-compose version 2 or 3 (services/volumes) syntax – for some reason, any of the Apache Drill containers I tried wouldn’t fire up properly.

So I eventually (3am…:-( went for a simpler approach, synching data through a local directory on host.

The result is something that looks like this:

The Apache Drill container, and an Apache Zookeeper container to keep it in check, I found via Dockerhub. I also reused an official RStudio container. The Jupyter container is one I rolled for TM351.

The Jupyter and RStudio containers can both talk to the Apache Drill container, and both analysis apps have access to their own data folder mounted in an application folder in the current directory on host.The data folders mount into separate directories in the Apache Drill container. Both applications can query into data files contained in either data directory as viewable from Apache Drill.

This is far from ideal, but it works. (The structure is as suggested so that RStudio and Jupyter scripts can both be used to download data into a data directory viewable from the Apache Drill container. Another approach would be to mount a separate ./data directory and provide some means for populating it with data files. Alternatively, if the files already exist on host,  mounting the host data directory onto a /data volume in the Apache Drill container would work too.

Here’s the docker-compose.yaml file I’ve ended up with:

drill:
  image: dialonce/drill
  ports:
    - 8047:8047
  links:
    - zookeeper
  volumes:
    -  ./notebooks/data:/nbdata
    -  ./R/data:/rdata

zookeeper:
  image: jplock/zookeeper

notebook:
  container_name: notebook-apache-drill-test
  image: psychemedia/ou-tm351-jupyter-custom-pystack-test
  ports:
    - 35200:8888
  volumes:
    - ./notebooks:/notebooks/
  links:
    - drill:drill

rstudio:
  container_name: rstudio-apache-drill-test
  image: rocker/tidyverse
  environment:
    - PASSWORD=letmein
  #default user is: rstudio
  volumes:
    - ./R:/home/rstudio
  ports:
    - 8787:8787
  links:
    - drill:drill

If you have docker installed and running, running docker-compose up -d in the folder containing the docker-compose.yaml file will launch three linked containers: Jupyter notebook on localhost port 35200, RStudio on port 8787, and Apache Drill on port 8047. If the ./notebooks, ./notebooks/data, ./R and ./R/data subfolders don’t exist they will be created.

We can use the clients to variously download data files and run Apache Drill queries against them. In Jupyter notebooks, I used the pydrill package to connect. Note the hostname used is the linked container name (in this case, drill).

If we download data to the ./notebooks/data folder which is mounted inside the Apache Drill container as /nbdata, we can query against it.

(Note – it probably would make sense to used a modified Apache Drill container configured to use CSV headers, as per Querying Large CSV Files With Apache Drill.)

We can also query against that same data file from the RStudio container. In this case I used the DrillR package (I had hoped to use the sergeant package (“drill sergeant”, I assume?! Sigh..;-) but it uses the RJDBC package which expects to find java installed, rather than DBI, and java isn’t installed in the rocker/tidyverse container I used.) UPDATE: sergeant now works without Java dependency... Thanks, Bob:-)

I’m not sure if DrillR is being actively developed, but it would be handy if it could return the data from the query as a dataframe.

So , getting up and running with Apache Drill and a data analysis environment is not that hard at all, if you have docker installed:-)

PS 8-9/18 – Seems like sergeant has moved on somewhat (Updates to the sergeant (Apache Drill connector) Package & a look at Apache Drill 1.14.0 release) and now lets you make calls from R into Apache Drill running in a disposably launched container: Driving Drill Dynamically with Docker and Updating Storage Configurations On-the-fly with sergeant. See also this Using Apache Drill with R cookbook.

PPS Jupyter drill magic: https://github.com/JohnOmernik/jupyter_drill Seems like I tried to Binderise it here: https://github.com/ouseful-PR/jupyter_drill/tree/binderise

Querying Panama Papers Neo4j Database Container From a Linked Jupyter Notebook Container

A few weeks ago I posted some quick doodles showing, on the one hand, how to get the Panama Papers data into a simple SQLite database and in another how to link a neo4j graph database to a Jupyter notebook server using Docker Compose.

As the original Panama Papers investigation used neo4j as its backend database, I thought putting the data into a neo4j container could give me the excuse I needed to start looking at neo4j.

Anyway, it seems as if someone has already pushed a neo4j Docker container image preseeded with the Panama Papers data, so here’s my quickstart.

To use it, you need to have Docker installed, download the docker-compose.yaml file and then run:

docker-compose up

If you do this from a command line launched from Kitematic, Kitematic should provide you with a link to the neo4j database, running on the Docker IP address and port 7474. Log in with the default credentials ( neo4j / neo4j ) and change the password to panamapapers (all lower case).

Download the quickstart notebook into the newly created notebooks directory, and you should be able to see it from the notebooks homepage on Docker IP address port 8890 (or again, just follow the link from Kitematic).

I’m still trying to find my way around both the py2neo Python wrapper and the neo4j Cypher query language, so the demo thus far is not that inspiring!

And I’m not sure when I’ll get a chance to look at it again…:-(