Quandary… To Pandoc or Not? (Yet…)

Whilst listening in, via Skype, on the School meeting yesterday, I treated it as radio and also started tinkering with an XSLT converter for transforming OU-XML to something I can get into a Jupyter notebook form. (If anyone can point me to official OU XSLT transformers for OU-XML, that’d be really useful…)

I’m 15 years out of XSLT, so I’ve started with an easy converter into HTML that is most of the way there now for common OU-XML elements, as well as one that will convert into a markdown format supported by Jupytext, which would allow me to go OU-XML-md-ipynb. I also wonder if an OU-XML-ipynb (JSON) rout might be a useful exercise.

But then I started wondering… would it make more sense to try to get it into Pandoc? Pandoc recently announced Jupyter notebook/ipynb support as a native converter, so what are the routes in and out of pandoc?

Poking around, it seems that pandoc represents things internally using its own AST (abstract syntax tree). Pandoc filters let you write your own output filters for converting documents represented using the AST in whatever format you want. There are a couple of Python packages that support writing pandoc output filters: pandocfilters, which includes examples, and panflute (docs), which has a separate examples repo; there’s also this handy overview of Technical Writing with Pandoc and Panflute.

So that’s the first question: can I write a filter to generate a valid OU-XML document? OU-XML probably has some structural elements that are not matched by pandoc AST elements, but can these be encoded somehow as extensions to the AST, or represented as text elements in documents produced by pandoc that could be post-processed into OU-XML elements?

Going the other way, it seems that pandoc can ingest a JSON format that serialises the Pandoc AST structure, so if I can convert OU-XML into that then it would make life a lot easier for generating a wide range of output document formats from OU-XML.

The AST is documented here; we can also output documents in the serialised AST/json by using the json output format which could provide a useful crib…

So here’s the quandary… do I spend the rest of the morning finishing of my hack XSLT converter, or do I switch track and try to go down the pandoc route? Hmmm… maybe I should finish what I started: it’ll give me a bit more XSLT practice and should result in enough of an approximation of OU-XML content in notebooks that we can start to see whether that sort of conversion even makes sense.

MyBinder Launches From Any git Repository: Github, Gists, GitLab, Bitbucket etc

By default, MyBinder looks to repositories on Github for its builds, but it can also build from Githubs gists, GitLab.com repositories, and, well, any git repository with a networked endpoint, it seems:

What prompted me to this was looking for a way to launch a MyBinder container from Bitbucket. (For the archaeologists, there are various issues and PRs (such as here, and here, as well as this recent forum post — How to use bitbucket repositories on mybinder.org — that trace some of the history…)

So what’s the trick?

For now, you need to get hold of the URL to a particular Bitbucket repo commit. For example, to try running this repo you need to co to the Commits page and grab the URL for the most recent master commit (or whichever one you you want) which will contain the commit hash:

For example, soenthing like https://bitbucket.org/ueacomputervision/image-labelling-tool/commits/f3ddb33e4839f8a0fe73c168993b405adc13daf0 gives the commit hash f3ddb33e4839f8a0fe73c168993b405adc13daf0.

For the repo base URL https://bitbucket.org/ueacomputervision/image-labelling-tool, the MyBinder launch link then takes on the form:


which is to say:


But it does look like things may get easier in the near future…

Feeding a MyBinder Container Built From One Github Repository With the Contents of Another

Long time readers should be more than well aware by now of MyBinder, the Jupyter project service that will build a Docker image from the contents of a git repository and then launch a container based on that image so you can work with a live, running, albeit temporary, instance if it.

But that’s not all it can do…

Via Chris Holdgraf on the Jupyter discourse community site (Tip: embed custom github content in a Binder link with nbgitpuller), comes a magical trick whereby you can launch a MyBinder instance built from one repository and populate it with files from another.

Why’s this useful? Well, if you’ve had a play with your own repos using MyBinder, you’ll know that each time you make a change to a repository, MyBinder will want to rebuild the Docker image next time you try to launch the repo there.

So if your repo defines a complex build that takes some time to install all of its dependencies, you have to wait for that build even if all you did was correct a typo in the markdown of a notebook file.

So here’s the trick…

nbgitpuller is a Jupyter server extension that supports the “one-way synchronization of a remote git repository to a local git repository”.

There are other approaches to git syncing too. See the next edition of Tracking Jupyter to find out what they are…

Originally developed as a tool to help distribute notebooks to students, it can be called via a Jupyter server URL. For example, if you have nbgitpuller installed in a local Jupyter server running on the default port 8888, the following URL will pull data from the specified repo into the base directory the notebook server points to using a URL of the form:


One of the neat things about Binderhub / MyBinder is that can pass a git-pull? argument through as part of a MyBinder launch URL, so if the repo you want to build from installs and enables nbgitpuller, you can then pull notebooks into the launched container from a second, nbgitpulled repository.

For example, yesterday I came across the Python show_ast package, and incorporated IPython magic,  that will render the abstract syntax tree of a Python command:

Such a thing may be useful in an introductory programming course (TBH, I’m never really sure what people try to teach in introductory programming courses, what the useful mental models are, how best to help folk learn them, and how to figure out how to teach them…).

As with most Python based repos, particularly ones that contain Jupyter notebooks (that is, .ipynb files [….thinks… ooh… I has a plan to try s/thing else too….]) I generally try to “run” them via MyBinder. In this case, the repo didn’t work because there is a dependency on the Linux graphviz apt package and the Python graphviz package.

At this point, I’d generally fork the repo, create a binderise branch containing the dependencies, then try that out on MyBinder, sometimes adding an issue and/or making a pull request to the original repository suggesting they Binderise it…

…but nbgitpuller provides a different opportunity. Suppose I create a base container that contains the Graphviz Linux application and the graphivz Python package. Something like this: ouseful-testing/binder-graphviz.

Then I can create a MyBinder session from that repo and pull in the show_ast package from its repo and run the notebook directly:


Fortuitously, things work seemlessly in this case because the example notebook lives in directory where we can import show_ast without the need to install it (otherwise we’d have needed to run pip install . at the top level of the repo). In general, where notebooks are kept in a notebooks or docs directory, for example, the path to import the package would break. (Hmmm… I need to think about protocols for handling that… It’s better practise to put the notebooks somewhere but that means we need to install the package or change the import path to it, which is one more step for folk to stumble over…)

Thinking about my old show’n’tell repo, the branches of which scruffily define various Binder environments suited to particular topic areas (environments for working on chemistry notebooks, for example, or astronomy notebooks, or classical language or music notebooks) and also contain demo notebooks, I could instead just define a set of base Binder environment containers, slow to build but built infrequently, and then lighter weight notebook repos containing just demo notebooks for a particular topic area. These could then be quickly and easily updated, and run on MyBinder having been nbgitpulled by a base container, without having to rebuild the base container each time I update a notebook in a notebook repo.

A couple of other things to note here. First, nbgitpuller has its own helper for creating nbgitpuller URLs, the nbgitpuller link generator:

It’s not hard to imagine a similar UI, or another tab to that UI, that can build a MyBinder link from a “standard” base container selected from a dropdown menu (or an optional link to a git repo) and then a provided git repo link for the target content repo.

Second, this has got me thinking about how we (don’t) handle notebook distribution very well in the OU.

For our TM351 internal course, we control the student’s computing environment via VM we provide them with, so we could install nbgitpuller in it, but the notebooks are stored in a private Github repo and we don’t want to give students any keys to it at all. (For some reason, I seem to be the only person who doesn’t have a problem with the notebooks being in a public repo!;-)

For our public notebook utilising courses on FutureLearn or OpenLearn, the notebooks are in a public repo, but we don’t have control of the learners’ computing environments, (which is to say, we can’t preinstall nbgitpuller and can’t guarantee that learners will have permissions of network access to install it themselves).

It’s almost as if various pieces keep appearing, but the jigsaw never quite seems to fit together…

Accessing a Legacy Windows Application Running Under Wine On A Containerised, RDP Enabled Desktop In a Browser Via A Guacamole Server Running in a Docker Container

I finally got round to finding, and fiddling with, an Apache Guacamole container that I could actually make sense of and it seems to work, with audio, when connecting to my demo RobotLab/Wine RDP desktop.

The container I tried is based on the Github repo oznu/docker-guacamole.

The container is started with:

mkdir guac_config
docker run -p 8080:8080 -v guac_config:/config oznu/guacamole

Login with user name and password guacadmin.

I then launched a RobotLab container that is running an RDP server:

docker run --name tm129 --hostname tm129demo --shm-size 1g -p 3391:3389 -d ousefulcoursecontainers/tm129rdp

Inside Guacamole, we need to create a new connection profile. From the admin drop down menu, select Settings, click on the Connections tab and create a New Connection:

Given the connection a name and specify the protocol as RDP:

The connection settings require the IP address and port noumber that the connection is to be made on. The port mapping was specified when we started the RobotLab container (3391) but what’s the network address? If we try to claim “localhost” in the Guacamole container, that refers the container’s localhost, not localhost on host. On a Mac, we can pick up the host IP address from the Network panel in the System Preferences:

Enter the appropriate connection parameters and save them:

From the admin menu, select Home. From the home screen you should be able to select the new connection…

When the connection is opened, I was presented with a warning dialogue:

but clicking OK cleared it okay…

Then I could enter the RobotLab RDP connection details (username and password are both ubuntu):

and I was in to the desktop.

The application files can be found within the File System in the /opt directory.

As mentioned previously, the base container needs some fettling… When you first run the RobotLab or Neural applications, Wine wants to do some updates (which requires a network connection). If I could figure out how to create users in the base image, rather than user creation occurring as part of the entrypoint, following the recipe here.

Although it’s a little bit ropey, the Guacamole desktop does play out audio.

RobotLab has three instructions for playing audio: sound, send and tone. The sound and send commands play an audio file, and this works, sort of (the spoken works played using the send command are, erm, very robotic!). The tone command doesn’t work, but I’ve seen in docs that this was an outstanding issue for some versions of Windows, so maybe it doesn’t work properly under Wine anyway…

Finally, I note that if you leave the remote desktop running, a screensaver kicks in…

Although the audio support isn’t brilliant (maybe there are “settings” in the container config that can improve it?) the support is more or less good enough, as is, for audio feedback / beeps etc. And just about good enough for the RobotLab activities.

What this means is that now I do have a route for running RobotLab, via a browser, with sort of desktop support.

One other thing to note relates to the network addressing. If I start the Guacamole and RobotLab containers together via a docker-compose.yml file, I’m guessing I should be able to define a Docker Compose network to connect them and use that as the network address/alias name in the Guacamole connection setting?

But I’m creatively drained atm and can’t face trying to get anything else working today…

PS another remote desktop protocol, SPICE, which I found via mentions in OpenStack docs…: [t]he SPICE project aims to provide a complete open source solution for remote access to virtual machines in a seamless way so you can play videos, record audio, share usb devices and share folders without complications [docs and a simple looking howto]. Not sure how deprecated / live this is?

Drawing and Writing Diagrams With draw.io

A skim back through this blog will turn up several posts over the years on the topic of “writing diagrams”, using text based scripts along with diagram generating applications to create diagrams from textual descriptions.

There are a several reasons I think such things useful, particularly in online, distance education context in an institution with a factory production model:

  1. diagram generation can be automated, with standardised style / rendering of the diagram separated from it’s logical description;
  2. maintaining diagrams is simplified: change the underlying text to change the logical content of the diagram, or change the rendering pipeline to change the output design or format;
  3. search: if you can search the text used to generate a diagram as well text that appears within a diagram, it supports discovery;
  4. accessibility: if we generate diagrams from text, there is a good chance we could also generate equivalent textual descriptions of diagrams from the same text.

Sometimes, though, it can be handy to be able to actually draw a diagram by actually drawing it, rather than generating it from a textual source.

Recently, I came across draw.io, via the jupyterlab-drawio extension. draw.io is a web-based [code] or electron app wrapped [code] SVG editor. draw.io is actually a front end application built on top of the mxGraph JavaScript diagramming library and based on the example  GraphEditor.

When you create a new diagram, you are prompted for a save location:

Local (device) storage is the default, but it looks like you can also link Google Drive or OneDrive online storage, though I haven’t tried this (yet!):

If you have a previously saved diagram, you can select it from a file browser. If you opt to create a new diagram, you can create a blank diagram with a default set of drawing tools, a particular diagram type or a diagram imported from a template URL:

If you go for the template URL option, you are prompted for the URL (I don’t know if there’s a catalogue / awesome list of template URLs anywhere?):

If you select one of the canned diagram type options, you are provided with a preview of the sorts of diagram you can create within that view:

If you click to select one of the example diagrams then click Create, the diagram editor opens with the example diagram and a set of custom diagram element options in the scratchpad sidebar:

If you don’t select a preview diagram, or you select the Blank diagram, you just get a default tool set.

Usefully for the course I’m looking at, one of the scratchpad collections provides diagram components that can be used to draw Crows Foot entity relation diagrams, as described here: Entity Relationship Diagrams with draw.io (see also: Entity Relationship Diagram (ERD)).

Clicking on an item in the toolbar previews the component and adds it to the canvas; you can also click and drag items from the sidebar and then drop them on the canvas.

(Partial) ERD diagrams can also be generated by importing database table definitions using the SQL import plugin.

Writing Diagrams

One of the nice features of draw.io is that you can also generate certain diagrams from data files. In the Arrange menu, the Insert option provides several options for importing different sorts of data or textual elements from which diagrams can be automatically generated.

Plugins extend the range of import options, as for example in the case of the sql-plugin. (The SQL plugin seems to add tables based on CREATE TABLE elements in the SQL; whilst it correctly identifies and highlights primary keys, it doesn’t identify relationships between them, so you have to add the crow’s foot lines yourself…)

See the full list of official plugins.

Data can be imported from CSV files, as described here: Automatically create draw.io diagrams from CSV files Not all columns need to be displayed; some columns may even be used to store metadata or styling information using reserved columns ( imagefill and stroke ). The first column in each row represents a node and may be styled according to details given in the styling columns.

Other columns contain values that can be included in the node or that specify which other nodes that node is connected to. Rules are used to define the styling and labelling of each edge, as well as identifying columns used to identify edge connections between nodes.

Not all columns need to be referenced / used in the diagram that is generated.

I haven’t fully explored all the possible CSV import settings yet; I’m also thinking it’d be nice if there were some Python tooling to help simplify the creation of the CSV import definition file.

(By the by, there is also a handy online CSV viewer webform available.)

As well as CSV import, UML diagrams can be generated using PlantUML, a tool for creating a wide variety of diagram types from imported UML and other diagram specifications: Use PlantUML in draw.io. (That said, when I tried with the online *draw.io editor, the PlantUML import didn’t work. It looks like it uses Graphviz underneath, so it may be something to do with that? I need to try on a local install really, or ideally in a container with JupyterLab using the jupyterlab-drawio extension.)*

Taken together, I wonder if these importers could be used with other Python tools for generating diagrams from code? e.g. could something like this approach to electrical circuit diagram generation with lcapy be used to generate diagrams that draw.io can render??

Another handy looking too comes in the form of drawio-batch, a “command line converter for draw.io diagrams” based on puppeteer (“a Node library which provides a high-level API to control Chrome”, operating it by default in headless mode) that wraps the online draw.io conversion code into an offline tool. (I’ve not had a chance to try this yet; from the tests, it looks like you call it with a draw.io XML diagram file and and output file and it gives you an output diagram back in a format corresponding to the filetype you specified by the output file suffix (pdf and svg in the tests)? Puppeteer it also new to me; a bit like Selenium, methinks, and a Javascript follower on from the now deprecated phantom.js?)

Of the plugins, replay looks interesting: it lets you render an animated version of a diagram, for example as you build up a complex flow diagram a piece at a time. There is also an anim plugin for what looks like creating more general animations.

All in all, it looks to be really handy, and something I could ship in out VM. The jupyterlab-drawio extension shows it works in JupyterLab, and I think it should also work with nbserverproxy?

By the by, the Google Drive / OneDrive integration was interesting (if it works; I haven’t had a chance to try it yet)… In particular, it makes me wonder: could the code that did that be reused to provide a similar storage workflow in JupyterHub?

PS in passing, there may be other useful tools in here —  10 JavaScript libraries to draw your own diagrams. (It’s been some time since I last did a round-up…)

Browser Based Virtualised Environments for Cybersecurity Education – Labtainers and noVNC

Whilst my virtualisation ramblings may seem to be taking a scattergun approach, I’m actually trying to explore the space in a way that generalises meaningfully in the context of the open and distance education.

The motivating ideas essentially boil down to these two questions / constraints:

  • can we package a software application once that we can then run it cross-platform, anywhere, both locally and remotely?
  • can we package the same software application so that it is available via a universal client? I tend to favour the browser as a universal client, but until I can figure out how to do audio from remote desktops via a browser, I also appreciate there may be a need for something like an RDP client too.

I’m also motivated by “open” on the one hand – can we share the means of production, as well as the result — and factory working: will the approach used to deliver one application scale to other applications in different subject areas, or the same application, over time, as it goes through various versions.

My main focus has been on environments for running our TM351 applications (Jupyter notebooks, various databases, OpenRefine) as well as keeping legacy applications running (RobotLab, Genie, Daisyworld) as well as exploring other virtualised desktops (eg for the VREP simulator) but there is also quite a lot of discussion internally around used virtualised environments to support our cybersecurity courses.

I suspect this is both a mature and an evolving space:

  • mature, in that folk have been using virtual machines to support this sort of course for some time; for example, this Offline Capture The Flag-Style Virtual Machine for Cybersecurity Education from University of Birmingham that dates back to 2015, or this SEED Labs — Hands-on Labs for Security Education from Syracuse University that looks like it dates back to 2002. There is also the well-known Kali Linux distribution that is widely used for digital forensics, penetration testing, ethical hacking training, and so on. (The OU also has a long standing Masters level course that has been using a VM for years…)
  • emerging, in that the technology for packaging (eg Docker) and running (eg the growth in cloud services) is evolving quickly, as are the increasing opportunities for creating things like structured notebook scripts around cybersecurity activities).

Recently, I also came across Labtainers, a set of virtual machines produced by the US Naval Postgraduate School’s Center for Cybersecurity and Cyber Operations billed as “fully packaged Linux-based computer science lab exercises with an initial emphasis on cybersecurity. Labtainers include more than 40 cyber lab exercises and tools to build your own.”

Individual activities are packaged in individual Docker containers, and a complete distribution is available bundled into a VirtualBox virtual machine (there’s also a Labtainer design guide). There’s also a paper here: Individualizing Cybersecurity Lab Exercises with Labtainers, Michael F. Thompson & Cynthia E. Irvine, IEEE Security & Privacy, Vol 16(2), March/April 2018, pp. 91-95, DOI: 10.1109/MSP.2018.1870862.

I actually spotted Labtainers from a demo by Olivier Berger / @olberger that was in part demonstrating a noVNC bridge container he’s been working on. I first posted about an X11 / XPRA bridge container I’d come across here; that post describes the JAremko/docker-x11-bridge container which I can run to provide an noVNC desktop through my browser; we can then run application separate application containers and mount the bridge container as a device, exposing the container application on the noVNC desktop. Olivier’s patched noVNC desktop container (fcwu/docker-ubuntu-vnc-desktop offers access to “an Ubuntu LXDE and LXQT desktop environment” so that it can be used in a similar way.

You can see it in action with the labtainers here:

A supporting blog post can be found here: Labtainers in a Web desktop through noVNC X11 proxy, full docker containers; there’s also an associated repo.

From the looks of it, Olivier has been on a similar journey to myself. Another post, this time from last year, describes a Demo of displaying labtainers labs in a Web browser through Guacamole (repo). Guacamole is an Apache project that provides a browser based remote desktop that can act as a noVNC or RDP client (I think…?!).

One thing I’m wondering now is can this sort of thing be packaged using the “new”, (to my recollection, third(?) time of launching?!), Docker Application CNAB packaging format?

(For all their attempts to appeal to a wider audience, I think Docker keep missing a trick by not putting the Kitematic crew back together…)