Category: OU2.0

Kiteflying Around Containers – A Better Alternative to Course VMs?

Eighteen months or so ago, I started looking at ways in which we might use a virtual machine to bundle up a variety of interoperating software applications for a distance education course on databases and data management. (This VM would run IPython notebooks as the programming surface, PostgreSQL and MongoDB as the databases. I was also keen that OpenRefine should be made available, and as everything in the VM was being accessed via a browser, I added a browser based terminal app (tty.js) to the mix as well). The approach I started to follow was to use vagrant as a provisioner and VM manager, and puppet scripts to build the various applications. One reason for this approach is that the OU is an industrial scale educator, and (to my mind) it made sense to explore a model that would support the factory line production model we have in a way that would scale vertically as a way of maintaining VMs for a course that runs over several ways as well as horizontally across other courses with other software application requirements. You can see how my thinking evolved across the following posts: posts tagged “VM” on OUseful.info.

Since then, a lot has changed. IPython notebooks have forked into the Jupyter notebook server and IPython, and Jupyter has added a browser based terminal app to the base offerings of the notebook server. (It’s not as good a flexible as tty.js, which allowed for multiple terminals in the same browser window, but I guess there’s nothing to stop you loading multiple terminals into separate browser tabs.) docker has also become a thing…

To recap on some of thinking about how we might provide software to students, I was pre-occupied at various times with the following (not necessarily exhaustive) list of considerations:

  • how could we manage the installation and configuration of different software applications on students’ self-managed, remote computers, running arbitrary versions of arbitrary operating systems on arbitrarily specced machines over networks with unknown and perhaps low bandwidth internet connections;
  • how could we make sure those applications interoperated correctly on the students’ own machines;
  • how could we make sure the students retained access to local copies of all the files they had created as part of their studies, and that those local copies would be the ones they actually worked on in the provided software applications; (so for example, IPython notebook files, and perhaps even database data directories);
  • how could we manage the build of each application in the OU production context, with OU course teams requiring access to a possibly evolving version of the machine 18 months in advance of student first use date and an anticipated ‘gold master’ freeze date on elements of the software build ~9 months prior to students’ first use;
  • how could we manage the maintenance of VMs within a single presentation of a 9 month long course and across several presentations of the course spanning 1 presentation a year over a 5 year period;
  • how could the process support the build and configuration of the same software application for several courses (for example, an OU-standard PostgreSQL build);
  • how could the same process/workflow support the development, packaging, release to students, maintenance workflow for other software applications for other courses;
  • could the same process be used to manage the deployment of application sets to students on a cloud served basis, either through a managed OU cloud, or on a self-served basis, perhaps using an arbitrary cloud service provider.

All this bearing in mind that I know nothing about managing software packaging, maintenance and deployment in any sort of environment, let alone a production one…;-) And all this bearing in mind that I don’t think anybody else really cares about any of the above…;-)

Having spent a few weeks away from the VM, I’m now thinking that we would be better served by using a more piecemeal approach based around docker containers. These still require the use of something like Virtualbox, but rather than using vagrant to provision the necessary environment, we could use more of an appstore approach to starting and stopping services. So for example, today I had a quick play with Kitematic, a recent docker acquisition, and an app that doesn’t run on Windows yet but for which Windows supported is slated for June, 2015 in the Kitematic roadmap on github

So what’s involved? Install Kitematic (if Virtualbox isn’t already installed, I think it’ll grab it down for you?) and fire it up…

Kitematic_1

It starts up a dockerised virtual machine into which you can install various containers. Next up, you’re presented with an “app dashboard”, as well as the ability to search dockerhub for additional “apps”:

Kitematic_2

Find a container you want, and select it – this will download the required components and fire up the container.

Kitematic_3

The port tells you where you can find any services exposed by the container. In this case, for scipyserver, it’s an IPython notebook (HTML app) running on top of a scipy stack.

Kitematic_4

By default the service runs over https with a default password; we can go into the Settings for the container, reset the Jupyter server password, force it to use http rather than https, and save to force the container to use the new settings:

Kitematic_5

So for example…

kitematic_ipynb

In the Kitematic container homepage, if I click on the notebooks folder icon in the Edit Files panel, I can share the notebook folder across to my host machine:

scipyserver_share

I can also choose what directory on host to use as the shared folder:

Kitematic_7

I can also discover and fire up some other containers – a PostgreSQL database, for example, as well as a MongoDB database server:

Kitematic_6

From within my notebook, I can install additional packages and libraries and then connect to the databases. So for example, I can connect to the PostgreSQL database:

kitematic_ipynb_postgres

or to mongo:

kitematic_ipynb_mongodb

Looking at the container Edit Files settings, it looks like I may also be able to share across the database datafiles – though I’m not sure how this would work if I had a default database configuration to being with? (Working out how to pre-configure and then share database contents from containerised DBMS’ is something that’s puzzled me for a bit and something I haven’t got my head round yet).

So – how does this fit into the OU model (that doesn’t really exist yet?) for using VMs to make interoperating software collections available to students on their own machines?

First up, no Windows support at the moment, though that looks like it’s coming; secondly, the ability to mount shares with host seems to work, though I haven’t tested what happens if you shutdown and start up containers, or delete a scipyserver container and then fire up a clean replacement for example. Nor do I know (yet?!) how to manage shares and pre-seeding for the database containers. One original argument for the VM was that interoperability between the various software applications could be hardwired and tested. Kitematic doesn’t support fig/Docker compose (yet?) but it’s not too hard to lookup up the addresses paste them into a notebook. I think it does mean we can’t provide hard coded notebooks with ‘guaranteed to work’ configurations (i.e. ones prewritten with service addresses and port numbers) baked in, but it’s not too hard to do this manually. In the docker container Dockerfiles, I’m not sure if we could fix the port number mappings to initial default values?

One thing we’d originally envisioned for the VM was shipping it on a USB stick. It would be handy to be able to point Kitematic to a local dockerhub, for example, a set of prebuilt containers on a USB stick with the necessary JSON metadata file to announce what containers were available there, so that containers could be installed from the USB stick. (Kitematic currently grabs the container elements down from dockerhub and pops the layers into the VM (I assume?), so it could do the same to grab them from the USB stick?) In the longer term, I could imagine an OU branded version of Kitematic that allows containers to be installed from a USB stick or pulled down from an OU hosted dockerhub.

But then again, I also imagined an OU USB study stick and an OU desktop software updater 9 years or so ago and they never went anywhere either..;-)

Using Docker to Build Linked Container Course VMs

Having spent bits of last year tinkering with vagrant and puppet as part of a workflow for building and deploying course related VMs in a scaleable way for a distance education context (trying to be OUseful here…) I’ve more recently started pondering whether it makes more sense to create virtual machines from linked data containers.

Some advantages of the “all in one flat VM” approach seem to be that we can construct puppet files to build particular components and then compose the final machine configuration from a single Vagrant script pulling in those separate components. Whilst this works when developing a VM for use by students on their own machines, it perhaps makes less sense if we were to provide remote hosted access to student VMs. There is an overhead associated with running a VM which needs to be taken into account if you need to scale. In terms of help desk support, the all-in-one VM approach offers a couple of switch it off and switch it on again opportunities: a machine can be shutdown and restarted/reprovisioned, or if necessary can be deleted and reinstalled though this latter loses any state that was saved internally in the VM by the student). If a particular application in the VM needs shutting down and restarting, then a specific stop/start instruction is required for each application.

On the other hand, a docker route in which each virtual application is launched inside its own container, and those containers are then linked together to provide the desired student VM configuration, means that if an application needs to be restarted, we can just destroy the container and fire up a replacement (though we’d probably need to find ways of preserving – or deleting – state associated with a particular application container too). If applications run as services, and for example I have a notebook server connected to a database server, if I destroy the database server container, I should be able to point the notebook server to the new database server – if I know the address of the new database server…

After a bit of searching around, I came across an example of creating a configuration not too dissimilar from the TM351 virtual machine configuration, but built from linked containers: Using Docker for data science, part 2 [Calvin Giles]. The machine is constructed from several containers, wired together using this fig script:

notebooks:
    command: echo created
    image: busybox
    volumes:
        - "~/Google Drive/notebooks:/notebooks"
data:
    command: echo created
    image: busybox
    volumes:
        - "~/Google Drive/data:/data"

devpostgresdata:
    command: echo created
    image: busybox
    volumes: 
        - /var/lib/postgresql/data

devpostgres:
    environment:
        - POSTGRES_PASSWORD
    image: postgres
    ports:
        - "5433:5432"
    volumes_from:
        - devpostgresdata

notebook:
    environment:
        - PASSWORD
    image: calvingiles/data-science-environment
    links:
        - devpostgres:postgres
    ports:
        - "443:8888"
    volumes_from:
        - notebooks
        - data

(WordPress code plugin & editor sucking atm wrt the way it keeps trying to escape stuff…)

(Fig is a tool for building multiple docker containers and wiring them together, a scripted version of something like Panamax. The main analysis application – calvingiles/data-science-environment – is a slight extension of ipython/scipyserver.)

With fig and boot2docker installed, and the fig script downloaded into the current working directory:

curl -L https://gist.githubusercontent.com/calvingiles/b6123c301954fe68e29a/raw/data-science-environment-fig.yml > fig.yml

the following two lines of code make sure that any previous copies of the containers are purged, and a new set of containers fired up with the desired password settings:

fig rm
PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d

(The script also creates Google Drive folders into which copies of the notebooks will be located and shared between the VM containers and the host.)

The notebooks can then be accessed via browser, (you need to log in with the specified password – MyPass from the example above); the location of the notebooks is https//IP.ADDRESS:443 (note the https, which may require you saying “yes, really load the page” to Google Chrome – though it is possible to configure the server to use just http) where IP.ADDRESS can be found by running boot2docker ip.

One thing I had trouble with at first was connecting the IPython notebook to the PostgreSQL database server (I couldn’t see it on localhost). I found I needed to connect to the actual IP address within the VM of the database container.

I found this address (IPADDRESS) from the docker commandline using: fig run devpostgres env (where devpostgres is the name of the database server container). The port is the actual server port number rather than the forwarded port number:

import psycopg2
con = psycopg2.connect(host=IPADDRESS,port=5432,user='postgres',password='PGPass')

I also came up with a workaround (as described in this issue I raised) but this seems messy to me – there must be a better way? Note how we connect to the forwarded port:

#Via http://blog.michaelhamrah.com/2014/06/accessing-the-docker-host-server-within-a-container/
#Get the IP address of the docker host server inside the VM
# I assume this is like a sort of 'localhost' for the space in which the containers float around?
IPADDRESS=!netstat -nr | grep '^0\.0\.0\.0' | awk '{print $2}'

#Let's see if we can connect to the db using the forwarded port address
import psycopg2
con = psycopg2.connect(host=IPADDRESS[0],port='5433',user='postgres', password='PGPass')

#Alternativley, connect via SQL magic
!pip3 install ipython-sql
%load_ext sql
postgrescon = 'postgresql://postgres:PGPass@'+IPADDRESS[0]+':5433'

#Then cell magic via:
%%sql $postgrescon

This addressing fiddliness also raises an issue about how we would run container bundles for several students in the same VM under a hosted offering – how would any particular student know how to connect to “their” personal database server(s). [UPDATE: doh! Container linking passes name information into a container as an environmental variable: Linking Containers Together.] Would we also need to put firewall rules in place to partition the internal VM network so that a student could only see other containers from their bundle? And in the event of switch-it-off/destroy-it/start-it-up-again actions, how would any new firewall rules and communication of where to find things be managed? Or am I overcomplicating?!

Anyway – this is a candidate way for constructing a VM out of containers in an automated way. So what are the relative merits, pros, cons etc of using the vagrant/puppet/all-in-one-VM approach as opposed to the containerised approach in an educational context? Or indeed, for different education contexts (eg trad-uni computer lab with VMs running in student desktops (if that’s allowed!); distance education student working on their home machine with a locally hosted VM; multiple students connecting to VM configurations hosted on a local cluster, or on AWS/Google Cloud etc?

Any comments – please add them below… I am sooooooo out of my depth in all this!

PS Seems I can connect with con = psycopg2.connect(host='POSTGRES',port='5432',user='postgres', password="PGPass")

Check other environments with:

import os
os.environ

Book Extras – Data Files, Code Files and a Dockerised Application

Idling through the LeanPub documentation last night, I noticed that they support the ability to sell digital extras, such as bundled code files or datafiles. Along with the base book sold at one price, additional extras can be bundled into packages alongside the original book and sold at another (higher) price. As with the book sales, two price points are supported: the minimum price and a recommended price.

It was easy enough to create a bundle of sample code and data files to support the Wrangling F1 Data With R book and add them as an extras package bundled with the book for an extra dollar or so.

leanpub_extras

This approach makes it slightly easier to distribute file bundles to support a book, but it still requires a reader to do some work in configuring their own development environment.

In an attempt to make this slightly easier, I also started exploring ways of packaging and distributing a preconfigured virtual machine that contains the tools – as well as code and data examples – that are required in order to try out the data wrangling approaches described in the book. (I’m starting to see more and more technical books supported by virtual machines, and can imagine this approach becoming a standard way of augmenting practical texts.) In particular, I needed a virtual machine that could run RStudio and that would be preloaded with libraries that would support working with SQLite data files and generating ggplot2 charts.

The route I opted for was to try out a dockerised application. The rocker/hadleyverse Docker image bundles a wide variety of useful R packages into a container along with RStudio and a base R installation. Building on top of this image, I created a small Dockerfile that additionally loaded in the example code and data files from the book extras package – psychemedia/wranglingf1data.

# Wrangling F1 Data Dockerfile
#
# https://github.com/psychemedia/wranglingf1data-docker
#

# Pull RStudio base image with handy packages...
FROM rocker/hadleyverse

#Create a directory to create a new project from
RUN mkdir -p /home/rstudio/wranglingf1data
RUN chmod a+rw /home/rstudio/wranglingf1data

#Populate the project-directory-to-be with ergast API supporting code and data
ADD ergastR-core.R /home/rstudio/wranglingf1data/ergastR-core.R
ADD ergastdb13.sqlite /home/rstudio/wranglingf1data/ergastdb13.sqlite

#Populate the project-directory-to-be with an additional data source
ADD scraperwiki.sqlite /home/rstudio/wranglingf1data/scraperwiki.sqlite

Running this Dockerfile (for example, using boot2docker) downloads and builds a containerised application preconfigured to support the book and available via a web browser. Instructions for downloading, and running the container can be found here: psychemedia/wranglingf1data-docker repository.

I also added instructions for using the Dockerised application to the book extras package as part of its README file.

Seven Ways of Running IPython Notebooks

We’re looking at using IPython notebooks for a MOOC on something or other, so here’s a quick review of the different ways I think we can provide access to them. Please let me know via the comments if there are other models…

User Desktop – Native App

Download a version of a scientific python distribution such as Anaconda (python 2.7 & 3) or Enthought Canopy (python 2.7) and run the notebook from within that. Runs cross platform, requires user admin privileges to install application.

In the context of a MOOC, this approach would require participants to download and install the python distribution on a desktop or laptop computer. This approach will not work on a tablet.

In terms of supporting the ability to open a notebook directly by double clicking on a notebook file, this nbopen looks like it may do the trick?

Delivery/use costs: none.
Publisher demands: notebook development.
Maintenance: dependence on distribution provider.
Support issues: installation of 3rd party software.
Custom branding/styling/extensions: can be installed after running a config script.

Browser Extension

The CoLaboratory (about) Chrome extension allows you to run IPython notebooks within the Chrome browser without the need to install any other software. Notebooks are saved to/opened from Google Drive. Python 2.7(?). Requires Google Chrome (cross-platform), Google Account (for Google Drive integration). This approach does not support the installation of arbitrary third party python libraries – only libraries compiled into the extension will work.

In the context of a MOOC, this approach would require participants to download and install Chrome and the CoLaboratory extension. This approach will not work on a tablet.
Delivery/use costs: none.
Publisher demands: notebook development.
Maintenance: dependence on extension publisher; (code available to fork).
Support issues: installation of 3rd party software.
Custom branding/styling/extensions: not supported(?)

iOS App – NO LONGOER AVAILABLE

As of March 2015 – no longer available from appstore
An iOS app, Computable (blog) makes notebooks available on an iPad. The app is free to preview notebooks bundled with the app, but requires a $10 in-app purchase to run your own notebooks. Integrated with Dropbox. Source code not available. Other reports of demos – but again, no code – available, such as: IPython notebook on iPhone. I don’t know if this approach supports the installation of arbitrary third party python libraries; that is, I don’t know if only libraries compiled into the application will work.

In the context of a MOOC, this approach would require participants to download and install the app and then pay the $10 fee. A Dropbox account is also required(?). This approach will only work on an iOS device.
Delivery/use costs: $10 to student.
Publisher demands: notebook development.
Maintenance: dependence on app publisher.
Support issues: installation of 3rd party software.
Custom branding/styling/extensions: not supported(?)

User Desktop – Virtual Machine

Run an IPython server within a virtual machine on the user’s desktop, exposing the notebook via a browser. Requires a virtual machine runner (eg VirtualBox, VMWare), port forwarding; some mechanism for starting up the VM and auto-running the notebook server. Devops tools may be used to deploy VMs either locally or in the cloud (eg JiffyLab (about Jiffylab); the OU course TM351 (in production) is currently exploring the use of VMs managed using vagrant to deliver IPython notebooks along with a range of other services).

Pre-defined IPython notebook VM examples: docker: ipython/notebook; Chef: IPython notebook cookbook.

In the context of a MOOC, this approach would require participants to download and install a virtual machine runner and the virtual machine image. This approach will not work on a tablet.
Delivery/use costs: none.
Publisher demands: development of VM image.
Maintenance: tracking VM runner compatability.
Support issues: installation of 3rd party software; installation of VM; port forwarding conflicts, locating the notebook server address.
Custom branding/styling/extensions: yes.

In the Cloud – Managed Services

Open a notebook running on a managed IPython notebook server such as Wakari or Authorea. Free plans available, requires personal account on managed service, internet connection.

In the context of a MOOC, this approach would require participants to have access to a network connection and create an account on a provider service. This approach will work on any device.
Deliver/use costs: Notebooks created under free plans will be public.
Custom branding/styling/extensions: no – branding may be associated with provider; provider identified extensions may be offered as part of hosted service.

In the Cloud – Ad Hoc tmp Notebooks

Nature recently did a splash on IPython notebooks (Interactive notebooks: Sharing the code) that included a demo notebook for readers to experiment with: IPython interactive demo.

The system uses a ‘temporary notebook’ server described here: Instant Temporary IPython Notebooks (code: Jupyter tmpnb). The sever spawns temporary notebooks that can be used to run particular activities. Other example use cases include the provision of notebooks at conferences/workshops for demo purposes, hour of code demos etc.

In the context of a MOOC, this approach would require participants to have access to a network connection. Notebooks will be temporary and cannot be persisted. This approach would be ideal for one off activities. This approach will work on any device. My assumption is that we would not be able direct participants to the current 3rd party provider of this service without prior agreement.
Delivery/use costs: server provision for running notebooks.
Publisher demands: management of cloud services if self-hosted.
Maintenance: tracking VM runner compatability.
Support issues:
Custom branding/styling/extensions: no.

In the Cloud – Hosted VMs

Another virtual machine approach, but participants fire up a prebuilt virtual machine in the cloud. Examples include: Yhat Sciencebox. Another example is the notebookcloud, a semi-managed service built around Google authentication and user’s personal AWS credentials; ((forked) legacy code). (From the docs: NotebookCloud is service that allows you to launch and control IPython Notebook servers on Amazon EC2 from your browser. This enables you to host your own Python programming environment, on your own Amazon virtual machine, and access it from any modern web browser.)

In the context of a MOOC, this approach would require participants to have access to a network connection and create an account on a provider service. This approach will work on any device.
Delivery/use costs: participant pays machine, storage and bandwidth costs according to usage.
Publisher demands: management of cloud services if self-hosted. Alternatively, we could host a version of NotebookCloud.
Maintenance: potential reliance on 3rd party VM configuration.
Support issues: account creation, VM management.
Custom branding/styling/extensions: no – branding may be associated with provider; provider identified extensions may be offered as part of hosted service.

Hosted MultiUser Systems

IPython and IPython Notebooks are under active development and it looks as if future releases will support multi-user services, for example JupyterHub: A multi-user server for Jupyter notebooks.

Docker Containers

Another take on virtualisation, IPython notebooks can be launched within docker containers within a lightweight virtual machine either on the desktop or in the cloud. For example, see Kiteflying Around Containers – A Better Alternative to Course VMs? for an example of launching a notebook on the desktop (currently (June, 2015) Mac/Linux only) or in the cloud, see this for a related example: Getting Started With Personal App Containers in the Cloud.

Delivery/use costs: participant pays machine, storage and bandwidth costs on cloud hsot according to usage; desktop use is free.
Publisher demands: management of cloud services if self-hosted. Application code is self-contained and can represent a managed distribution.
Maintenance: potential reliance on 3rd party defined containers.
Support issues: account creation, docker management.
Custom branding/styling/extensions: yes, within a custom container.

Summary

In order to make interactive use of IPython notebooks within a course, we need to make notebooks available, and provide a way of running them. A wide variety of models that support the running of notebooks exists. We can distinguish: where does the notebook server run; where does it load pre-existing notebooks from; where does it save notebooks to.

Running a notebook server incurs a resource cost in terms of installation, maintenance, (remote) access (eg managing multiple instances, port availability etc when offering a hosted service), any financial costs associated with running the service. Who covers the costs and meets any load issues depends on the solution adopted.

As far as usage goes, notebooks are accessed via a web browser and as such are accessible on any device. However, if a server runs locally rather than on a remote host, there are device dependencies to contend with that rule out some solutions on some platforms (VM can’t run on iOS; iOS app won’t run on Windows machine etc).

In an online course environment, we may be able separate concerns and suggest a variety of ways of running notebooks to participants, leaving it up to each participant to find a way of running the notebooks that works for them. Duty of care might then extend only insofar as making notebooks available that will run on all the platforms we have recommended.

In terms of pedagogy, we might distinguish between notebooks as used:

  1. to run essentially standalone exercises, and which we might treat as disposable (a model which the tmpnb solution fits beautifully because no persistence of state is required); this exercises might be pre-prepared notebooks made available to participants, or user created notebooks that users create as a scratch pad to run through a particular set of activities described elsewhere;
  2. for activities where the participant may want to save a work in progress and return to it at a later date (which requires persistence of state on a per user basis); this might be in the context of a notebook we provide to the students, or one they have created and are working on themselves;
  3. for activities where participants create their own notebooks and wish to preserve them.

From the OU perspective, we should probably take a view on the extent to which we develop solutions that work across one or more contexts, such as MOOCs delivered ex- of OU provisioned services (eg FutureLean); MOOCs delivered via an OU context (eg OpenLearn); courses delivered to OU fee paying students via OU systems.

There may be opportunities to develop solutions that work to support the delivery of OU courses, as well as OU MOOCs, and that are further licensable to other institutions, eg to support their own course delivery or MOOC delivery, or appropriate for release as open software in support of the wider community.

Thoroughly Confused About Student VMs & Docker

The story so far… We’re looking at using a virtual machine (VM) preconfigured with all sorts of software and services for a distance education course. The VM runs in headless mode (no graphical desktop) and exposes the applications we want students to be able to run as services accessed through a web browser. The VM is built from a vagrant script using puppet, in order to support maintenance and as a demonstration of a potentially generic production model. It also means that we should be able to build VMs for different VM runners (Virtualbox, VMWare etc), as well as being able to generate machine images for use in cloud hosted VMs as well as getting them up and running. The activities to be run inside the VM include a demonstration of a distributed MongoDB network into which network partitions are introduced. The separate database instances run inside individual docker containers and firewall rules are used to network partition them.

The set-up looks something like this – green blocks are services running in the VM, orange blocks are containers:

ouvm

The containers are started up from within the IPython notebook using a python wrapper to docker.io.

One problem with this approach is that we have two Mongo DB downloads – one in the VM and one for use in containers. This makes me think that it might make more sense to run all the applications in containers of their own. For example, the notebook server in a container of its own, the original MongoDB instance in a container of its own, PostgreSQL in a container of its own, and finally a data container or other area of the VM that can be used to persist data within the VM such that it can be accessed by one or more of the other services as and when required.

I’m not sure what the best strategy would be for persisting state, for example as used by the database services? If we mount a database’s datastore in a volume within the VM, we can destroy the container that is used to run a database service and our database datastore will be preserved. If the mounted volume is located in the VM/host share area, if the VM itself is destroyed the data will persist in a volume on the host. This is perhaps a bit scrappy because it means that we might nominally take up a large amount of space “on the host”, compared with the situation of providing students with a pre-populated database in which the data volume was mounted “inside” the VM proper (i.e. away from the share area). In such a case, it would nice if any data tables that students built were mounted into the host shared area (so the students can clearly “retain ownership” of those tables), but I suspect that DBMS don’t like putting different data tables or databases into different volumes..?

This approach to using docker is perhaps at odds with the typical way of thinking about how we might make use of it. It is very much in the style of a VM acting as an app runner for a host, and docker containers being used to run the individual apps.

One problem with this approach is that we need a control panel that:

  • is always running;
  • is exposed via an http/HTML service that can be accessed on the host machine;
  • allows students to bring up and shut down containers/services/”apps” as required.

The container approach is nice because if something breaks with one of the databases, for example, it should be easy for a student to switch it off and switch it back on again… That said, if we are popping up and ripping down containers, we need to work out a connection manager so that eg IPython knows where to find a particular database (I think that docker starts services up on essentially arbitrary internal IP addresses?). This could possibly be done by naming docker processes sensibly and then creating a little python library to look up the docker processes by name so that we know where to find them?

As ever, there are tradeoffs in terms of making this easy approach easy for us (as production engineers), making them easy for students, and making them easy for the helpdesk to support…

I don’t know enough about any of this stuff to know whether it makes sense rebuilding the VM we currently have, in which services roaming around the base VM, with a completely containerised version. The core requirement from the user perspective is that a student should be able to download a base box, fire it up as easily as possible in Virtualbox, and then access a control panel (ideally in a browser) that allows them to start up and shut down applications-in-containers, as well as seeing a clear dashboard view of what services are up and running and what (localhost) ports they can be accessed on through their browser.

If you can help talk me through what the issues are or might be, and whether any of the above makes sense or is complete nonsense, I would be most grateful…

Exporting Markdown and XML From Google Docs

Just over a year ago, we started production of a new OU course using Google docs as the medium within which we’d share draft course materials. This was something of an experiment to see whether the online social document medium encouraged sharing and discussion of ideas, resources, ongoing feedback and comment of the work in progress amongst the course team, rather than, or at least, in addition to, the traditional handover of significant chunks of the course at set handover dates. (In case you’re wondering, it didn’t…)

The question we’re now faced with is, how do we get the content that’s in Google docs into the next stage of the OU document workflow.

The actual HTML based course materials that appear in the VLE (or the ebook versions of them, etc) are generated automatically from an “OU Structured Content” XML document. The XML documents are prepared using the <oXygen/> XML editor, extended with an OU structured content framework that includes the requisite DTDs/schema files, hooks for rendering, previewing and publishing into the OU environment and so on. (Details about the available tags can be found here: tag guide [OU internal link].)

Whilst the preferred authoring route is presumably that authors use the <oXygen/> editor from the start, the guidance also suggests that many authors use Word (and an appropriate OU style sheet) and then copy and paste content over into the XML editor, at which point some amount of tidying and retagging may be required.

As Google docs doesn’t seem to support the addition of custom style elements or tags (users are limited to customising the visual style of provided style elements), we need to find another way of getting content from Google docs and into <oXygen/>. One approach would be to grab a copy of the Google doc into Microsoft Word, apply an OU template to mark up the content using appropriate custom style elements, and then copy the content over to the XML editor, upon which point it will probably need further tidying.

Another approach is to try to export the data as an XML document. Looking around, I found a Google Apps script script (sic) that allows you to export the content in a Google doc as markdown. Whilst markdown documents don’t have the same tree-like document structure as XML, it did provide an example of how to parse a Google docs document. My first attempt at a script to export a Google doc in the OU XML format can be found here: export Google doc as OU SC-XML. (Note: the script is subject to change; now I have a basic operational/functional spec, I can start to try to tidy up the code and try to parse out more structure…)

Having got a minimal exporter working, the question now arises as to where effort needs to be spent next. The exporter produces a minimal form of OU-XML that sometimes validates and (in early testing) sometimes doesn’t. (If the script is working properly it should produce output that always validate as XML; then it should produce output that always validate as OU-XML). Should time be spent improving the script to produce better XML, or can we live with the fact that the exporter gets the document some way in to <oXygen/>, but further work will be required to fix a few validation breaks?

Another issue that arises is how rich a form of OU-XML we try to export. When working in Microsoft Word environment, a document style can be defined using elements that map onto the elements in OU-XML. When working in Google docs, we need to define a convention that the parser can respond to.

At the moment, the parser is sensitive to:

  • HEADING1: treated as Session;
  • HEADING2: treated as Section;
  • HEADING3: treated as SubSection;
  • LIST_ITEM: numbered and unnumbered lists are treated as BulletedList; sublists to depth 1 are supported as BulletedSubsidiaryList
  • coloured text: treated as AuthorComment; at the moment this may incorrectly grab title elements as such;
  • INLINE_IMAGE: images are rendered as relatively referenced Figure elements with an empty Description element. A copy of the image is locally stored. (Note: INLINE_DRAWING elements are unsupported – there’s no way of exporting them; maybe I should export an empty Figure with a Description saying there’s a missing INLINE_DRAWING?)
  • TABLE: rendered as Table with empty TableHead;
  • bold, italic, LinkUrl: rendered as b, i and a tags respectively;
  • font.COURIER_NEW: rendered as ComputerCode.
  • By convention, we should be able to detect and parse things like activities, exercises and SAQs. Some mechanism needs to be supported for identifying the block elements in such cases. For example, one convention might take the form:

    Exercise N

    Discussion

    End

    The ^Exercise and ^End$ elements denote the block; heading style (eg HEADING4 for the ^Exercise (and perhaps HEADING5 for the ^End$?) could further aid detection?

    Another approach would be to use horizontal lines to denote the start and stop of a block. For example:


    SAQ

    My Answer

    where represents a horizontal line and denotes the start and end of a block. Again, heading styles within the block could either identify or reinforce a particular block element type.

    Rendering a preview of the OU-XML as it would appear in the OU VLE is possible by uploading the OU-XML file, or a zip file containing it and related assets, to an OU URL that sits behind an OU authentication care. The <oXygen/> editor handles previews by using the default web client – your default browser – to post a selected XML document to the appropriate OU upload/preview URL. Unfortunately, the functions that allow http POST operations from Google app script run on Google servers, which means that we can’t just create a button in Google docs that would post and XML export of the current Google doc to the authentication-required OU URL. (I don’t know if this would be possible in an OU/Google apps domain?).

    I’m not sure if a workaround would be to launch a preview window in the browser from Google docs containing a copy of the OU XML version of a document, highlighting the XML, then using a bookmarklet to post the highlighted XML to the OU preview service URL within the browser context using a browser where the user has logged in to the OU web domain? Alternatively, could a Chrome application access both content from Google Drive and then post to the authenticated OU preview URL using browser permissions? (That is, can a Chrome app access both Google Drive using a user’s Google permissions, or machine access permissions, and then post content grabbed from that source to the OU URL using permissions granted to the browser?) As ever, my ignorance about browser security policies on the one hand, and the Google Apps/Chrome apps security model on the other, make it hard to know what workarounds might be possible.

    If any members of OU staff would like to try out the exporter, please get in touch for hints, or let me know how you get on:-) In addition, if any members of OU staff are using Google docs for course production, I’d love to know how you’re using them and how you’re getting on:-)

    PS via @mahawksey, I see that we can associate metadata with a doc using PropertiesService.getDocumentProperties(). Could be handy for adding things like course code, author metadata, publishing route, template etc to a doc. I’m not sure if we can also associate metadata with a folder, though I guess we could also include a file in a folder that contains metadata relating to files held more generally within the same folder?

Edtech and IPython Notebooks – Activities and Answer Reveals

A few months ago I posted about an interaction style that I’d been using – and that stopped working – in IPython notebooks: answer reveals.

An issue I raised on the relevant git account account turned up a solution that I’ve finally gotten round to trying out – and extending with a little bit of styling. I’ve also reused chunks from another extension (read only code cells) to help style other sorts of cell.

Before showing where I’m at with the notebooks, here’s where OU online course materials are at the moment.

Teaching text is delivered via the VLE (have a look on OpenLearn for lots of examples). Activities are distinguished from “reading” text by use of a coloured background.

m269_closedAns

The activity requires a student to do something, and then a hidden discussion or anser can be revealed so that the student can check their answer.

m269_ansReveal

(It’s easy to be lazy, of course, and just click the button without really participating in the activity. In print materials, a frictional overhead was added by having answers in the back of the study guide that you would have to turn to. I often wonder whether we need a bit more friction in the browser based material, perhaps a time based one where the button can’t be clicked for x seconds after the button is first seen in the viewport (eg triggered using a test like this jQuery isOnScreen plugin)?!)

I drew on this design style to support the following UI in an IPython notebook:

Here’s a markdown cell in activity style and activity answer style.

M269_-_Python_-_Blue1a

The lighter blue background set into the activity is an invitation for students to type something into those cells. The code cell is identified as such by the code line In [ ] label. Whatever students type into those cells can be executed.

The heading is styled from a div element:

M269_-_Python_-_Blue

If we wanted to a slightly different header background style as in the browser materials, we could perhaps select a notebook heading style and then colour the background differently right across the width of the page. (Bah.. should have thought of that earlier!;-)

Markdown cells can also be styled to prompt students to make a text response (i.e. a response written in markdown, though I guess we could also use raw text cells). I don’t yet have a feeling for how much ownership students will take of notebooks and start to treat them as workbooks?

Answer reveal buttons can also be added in:

M269_-_Python_-_Blue2

Clicking on the Answer button displays the answer.

M269_-_Python_-_Blue3

At the moment, the answer background is the same colour as the invitation for student’s to type something, although I guess we could also interpret as part of the tutor-alongside dialogue, and the lighter signifies expected dialogic responses whether from the student or the “tutor” (i.e. the notebook author).

We might also want to make use of answer buttons after a code completion activity. I haven’t figured out the best way to do this as of yet.

M269_-_Python_-_Blue4

At the moment, the answer button only reveals text – and even then the text needs to be styled as HTML (the markdown parsing doesn’t work:-(

M269_-_Python_-_Blue5

I guess one approach might be to spawn a new code cell containing some code written in to the answer button div. Another might be to populate a code cell following the answer button with the answer code, hiding the cell and disabling it (so it can’t be executed/run), then revealing it when the answer button is clicked? I’m also not sure whether answer code should be runnable or not?

The mechanic for setting the cell state is currently a little clunky. There are two extensions, one for the answer button, one for setting the state other cells, that use different techniques for addressing the cells (and that really need to be rationalised). The extensions run styling automatically when the notebook is started, or when triggered. At the moment, I’m using icons from the orgininal code I “borrowed” – which aren’t ideal!

M269_-_Python_-_Blue0

The cell state setter button toggles selected code cells from activity to not-activity states, and markdown cells from activity-description to activity-student-answer to not-activity. The answer button button adds an answer button at every answer div (even if there’s already an answer button rendered). Both extensions style/annotate restarted notebooks correctly.

The current, hacky, user model I have in mind is that authors have an extended notebook with buttons to set the cell styles, and students have an extended notebook without buttons that just styles the notebook when it’s opened.

FWIW, here’s the gist containing extensions code.

Comments/thoughts appreciated…