Sharing Folders into VMs on Different Machines Using Dropbox, Google Drive, Microsoft OneDrive etc

Ever since I joined the OU, I’ve believed in trying to deliver distance education courses in an agile and responsive way, which is to say: making stuff up for students whilst the course is in presentation.

This is generally not done (by course/module teams at least) because the aim of most course/module teams is to prepare the course so thoroughly that it can “just” be presented to students.

Whatever.

I personally think we should try to improve the student experience of the course as it presents if we can by being responsive and reactive to student questions and issues.

So… TM351, the data management course that uses a VM, has started again, and issues / questions are already starting to hit the forums.

One of the questions – which I’d half noted but never really thought through in previous presentations (my not iterating/improving the course experience in, or between, previous presentations)  – related to sharing Jupyter notebooks across different machines using Google Drive (equally, Dropbox or Microsoft OneDrive).

The VirtualBox VM we use is fired up using the vagrant provisioner. A Vagrantfile defines various configuration settings – which ports are exposed by the VM, for example. By default, the contents of the folder in which vagrant is started up in are shared into the VM. At the same time, vagrant creates a hidden .vagrant folder that contains state relating to the instance of that VM.

The set up on a single machine is something like this:

If a student wants to work across several machines, they need to share their working course files (Jupyter notebooks, and so on) but not the VM machine state. Which is to say, they need a set up more like the following:

For students working across several machines, it thus makes sense to have all project files in one folder and a separate .vagrant settings folder on each separate machine.

Checking the vagrant docs, it seems as if this is quite manageable using the synced folder configuration settings.

The default copies the current project folder (containing the vagrantfile and from which vagrant is rum), which I’m guessing is a setting something like:

config.vm.synced_folder "./", "/vagrant"

By explicitly setting this parameter, we can decide how we want the mapping to occur. For example:

config.vm.synced_folder "/PATH/ON/HOST", "/vagrant"

allows you to to specify the folder you want to share into the VM. Note that the /PATH/ON/HOST folder needs to be created before trying to share it.

To put the new shared directory into effect, reload and reprovision the VM. For example:

vagrant reload --provision

Student notebooks located in the notebooks folder of that shared directory should now be available in the VM. Furthermore, if the shared folder is itself in a webshared folder (for example, a synced Dropbox, Google Drive or Microsoft OneDrive folder) it should be available wherever that folder is synched to.

For example, on a Mac (where ~ is an alias to my home directory), I can create a directory in my dropbox folder ~/Dropbox/TM351VMshare and then map this into the VM using by adding the following line to the Vagrantfile:

config.vm.synced_folder "~/Dropbox/TM351VMshare", "/vagrant"

Note the possibility of slight confusion – the shared folder will not now be the folder from which vagrant is run (unless the folder are running from is /PATH/ON/HOST ).

Furthermore, the only thing that needs to be in the folder from which vagrant is run is the Vagrantfile and the hidden .vagrant folder that vagrant creates.

Fingers crossed this recipe works…;-)

Sharing Pre-Built VMs Via Vagrant Cloud

In passing, I noticed yesterday that Vagrant Cloud (docs) can be used to host and distribute public Vagrant base boxes. So I exported a box file from my V-REP’n’Jupyter VM:

vagrant package

uploaded it to Vagrant Cloud – ouseful/ou-robotics-test – and tweaked my Vagrantfile to use that copy as the base box:

config.vm.box = "ouseful/ou-robotics-test"

Now I’m thinking I should probably do the same for the TM351 VM, giving the hassle it seems to take trying to get the .box file hosted for download on an OU URL…

Distributing Virtual Machines That Include a Virtual Desktop To Students – V-REP + Jupyter Notebooks

When we put together the virtual machine for TM351, the data management and analysis course, we built a headless virtual machine that did not contain a graphical desktop, but instead ran a set of services that could be accessed within the machine at a machine level, and via a browser based UI at the user level.

Some applications, however, don’t expose an HTML based graphical user interface over http, instead they require access to a native windowing system.

One way round this is to run a system that can generate an HTML based UI within the VM and then expose that via a browser. For an example, see Accessing GUI Apps Via a Browser from a Container Using Guacamole.

Another approach is to expose an X11 window connection from the VM and connect to that on the host, displaying the windows natively on host as a result. See for example the Viewing Application UIs and Launching Applications from Shortcuts section of BYOA (Bring Your Own Application) – Running Containerised Applications on the Desktop.

The problem with the X11 approach is that is requires gubbins (technical term!) on the host to make it work. (I’d love to see a version of Kitematic extended not only to support docker-compose but also pre-packaged with something that could handle X11 connections…)

So another alternative is to create a virtual machine that does expose a desktop, and run the applications on that.

Here’s how I think the different approaches look:

vm_styles

As an example of the desktop VM idea, I’ve put together a build script for a virtual machine containing a Linux graphic desktop that runs the V-REP robot simulator. You can find it here: ou-robotics-vrep.

The script uses one Vagrant script to build the VM and another to launch it.

Along with the simulator, I packaged a Jupyter notebook server that can be used to create Python notebooks that can connect to the simulator and control the simulated robots running within it. These notebooks could be be viewed view a browser running on the virtual machine desktop, but instead I expose the notebook server so notebooks can be viewed in a browser on host.

The architecture thus looks something like this:

I’d never used Vagrant to build a Linux desktop box before, so here are a few things I learned about and observed along the way:

  • installing ubuntu-desktop naively installs a whole range of applications as well. I wanted a minimal desktop that contained just the simulator application (though I also added in a terminal). For the minimal desktop, apt-get install -y ubuntu-desktop --no-install-recommends;
  • by default, Ubuntu requires a user to login (user: vagrant; password: vagrant). I wanted to have as simple an experience as possible so wanted to log the user in automatically. This could be achieved by adding the following to /etc/lightdm/lightdm.conf:
[SeatDefaults]
autologin-user=vagrant
autologin-user-timeout=0
user-session=ubuntu
greeter-session=unity-greeter
  • a screensaver kept kicking in and kicking back to the login screen. I got round this by creating a desktop settings script (/opt/set-gnome-settings.sh):
#dock location
gsettings set com.canonical.Unity.Launcher launcher-position Bottom

#screensaver disable
gsettings set org.gnome.desktop.screensaver lock-enabled false

and then pointing to that from a desktop_settings.desktop file in the /home/vagrant/.config/autostart/ directory (I set execute permissions set on the script and the .desktop file):

[Desktop Entry]
Name=Apply Gnome Settings
Exec=/opt/set-gnome-settings.sh
Hidden=false
NoDisplay=false
X-GNOME-Autostart-enabled=true
Type=Application
  • because the point of the VM is largely to run the simulator, I thought I should autostart the simulator. This can be done with another .desktop file in the autostart directory:
[Desktop Entry]
Name=V-REP Simulator
Exec=/opt/V-REP_PRO_EDU_V3_4_0_Linux/vrep.sh
Type=Application
X-GNOME-Autostart-enabled=true
  • the Jupyter notebook server is started as a service and reuses the installation I used for the TM351 VM;
  • I thought I should also add a desktop shortcut to run the simulator, though I couldnlt find an icon to link to? Create an executable run_vrep.desktop file and place it on the desktop:
[Desktop Entry]
Name=V-REP Simulator
Comment=Run V-REP Simulator
Exec=/opt/V-REP_PRO_EDU_V3_4_0_Linux/vrep.sh
Icon=
Terminal=false
Type=Application

Her’s how it looks:

If you want to give it a try, comments on the build/install process would be much appreciated: ou-robotics-vrep.

I will also be posting a set of activities based on the RobotLab activities used in TM129 in the possibility that we start using V-REP on TM129. The activity notebooks will be posted in the repo and via the associated uncourse blog if you want to play along.

One issue I have noticed is that if I resize the VM window, V-REP crashes… I also can’t figure out how to open a V-REP scene file from script (issue) or how to connect using a VM hostname alias rather than IP address (issue).

Distributing Software to Distance Education Students – Checksums

One of the issues with distributing software to distance education students is ensuring that the software package they are trying to install hasn’t been corrupted ins some way during transport. For example, one of the ways we ship software to students is via USB memory stick. But in one course last year, it seems that some of the sticks were a bit dodgy, and the files wouldn’t install from them.

Which is where checksums come in.

If a student is having issues installing a piece of software, we can check the checksum of the distributed installer package to see if it matches the checksum of a pristine package. If it doesn’t, we know the problem is a corrupted installer package (rather than a problem downstream of that, for example).

So what is a checksum? Essentially, it’s a single number derived from all the bits in the file you’re generating the checksum for. If any bit in the file is changed, the checksum should too.

So here are a couple of ways of generating checksums…


WINDOWS:

Download the Windows fciv (Windows File Checksum Integrity Verifier) utility: https://support.microsoft.com/en-gb/kb/841290

Run a command of the form:
fciv.exe c:\Users\MY_USER\PATH\FILE_CHECK.SUFFIX

This will produce checksums using different coding mechanisms:

MD5(FILE_CHECK.SUFFIX)= SOME_LONG_HEXADECIMAL_STRING
SHA1(FILE_CHECK.SUFFIX)= A_DIFFERENT_LONG_ALPHANUMERIC_STRING


MAC / LINUX:

On a Mac, you can find the checksum using the following command:

openssl sha1 PATH/FILE_CHECK.SUFFIX
openssl md5 PATH/FILE_CHECK.SUFFIX

For example:

openssl md5 ~/USERRELATIVEPATH/FILE_CHECK.SUFFIX
SHA1(PATH/FILE_CHECK.SUFFIX)= SOME_LONG_HEXADECIMAL_STRING


If we distribute a copy of the checksum for installer packages, along with the installer packages, assuming that the checksum is not corrupted, a student can check that the installer package is not corrupted by generating the checksum for their installer package and comparing it to the distributed one.

When it comes to support in the event of a problem, and a call to the help desk, then the first question from support can be: “Have you checked the checksum of the original package?” (Or we can prompt students to do this themselves as part of self-service support…)

Even better if we shipped a simple one-click file integrity checking utility that:

1) runs a checksum test on itself to check that it’s working;
2) runs a checksum test on the distributed package(s) to check that they are not corrupted.

Building a JSON API Using Jupyter Notebooks in Under 5 Minutes

In the post Making a Simple Database to Act as a Lookup for the ONS Register of Geographic Codes, I idly pondered creating a simple API for looking up ONS geographic codes.

Popping out to the shop for a pint of milk, I recalled one of the many things on my to do list was too look at the Jupyter Kernel Gateway, as described in the IBM Emerging Technologies blogpost on Jupyter Notebooks as RESTful Microservices.

The kernel_gateway project looks to be active – install it using something like pip3 install jupyter_kernel_gateway – and set up a notebook, such as simpleAPI.ipynb (the following code blocks represent separate code cells).

Import some helper packages and connect the geographic codes db I created in the previous post.

import json
import pandas as pd
import sqlite3
con = sqlite3.connect("onsgeocodes.sqlite")

Create a placeholder for the global REQUEST object that will be created when the API is invoked:

REQUEST = json.dumps({
'path' : {},
'args' : {}
})

Now let’s define the API.

For example, suppose I want to return some information about a particular code:

# GET /ons/:code
request = json.loads(REQUEST)
code = request['path'].get('code')

q='SELECT * FROM codelist WHERE "GEOGCD"="{code}"'.format(code=code)
print('{"codes":%s}' % pd.read_sql_query(q, con).to_json(orient='records'))

Or suppose I want to look up what current codes might exist that partially match a particular place name:

# GET /ons/current/:name
request = json.loads(REQUEST)
name = request['path'].get('name')

q='''
SELECT * FROM codelist JOIN metadata
WHERE "GEOGNM"="{name}" AND codeAbbrv=sheet AND codelist.STATUS="live"
'''.format(name=name)

print('{"codes":%s}' % pd.read_sql_query(q, con).to_json(orient='records'))

On the command line in the same directory as the notebook (for example, SimpleAPI.ipynb), I can now run the command:

jupyter kernelgateway --KernelGatewayApp.api='kernel_gateway.notebook_http' --KernelGatewayApp.seed_uri='./SimpleAPI.ipynb'

to start the server.

And as if by magic, an API appears:

The original blogpost also describes a simple docker container for running the API. Which makes me thing: this is so easy… And using something like LaunchBot, it’d be easy enough to have a simple manager for local API servers running on a personal desktop?

Here’s the complete notebook.

Making a Simple Database to Act as a Lookup for the ONS Register of Geographic Codes

Via a GSS (Government Statistical Service) blog post yesterday – Why Do We Need Another Register? – announcing the Statistical Geography Register, which contains “the codes for each type of statistical geography within the UK”, I came across mention of the ONS Register of Gepgraphic Codes.

This register, maintained by the Office of National Statistics, is released as a zip file containing an Excel spreadsheet. Separate worksheets in the spreadsheet list codes for various geographies in England and Wales (but not Scotland; that data is available elsewhere).

To make a rather more reproducible component for accessing the data, I hacked together a simple notebook to pull the data out of the spreadsheet and pop it into a simple SQLite3 database as a set of separate tables, one per sheet.

One thing we need to do to reconcile items in the metadata sheet and the sheetnames by joining a couple of the columns together with a subscript:

xl['RGC']["codeAbbrv"] = xl['RGC']["Entity code"].map(str) + '_' + xl['RGC']["Entity abbreviation"].map(str)

The bulk of the script is quite simple (see the full notebook here):

import sqlite3
con = sqlite3.connect("onsgeocodes.sqlite")

cols=['GEOGCD','GEOGNM','GEOGNMW','OPER_DATE','TERM_DATE','STATUS']

bigcodes=pd.DataFrame(columns=['sheet']+cols)
bigcodes.to_sql(con=con, name='codelist', index=False, if_exists='replace')

sheets= list(xl.keys())
sheets.remove('For_Scotland')
for sheet in sheets[2:]:
  xl[sheet].to_sql(con=con, name=sheet, index=False, if_exists='replace')
  xl[sheet]['sheet']=sheet
  #Reorder the columns
  xl[sheet][['sheet']+cols].to_sql(con=con, name='codelist', index=False, if_exists='append')

You may also notice that it creates a “big” table (codelist) that contains all the codes – which means we can look up the provenance of a particular code:

q='''
SELECT sheet, GEOGCD, GEOGNM, GEOGNMW, codelist.STATUS, "Entity name"
FROM codelist JOIN metadata WHERE "GEOGCD"="{code}" AND codeAbbrv=sheet
'''.format(code='W40000004')
pd.read_sql_query(q, con)
sheet GEOGCD GEOGNM GEOGNMW STATUS Entity name
0 W40_CMLAD W40000004 Denbighshire Sir Ddinbych live Census Merged Local Authority Districts

We can also look to see what (current) geographies might be associated with a particular name:

q='''
SELECT DISTINCT "Entity name", sheet FROM codelist JOIN metadata
WHERE "GEOGNM" LIKE "%{name}%" AND codeAbbrv=sheet AND codelist.STATUS="live"
'''.format(name='Isle of Wight')
pd.read_sql_query(q, con)
Entity name sheet
0 Super Output Areas, Lower Layer E01_LSOA
1 Super Output Areas, Middle Layer E02_MSOA
2 Unitary Authorities E06_UA
3 Westminster Parliamentary Constituencies E14_WPC
4 Community Safety Partnerships E22_CSP
5 Registration Districts E28_REGD
6 Registration Sub-district E29_REGSD
7 Travel to Work Areas E30_TTWA
8 Fire and Rescue Authorities E31_FRA
9 Built-up Areas E34_BUA
10 Clinical Commissioning Groups E38_CCG
11 Census Merged Local Authority Districts E41_CMLAD
12 Local Resilience Forums E48_LRF
13 Sustainability and Transformation Partnerships E54_STP

What I’m wondering now is – can I crib from the way the ergast API is put together to create a simple API that takes a code, or a name, and returns geography register information related to it?

The same approach could also be applied to the registers I pull down from NHS Digital (here) – which makes me think I should generate a big codelist table for those codes too…

PS this in part reminds me of a conversation years ago with Richard et al from @cottagelabs who were mooting, at the time, a service that would take an arbitrary code and try to pattern match the coding scheme it was part of and then look it up in that coding scheme.

PPS hmm, also thinks: maybe names associated with coding schemes could be added to a simple document tagger.

Setting up a Containerised Desktop API server (MySQL + Apache / PHP 5) for the ergast Motor Racing Data API

A couple of days ago, I noticed that Chris Newell had shared the code for his ergast Formula 1 motor racing results database that I use heavily for my F1DtataJunkie doodles. The code is a PHP application that pulls data from the ergast database, which is regularly shared as a MySQL database dump. So I raised an issue asking if a docker containerised version of the application was available, and Chris replied that there wasn’t, but if I wanted to look at creating one…?

…which I did. After a few false starts, I came up with the solution Chris has since pulled into his ergast-f1-api repo.

The pattern is quite a handy one, and I think reusable – I’ll give it a go for spinning up my own API to look up ONS Geography codes when I get a chance – so what’s involved?

The dockerised application is built from two components launched using docker-compose, a MySQL container and an Apache server configured with PHP5:

ergastdb:
  container_name: ergastdb
  build: ergastdb/
  environment:
    MYSQL_ROOT_PASSWORD: f1
    MYSQL_DATABASE: ergastdb
  expose:
    - "3306"

web:
  build: ./lamp
  #image: nimmis/apache-php5
  ports:
    - '8000:80'
  volumes:
    - ./webroot:/var/www/html
    - ./php/api:/php/api
    - ./logs:/var/log/apache2
  links:
    - ergastdb:ergastdb

The server runs the application, makes requests from the linked database container, and returns the result.

As part of my f1datajunkie tinkering, I’d put together a Dockerfle for populating a MySQL database with Chris’ database dump some time ago, so I could reuse that directly.

Which meant all I had to do was get the application up and running… Chris’ original instructions around the API server application were to place the application files into “the root directory” and also add in an Apache .htaccess URL rewrite, which he provided.

Simple… or maybe not..?! Not being an Apache user, it took me a bit of time to actually get things up and running, so here are some of the gotchas that caught me out:

  • where’s the “root directory”?
  • where should the .htaccess file?

Running a couple of simple tests identified the root directory as the root for the files served by the webserver, and a quick search revealed the .htaccess file should go in the same location.

But the redirects didn’t work…

As a test, I wrote a simple rewrite rule that should have redirected any url to the same test file – but no joy…

A bit more testing suggested I needed to enable the mod_rewrite plugin, which I did in the appropriate Dockerfile. But still no joy. Because to use patches like the .htaccess file in a server directory also requires allowing “Overrides” to the base Apache settings – which I did (via a crib) by rewriting the Apache config file, again via the Dockerfile that builds the server container image.

RUN sed -i '/<Directory \/var\/www\/>/,/<\/Directory>/ s/AllowOverride None/AllowOverride All/' /etc/apache2/apache2.conf

RUN a2enmod rewrite

Then the database connection didn’t work… But that was easily fixed: I’d forgotten to change the permission is in the appropriate application file.

Now running the docker-compose file with the command docker-compose up --build -d and it fires up two linked containers. The API is then available via a browser on http://localhost:8000/api/f1 using calls of the form http://localhost:8000/api/f1/2015.json.

So now I have a minimal working example of a PHP application powered by a MySQL database driving a simple web API. Which I can crib from for my own simple APIs…

As to how it could be improved, there are a couple of obvious things:

  • the containers are intended for personal, local use: the database is accessed as root and should be properly secured;
  • I haven’t got a pattern for updating the database when a new database image is released. The current workaround would be to destroy the containers, and the database volume, then rebuild the images and the containers.

PS to use the local ergast API server from R with my ergastR package, I needed to tweak the package to allow me to specify the path the API server. I’ll post details of how I did that later…