Rethinking: Distance Education === Bring Your Own Device?

In passing, an observation…

Many OU modules require students to provide their own computer subject to a university wide “minimum computer specificiation” policy. This policy is a cross-platform one (students can run Windows, Mac or Linux machines) but does allow students to run quite old versions of operating systems. Because some courses require students to install desktop software applications, this also means that tablets and netbooks (eg Chromebooks) do not pass muster.

On the module I work primarily on, we supply students with a virtual machine preconfigured to meet the needs of the course. The virtual machine runs on a cross-platform application (Virtualbox) and will run on a min spec machine, although there is a hefty disk space requirement: 15GB of free space required to install and run the VM (plus another 15-20GB you should always have free anyway if you want your computer to keep running properly, be able to install updates etc.)

Part of the disk overhead comes from another application we require students to use called vagrant. This is a “provisioner” application that manages the operation of the VirtualBox virtual machine from a script we provide to students. The vagrant application caches the raw image of the VM we distribute so that fresh new instances of it can be created. (This means students can throw away the wokring copy of their VM and create a fresh one if they break things; trust me, in distance edu, this is often the best fix.)

One of the reasons why we (that is, I…) took the vagrant route for managing the VM was that it provided a route to ship VM updates to students, if required: just provide them with a new Vagrantfile (a simple text file) that is used to manage the VM and add in an update routine to it. (In four years of running the course, we havenlt actually done this…)

Another reason for using Vagrant was that it provides an abstraction layer between starting and stopping the virtual machine (via a simple commandline command such as vagrant up, or desktop shortcut that runs a similar command) and the virtual machine application that runs the virtual machine. In our case, vagrant instructs Virtualbox running on the student’s own computer, but we can also create Vagrantfiles that allow students to launch the VM on a remote host if they have credentials (and credit…) for that remote host. For example, the VM could be run on Amazon Web Services/AWS, Microsoft Azure, Google Cloud, Linode, or Digital Ocean. Or on an OU host, if we had one.

For the next presentation of the module, I am looking to move away from the Virtualbox VM and move the VM into a Docker container†. Docker offers an abstraction layer in much the same way that vagrant does, but using a different virtualisation model. Specifically, a simple Docker command can be used to launch a Dockerised VM on a student’s own computer, or on a remote host (AWS, Azure, Google Cloud, Digital Ocean, etc.)

We could use separate linked Docker containers for each service used in the course — Jupyter notebooks, PostgreSQL, MongoDB, OpenRefine — or we could use a monolithic container that includes all the services. There are advantages and disadvantages to each that I really do need to set down on paper/in a blog post at some point…

So how does this help in distance education?

I’ve already mentioned that we require students to provide a certain minimum specification computer, but for some courses, this hampers the activities we can engage students in. For example, in our databases course, giving students access to a large database running on their own computer may not be possible; for an upcoming machine learning course, access to a GPU is highly desirable for anything other than really simple training examples; in an updated introductory robotics module, using realistic 3D robot simulators for even simple demos requires access to a gamer level (GPU supported) computer.

In a traditional university, physical access to computers and computer labs running pre-installed, university licensed software packages on machines capable of providing them for students who can’t run the same on their own machines may be available.

In my (distance learning) institution, access to university hosted software is not the norm: students are expected to provide their own computer hardware (at least to minimum spec level) and install the software on it themselves (albeit software we provide, and software that we often build installers for, at least for users of Windows machines).

What we don’t do, however, is either train students in how to provision their own remote servers, or provide software to them that can easily be provisioned on remote servers. (I noted above that our vagrant manager could be used to deploy VMs to remote servers, and I did produce demo Vagrantfiles to support this, but it went no further than that.)

This has made me realise that we make OUr distance learning students pretty much wholly responsible for meeting any computational needs we require of them, whilst at the same time not helping them develop skills that allow that them to avail themselves of self-service, affordable, metered remote computation-on-tap (albeit with the constraint of requiring a netwrok connection to access the remote service).

So what I’m thinking now is that now really is the time to start upskilling OUr distance learners, at least in disciplines that are computationally related, early on and in the following ways:

  1. a nice to have — provide some academic background: teach students about what virtualisation is;

  2. an essential skill, but with a really low floor — technical skills training: show students how to launch virtual servers of their own.

We should also make software available that is packaged in a way that the same environment can be run locally or remotely.

Another nice to have might be helping students reason about personal economic consequences, such as the affordability of different approaches in their local situation, which is to say: buying a computer and running things locally vs. buying something that can run a browser and run things remotely over a network connection.

As much as anything, this is about real platform independence, being open as to, and agnostic of, what physical compute device a student has available at home (whether it’s a gamer spec desktop computer or a bottom of the range Chromebook) and providing them with both software packages that really can run anywhere and the tools and skills to help students run them anywhere.

In many respects, using abstraction layer provisioning tools like vagrant and Docker, the skills to run software remotely are the same as running them locally, with the additional overhead that students have a once only requirement to sign up to a remote host and set up credentials that allow them to access the remote service from the provisioner service that runs on their local machine.

Dockerising / Binderising the TM351 Virtual Machine

Just before the Chirstmas break, I had a go recasting the TM351 VM as a Docker container built from a Github repository using MyBinder (which is to say: I had a go at binderising the VM…). Long time readers will know that this virtual machine has been used to deliver a computing environment to students on the OU TM351 Data managament and Analysis course since 2016. The VM itself is built using Virtualbox provisioned using vagrant and then distributed originally via a mailed out USB stick or alternatively (which is to say, unofficially; though my preferrred route) as a download from VagrantCloud.

The original motivation for using Vagrant was a hope that we’d be able to use a range of provisioners to construct VM images for a range of virtualisation platforms, but that’s never happened. We still ship a Virtualbox image that causes problems to a small number of Windows users each year, rather than a native HyperV image, because: a) I refuse to buy a Windows machine so I can build the HyperV image myself; b) no-one else sees benefit from offering multiple images (perhaps because they don’t provide the tech support…).

For all our supposed industrial scale at delivering technology backed “solutions”, the VM is built, maintained and supported on a cottage industry basis from within the course team.

For a scaleable solution that would work:

a) within a module presentation;
b) across module presentations;
c) across modules

I think we should be looking at some sort of multi-user hosted service, with personal accounts and persistent user directories. There are various ways I can imagine delivering this, each that creates its own issues as well solving particular problems.

As a quick example, here are two possible extremes:

1) one JupyterHub server to rule them all: every presentation, every module, one server. JupyterHub can be configured to use the DockerSpawner to present different kernel container options to the user, (although I’m not sure if this can be personalised on a per user basis? If not, that feature would make for a useful contribution back…), so a student could be presented with a list of containers for each of their modules.

2) one JupyterHub server per module per presentation: this requires more admin and means servers everywhere, but it separates concerns…

The experimental work on a “persistent Binderhub deployment” also looks interesting, offering the possibility of launching arbitrary environments (as per Binderhub) against personally mounted file area (as per JupyterHub).

Providing a “takeaway” service is also one of my red lines: a student should be free to take away any computing environment we provide them with. One in-testing hosted version of the TM351 VM comes, I believe, with centralised Postgres and MongoDB servers that students have accounts on and must log in to. Providing a mutli-user service, rather than a self-contained personal server, raises certain issues regarding support, but also denies the student the ability to take away the database service and use it for their own academic, personal or even work purposes. A fundamentally wrong approach, in my opinion. It’s just not open.

So… binderising the VM…

When Docker containers first made their appearance, best practice seemed to be to have one service per container, and then wire containers together using docker-compose to provide a more elaborate environment. I have experimented in the past with decoupling the TM351 services into separate containers and then launching them using docker-compose, but it;s never really gone anywhere…

In the communities of practice that I come across, more emphasis now seems to be on putting everything into a single container. Binderhub is also limited to launching a single container (I don’t think there is a Jupyter docker-compose provisioner yet?) so that pretty much seals it… All in one…

A proof-of-concept Binderised version of the TM351 VM can be found here: innovationOUtside/tm351vm-binder.

It currently includes:

  • an OU branded Jupyter notebook server running jupyter-server-proxy;
  • the TM351 Python environment;
  • an OpenRefine server proxied using jupyter-server-proxy;
  • a Postgres server seeded (I think? Did I do that yet?!) with the TM351 test db (if I haven’t set it up as per the VM, the code is there that shows how to do it…);
  • a MongoDB server serving the small accidents dataset that appears in the TM351 VM.

What is not included:

  • the sharded Mongo DB activity; (the activity it relates to as presented at the moment is largely pointless, IMHO; we could deminstrate the sharding behaviour with small datasets, and if we did want to provided queries over the large dataset, that might make sense as something we host centrally and et students log in to query. Which would also give us another teachng point.)

The Binder configuration is provided in the binder/ directory. An Anaconda binder/environment.yml file is used to install packages that are complicated to build or install otherwise, such as Postgres.

The binder/postBuild file is run as a shell script responsible for:

  • configuring the Postgres server and seeding its test database;
  • installing and seeding the MongoDB database;
  • installing OpenRefine;
  • installing Python packages from binder/requirements.txt (the requirements.txt is not otherwise automatically handled by Binderhub — it is trumped by the environment.yml file);
  • enabling required Jupyter extensions.

If any files handled via postBuild need to be persisted, they can be written into $CONDA_DIR.

(As a reference, I have also created some simple standalone template repos showing how to configure Postgres and MongoDB in Binderhub/repo2docker environments. There’s also a neo4j demo template too.)

The binder/start file is responsible for:

  • defining environment variables and paths required at runtime;
  • starting the PostgreSQL and MongoDB database services.

(OpenRefine is started by the user from the notebook server homepage or JupyterLab. There’s a standalone OpenRefine demo repo too…)

Launching the repo using MyBinder will build the TM351 environment (if a Binder image does not already exist) and start the required services. The repo can also be used to build an environment locally using repo2docker.

As well as building a docker image within the Binderhub context, the repo is also automated with a Github Action that is used to build release commits using repo2docker and then push the resulting container to Docker Hub. The action can be found in the .github/workflows directory. The container can be found as ousefuldemos/tm351-binderised:latest. When running a container derived from this image, the Jupyter notebook server runs on the default port 8888 inside the container, and the OpenRefine application proxied through it; the database services should autostart. The notebook server is started with a token required, so you need to spot the token from the start up logs of the container – which means you shouldn’t run it with the -d flag. A variant of the following command should work (I’m not sure how you reliably specify the correct $PWD (present working directory) mount directory from a Windows command prompt):

docker run --name tm351test --rm -p 8895:8888 -v $PWD/notebooks:/notebooks -v $PWD/openrefine_projects:/openrefine_projects ousefuldemos/tm351-binderised:latest

Probably easier is to use the Kitematic inspired containds “personal Binderhub” app which can capture and handle the token auomatically and let you click straight through into the running notebook server. Either use containds to build the image locally by providing the repo URL, or select a new image and search for tm351: the ousefuldemos/tm351-binderised image is the one you want. When prompted, select the “standard” launch route, NOT the ‘Try to start Jupyter notebook’ route.

Although I’ve yet to try it (I ran out of time before the break), I’m hopeful that the prebuilt container should work okay with JupyterHub. If it does, this means the innovationOUtside/tm351vm-binder repo can serve as a template for building images that can be used to deploy authenticated OU computing environments via an OU authenticated and OU hosted JupyterHub server (one can but remain hopeful!).

If you try out the environment, either using MyBinder, via repo2docker, or from the pre-built Docker image, please let me know either here, via the repo issues, or howsoever: a) whether it worked; b) whether it didn’t; c) whether there were any (other) issues. Any and all feedback would be much appreciated…

vagrant share – sharing a vagrant launched headless VM service on the public interwebz

Lest I forget (which I had…):

vagrant share lets you launch a VM using vagrant and share the environment using ngrok in three ways:

  • via public URLs (expose your http ports to the web, rather than locally);
  • via ssh;
  • via vagrant connect (connect to any exposed VM port from a remote location).

So this could be handy for remote support with students… If we tell them to install the vagrant share plugin, then we can offer remote support…

Getting the TM351 VM Running on OU OpenStack

One of the original motivations for delivering the TM351 software and services via a virtual machine, with user interfaces provided via a browser, was that we should be able to use the same VM  as a locally run machine on a student’s own computer, or as a hosted machine (accessible via the web) running on an OU server.

A complementary third year equivalent course, TM352 Web, Mobile and Cloud Technologies, uses a Faculty managed OpenStack instance as a dogfooding teaching environment on that course (students learn about cloud stuff in the course, get to deploy some canned machines and develop their own services using OpenStack, and the department develops skills in in deploying and managing such environments with hundreds of users).

I think part of the pitch for the OpenStack cluster was that it would be available to other courses, but a certain level of twitchiness in keeping it stable for the original course use case has meant that getting access to the machine has not been as easy as it might have been.

(There is no dev server that I can access, at least not from a connection outside the OU network. So the only server I can play on is the live server, as used by students. If you’re confident managing OpenStack, this is probably fine (it should be able to cope with lots of tenants with different requirements, right?), but if you’re not, making a dev server, open to all who want to try it out, and available sooner rather than later, probably makes morse sense: more people solving problems, more use cases being explored and ruled out, more issues being debugged; more learning going on generally…)

Whatever.

I’ve finally got an account, and a copy of the TM351 VM image, originally built for VirtualBox, uploaded to it.

You’d think that part at least would have been be easy, but it took the best part of four months or so at least… First, getting an account on the OpenStack server. Second, getting a copy of the TM351 VM image that could be loaded onto it. I got stuck going nowhere trying to convert the original Virtualbox image until it was pointed out to me that there was a VirtualBoxManage tool for doing it (Converting between VM Formats). Faculty advice suggests the clonehd command:

vboxmanage clonehd box-disk001.vmdk /Users/USER/Desktop/tm351.img --format raw

but that looks deprecated in recent versions of VirtualBox to me… The following seems more contemporary:

VBoxManage clonemedium ~/VirtualBox\ VMs/tm351_18J-student/box-disk001.vmdk tm351_18J-student.raw --format RAW

Third, loading the image onto OpenStack. A raw box format image I thought I had managed to create myself came in at 64GB (the original box was ~8GB), but it seems this is because that’s the size of the virtual disk. Presumably vagrant is setting this in my original build (or VirtualBox is defaulting to it?), so one thing I need to figure out is how to reduce it without compromising anything. Looking at Resizing Vagrant box disk space  I wonder if we could move along steps from vmdk to vid to resize and then raw?

Uploading a 64 GB from home to OpenStack using an http file uploader on  the OpenStack user admin page is just asking for trouble, but even copying the image from OU networked machines is not just-do-it-able: it requires copying  the file from one machine to another and then onto the OpenStack server by someone-not-me with the appropriate logins and scp permissions.

(Building the machine on OpenStack myself using an OpenStack vagrant provisioner is not an option on the live server at least: API access addresses seem to only be provided for a private network that I don’t have access to. If we manage to get a development server that I am allowed to access using VPN, or even better, without VPN, and I can get permissions to use the API, and we can connect to things like the apt-get and Pypi/pip repos, using a build provisioner makes sense to me.)

So I there is now an image visible on the OpenStack server.

You’ll note we haven’t tried to brand the OpenStack user’s admin panel at all  (I would have…;-).

What next? Trying to spin up an instance from the image kept giving me errors (I started trying with a small machine instance, then tried creating an instance with ever larger machine flavours — the issue was indeed the 64GB default disk size associated with the image. Faculty IT changed a setting that meant the larger disk sizes would spin up and reported that it worked for them with the VM on a large flavour machine.

But it didn’t for me…  I kept getting the message [Error: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance XXXXXX. I think the issue must have been a permissions thing manifested as a network thing. Faculty IT restarted the image as private to me, (and with my own private network?) and I tried again… (For this reason, I’m not convinced that anyone else just given an account will be able to get their own version of the TM351 VM up and running? I need to understand better what requirements, if any, are placed on the creation of the OpenStack user account for it to work. And I need a second test user account (at least) to test it..)

Anyway – success for me – a running instance of the TM351 VM. And now I could use the OpenStack web console to log in to the machine using the default vagrant credentials. Which I need to change… (and find a sensible method for students to use to change the defaults).

So now I can poke around inside the VM. But I can’t see any of the services it’s running for a couple of reasons: firstly, the VM has no public IP address; secondly, the only port I think I’m allowed to expose publicly is port 80, and there are no services running on port 80. And unlike vagrant and docker, which make it easy to map and expose an arbitrary port inside the VM onto a specified port outside the VM, such as port 80, I haven’t found a way to do that in OpenStack. (The documentation sucks. Really badly. And there is no internal FAQ to give me even the slightest crib as to what to do next.)

The TM352 course materials come to my rescue here, sort of. As OU central academic staff, I can log in to course VLEs and see the published teaching material, although not the student forums. Looking in the current presentation, the materials that show TM352 students how to make their VM visible to the world haven’t been released yet so I can’t see them.. Bah… But I can look at the materials provided to students on the previous presentation… Which are out of date compared to the current version of OpenStack. But never mind, because the materials are enough of a crib to figure out what to do where-ish: Block 2 Part 2: Designing a cloud, 8 Getting started with OpenStack. The essential steps boil down to the following (apols for the vagueness; I don’t want to actually restep through everything to check it works in case I break my current instance; next time I run through from scratch, I’ll tidy up the instructions. Ideally, I’d do a fresh run through in a new, virgin test user account):

  1. Create a new private network for the VM to run on: I seemed to have a network already created, but here’s a howto: under Network, select the Networks option, and then Create Network with the  Admin State as UP (i.e. running and usable) and the Create Subnet box ticked. Use IP/v4 and set an IP address range in CIDR format (e.g. 192.168.0.0/24);
  2. Create a router that interconnects the public network and the private network: from the Network menu select Routers . Set Admin State to UP and External Network to public then Create Router. In the Network Topology view, select the router and then Add Interface, using Subnet set to the private created network and the IP Address left blank.
  3. Configure the network security rules: from Network select Security Groups ; if there’s no default group create one; once there is, select Manage Rules. We need to add three rules:
Rule ALL TCP
Direction Ingress
Remote CIDR
CIDR 0.0.0.0/0
Rule ALL ICMP
Direction Ingress
Remote CIDR
CIDR 0.0.0.0/0
Rule HTTP
Remote CIDR
CIDR 0.0.0.0/0
  1. Create a VM instance from the TM351 image: bearing in mind the previous set-up, choose appropriately!
  2. Attach a public IP address to the VM: in `Network` select Floating IPs and then Allocate IP to Project. With the new floating IP address, select Associate and choose appropriately.

Hopefully now there should be a public IP address associated with the VM and ports 80 and 22 (ssh) exposed. Using the public IP address, from a terminal on my own local machine:

ssh vagrant@VM.IP.ADDR.ESS

followed by the password, and I should be in…

(I can’t help thinking that typing vagrant up is a much easier way to launch a VM. And then vagrant ssh to SSH in…)

Next step – try to see the public services running inside the VM, bearing in mind that we can only access services through port 80.

To test things, we can just try a simple http server on port 80:

python3 -m http.server 80

That works, so port 80 is live on my VM and I can see it from the public internet. So kill the test http server…

Running Everything Through Port 80

Running services inside VM against port 80 requires them to run as root (ports <1024 are privileged), but in the last rebuild of the VM we tried to move away from running everything as root and instead run them under a user account. Which means that the Jupyter server is defined to run under a user account on a non-privileged port.

I went round in circles on this one for getting on for an hour, trying to run Jupyter notebooks on port 80, but running into permissions errors accessing port 80 unless I ran the service as root.  (Things like tail /var/log/syslog helped in the debugging…)

I also had to manually fix the missing notebook directory that the notebook service is supposed to start in. (I think this is another permissions snafu – the service runs as a user but the mkdir guard run via ExecStartPre needs permissions tweaking to run as root using PermissionsStartOnly=true (issue.)

The simplest thing to do is run a proxy like nginx. Which isn’t installed in the VM. No problem, the vagrant user I ssh into the VM with can run via sudo so I should be able to just do a sudo apt-get update && sudo apt-get install -y nginx. Only I can’t because the security rules upstream of the OpenStack server won’t let me. F**k. It’s a Saturday afternoon, and there are zero, no, zilch, none, Faculty IT help files or FAQs that have been shared with me, or that I’m even aware of the existence of, with possible workarounds. But there is Twitter, and various other Saturday working friends, which gives me a result: set up an ssh tunnel and do it via my home machine ( https://stackoverflow.com/questions/36353955/apt-get-install-via-tunnel-proxy-but-ssh-only-from-client-side ):

sudo ssh -R 8899:us.archive.ubuntu.com:80 vagrant@IP.ADDR

With that tunnel set up, inside the VM I can run sudo nano /etc/apt/apt.conf and edit in the following lines:

Acquire::http::Proxy "http://localhost:8899";
Acquire::https::Proxy "https://localhost:8899";

Then I can apt-get update, apt-get install etc inside the VM

sudo apt-get update
sudo apt-get install -y nginx

To try and pre-empt any other issues, it’s worth checking that the required folders (again) are in place (/vagrant/notebooks and /vagrant/openrefine-projects) and with the appropriate owner and group (oustudent:users) permissions:

sudo chown -R oustudent:users /vagrant

As mentioned, the current ExecPreStart in the Jupyter notebook and OpenRefine service definition files were supposed to check folders exist but I think they need changing to incorporate things like following:

PermissionsStartOnly=true
ExecStartPre=/bin/mkdir -p /vagrant/notebooks
ExecStartPre=/bin/chown oustudent:users /vagrant/notebooks

Right, so permissions should be sorted, and the Jupyter notebook server should be runnable against port 80 via the nginx proxy; but I need an nginx config file… If we were running notebooks as a service in the OU this is the sort of thing I’d hope would be in an an examples FAQ, battle tested in an OU context; but we don’t so it isn’t so I rely on other people having solved the problem and being willing to share their answer in public: https://nathan.vertile.com/blog/2017/12/07/run-jupyter-notebook-behind-a-nginx-reverse-proxy-subpath/

Unfortunately, it didn’t work for me out of the can… the post supposedly describes how to proxy the server down a path, but (jumping ahead) the login page URL didn’t rewrite down the path for me; tweaking the proxy definition so that the Jupyter notebook server runs at the top level (/) on port 80 did work though – so here’s the nginx definition file I ended up using:

sudo nano /etc/nginx/sites-available/default

and then:

location / {
  error_page 403 = @proxy_groot;

  deny 127.0.0.1;
  allow all;

  # set a webroot, if there is one
  #root /web_root;
  try_files $uri @proxy_groot;
}

location @proxy_groot {
  #rewrite /notebooks(.*) $1 break;
  proxy_read_timeout 300s;
  proxy_pass http://upstream_groot;

  # pass some extra stuff to the backend
  proxy_set_header Host $host;
  proxy_set_header X-Real-Ip $remote_addr;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}

location ~ /api/kernels/ {
  proxy_pass http://upstream_groot;
  proxy_set_header Host $host;

  # websocket support
  proxy_http_version 1.1;
  proxy_set_header Upgrade "websocket";
  proxy_set_header Connection "Upgrade";
  proxy_read_timeout 86400;
}

location ~ /terminals/ {
  proxy_pass http://upstream_groot;
  proxy_set_header Host $host;

  # websocket support
  proxy_http_version 1.1;
  proxy_set_header Upgrade "websocket";
  proxy_set_header Connection "Upgrade";
  proxy_read_timeout 86400;
}

followed by:

sudo  nginx -s reload

To try to make the notebook server slightly more secure than wide open — it will be running on a public IP address after all — I need to add a password (the original TM351 VM runs everything wide open).

First, create a password hash:

echo -n "my cool password" | sha1sum

then edit the system service file:

sudo nano /lib/systemd/system/jupyter.service

We need to tweak the startup along the lines of:

ExecStart=/usr/local/bin/jupyter notebook --port=8888 --ip=0.0.0.0 --y --log-level=WARN --no-browser --notebook-dir=/vagrant/notebooks --allow-root --NotebookApp.token='' --NotebookApp.password='sha1:WHATEVER' --allow_origin='*'

We can probably drop the --allow-root ? (Although the default notebook user can sudo some commands…)

Reload the daemon to acknowledge the service definition changes and restart the service:

sudo systemctl daemon-reload
sudo systemctl restart jupyter.service

So this seems to work: I can see Jupyter notebook and login via port 80 on the floating public IP address I assigned to the TM351VM instance. I can open a notebook, run cells, call the PostgreSQL and basic Mongo databases at least, open a terminal. What I can’t do is curl or wget or run Python requests to load data files from the internet using a notebook because of the upstream IT network security rules. This is a bit of a blocker for the course. We may be able to finesse a way round with an ssh tunnel in testing, but I don’t think we should be expecting that of our students. (Thinks: how do IT security rules / policies apply when we define activities for students that we expect them to run on their own computers?! File as: whatever… We’ll just have to do something really crappy instead for students. Or set up a best-not-tell-IT proxy on the OU network somewhere…)

The next step is – can I expose the other core teaching application in the VM: OpenRefine?

A possible blocker is that we only have one port exposed on the public internet (port 80) so we need to find a way to expose OpenRefine. Fortunately, the nbserverproxy package allows the Jupyter server to proxy services running on localhost in the VM. So I should be able to run that. But first things first:  pip installs are borked even with an ssh tunnel (open questions on Stack Overflow confirm that this is not just me…).

Okay… pip packages can be downloaded and installed from a local file, so I can download the nbserverproxy pip package on my own machine and then scp it into the running OpenStack hosted VM at /vagrant/notebooks . Then from a notebook inside the VM I can run !pip install --user ./nbserverproxy-master.zip (just to show the notebook is working properly! ;-)  and enable it: ! jupyter serverextension enable --py nbserverproxy.

Restart the notebook server from VM command line and I should be able to see OpenRefine at http://MY.FLOATING.IP.ADDR/proxy/3334/ (the trailing slash is required of the styling fails as the path to the style files is incorrectly resolved). I think that this should also be down the password protected path? i.e. if I hadn’t logged in to the notebook server, I don’t think I should be able to get this far? (NEED TO CHECK.)

One of the VM Easter Eggs, nbdime, is also visible on http://MY.FLOATING.IP.ADDR/proxy/8899/. Go team me… :-)

Grab a snapshot of the working VM in the idle hope that maybe if someone else tries to launch from that image, it will just work. Although things like the network and security rules will presumably need setting up?

For student use, I’d need a simple way / recipe to set up different/personalised ssh credentials into the VM, otherwise anyone with the public IP address could ssh in. This must be a common issue, so it’d be good to see a Faculty OpenStack FAQ suggesting what the possible options are. I guess a simple one is on starting the instance? Can we force keys into the VM when it launches? Another issue is (re)setting the password for the Jupyer notebook server so each student is assigned, or can easily set (and recover….) their own password.

Other next steps: is there something in OpenStack where I can define network settings, security rules, etc, and provide students with an easier way of deploying an TM351 instance on the Faculty OpenStack and making its public services available on the public internet? Can I do this with an OpenStack stack? If so, that would be a handy thing to have an OU OpenStack tutorial for…

This is obvs the sort of support that should be available in Faculty IT tutorials, FAQs, and God Forbid, in person if we’re running the OpenStack server as a Faculty service and trying to encourage people to use it, so that’s what I’ll probably spend my next day of miserable OpenStack hacking doing when I can motivate myself to do it: trying to figure out if and how to make things closer to one click simpler for students to launch their own TM351 VM. (In the first instance for TM351, we want students to be able to run course VMs on an OU server because they’re struggling with getting things running on their own computer; this is often highly correlated with them having poor computer skills, poor problem solving skills, and poor instruction following skills, so we’re on a hiding to nothing if we expect them to launch instances, choose flavours, create routers, create and assign floating IP addresses and set up security rules. On their own. Because I’m not going to do that tech support for them. (I am ranty typing; my keyboard is suddenly VERY LOUD. [REDACTED])

Seeding Shared Folders With Files Distributed via a VM

For the first few presentations of our Data Management and Analysis course, the course VM has been distributed to students via a USB mailing. This year, I’m trying to move to a model whereby the primary distribution is via a download from VagrantCloud (students manage the VM using Vagrant), though we’re also hoping to be able to offer access to an OU OpenStack hosted VM to any student’s who really need it.

For students on Microsoft Windows computers, an installer installs Virtualbox and vagrant from installers distributed via the USB memory stick. This in part derives from the policy of fixing versions of as much as we can so that it can be tested in advance. The installer also creates a working directory for the course that will be shared by the VM, and copies required files, again from the memory stick, into the shared folder. On Macs and Linux, students have to do this setup themselves.

One of the things I have consciouslystarted trying to do is move the responsibility for satisficing of some of the installation requirements into the Vagrantfile. (I’m also starting to think they should be pushed even deeper into the VM itself.)

For example, as some of the VM services expect particular directories to exist in the shared directory, we have a couple of defensive measures in place:

  • the Vagrantfile creates any required, yet missing, subdirectories in the shared directory;
            #Make sure that any required directories are created
            config.vm.provision :shell, :inline => <<-SH
                mkdir -p /vagrant/notebooks
                mkdir -p /vagrant/openrefine_projects
                mkdir -p /vagrant/logs
                mkdir -p /vagrant/data
                mkdir -p /vagrant/utilities
                mkdir -p /vagrant/backups
                mkdir -p /vagrant/backups/postgres-backup/
                mkdir -p /vagrant/backups/mongo-backup/	
            SH
    

  • start up scripts for services that require particular directories check they exist before they are started and create them if they are missing. For example, in the service file, go defensive with something like ExecStartPre=mkdir -p /vagrant/notebooks.

The teaching material associated with the (contents of) the VM is distributed using a set of notebooks downloaded from the VLE. Part of the reason for this is that it delays the point at which the course notebooks must be frozen: the USB is mastered late July/early August for a mailing in September and course start in October.

As well as the course notebooks are a couple of informal installation test notebooks. This can be frozen along with the VM and distributed inside it, but the question then arises. So this year I am trying out a simple pattern that bakes test files into the VM and then uses the Vagranfile to copy the files into the shared directory on its first run with a particular shared folder:

config.vm.provision :shell, :inline => <<-SH
    if [ ! -f /vagrant/.firstrun_nbcopy.done ]; then
        # Trust notebooks in immediate child directories of notebook directory
        files=(`find /opt/notebooks/* -maxdepth 2 -name "*.ipynb"`)
        if [ ${#files[@]} -gt 0 ]; then
            jupyter trust /opt/notebooks/*.ipynb;
            jupyter trust /opt/notebooks/*/*.ipynb;
        fi
        #Copy notebooks into shared directory
        cp -r /opt/notebooks/. /vagrant/notebooks
        touch /vagrant/.firstrun_nbcopy.done
    fi
   SH

This pattern allows files shipped inside the VM to be copied into the shared folder once it is mounted into the VM from host. The files will then persist inside the shared directory, along with a hidden flag file to say the files have been copied. I’m not sure about the benefits of auto-running something inside the VM to manage this copying? Or whether to check that a more recent copy of the files to be copied doesn’t already exist in the shared folder before copying on the first run in the folder?

Fragment – TM351 Services Architected for Online Access

When we put together the original  TM351 VM, we wanted a single, self-contained installable environment capable of running all the services required to complete the practical activities defined for the course. We also had a vision that the services should be capable of being accessed remotely.

With a bit of luck, we’ll have access to an OU OpenStack environment any time soon that will let us start experimenting with a remote / online VM provision, at least for a controlled number of students. But if we knew that a particular cohort of students were only ever going to access the services remotely, would a VM be the best solution?

For example, the services we run are:

  • Jupyter notebooks
  • OpenRefine
  • PostgreSQL
  • MongoDB

Jupyter notebooks could be served via a single Jupyter Hub instance, albeit with persistence enable on individual accounts so students could save their own notebooks.

Access to PostgreSQL could be provided via a single Postgres DB with students logging in under their own accounts and accessing their own schema.

Similarly – presumably? – for MongoDB (individual user accounts accessing individual databases). We might need to think about something different for the sharded Mongo activity, such as a containerised solution (which could also provide an opportunity to bring the network partitioning activity I started to sketch out way back when).

OpenRefine would require some sort of machinery to fire up an OpenRefine container on demand, perhaps with a linked persistent data volume. It would be nice if we could use Binderhub for that, or perhaps DIT4C style infrastructure…

Sharing Folders into VMs on Different Machines Using Dropbox, Google Drive, Microsoft OneDrive etc

Ever since I joined the OU, I’ve believed in trying to deliver distance education courses in an agile and responsive way, which is to say: making stuff up for students whilst the course is in presentation.

This is generally not done (by course/module teams at least) because the aim of most course/module teams is to prepare the course so thoroughly that it can “just” be presented to students.

Whatever.

I personally think we should try to improve the student experience of the course as it presents if we can by being responsive and reactive to student questions and issues.

So… TM351, the data management course that uses a VM, has started again, and issues / questions are already starting to hit the forums.

One of the questions – which I’d half noted but never really thought through in previous presentations (my not iterating/improving the course experience in, or between, previous presentations)  – related to sharing Jupyter notebooks across different machines using Google Drive (equally, Dropbox or Microsoft OneDrive).

The VirtualBox VM we use is fired up using the vagrant provisioner. A Vagrantfile defines various configuration settings – which ports are exposed by the VM, for example. By default, the contents of the folder in which vagrant is started up in are shared into the VM. At the same time, vagrant creates a hidden .vagrant folder that contains state relating to the instance of that VM.

The set up on a single machine is something like this:

If a student wants to work across several machines, they need to share their working course files (Jupyter notebooks, and so on) but not the VM machine state. Which is to say, they need a set up more like the following:

For students working across several machines, it thus makes sense to have all project files in one folder and a separate .vagrant settings folder on each separate machine.

Checking the vagrant docs, it seems as if this is quite manageable using the synced folder configuration settings.

The default copies the current project folder (containing the vagrantfile and from which vagrant is rum), which I’m guessing is a setting something like:

config.vm.synced_folder "./", "/vagrant"

By explicitly setting this parameter, we can decide how we want the mapping to occur. For example:

config.vm.synced_folder "/PATH/ON/HOST", "/vagrant"

allows you to to specify the folder you want to share into the VM. Note that the /PATH/ON/HOST folder needs to be created before trying to share it.

To put the new shared directory into effect, reload and reprovision the VM. For example:

vagrant reload --provision

Student notebooks located in the notebooks folder of that shared directory should now be available in the VM. Furthermore, if the shared folder is itself in a webshared folder (for example, a synced Dropbox, Google Drive or Microsoft OneDrive folder) it should be available wherever that folder is synched to.

For example, on a Mac (where ~ is an alias to my home directory), I can create a directory in my dropbox folder ~/Dropbox/TM351VMshare and then map this into the VM using by adding the following line to the Vagrantfile:

config.vm.synced_folder "~/Dropbox/TM351VMshare", "/vagrant"

Note the possibility of slight confusion – the shared folder will not now be the folder from which vagrant is run (unless the folder are running from is /PATH/ON/HOST ).

Furthermore, the only thing that needs to be in the folder from which vagrant is run is the Vagrantfile and the hidden .vagrant folder that vagrant creates.

Fingers crossed this recipe works…;-)