Getting the TM351 VM Running on OU OpenStack

One of the original motivations for delivering the TM351 software and services via a virtual machine, with user interfaces provided via a browser, was that we should be able to use the same VM  as a locally run machine on a student’s own computer, or as a hosted machine (accessible via the web) running on an OU server.

A complementary third year equivalent course, TM352 Web, Mobile and Cloud Technologies, uses a Faculty managed OpenStack instance as a dogfooding teaching environment on that course (students learn about cloud stuff in the course, get to deploy some canned machines and develop their own services using OpenStack, and the department develops skills in in deploying and managing such environments with hundreds of users).

I think part of the pitch for the OpenStack cluster was that it would be available to other courses, but a certain level of twitchiness in keeping it stable for the original course use case has meant that getting access to the machine has not been as easy as it might have been.

(There is no dev server that I can access, at least not from a connection outside the OU network. So the only server I can play on is the live server, as used by students. If you’re confident managing OpenStack, this is probably fine (it should be able to cope with lots of tenants with different requirements, right?), but if you’re not, making a dev server, open to all who want to try it out, and available sooner rather than later, probably makes morse sense: more people solving problems, more use cases being explored and ruled out, more issues being debugged; more learning going on generally…)

Whatever.

I’ve finally got an account, and a copy of the TM351 VM image, originally built for VirtualBox, uploaded to it.

You’d think that part at least would have been be easy, but it took the best part of four months or so at least… First, getting an account on the OpenStack server. Second, getting a copy of the TM351 VM image that could be loaded onto it. I got stuck going nowhere trying to convert the original Virtualbox image until it was pointed out to me that there was a VirtualBoxManage tool for doing it (Converting between VM Formats). Faculty advice suggests the clonehd command:

vboxmanage clonehd box-disk001.vmdk /Users/USER/Desktop/tm351.img --format raw

but that looks deprecated in recent versions of VirtualBox to me… The following seems more contemporary:

VBoxManage clonemedium ~/VirtualBox\ VMs/tm351_18J-student/box-disk001.vmdk tm351_18J-student.raw --format RAW

Third, loading the image onto OpenStack. A raw box format image I thought I had managed to create myself came in at 64GB (the original box was ~8GB), but it seems this is because that’s the size of the virtual disk. Presumably vagrant is setting this in my original build (or VirtualBox is defaulting to it?), so one thing I need to figure out is how to reduce it without compromising anything. Looking at Resizing Vagrant box disk space  I wonder if we could move along steps from vmdk to vid to resize and then raw?

Uploading a 64 GB from home to OpenStack using an http file uploader on  the OpenStack user admin page is just asking for trouble, but even copying the image from OU networked machines is not just-do-it-able: it requires copying  the file from one machine to another and then onto the OpenStack server by someone-not-me with the appropriate logins and scp permissions.

(Building the machine on OpenStack myself using an OpenStack vagrant provisioner is not an option on the live server at least: API access addresses seem to only be provided for a private network that I don’t have access to. If we manage to get a development server that I am allowed to access using VPN, or even better, without VPN, and I can get permissions to use the API, and we can connect to things like the apt-get and Pypi/pip repos, using a build provisioner makes sense to me.)

So I there is now an image visible on the OpenStack server.

You’ll note we haven’t tried to brand the OpenStack user’s admin panel at all  (I would have…;-).

What next? Trying to spin up an instance from the image kept giving me errors (I started trying with a small machine instance, then tried creating an instance with ever larger machine flavours — the issue was indeed the 64GB default disk size associated with the image. Faculty IT changed a setting that meant the larger disk sizes would spin up and reported that it worked for them with the VM on a large flavour machine.

But it didn’t for me…  I kept getting the message [Error: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance XXXXXX. I think the issue must have been a permissions thing manifested as a network thing. Faculty IT restarted the image as private to me, (and with my own private network?) and I tried again… (For this reason, I’m not convinced that anyone else just given an account will be able to get their own version of the TM351 VM up and running? I need to understand better what requirements, if any, are placed on the creation of the OpenStack user account for it to work. And I need a second test user account (at least) to test it..)

Anyway – success for me – a running instance of the TM351 VM. And now I could use the OpenStack web console to log in to the machine using the default vagrant credentials. Which I need to change… (and find a sensible method for students to use to change the defaults).

So now I can poke around inside the VM. But I can’t see any of the services it’s running for a couple of reasons: firstly, the VM has no public IP address; secondly, the only port I think I’m allowed to expose publicly is port 80, and there are no services running on port 80. And unlike vagrant and docker, which make it easy to map and expose an arbitrary port inside the VM onto a specified port outside the VM, such as port 80, I haven’t found a way to do that in OpenStack. (The documentation sucks. Really badly. And there is no internal FAQ to give me even the slightest crib as to what to do next.)

The TM352 course materials come to my rescue here, sort of. As OU central academic staff, I can log in to course VLEs and see the published teaching material, although not the student forums. Looking in the current presentation, the materials that show TM352 students how to make their VM visible to the world haven’t been released yet so I can’t see them.. Bah… But I can look at the materials provided to students on the previous presentation… Which are out of date compared to the current version of OpenStack. But never mind, because the materials are enough of a crib to figure out what to do where-ish: Block 2 Part 2: Designing a cloud, 8 Getting started with OpenStack. The essential steps boil down to the following (apols for the vagueness; I don’t want to actually restep through everything to check it works in case I break my current instance; next time I run through from scratch, I’ll tidy up the instructions. Ideally, I’d do a fresh run through in a new, virgin test user account):

  1. Create a new private network for the VM to run on: I seemed to have a network already created, but here’s a howto: under Network, select the Networks option, and then Create Network with the  Admin State as UP (i.e. running and usable) and the Create Subnet box ticked. Use IP/v4 and set an IP address range in CIDR format (e.g. 192.168.0.0/24);
  2. Create a router that interconnects the public network and the private network: from the Network menu select Routers . Set Admin State to UP and External Network to public then Create Router. In the Network Topology view, select the router and then Add Interface, using Subnet set to the private created network and the IP Address left blank.
  3. Configure the network security rules: from Network select Security Groups ; if there’s no default group create one; once there is, select Manage Rules. We need to add three rules:
Rule ALL TCP
Direction Ingress
Remote CIDR
CIDR 0.0.0.0/0
Rule ALL ICMP
Direction Ingress
Remote CIDR
CIDR 0.0.0.0/0
Rule HTTP
Remote CIDR
CIDR 0.0.0.0/0
  1. Create a VM instance from the TM351 image: bearing in mind the previous set-up, choose appropriately!
  2. Attach a public IP address to the VM: in `Network` select Floating IPs and then Allocate IP to Project. With the new floating IP address, select Associate and choose appropriately.

Hopefully now there should be a public IP address associated with the VM and ports 80 and 22 (ssh) exposed. Using the public IP address, from a terminal on my own local machine:

ssh vagrant@VM.IP.ADDR.ESS

followed by the password, and I should be in…

(I can’t help thinking that typing vagrant up is a much easier way to launch a VM. And then vagrant ssh to SSH in…)

Next step – try to see the public services running inside the VM, bearing in mind that we can only access services through port 80.

To test things, we can just try a simple http server on port 80:

python3 -m http.server 80

That works, so port 80 is live on my VM and I can see it from the public internet. So kill the test http server…

Running services inside VM against port 80 requires them to run as root (ports <1024 are privileged), but in the last rebuild of the VM we tried to move away from running everything as root and instead run them under a user account. Which means that the Jupyter server is defined to run under a user account on a non-privileged port.

I went round in circles on this one for getting on for an hour, trying to run Jupyter notebooks on port 80, but running into permissions errors accessing port 80 unless I ran the service as root.  (Things like tail /var/log/syslog helped in the debugging…)

I also had to manually fix the missing notebook directory that the notebook service is supposed to start in. (I think this is another permissions snafu – the service runs as a user but the mkdir guard run via ExecStartPre needs permissions tweaking to run as root using PermissionsStartOnly=true (issue.)

The simplest thing to do is run a proxy like nginx. Which isn’t installed in the VM. No problem, the vagrant user I ssh into the VM with can run via sudo so I should be able to just do a sudo apt-get update && sudo apt-get install -y nginx. Only I can’t because the security rules upstream of the OpenStack server won’t let me. F**k. It’s a Saturday afternoon, and there are zero, no, zilch, none, Faculty IT help files or FAQs that have been shared with me, or that I’m even aware of the existence of, with possible workarounds. But there is Twitter, and various other Saturday working friends, which gives me a result: set up an ssh tunnel and do it via my home machine ( https://stackoverflow.com/questions/36353955/apt-get-install-via-tunnel-proxy-but-ssh-only-from-client-side ):

sudo ssh -R 8899:us.archive.ubuntu.com:80 vagrant@IP.ADDR

With that tunnel set up, inside the VM I can run sudo nano /etc/apt/apt.conf and edit in the following lines:

Acquire::http::Proxy "http://localhost:8899";
Acquire::https::Proxy "https://localhost:8899";

Then I can apt-get update, apt-get install etc inside the VM

sudo apt-get update
sudo apt-get install -y nginx

To try and pre-empt any other issues, it’s worth checking that the required folders (again) are in place (/vagrant/notebooks and /vagrant/openrefine-projects) and with the appropriate owner and group (oustudent:users) permissions:

sudo chown -R oustudent:users /vagrant

As mentioned, the current ExecPreStart in the Jupyter notebook and OpenRefine service definition files were supposed to check folders exist but I think they need changing to incorporate things like following:

PermissionsStartOnly=true
ExecStartPre=/bin/mkdir -p /vagrant/notebooks
ExecStartPre=/bin/chown oustudent:users /vagrant/notebooks

Right, so permissions should be sorted, and the Jupyter notebook server should be runnable against port 80 via the nginx proxy; but I need an nginx config file… If we were running notebooks as a service in the OU this is the sort of thing I’d hope would be in an an examples FAQ, battle tested in an OU context; but we don’t so it isn’t so I rely on other people having solved the problem and being willing to share their answer in public: https://nathan.vertile.com/blog/2017/12/07/run-jupyter-notebook-behind-a-nginx-reverse-proxy-subpath/

Unfortunately, it didn’t work for me out of the can… the post supposedly describes how to proxy the server down a path, but (jumping ahead) the login page URL didn’t rewrite down the path for me; tweaking the proxy definition so that the Jupyter notebook server runs at the top level (/) on port 80 did work though – so here’s the nginx definition file I ended up using:

sudo nano /etc/nginx/sites-available/default

and then:

location / {
  error_page 403 = @proxy_groot;

  deny 127.0.0.1;
  allow all;

  # set a webroot, if there is one
  #root /web_root;
  try_files $uri @proxy_groot;
}

location @proxy_groot {
  #rewrite /notebooks(.*) $1 break;
  proxy_read_timeout 300s;
  proxy_pass http://upstream_groot;

  # pass some extra stuff to the backend
  proxy_set_header Host $host;
  proxy_set_header X-Real-Ip $remote_addr;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}

location ~ /api/kernels/ {
  proxy_pass http://upstream_groot;
  proxy_set_header Host $host;

  # websocket support
  proxy_http_version 1.1;
  proxy_set_header Upgrade "websocket";
  proxy_set_header Connection "Upgrade";
  proxy_read_timeout 86400;
}

location ~ /terminals/ {
  proxy_pass http://upstream_groot;
  proxy_set_header Host $host;

  # websocket support
  proxy_http_version 1.1;
  proxy_set_header Upgrade "websocket";
  proxy_set_header Connection "Upgrade";
  proxy_read_timeout 86400;
}

followed by:

sudo  nginx -s reload

To try to make the notebook server slightly more secure than wide open — it will be running on a public IP address after all — I need to add a password (the original TM351 VM runs everything wide open).

First, create a password hash:

echo -n "my cool password" | sha1sum

then edit the system service file:

sudo nano /lib/systemd/system/jupyter.service

We need to tweak the startup along the lines of:

ExecStart=/usr/local/bin/jupyter notebook --port=8888 --ip=0.0.0.0 --y --log-level=WARN --no-browser --notebook-dir=/vagrant/notebooks --allow-root --NotebookApp.token='' --NotebookApp.password='sha1:WHATEVER' --allow_origin='*'

We can probably drop the --allow-root ? (Although the default notebook user can sudo some commands…)

Reload the daemon to acknowledge the service definition changes and restart the service:

sudo systemctl daemon-reload
sudo systemctl restart jupyter.service

So this seems to work: I can see Jupyter notebook and login via port 80 on the floating public IP address I assigned to the TM351VM instance. I can open a notebook, run cells, call the PostgreSQL and basic Mongo databases at least, open a terminal. What I can’t do is curl or wget or run Python requests to load data files from the internet using a notebook because of the upstream IT network security rules. This is a bit of a blocker for the course. We may be able to finesse a way round with an ssh tunnel in testing, but I don’t think we should be expecting that of our students. (Thinks: how do IT security rules / policies apply when we define activities for students that we expect them to run on their own computers?! File as: whatever… We’ll just have to do something really crappy instead for students. Or set up a best-not-tell-IT proxy on the OU network somewhere…)

The next step is – can I expose the other core teaching application in the VM: OpenRefine?

A possible blocker is that we only have one port exposed on the public internet (port 80) so we need to find a way to expose OpenRefine. Fortunately, the nbserverproxy package allows the Jupyter server to proxy services running on localhost in the VM. So I should be able to run that. But first things first:  pip installs are borked even with an ssh tunnel (open questions on Stack Overflow confirm that this is not just me…).

Okay… pip packages can be downloaded and installed from a local file, so I can download the nbserverproxy pip package on my own machine and then scp it into the running OpenStack hosted VM at /vagrant/notebooks . Then from a notebook inside the VM I can run !pip install --user ./nbserverproxy-master.zip (just to show the notebook is working properly! ;-)  and enable it: ! jupyter serverextension enable --py nbserverproxy.

Restart the notebook server from VM command line and I should be able to see OpenRefine at http://MY.FLOATING.IP.ADDR/proxy/3334/ (the trailing slash is required of the styling fails as the path to the style files is incorrectly resolved). I think that this should also be down the password protected path? i.e. if I hadn’t logged in to the notebook server, I don’t think I should be able to get this far? (NEED TO CHECK.)

One of the VM Easter Eggs, nbdime, is also visible on http://MY.FLOATING.IP.ADDR/proxy/8899/. Go team me… :-)

Grab a snapshot of the working VM in the idle hope that maybe if someone else tries to launch from that image, it will just work. Although things like the network and security rules will presumably need setting up?

For student use, I’d need a simple way / recipe to set up different/personalised ssh credentials into the VM, otherwise anyone with the public IP address could ssh in. This must be a common issue, so it’d be good to see a Faculty OpenStack FAQ suggesting what the possible options are. I guess a simple one is on starting the instance? Can we force keys into the VM when it launches? Another issue is (re)setting the password for the Jupyer notebook server so each student is assigned, or can easily set (and recover….) their own password.

Other next steps: is there something in OpenStack where I can define network settings, security rules, etc, and provide students with an easier way of deploying an TM351 instance on the Faculty OpenStack and making its public services available on the public internet? Can I do this with an OpenStack stack? If so, that would be a handy thing to have an OU OpenStack tutorial for…

This is obvs the sort of support that should be available in Faculty IT tutorials, FAQs, and God Forbid, in person if we’re running the OpenStack server as a Faculty service and trying to encourage people to use it, so that’s what I’ll probably spend my next day of miserable OpenStack hacking doing when I can motivate myself to do it: trying to figure out if and how to make things closer to one click simpler for students to launch their own TM351 VM. (In the first instance for TM351, we want students to be able to run course VMs on an OU server because they’re struggling with getting things running on their own computer; this is often highly correlated with them having poor computer skills, poor problem solving skills, and poor instruction following skills, so we’re on a hiding to nothing if we expect them to launch instances, choose flavours, create routers, create and assign floating IP addresses and set up security rules. On their own. Because I’m not going to do that tech support for them. (I am ranty typing; my keyboard is suddenly VERY LOUD. [REDACTED])