After a fun chat with Jim Groom this morning – even after all these years, we’ve still never met in person – I thought I should get round to finishing off this post, modified slightly in light of today’s chat…
A couple of months ago, I signed up for some online webhosting from Reclaim Hosting (which I can heartily recommend:-), in part because I wanted to spend a bit of time hacking some #opendata related WordPress plugins (first attempt described here); and my hosting on WordPress-dot-com doesn’t allow much in the way of tech customisation…
Reclaim offers web hosting, which is to say: a place to park several domains of my own, host blogs of my own customisation, manage email associated with my domains, handle analytics and logging, and publish a variety of other web style applications of my own choosing.
The web applications on offer are 1-click installable (ish – there may be a various custom settings associated with any particular application) using cPanel and installatron.
This is great for web hosting BUT the applications on offer are, in the main, applications associated with web stuff. As compared to applications associated with scientific, engineering, or digital humanities coursework, for example; (“scholarly apps”, perhaps?!) So for example, for OUr Data Management and Analysis (TM351) course, students will be running Jupyter notebooks, OpenRefine, MongoDB and PostgreSQL (I had hoped early on that RStudio might make it in there too, but that was over ambitious!;-) It’s not surprising that some of these apps also appear on the ResBaz Cloud.
Jupyter, OpenRefine and RStudio share the common feature of presenting graphical user interfaces via a browser. MongoDB and PostgreSQL, on the other hand, along with services like the Apache Tika Document Text Extraction Service, provide “headless” services via an http port. Which is to say – they work over the web, and, if appropriate, they can be accessed via a browser.
So here’s what I want, what I think I really, really want: an online application hosting provider. Reclaim lets me do the web social and web publishing stuff, but at the moment I can’t 1-click install my web-runnable course software there. Nor can I easily share my own “scholarly app” creations: for example, I could pay $9 a month to host shiny apps I’ve built in RStudio on ShinyApps.io, but if I just wanted to share a little something with my friends that I’d built on a course for a day or two, that would probably be overkill compared to hosting it briefly on my own site. If I’d built a cool Jupyter notebook and wanted to let you have a play with it, I could share the notebook file with you and you could then download it and run it on your own notebook server, assuming you know how to do that, but it might also be nice if you could 1-click launch an interactive version of it on my site. (Actually, there is a string’n’glue solution to this: I could pop the notebook onto github and then run it via binder.)
So looking around for bits of stick’n’string’n’glue that could perhaps be glued together to let me do this, what I quite like is to have my own online, course-app running StrinGLE (remember StringLE…? A learning environment, made from string’n’glue, where you could actually do stuff as well as be given stuff to read…).
On the one hand, the social webhosting side, I’d have my webhosting apps cPanel; on the other, to meet my course related scientific computing needs, I;d have something like Kitematic:
Note there may be some overlap in the applications. More what I’m thinking about are uses cases where the applications operate on a different timescale. The web hosting apps I start once and they run for ever. I want my blog to be there all the time, and I want my email to work all the time. The personal apps are more like applications that only run when I’m using them: RStudio, or a Jupyter notebook. That is, I start them/launch them when I want to use them, then shut them down when I’ve done, ideally persisting any files I’ve been working on somewhere until the next time I want to use the application. Containers are ideal for this because you can start them when you need them, then throw them away when your study session is done.
So that’s one take – a Kitematic complement to cPanel that lets me fire up applications for short term use, whilst persisting my files in a storage area part of my online hosting, which is perhaps even synched to something like Dropbox.
Here’s another take – imagine what this might mean…:
In this case, imagine that the binder button takes an image on a dockerhub and launches it via my web host. So the binder button takes me to a central clearing house where I have an authenticated account that I’ve configured with details of my web host. Clicking the binder button says to the binder server: “authenticated person X wants to run a container based on image Y”, and the binder server looks up my host details, and fires up a container there, with a URL as a subdomain of my domain.
I could imagine something like Tutum – recently acquired by docker – being able to support something like this: from Tutum, I can currently start up servers (droplets) on a third party host (I use Digital Ocean for this), and then deploy containers from dockerhub on those servers. At the moment it takes a few clicks in Tutum to set up the various settings and start the servers, but it could perhaps all be streamlined in to a few setup screens for the first time I launch a container application, and the parameters saved to a config file that could be used by default on future starts of the same application? So a tutum button, rather than a binder button, on dockerhub perhaps?
As to security, I think that running arbitrary containers fills IT folk with security dread, so it may make more sense to only support containers based on images held in a trusted repository, such as an institutional repository. This does put something of an application gatekeeper role back on the institution, but the institution could be a trusted commercial or community partner. (I wonder: is there support for trusted/signed docker images?)
As to how achievable this is – I wish I had time to explore and play with the Tutum API a little! In the meantime, Jim mentioned the rather intriguing sounding sandstorm.io:
What this seems to be is an app store where the apps are Linux virtual machines, packaged using vagrant…: Sandstorm application packaging.
From a quick peek, it seems that a Sandstorm application is a Linux image built up from a Sandstorm base image and a set of user defined shell scripts. (UPDATE: for a description of how the Sandstorm.io approach differs from docker, see Why doesn’t Sandstorm just run Docker apps?) Rather than running a single application within a single container, and then linking containers to make application compositions, it looks as if Sandstorm containers may run several applications that talk to each other within the container? State can also be persisted, so whilst application running containers are destroyed if you close a browser session running against the container, the state is recoverable if you launch another container from the same image. Which means that the Sandstorm folk have got the user-authentication thing sussed? (Sandstorm know I’m me. When I fire up a Jupyter container, they can link it to my stash of notebook files.) Hmm…
My TM351 VM build files are based on puppet – with a few shell scripts – orchestrated by vagrant. I wonder how hard it would be to create a version of the TM351 virtual machine that could be deployed via Sandstorm? Hmm…
PS Hmm.. it seems that a “Deploy to Tutum” button already exists (h/t @borja_burgos), though I’ve not had time to look at this properly yet… Exciting:-)
PPS and via @tutumcloud, stackfiles.io – a bit like Panamax compositions, deployable via Tutum… Thinks: so I should be able to do a stack for TM351 linked containers…:-)