When we put together the original TM351 VM, we wanted a single, self-contained installable environment capable of running all the services required to complete the practical activities defined for the course. We also had a vision that the services should be capable of being accessed remotely.
With a bit of luck, we’ll have access to an OU OpenStack environment any time soon that will let us start experimenting with a remote / online VM provision, at least for a controlled number of students. But if we knew that a particular cohort of students were only ever going to access the services remotely, would a VM be the best solution?
For example, the services we run are:
- Jupyter notebooks
- OpenRefine
- PostgreSQL
- MongoDB
Jupyter notebooks could be served via a single Jupyter Hub instance, albeit with persistence enable on individual accounts so students could save their own notebooks.
Access to PostgreSQL could be provided via a single Postgres DB with students logging in under their own accounts and accessing their own schema.
Similarly – presumably? – for MongoDB (individual user accounts accessing individual databases). We might need to think about something different for the sharded Mongo activity, such as a containerised solution (which could also provide an opportunity to bring the network partitioning activity I started to sketch out way back when).
OpenRefine would require some sort of machinery to fire up an OpenRefine container on demand, perhaps with a linked persistent data volume. It would be nice if we could use Binderhub for that, or perhaps DIT4C style infrastructure…