The Long Road From Proof of Concept / Quick Demo Through Reference Architecture to Production System

I tinker at the level of proof of concept, playful demo and half hour hack (when I try things out, it’s my intention that I should be able to make some good progress and get something running in half an hour. It may end up taking an hour, a couple of hours, half a day, even a couple of days if I get really obsessed/frustrated and think it’s worth spending that extra time (?!) on, but the initial intention typically is: could I get something working to proof of concept level quickly?

As I’ve written before, one reason is funnels: if it takes 3 weeks to try something out, not many people will get to try it out. If you see something new in a tweet and it takes ten minutes to try, you might. And from that, whatever the thing is might get traction more widely if within that 10 minutes you see enough promise to want to spend more time on the thing. Or it might just help you on a temporary problem, and you can use it, move on, drop it, perhaps remembering it as yet another of those weirdly shaped screwdrivers that only fits very peculiarly headed screws, but it useful for them nonetheless.

Through trying lots of things of things out you also get a feel for what’s new, what’s interesting, what’s more of the same, what’s actually different. Downes knows this too…

So, playful demos. I spent a chunk of time last night trying to launch an OpenRefine container directly from Jupyterhub using a Dockerspawner. (It didn’t work.) My thinking is that being able to launch arbitrary containers from behind Jupyterhub means that have-a-go educators could co-opt Jupyterhub as a multi-user front end to launch anything in a container that returns something on port 8888. (I’m still not sure what Jupyterhub Dockerspawner requires of a container it launches (is it just an http response on port 8888?) or what it sends to the container when it tries to launch it (does it send a command to append to a ENTRYPOINT? does it send environment variables in? If you can point me to docs, transparent debug examples/logs, that’d be much appreciated).

I’ve not really used Jupyterhub and didn’t want to use The Littlest Jupyter Hub (although I guess you can change that to use Dockerspawner? Hmmm… Bah…), so it also provided an opportunity for me to find a (quick) way of firing up Jupyterhub servers.

I ended up following the recipe for the simple example in the jupyterhub/dockerspawner repo. It comes with these caveats:

This is a simple example of running jupyterhub in a docker container.

This shows the very basics of running the Hub in a docker container (mainly setting up the network). To run for real, you will want to:

– …

jupyterhub-deploy-docker does all of these things.

So: enough to get up and running, no more than that… That’s the level I tend work at.

One of the nice things about Juptyer ecosystem is that I can get started at this quick level, and produce containers that can be launched by production systems that work easily way with my quick local demo. I might even be able to tinker around with Jupyterhub customisation, tweaking style templates and so on to explore different ways of customising the presentation which might also be relevant to the final production system.

The jupyterhub/jupyterhub-deploy-docker setup, which provides a [r]eference deployment of JupyterHub with docker goes a bit further than I need for simple personal testing / proof of concept and requires more investment in setup time.

As a reference deployment, the README suggests use cases include (but are not necessarily limited to):

  • creating a JupyterHub demo environment that you can spin up relatively quickly.
  • providing a multi-user Jupyter Notebook environment for small classes, teams, or departments.

The reference deployment is useful for me because it provides a logical diagram /  architectural example showing what other things need to be considered for a production system rather than a plaything, even if the reference deployment does not demonstrate them at production strength.

(Note to self: it would be useful to annotate the reference deployment with commentary about why each piece is there and what sorts of criteria you might bring to bear when deciding one way of implementing it versus another.)

It also comes with a disclaimer:

This deployment is NOT intended for a production environment. It is a reference implementation that does not meet traditional requirements in terms of availability nor scalability.

If you are looking for a more robust solution to host JupyterHub, or you require scaling beyond a single host, please check out the excellent zero-to-jupyterhub-k8s project.

(It might also be worth noting that for a small scale production use-case, The Littlest JupyterHub (TLJH) [jupyterhub/the-littlest-jupyterhub], a “[s]imple JupyterHub distribution for 1-100 users on a single server” might also be appropriate?)

The Zero to JupyterHub with Kubernetes [jupyterhub/zero-to-jupyterhub-k8s] deployment adds complexity further, providing a comprehensive set of “[r]esources for deploying JupyterHub to a Kubernetes Cluster” (the docs are actually targeted to Google Kubernetes Engine, but we (well, not me, obvs..;-) managed to use them to bootstrap an Azure install). This is moving into production territory now (we use this for our TM112 disposable notebook optional activity), although by following the instructions, if you have a couple of hours, or perhaps half a day, to start with (rather than half an hour to start with…) plus access to a Kubernetes cluster, you can still give it a spin. (I tried last year to get it running with a local k8s cluster running via Docker on my local machine, but couldn’t get it to work at the time. It may be worth trying this again now, and finding, or posting a recipe, for doing this…)

A large part of my frustration in working at the OU arises from not being able to explore technology ideas more rapidly. It’s easy to be quick at the proof of concept level, harder to get things into production. I know that. But things like the Jupyter ecosystem provide an opportunity for end-user-development in one part of the ecosystem (eg within a container launched by Dockerspawner, or within a notebook via notebook extensions) whilst another part gets the production side right. Or even just facilitates the playfulness.

For example, yesterday I spotted this spacy-course [repo]. If you havenlt come across it, spacy is a really powerful, easy to use, natural language processing library.

The course is split into chapters, with sections in chapters and pages in sections.

Some of the sections are slide displays, with central teaching points and commentary on the side. (Methinks it should be easy enough to add an audio player to read the script on the side, which could be quite interesting?)

Other sections, containing practical activities, are arranged as collapsible elements.

The course supports code execution using MyBinder (it looks like it uses juniper.js to manage this; I wonder how easy it would be to use Voila instead?):

From looking at the repo, the course seems to have been around some time, so now I’m wondering why it took me so long to find it?!

[Prompted by @betatim, it seems there’s a backstory: the course was on DataCamp, but the course developer, @_inesmontani, got frustrated with that provider and instead “wanted to make a free version of my spaCy course so you don’t have to sign up for their service – and ended up building my own interactive app. Powered by the awesome @mybinderteam & @gatsbyjs” What’s more, “[t]he app and framework are 100% open-source and based on Markdown + custom elements. I built it for my content, but if you want to use it to publish your own DIY online course…” By the by, for a course revision, we’re looking at ways we can take all the course content out of the VLE and deliver it via our Jupyter fronted VM… There are three main reasons for this: 1) students should be allowed to take away a copy of the course materials, not just be given access to them for the duration of the course and a couple of years after; 2) getting errata addressed is a nightmare with the current document workflow — the version controlled, issue tracked, workflow we’re trying to work to improves this; 3) we’re interested in exploring how to present the course material in a more structured, searchable and interactive / interesting way. I really take heart from this spacy course example…]

I’m not sure how the content was created. If there’s a transform from Jupyter notebooks into this course format (perhaps using Jupytext, or a Jupyter Book style production route?), that could be really interesting… (At least, to me…. [REDACTED SNARK].)

If you want to try it yourself, Ines has put together this forkable [s]tarter repo for building interactive Python courses.

When it comes to production systems, end user development like this is perhaps part of the problem, though? Production systems folk don’t want end users producing things…?

PS Yes and no to that…paraphrasing something else I saw yesterday, I tend to assume excellence, and tend to only provide negative feedback. A lot of my commentary tends to be more neutral — X does this; I had to do Y then Z to get that to work; etc. As a rule of thumb, I only comment on public activities that I come across and I don’t comment on things that are only discoverable behind authentication.

On the other hand, Tracking Jupyter is a personal experiment into finding a way of providing synoptic feedback about an open system. That that community is open, and that a large number of the activities carried out within it are transparent and discoverable, makes such feedback possible.

Sometimes, my commentary comes with added snark in my personal comms channels (social media, this blog). Which is part of the point. That, and the f****g swearing, are deliberately used to limit the readership, and the willingness of people to link to the content (it’s inappropriate; not properAcademic). And they’re channels where I vent frustration.

I know how to maintain Chinese Walls. Contrary to what folk may think, I don’t blog everything. A lot of stuff that appears in this blog is only here because I can’t find anyone to engage in discussion about it internally, despite trying… And a lot of stuff doesn’t appear. (Not as much as didn’t used to appear, though, back when folk did used to talk to me…)

PPS This sort of personal comment is also, in part, a device to limit linking. Plus the blog is my personal notebook, and as such, is what it is…

;-)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.