One of the challenges that faces the distance educator, indeed, any educator, in delivering computing related activities is how to provide students with an environment in which they can complete practical computing related teaching and learning activities.
Simply getting the student to a place where the code you want them to work on, and run, is far from trivial.
In a recent post on Creating gentle introductions to coding for journalists… (which for history of ideas folk, and my own narrative development timeline, appeared sometime after most of this post was drafted, but contextualises it nicely), journalism educator Andy (@digidickinson) Dickinson describes how in teaching MA students a little bit of Python he wanted to:
– Avoid where possible, the debates – Should journalists learn to code? Anyone?
– Avoid where possible too much jargon – Is this actually coding or programming or just html
– Avoid the issue of installing development environments – “We’ll do an easy intro but, first lets install R/python/homebrew/jupyter/anaconda…etc.etc.”
– Not put people off – fingers crossed
The post describes how he tried to show a natural equivalence between, and progression from, from Excel formulas to Python code (see my post from yesterday on Diagrams as Graphs, and an Aside on Reading Equations which was in part inspired by that sentiment).
But that’s not what I want to draw on here.
What I do want to draw on is this:
The equation `Tech + Journalists=` is one you don’t need any coding experience to solve. The answer is stress.
Experience has taught me that as soon as you add tech to the mix, you can guarantee that one person will have a screen that looks different or an app that doesn’t work. Things get more complicated when you want people to play and experiment beyond the classroom. Apps that don’t install; or draconian security permissions are only the start. Some of this stuff is quite hardcore for a user who’s never used notepad before let alone fired up the command prompt. All of this can be the hurdle that most people fall at. It can sap your motivation.
Andy declares a preference for Anaconda, but I think that is… I prefer alternatives. Like Docker. This is my latest attempt at explaining why: This is What I Keep Trying to Say….
Docker is also like a friendly way in to the idea of infinite interns.
I first came across this idea — of infinite interns — from @datamineruk (aka Nicola Hughes), developed, I think, in association with Daithí Ó Crualaoich (@twtrdaithi, and by the looks of his Twitter stream, fellow Malta fan:-)
As an idea, I can’t think of anything that has had a deeper or more profound effect on my thinking as regards virtual computing than infinite interns.
Here’s how the concept was originally described, in a blog post that I think is now only viewable via the Internet Archive Wayback Machine — DataMinerUK: What I Do And How:
I specialise in the backend of data journalism: investigations. I work to be the primary source of a story, having found it in data. As such my skills lean less towards design and JavaScript and more towards scraping, databases and statistics.
…
I work in a virtual world. Literally. The only software I have installed on my machine are VirtualBox and Vagrant. I create a virtual machine inside my machine. I have blueprints for many virtual machines. Each machine has a different function i.e. a different piece of software installed. So to perform a function such as fetching the data or cleaning it or analysing it, I have a brand new environment which can be recreated on any computer.
I call these environments “Infinite Interns“. In order to help journalists see the possibilities of what I do, I tell then to think about what they could accomplish if they had an infinite amount of interns. Because that’s what code is. Here are a couple of slides about my Infinite Interns system:
And here are the slides, used without permission…
Let’s go back to Andy…
There are always going to be snags and, by the time we get to importing libs like pandas [a Python package for working with tabular data], things are going to get complicated – it’s unavoidable. But if the students come away knowing that code isn’t tricky at least in principle, that at a low level the basic structures and ideas are pretty simple and there’s plenty of support out there. Well, that’ll be a win. Fingers crossed.
What you really need is an infinite intern…
Which is to say, what you really need is an easy way to tell students how to set up their computing environment.
Which is to say, you really need an easy way for students to tell their computers what sort of environment they’d like to work in.
Want a minimal Jupyter notebook?
docker run --rm -p 8877:8888 -e JUPYTER_TOKEN=letmein jupyter/minimal-notebook
and look to http://localhost:8877
then login with token letmein
.
Need a scipy stack in there? Use a different intern…
docker run --rm -p 8877:8888 -e JUPYTER_TOKEN=letmein jupyter/scipy-notebook
And so on…
And if you can’t install Docker on your machine, you can still run (notebook running) containers in the cloud: for example, Running a Minimal OU Customised Personal Jupyter Notebook Server on Digital Ocean.
There’s also tooling to build containers from build specs in Github repos, such as repo2docker
. This tool can automatically add in a notebook server for you. That same application is used to build containers that run on the cloud from a Github repo, at a single click: MyBinder (docs).
What this shows, though, is that installing software actually masks a series of issues.
If a student, or a data journalist, is on a low spec computer, or a computer that doesn’t let you install desktop software applications, or a computer that has a different operating system than the one required by the application you want to run, what are you to do?
What is the problem we are actually trying to solve?
I see the computing environment as made up of three components (PLC):
- a physical component;
- a logical component;
- a cultural component.
The Physical Component
The physical component, (physical environment, or physical layer) corresponds to the physical (hardware) resource(s) required to run an activity. This might be a student’s own computer or it might be a remote server. It might include the requirement for a network connection with minimum bandwidth or latency properties. The physical resource maps onto the “compute, storage and network” requirements that must be satisfied in order to complete any given activity.
In some respects, we might be able to abstract completely away from the physical. If I am happy running a “disposable” application where I don’t need to save any files for use later, I can fire up a server, run some code, kill the server.
But if I want to save the files for use an arbitrary amount of time later, I need some persistent physical storage somewhere where I can put those files, and from where I can retrieve them when I need them. Persistence of files is one of the big issues we face when trying to think of how best to support our distance education students. Storage can be problematic.
How individuals connect to resources is another issue. This is the network component. If a student has a low powered computer (poor compute resource) we may need to offer them access to a more powerful remote service. But that requires a network connection. Depending on where files are stored, there are two network considerations we need to make: how does a student access files to edit them, and how do files get to compute so they can be processed.
The Logical Component
The logical component (logical layer; logical environment) might also be referred to as the computational environment. This includes operating system dependencies (for example, the requirement for a particular operating system), application or operating system dependencies (for example, we might require a particular application such as Scratch to be available, or a particular operating system package dependency that is required by a programming language package), programming language dependencies (for example, in a Python environment we might require a particular version of pandas
to be installed, or a particular version of Java).
The Cultural Component
The cultural component (cultural layer; cultural environment) incorporates elements of the user environment and workflow. At one extreme, the adoption of a particular programming editor is an example of a cultural component (the choice of editor may actually be irrelevant as far as the teaching except insofar a student needs access to a code editor, not any particular code editor). The workflow element is more complex, covering workflows in both abstract terms (eg using a test driven approach, or using a code differencing and checkin management process) as well as practical terms (for example, using git and Github, or a particular testing framework).
For example, you could imagine a software design project activity in a computing course that instructs students to use a test driven approach and code versioning, but not specify the test framework, version control environment, or even programming language / computational environment.
This cultural element is one that we often ignore when it comes to HE, expecting students to just “pick up” tools and workflows, and one whose deficit makes graduates less than useful when it comes to actually doing some work when they do graduate. It’s also one that is hard to change in an organisation, and one that is hard to change at a personal level.
If you’ve tried getting a new technology into a course created by a course team, and / or into your organisation, you’ll know that one of the biggest blockers is the current culture. Adopting a new technology is really hard because if it really is new, it will lead to, may even require, new workflows — new cultures — for many, indeed any, of the benefits to reveal themselves.
Platform Independent Software Distribution – Physical Layer Agnosticism
Reflecting on the various ways in which we provide computing environments for distance education students on computing courses, one of my motivations is to package computational support for our educational materials in a way that is agnostic to the physical component. Ideally, we should be able to define a single logical environment that can be used across a wide range of physical environments.
Virtualisation has a role to play here: if we package software in virtualised environments, we have great flexibility when it comes to where the virtual machine physically runs. It could be on the student’s own computer, it could be on an OU server, it could be on a “bring your own server” basis.
Virtualisation essentially allows us to abstract away from much of the physical layer considerations because we can always look to provide alternative physical environments on which to run the same logical environment.
However, in looking for alternatives, we need to be mindful that (compute, storage, network) triple provides a set of multi-objective constraints that need to be satisfied and that may lead to certain trade-offs between them being required.
This is particularly true when we think of extrema, such as large data files (large amount of storage and/or large amount of bandwidth/network connectivity) and/or processes that require large amounts of computation (these may be associated with large amounts of data, or they may not; an example of the latter might be running a set of complex equations over multiple iterations, for example).
My preference is also that we should be distributing software environments and services that also allow students to explore, and even bring to bear, their own cultural components (for example, their favourite editor). I’ll have more to say about that in a future post…
Related: Fragment – Programming Privilege. See also This is What I Keep Trying to Say… where a few very short lines of configuration code let me combine / assemble pre-existing packages in new and powerful ways, without really having to understand anything about how the pieces themselves actually work.
This is a really thoughtful bit of contextualisation. I particually like the physical, logical and cultural component idea- the way those overlap and interact is something to ponder and chimes in an abstract kind of way around systems models (someone introduced me to Csikszentmihalyi’s systems model of Creativity. Blame them!) I’ve recently been getting aquatinted with docker. I’ve relied on vm’s for a while but recently played with Worbench and Datashare – both of which use docker in one way or another. It got me back into the mix of it. I like the virtual interns idea as well but I wonder how we get people to the step where “docker run –rm -p 8877:8888 -e JUPYTER_TOKEN=letmein jupyter/minimal-notebook” is something they are comfortable with – the bit where we build the HR department to hire the interns :-) I’m not saying its not valid to start there though, Workbench and Datashare amongst others show that if you want to be part data journalism, you need to move fast, the tech is moving on a pace. That’s were its comes back to the cultural, we are in that really vibrant and innovative part of sharing cultures (code and journalism) and the results are fascinating and seductive to tech and journo alike. Anyhoo. I’m not sure there is a point here, but theres plenty for me to ponder on. Ta!
Hi Andy,
Thanks for that comment…
Re: the ‘docker …’ command, what I’d quite like to see is a ‘digital application shelf’ in the library, or part of the institutional repo, where I can 1-click a ‘launch in institutional binder’ button that will build an iamage against a repo, launch a container instance from it, and let the user run the contents of it.
Libraries were the part of the institution that took ownership of open access repositories as well as advising on data management policies and practices, and I would like to see them also taking the lead in the next step of research openness: providing environments whereby patrons can execute code appendices to papers. (Many researchers promote their papers with associated notebooks that contain the analysis, but there are only one or two institutions that also provide computational support in the form of Binderhubs to actually run them. Which is not fair on the core Binderhub project which runs the MyBinder service.
As well as temporary, run only environments, I think institutions should also, as a matter of course, be offering infrastructure that allows users to launch containers on an institutional cloud, or negotiate credits with cloud providers (I have found Digital Ocean to provide the easiest self-service server route). Services such as Zeit now show how easy this can be ( https://blog.ouseful.info/2018/08/06/publish-static-websites-docker-containers-or-node-js-apps-just-by-typing-now/ ).
Such a service would provide access to general purpose computing. For access to a more limited range of computing environments, services such as Jupyterhub allow an administrator to provide a scaleable service that offers persistent (through linked data volumes) authenticated access to a user selectable predefined container build (Jupyterhub can present users with a list of containers from which they can choose which environment they want to launch).
I realise a lot of my posts are based around Jupyter ecosystem tech and this may not be the best, or only, route. But it’s one I’m familiar with and one whereby how I can see how various pieces can fit together (Jupyter is not just notebooks; it’s a family of protocols that support a compelling range of standalone services that can also be combined in effective ways. It also has traction, for now at least. And, seemingly, buy in from various corporates, although others, such as Google, seem to be going their own way through things like Colab).
“Re: the ‘docker …’ command, what I’d quite like to see is a ‘digital application shelf’ in the library, or part of the institutional repo, where I can 1-click a ‘launch in institutional binder’ button that will build an image against a repo, launch a container instance from it, and let the user run the contents of it.” – YES! That please.
I’m really enjoying the Jupyter thing – as a trainer its spot on. I can also see the value of docker though. That clean slate to start everyone off and also build on. I think the colab vs. azure etc. (which you helpfully blogged on) is a space to watch but positive in the sense of accessing things as are platforms like repl.it