Fragment: Bring Your Own Infrastructure (BYOI)

Over the years I posted various fragmentary thoughts on delivering software to students in BYOD (bring your own device) environments (eg Distributing Software to Students in a BYOD Environment from 5 years ago,  BYOA (Bring Your Own Application) – Running Containerised Applications on the Desktop from 3 years ago, or Rethinking: Distance Education === Bring Your Own Device? yesterday).

Adding a couple more pieces to the jigsaw, today I notice this Coding Enviroment Landing Page at the University of Colorado:

The environment appears to be a JupyterHub environment bundled with VSCode inside using the jupyter_codeserver_proxy extension and the draw.io picture editor bundled as a JupyterLab extension.

Advice is also given on running arbitrary, proxied web apps within a user session using Jupyter server proxy (Proxying Web Applications). This is a great example of one of the points of contention I have with Jim Groom, “Domain of Your Own” evangelist, and that I’ve tried to articulate over the years (not necessarily very successfully), several times previously (eg in Cloudron – Self-Hosted Docker / Containerised Apps (But Still Not a Personal Application Server?) or Publish Static Websites, Docker Containers or Node.js Apps Just by Typing: now): in particular, the desire to (create,) launch and run applications on a temporary per session basis (during a study session, for the specific purposes of launching and reading an interactive paper in a “serverless” way, etc).

The Colorado example is a really nice example of a simple multi-user environment that can be used to support student computing with an intelligent selection of tools bundled inside the container. (I’m guessing increasing numbers of universities offer similar services. Anyone got additional examples?)

Another jigsaw piece comes in the form of eduID, a federated Swedish identity service that students can use to sign in to their university services, whichever university they attend. One advantage of this is that you can create an identity when you start a university application process, and retain that identity throughout an HE career, even if you switch institution (for example, attending one as an undergrad, another as a postgrad). The eduID can also be linked to an Orcid ID, am international identifier shceme used to identify academic researchers.

What  eduID does then, is provide you with an identity that can be registered with an HE provider and used to access that HEI’s services. Your identity is granted, and grants you, access to their services.

So. Domain of Your Own. Hmmm… (I’ve been here before…) Distance education students, and even students in traditional universities, often study on a “bring your own device” basis. But what if that was an “Infrastructure of Your Own” basis? What would that look like?

I can imagine infrastructure being provide in various ways. For example:

  1. identity: a bring-your-own-identity service such as eduID;

  2. storage: I give the institution access to my Dropbox account or Google Drive account or Microsoft Live Onebox, or something like a personal SparkleShare or Nextcloud server; when I load a personal context on an institutional service, if there is a personal user file area linked to it, it synchs to my remote linked storage;

  3. compute: if I need to install and run software as part of my course, I might normally be expected to install it on my own computer. But what if my computer is a spun-up-on-demand server in the cloud?

(It may also be worth trying to compare those to the levels I sketched out in a fragment from a year ago in Some Rambling Thoughts on Computing Environments in Education.)

I’m absolutely convinced that all the pieces are out there to support a simple web UI that would let me log-in to and launch temporary services on on-demand servers (remote or local) and link and persist files I  was working on using those services to a personal storage server somewhere. And that all it would take is some UI string’n’glue to pull them together.

PS Security wise, something like Tailscale (via @simonw) looks interesting for trying to establish personal private networks between personally hosted services.

PPS anyone else remember PLEs (personal learning environments) and distributed, DIY networked service oriented architecture ideas that were floating around a decade or so ago?

On Stats, and “The Thing”…

Although I work from home pretty much all of the time anyway, I’m rubbish in online meetings and tend to get distracted, often by going on a quick web trawl for things related to whatever happens to be in mind just as the meeting starts…

So for example, yesterday, I started off wondering about mortality stats relating to “the thing”. Public Health England publish a dashboard with some numbers and charts on it but it’s really hard to know what to make of the numbers. You can also get COVID-19 Daily Deaths from NHS England reported against the following constraints:

All deaths are recorded against the date of death rather than the date the deaths were announced. Interpretation of the figures should take into account the fact that totals by date of death, particularly for most recent days, are likely to be updated in future releases. For example as deaths are confirmed as testing positive for COVID-19, as more post-mortem tests are processed and data from them are validated. Any changes are made clear in the daily files.

These figures will be updated at 2pm each day and include confirmed cases reported at 5pm the previous day. Confirmation of COVID-19 diagnosis, death notification and reporting in central figures can take up to several days and the hospitals providing the data are under significant operational pressure. This means that the totals reported at 5pm on each day may not include all deaths that occurred on that day or on recent prior days.

These figures do not include deaths outside hospital, such as those in care homes. This approach makes it possible to compile deaths data on a daily basis using up to date figures.

If the conditions for adding counts to the tally for covid deaths are people who test positive post mortem, then the asymptomatic boy racer who kills themself in a motorbike accident whilst making use of open roads will count, but that’s not really the sort of number we’re interested in.

The ONS (Official of National Statistics) look like they’re trying to capture the number of deaths where “the thing” is the likely cause of death (Counting deaths involving the coronavirus (COVID-19)), adding a Covid19 tab to the weekly provisional mortality stats release along with a related bulletin and covid19 deaths breakout collection:

I don’t think these stats are even labelled as “experimental”, so from my humble position I think the ONS should be commended on the way they’ve managed to pull this new release together so quickly, albeit with a lag in the numbers that results from due process around death certificate registration etc.

One thing that is perhaps unfortunate is that the NHS weekly winter sitreps stopped a few weeks ago; these stats track several hospital and critical care related measures at the hospital level, but they’re only released a few months a year. Whilst continuing the release would have added a burden, I think a lot of planners may have found them useful. (I hope the planners who really need them have access to them anyway.) By the by, some daily data collections relating to managing “the thing” were described in a letter from the PHE Incident Director at Public Health England’s National Infection Service on March 11th.

I also note the suspension of various primary care and secondary care data collections, which means that spotting various side effects of the emergency response activity may not be obvious for some time.

As far as the ONS go, they have published their own statement on ensuring the best possible information during COVID-19 through safe data collection as well as a special ONS — Coronavirus (COVID-19) landing page for all related datasets they are able to release.

[April 2nd, 2020: the ONS have also started publishing several more “faster” society and economic indicators.]

Pondering the ONS response, I started wondering whether there’s a history anywhere of the genesis and evolution of each ONS statistical measure. In my “listen to the online meeting as radio while doing an unrelated web trawl”, I turned up a few potential starting points relating the the history of official stats, including this presented paper on the evolution of the United Kingdom statistical system, which looks like it might have appeared in published form in this (special issue?) of the Statistical Journal of the IAOS – Volume 24, issue 1,2 .

The Royal Statistical Society’s StatsLife online magazine also has a History of Statistics Section tag which pulls up more possibly useful starting points…

Broadening my search a little, I also found this briefing on sources of historical statistics from one of my favourite sources ever, the House of Commons Library; in turn this led to a more general web search on "sources of statistics" site:commonslibrary.parliament.uk/research-briefings which turns up a wealth of briefings on sourcing UK stats in particular subject areas.

And finally, for anyone out there who does have proper skills in the area and the ability to commit resource, there are various initiatives out there looking for volunteers. In particular:

Others with less specific or specialist skill might consider many of the other opportunities for technical sprints in communities that might typically lack access to cognitive surplus in developer communities. For example, this initiative exploring and developing Digital Tools for churches during the Coronavirus. (Don’t let “churches” put you off: for “church” read “body of people” or “community” and go from there…)

Rethinking: Distance Education === Bring Your Own Device?

In passing, an observation…

Many OU modules require students to provide their own computer subject to a university wide “minimum computer specificiation” policy. This policy is a cross-platform one (students can run Windows, Mac or Linux machines) but does allow students to run quite old versions of operating systems. Because some courses require students to install desktop software applications, this also means that tablets and netbooks (eg Chromebooks) do not pass muster.

On the module I work primarily on, we supply students with a virtual machine preconfigured to meet the needs of the course. The virtual machine runs on a cross-platform application (Virtualbox) and will run on a min spec machine, although there is a hefty disk space requirement: 15GB of free space required to install and run the VM (plus another 15-20GB you should always have free anyway if you want your computer to keep running properly, be able to install updates etc.)

Part of the disk overhead comes from another application we require students to use called vagrant. This is a “provisioner” application that manages the operation of the VirtualBox virtual machine from a script we provide to students. The vagrant application caches the raw image of the VM we distribute so that fresh new instances of it can be created. (This means students can throw away the wokring copy of their VM and create a fresh one if they break things; trust me, in distance edu, this is often the best fix.)

One of the reasons why we (that is, I…) took the vagrant route for managing the VM was that it provided a route to ship VM updates to students, if required: just provide them with a new Vagrantfile (a simple text file) that is used to manage the VM and add in an update routine to it. (In four years of running the course, we havenlt actually done this…)

Another reason for using Vagrant was that it provides an abstraction layer between starting and stopping the virtual machine (via a simple commandline command such as vagrant up, or desktop shortcut that runs a similar command) and the virtual machine application that runs the virtual machine. In our case, vagrant instructs Virtualbox running on the student’s own computer, but we can also create Vagrantfiles that allow students to launch the VM on a remote host if they have credentials (and credit…) for that remote host. For example, the VM could be run on Amazon Web Services/AWS, Microsoft Azure, Google Cloud, Linode, or Digital Ocean. Or on an OU host, if we had one.

For the next presentation of the module, I am looking to move away from the Virtualbox VM and move the VM into a Docker container†. Docker offers an abstraction layer in much the same way that vagrant does, but using a different virtualisation model. Specifically, a simple Docker command can be used to launch a Dockerised VM on a student’s own computer, or on a remote host (AWS, Azure, Google Cloud, Digital Ocean, etc.)

We could use separate linked Docker containers for each service used in the course — Jupyter notebooks, PostgreSQL, MongoDB, OpenRefine — or we could use a monolithic container that includes all the services. There are advantages and disadvantages to each that I really do need to set down on paper/in a blog post at some point…

So how does this help in distance education?

I’ve already mentioned that we require students to provide a certain minimum specification computer, but for some courses, this hampers the activities we can engage students in. For example, in our databases course, giving students access to a large database running on their own computer may not be possible; for an upcoming machine learning course, access to a GPU is highly desirable for anything other than really simple training examples; in an updated introductory robotics module, using realistic 3D robot simulators for even simple demos requires access to a gamer level (GPU supported) computer.

In a traditional university, physical access to computers and computer labs running pre-installed, university licensed software packages on machines capable of providing them for students who can’t run the same on their own machines may be available.

In my (distance learning) institution, access to university hosted software is not the norm: students are expected to provide their own computer hardware (at least to minimum spec level) and install the software on it themselves (albeit software we provide, and software that we often build installers for, at least for users of Windows machines).

What we don’t do, however, is either train students in how to provision their own remote servers, or provide software to them that can easily be provisioned on remote servers. (I noted above that our vagrant manager could be used to deploy VMs to remote servers, and I did produce demo Vagrantfiles to support this, but it went no further than that.)

This has made me realise that we make OUr distance learning students pretty much wholly responsible for meeting any computational needs we require of them, whilst at the same time not helping them develop skills that allow that them to avail themselves of self-service, affordable, metered remote computation-on-tap (albeit with the constraint of requiring a netwrok connection to access the remote service).

So what I’m thinking now is that now really is the time to start upskilling OUr distance learners, at least in disciplines that are computationally related, early on and in the following ways:

  1. a nice to have — provide some academic background: teach students about what virtualisation is;

  2. an essential skill, but with a really low floor — technical skills training: show students how to launch virtual servers of their own.

We should also make software available that is packaged in a way that the same environment can be run locally or remotely.

Another nice to have might be helping students reason about personal economic consequences, such as the affordability of different approaches in their local situation, which is to say: buying a computer and running things locally vs. buying something that can run a browser and run things remotely over a network connection.

As much as anything, this is about real platform independence, being open as to, and agnostic of, what physical compute device a student has available at home (whether it’s a gamer spec desktop computer or a bottom of the range Chromebook) and providing them with both software packages that really can run anywhere and the tools and skills to help students run them anywhere.

In many respects, using abstraction layer provisioning tools like vagrant and Docker, the skills to run software remotely are the same as running them locally, with the additional overhead that students have a once only requirement to sign up to a remote host and set up credentials that allow them to access the remote service from the provisioner service that runs on their local machine.

Simple 2D ev3devsim Javascript Simulator Running as an ipywidget in Jupyter Notebooks

So…

…for a course revision upcoming, I’ve been tweaking a thing.

The thing is ev3devsim [repo], a Javascript powered 2 robot simulator that allows you to execute Python code, via Skulpt, in the browser to control a simple simulated robot.

The Python package used to control the robot is a skulpt port of ev3dev-lang-python, a Python wrapper for the ev3dev Linux distribution for Lego Ev3 robots. (Long time readers may recall I explored ev3dev for use in an OU residential school way back when and posted a few related notebooks.)

Anyway… we want to use Python in the module revision, the legacy activities we want to update look similar, ish, sort of, almost, we may be able to use some of them, and I want to do the activities via Jupyter notebooks.

So I’ve had a poke around and think I’ve managed to make the fumblings of a start around an ipywidget wrapper for the simulator that will allows us to embed it in a notebook.

Because I don’t understand ipywidgets at all, I’m using the jp_proxy_widget, which I first played with in the context of wrapping wavesurfer.js (Rapid ipywidgets Prototyping Using Third Party Javascript Packages in Jupyter Notebooks With jp_proxy_widget).

Here’s where I’m at [nbev3devsim; Binder demo available, if the code I checked in works!]:

The first thing to notice is that the terminal has gone. The idea is that you write the code in a code cell and inject it into the simulator. My model for doing this is via cell block magic or by passing code in a variable into the simulato (for generality, I should probably also allow a link to a .py file).

The cell block magic still needs some work, I think; eg a temporary alert with a coloured backrgound to say “code posted to simulator” that disappears on its own after a couple of seconds.) I probably also need an easy  way to preview the code currently assigned to the simulated robot.

You might also notice a chart display. This is actually a plotly streaming line chart that updates with sensor values (at the moment, just the ultrasound sensor; other sensors have different ranges, so do I scale those, or use different charts perhaps?)

There is also an output window your code can print messages to, as the following hello-world magic shows:

We can read state out of the simulator, though the way the callback work this seems to require running code across two cells to get the result into the Python environment?

I’ve also experimented with another approach where the widget’s parent object grabs (or could be regularly updated to mirror, maybe?) logged sensor readings from inside the simulator, which means I can interrogate that object, even as the simulator runs. (I also started to explore using a streaming dataframe for this data, but I’m not convinced that’s the best approach; certainly trying to stream logged data from the simulator into a streaming chart in the notebook context is laggy compared to the chart embedded in the simulator context.)

With the data in the Python context, we can do what we like with it, Like chart it etc.

There’s a lot of tweaks that need to be made and things to be added to run the full complement of activities we ran in the original presentation of the course.

I’d already started to explore what’s required to add Python functions to skulpt (eg Simple Text to Speech With Skulpt), although I’m not sure if that’s blocking (could it be handled asynchronously if so?) and today managed to learn enough from this SO answer on making objects draggable to make the robot draggable on the canvas (I think; a PR is here but not tested in a fresh/isolated environment yet (I only made a PR to give me s/thing to link to here!); the biggest issue I had was converting mouse co-ordinates to robot world canvas co-ordinates. There are still issues there, eg in getting the size of the robot correctly, but the co-ordinate management in the simulator looks a bit involved to me and I want to get my head round it enough that if I do start trying to simplify things, I don’t break other things!)

Other things that really need adding:

  • ability to reset canvas in one go;
  • ability to rotate robot using mouse;
  • ability to add noise to motors and sensors;
  • configure robot from notebook code cell rather than simulator UI? (This could also be seen as an issue about whether to strip as much out of the widget as possible.)
  • predefine sensoble robot configurations; (can we also have a single, centreline front mounted light sensor?)
  • add pen-up / pen-down support (perhaps have a drawing layer in the simulator for this?)
  • explore supporting multiple simulators embedded in one notebook (currently it’s at most one, I suspect in large part becuase of specific id values assigned to DOM elements?)

The layout is also really clunky, the main issue being how to see the code against the simulator (if you need to). Two columns might be better — notebook text and code cells in one, the simulator windows stacked in the other? — but then again, a wide simulator window is really useful. A floatinging / draggable simulator window might be another option? I did thing the simulator window might be tear-offable in JupyterLab, but I have never managed to successfully tear off any jp_proxy_widget in JupyterLab (my experiences using JupyterLab for anything are generally really miserable ones).

The original module simulator allowed you to step through the code, but: a) I don’t know if that would be possible; b) I suspect my coding knowledge / skills aren’t up to it; c) I really should be trying to write the activities, not sinking yet more time into the simulator. (One thing I do need to do is see if any of the code I wrote years ago when scopting things for the residential school is reusable, which could save some time…)

I also need to see if the simulator is actually any good for the practical activities we used in the original version of the course, or whether I need to write a whole new set of activities that do work in this simulator… Erm…

Notes in Advance of a Meeting About The Possibility of Getting an Institutional Jupyter Server Up and Running

Notes on Jupyter Deployments in an OU Context.

Notes made in advance of an internal workshop to discuss supporting “Jupyter notebooks” in the OU.

My intention is to split the content over several documents.

Needless to say, I think “notebooks” are both: a) not really the point; b) offer way more potential for doing all sorts of things than folk might think.

pandoc -o output.docx -f markdown -t docx filename.md

The content is split into three main sections:

  • Current Jupyter Deployments in the OU
  • Architectural Models
  • Use Cases for Jupyter Services

Current Jupyter Deployments in the OU

Jupyter notebooks are already used in several OU courses. The following summarises what I’ve managed to learn, or imagine may be the case. Folk involved with the respective modules may well disagree.

TM351

TM351 is a 30 point, third level module on data management and analysis. Approximately 50% of study is spent working on practical activities delivered via Jupyter notebooks.
Essential requirements:

  • notebook server that allows students to complete notebook based activities in a Python environment;
  • ability to create and run new Python backed notebooks;
  • pre-configured Python computational environment with preinstalled packages (Python and Linux package dependencies);
  • access to PostgreSQL server with permissions to add and delete users, roles, databases, tables; read and write to tables from notebooks;
  • access to MongoDB server with read permissions on a seeded database; permissions to create / read / write / delete databases and collections from notebooks;
  • access to OpenRefine application;
  • access to public internet addresses in order to download data files from arbitrary URLs;
  • ability to persist notebooks;
  • ability to access all service GUIs through a web browser;
  • ability to work on a student’s own computer in an offline mode;
  • ability to work cross-platform (Windows, Mac, Linux).
    Desirable Requirements:

  • ability to backup and restore databases;

  • ability to take away the complete computing environment so it can be run ex- of the OU;
  • ability to access the environment on a remote host (eg an OU hosted solution); (this solution would in turn create requirements based on scaleability, affordability, peak load, resource (processor, memory, storage, bandwidth), uptime etc.)
  • ability to access the environment from a terminal / command line;
  • side effect free on user desktop (i.e. the environment should not clash with any services already running on the student’s computer; the environment should not require changes to the student’s computer; any applications installed should be capable of being uninstalled cleanly).

Optional Requirements:

  • headless operation (access to desktop applications inside the provided environment is not required);
    Solution (16B, 16J-19J):

A VirtualBox virtual machine (VM) managed using vagrant provides a self-contained, preconfigured environment running all required applications. User files are mounted into the VM from the user desktop and saved out of the VM back onto the desktop.

Proposed Solution (20J+):

A Docker container (rather than VirtualBox VM) built using repo2docker, runnable via Docker Desktop / ContainDS on a student computer or on a remote host either as a standalone service or via JupyterHub.

Requirements not met:

  • side effects (VirtualBox and vagrant must both be installed; on Windows, virtualisation may need enabling, HyperV needs disabling. A fix to the latter would be to ship a VM built natively for HyperV);
  • hosted solution: an OU hosted solution is not available. DIY solutions for students to self-host on Azure, AWS, Digital Ocean are provided.

TM112

TM112 is a 30 point, first level module that in part provides an introduction to Python programming. Whilst the Python environment used for most of the activities is a simple, user-installed Python environment without embellishment, an optional “notebook experience” activity is also provided to deliver enrichment activities.
Essential requirements:

  • ability to run provided Jupyter notebooks within a single study session (no requirement to persist changed notebooks);
  • install-free / hosted solution;
  • scaleable (capable of coping with peak demand: 1k concurrent users)
  • affordable ( Use case example: a Docker container running a Jupyter server provides authenticated access to a GUI based Java application via a web browser. The Java application runs on the virtualised desktop and is proxied by the notebook server using the jupyter-desktop-server.
    > Use case example: a Jupyter server provides browser based access to a Windows desktop application running under Wine on an XFCE desktop. The application is launched from a Jupyter notebook UI and proxied via the jupyter-desktop-server extension.

JupyterHub

JupyterHub is a multi-user service that can provide authenticated access to personalised computational environments for multiple users. Each user may also be provided with their own persistent account managed by the server.

A range of authentication schemes are supported, including OAuth, LTI, login from Github etc. (TM112 uses LTI to allow students to access a single sign on authenticating JupyterHub server that launches temporary notebook servers from a Moodle VLE web page. Contact: Rod Norfor for technical details.)

The JupyterHub server provides access to a range of computational environments through environment spawners. For example, the DockerSpawner will launch a Docker container in response to a user login that runs a personalised computational environment in a Docker container.

JupyterHub can be configured to provide users with a range of alternative posssible environments, so a student could log in and be presented with options to launch different environments relating to different modules, for example.

JupyterHub can scale a service offering with increasing numbers of users using a well supported and well proven Kubernetes deployment model. (The TM112 Jupyterhub sever uses Kubernetes on Microsoft Azure to service the required number of students in a scaleable way.)

Whilst JupyterHub nominally expects to manage launched environments via a Jupyter notebook server running in the environment, a recent community contribution (jhsingle-native-proxy) allows arbitrary containerised web applications to be launched and managed from a JupyterHub server.

Architecturally, JupyterHub can launch individual Jupyter notebook servers, notebook servers then launch notebooks, and notebooks launch kernels.

Jupyter Enterprise Gateway Server

The Jupyter Enterprise Gateway Server is a middleware service, originally developed by IBM, that provides the ability to launch kernels on behalf of remote notebooks in a scaleable way (eg scaling for large numbers of users; allowing kernels to run with different amounts of computational resource (CPUs, GPUs, memory etc)).

One possible architectural model would be for a JupyterHub server to provide multi-user access to the Jupyter environment, and JupyterHub to launch kernels via the Jupyter Enterprise Gateway Server.

Alternatively, a student running their own personal Jupyter notebook server at home on a computer with limited computational resource could use an institutional Jupyter Enterprise Gateway Server to launch a kernel on a well provisioned server, eg one with a large amount of memory and a GPU.

(At the moment, I don’t think the same personal Jupyter server can launch kernels locally as well as via a Jupyter Enterprise Gateway server; I think the provisioner is one or the other.)

BinderHub

BinderHub is variant of JupyterHub that allows an unauthenticated user to launch a containerised Jupyter notebook server, on-demand, built according to a linked to specification on a remote repository.

The most common way of using BinderHub is to create a Github repository containing environment definition files as well as user files (eg Jupyter notebooks) and then use MyBinder (a free and open federated Binderhub service) to build a Docker image based on the contents of the repository. Once built, or if a cached version already exists, MyBinder spawns a Docker container from the image and serves it to the user.

Currently, BinderHub provides only a “temporary” service – the container is built, deployed, served to the user, and then destroyed at the end of the session. However, one of the Binder Federation nodes do have an experimental persistent Binderhub deployment that provides authenticated user access and a persistent user file areas. Users can launch Binder containers from their account, save files to their account, and share their own files into their launched Binder containers.

Binderhub / MyBinder is also used as an ad hoc provider of computational environments for a variety of online “interactive textbooks” and online courses. For example, Jupyter Book and “the spacy course”, as well as the LibreTexts interactive book platform. Published as HTML websites, the contents of code cells embedded (and editable) within the HTML page can be executed against a remotely launched MyBinder kernel, with the result of the computation returned to the page and displayed within it.

Several Javascript packages (thebelab.js, juniper.js) exist to “enable” the code cells in an HTML page and manage the MyBinder connection.

nbgallery

nbgallery is not an official Jupyter project but it does provide a range of interesting features that are worth exploring if we want to be open-minded about what sort of user environment we want to use to provide people to access to Jupyer notebooks.

The nbgallery application (TH review and video review) was developed by the US Department of Defense and the NSA to provide multi-user access to a wide range notebooks. The gallery provides search tools over a wide collection of notebooks and allows users to rate and review notebooks. Users can launch notebooks in a connected Jupyter environment. A healthcheck facility checks that code cells execute as expected (and if not may flag maintenance or student difficulty issues).

Exploring the use of nbgallery either as a social application, visited by all students, or as a personal application, used by a student to access notebooks in a personal environment, may turn up a way of providing access to notebooks in a way that is useful not just for linear courses, but also resource based / problem based learning courses.

Institutional Vs Local Provision

Whilst it is possible to consider Jupyter mediated environments in either the local user context (eg TM351 students using their own VMs) or the institutional context (for example, TM112 students launnching temporary notebook servers), I think there is most to be gained from considering them as two different ways of exposing students to the same computational environments.

For example, consider the following three situations:

1) TM351 students run a Virtualbox virtual machine containing multiple “personal” servers: a Jupyter notebook server, a Postgres database server; the student “owns” all services and all services are integrated.

2) TM351 students accessing a virtalised Jupyter environment for notebook access, and logging in to a shared Postgres database server. The student does not own their computational environment provided by the notebook server, though they may able to export their files; nor do they own their database server: they are one of many users accessing the same service, though they may be able to export a dump of their database contents.

3) a TM351 Docker environment is defined in a public repository such as innovationOUtside/tm351vm-binder. The definition is public and can be shared (owned, edited) by anyone. The repository can be launched using a Binderhub instance and used to provide temporary access to an integrated, personal TM351 environment running a personal Jupyter notebook server as well as a Posgtgres server. Continuous integration tools build a Docker container image from the repo and push it to Docker Hub (ousefuldemos/tm351-binderised. An institutional JupyterHub server allows students to seemlessly login from the VLE and launch the TM351 environment pulled from Docker Hub either directly or via an institutional Jupyter Enterprise Gateway Server. A student with a powerful computer at home installs Docker and launches their own local container instance of the TM351 environment pulled from DockerHub. Perhaps more conveniently, they use the ContainDS desktop application to launch the container locally, again either pulling the prebuilt image from Docker Hub, or building a version themselves either directly from the original repository or from a local clone of it. (ContainDS greatly simplifies the practicalites of running Dockerised notebook servers on the desktop, providing a useful graphical user interface for mangaing containers, managing server authentication tokens, mounting files from the desktop into the container, etc.)

In the third case, the same environment definition is used to:

  • build and deploy temporary environments on MyBinder;
  • build a public image deposited on Docker Hub;

The public image on Docker Hub is then used to:

  • deploy environments from an institutional JupyterHub service;
  • deploy local environments on the students’ own desktop.

In each case, students gain personal access to a commonly defined environment and have “ownership” of all the services running inside the environment. Students are free to take away the environment and use it in other contexts, or access it solely via hosted solutions. (There is an issue of synchronising user files and environment updates across mutliple services if a student works in that way, eg sometimes at home on their computer, sometimes from their desk on an OU remote host, etc.)

Use Cases for Jupyter Services

Jupyter environments can be used to support a range of activities across the institution, most notably:

  1. Delivering interactive teaching materials to students (module delivery);
  2. Authoring teaching materials (module production);
  3. Supporting computational academic research (research);
  4. Disseminating academic research (reproducible research publications) (research publishing);
  5. Supporting institutional data analysis and reporting (business analytics)

A sensible question to ask is: “what benefits or differences do a ‘Jupyter solution’ bring to each of these activities. So let’s quickly review them:

Delivering Interactive Teaching Materials to Students

We have been using Jupyter notebook from since before they were Jupyter notebooks to deliver 4 hours of teaching per week to students on TM351. Over that time, the pedagogy has delivered but is still largely unexplored.

The notebooks we wrote then are not the notebooks we would write now. The notebooks we might write now are not the notebooks we could write if we spent some time exploring them properly as a medium for both teaching and learning, eg considering how they might be used to support formative assessment, summatice assessment (touched on on TM351), automated testing / grading, personal note taking and portfolio development.

Computation supports interactivity in two ways:

  • it supports the execution of provided code and as such can be used to create what are effectively end user applications;
  • it allows student to create and execute their own code, for whatever purpose.

The Jupyter environment can support both use cases.

Authoring Teaching Materials

Irrespective of whether notebooks are used to deliver teaching to students, they can be used to develop teaching materials, interactive and otherwise, in a direct authoring way.

For example, document conversion tools allow authored notebooks to be rendered in a variety of formats: as .docx word processor documents, as .pdf files, as simple .md markdown/text files, as HTML pages.

The notebook user interface is web based and, via notebook extensions, supports WYSIWYG editing as well as direct editing of markdown and HTML text. Mathematical and chemical equations written in LaTeX are rendered natively by the notebook.

A wide range of display methods allow rich media assets to be embedded in the text by simple wrapping of a file reference (a local file reference or a web URL) that points to the object: images (Image(URL)), videos (Video(LOCAL_FILE) and audio clips (Audio(URL)) are all readily embedded in the document, for example, as can more complex objects such as interactive maps.

Code, often little more than a single simple line, can also be used to generate media outputs, from rich interactive embedded javascript applications to simple charts and tables.

More ambitious authors may choose to create their own asset generating code, or even go so far as to create de facto end user applications within the notebook context.

Using code to generate charts and tables has benefits for module maintenance, because charts are generated from source datasets or equations. If they need to be updated, a simple change to the code or data file is all that’s required for the rendered asset to be updated.

The executed notebook can then be exported as final document (as interactive HTML or ePub, as a flat PDF or docx etc) from the original notebook. (Of course, not all renderings will be as rich as the original interactive notebook form or HTML converted form.)

I was hoping to make more progress on openlearn-publish-test to demonstrate how we could use Jupytext to support direct authoring and rich / interactive updating of converted OpenLearn OU-XML content in a notebook UI, but I’ve run out of time and only got as far as how to get the OpenLearn content into a notebook enabled environment.

Supporting Computational Academic Reasearch

Lots of academics write ad hoc code in their research; lots of academics make notes around their ad hoc code; lots of academics use code for exploration; notebooks are powerful environment that lets you do each of those in the context of all the others.

Jupyter notebooks are arguably not the best environment for developing research software packages, although the provision of computational environments may support that activity. However, workflows are emerging that do better support traditional software engineering / code development practices. For example, tools such as Jupytext provide support for working with simple text document formats (.py and .R files, for example) directly within the Jupyter notebook environment.

Disseminating Academic Research

An increasing number of academic journals require researchers to deposit reproducible code scripts with their submissions.

Several journals are exploring the use of Jupyer notebooks as a first-class document format for submitting papers, as well as developing review and comment tools around them.

Supporting Institutional Analysis and Reporting

Lots of financial companies use notebooks as an analysis environment. The Ministry of Justice moved to a Jupyter fronted platform (MoJ Analytical Platform) for their analysts.

A recent OU job ad for Head of Data Analytics identified skills in things like Python and R. Such folk might reasonably expect to use Jupyter notebooks for their analysis and reporting. One of the things not covered in this review are the rich interactive dashboarding tools that Jupyter ecosystem supports (eg Voilà).

Appropriating OpenLearn Content and Republishing Edited Versions Of It Via a “Simple” Automated Text Blogging Workflow

I had intended on using my (unpaid) strike days to catch up with some books and harp practice, and maybe even the garden, and keep away from the keyboard; or failing that, to have a push on my rally data tinkering and get another LeanPub book started to try to reboot the £50 a quarter or so my previous publication (Wrangling F1 Data With R) generated, which keeps things like recurring Dropbox and Flickr etc etc charges covered (no-one has ever bought me a KoFi, as far as I can tell…).

And I was determined not to do any of the mounting workload associated with the day job, no matter how much fun some it is likely to be (like getting Ev3devSim working as an ipywidget in Jupyter notebooks).

Whilst I did manage to stick to the determined not to path, I never even really started down the intended one, instead spending hours and hours in front of keyboard trying to hack something together around my OpenLearn publishing workflow.

So here’s what I’ve come up with…

An OpenLearn Unit Text Publishing Thing

Firstly, it’s a thing that lets you grab the “source” content of an OpenLearn unit (at least, some of it; I still haven’t got round to grabbing things like video files or audio files, or scraping PDFs etc.) and churn it into a simple text format, markdown, which looks like this:

Headers are prefixed with a #, you can emphasise things by wrapping it in * characters, eg *italics* -> italics, or **double them up** for strong emphasis. Embedding links — [link text](link/path/file.html) — and images — ![Alt text](path/to/image.file) — is also pretty easy when you get the hang of it.

So how do you get started? First, you need a Github account (sign up here; just get one: you’re not going to have to do any hard Github stuff, you’re just making use of their free hosting). Get one, and sign in.

Second, visit my demo repo — psychemedia/openlearn-publish-test — (the URL will change at some point, but I’ll archive the original and link the new address from it…) and grab a copy of your own repo from mine by clicking the big green Use this template button:

You’ll be presented with a form:

Give your repo a name (no spaces). Optionally add a description. Keep the repo public. And click the big green Create repository from template button.

Things will churn for a moment or two:

And then you’ll have your own repo, containing a copy of the files in mine:

Behind the scenes, there is work going on…

Click the Actions tab on your copy of the repo to see what…

At first, it may look like nothing… but wait a moment or two and refresh the page:

A couple of actions will start running to initialise, and customise, your repo for you.

When the actions are done, you’ll be informed… (you shouldn’t have to refresh the page, the status indicators should update when things are done…):

If you go back to your repo homepage, you’ll see it’s been updated with a new README that’s slightly different to the original copy from my repo, and that has been personalised to yours:

So… now you can grab some OpenLearn content into your repo.

Click on the SET_UP.md file link in your repo:

You will be presented with a list of units on OpenLearn.

Find one you like the look of and click the Grab Unit into this repo link:

This will open a new issue for you in the Issues tab of your repo, and prepopulate it with a title that will tell a Github Action you want to grab some OpenLearn content, and an issue body that tells the action where the unit can be found.

Click the Submit new issue button to get things started.

Back in the Actions tab, you can see the helper elves have started doing their thing again…

If you click on a running Action, you can check its progress in more detail:

Click through on the actual job name to see what’s happening inside:

You can expand a step by clicking the arrow to see what each step is doing or has already done…

If you read through the steps, you’ll see several things are done: for example, we grab some OUXML (the OpenLearn content), convert to markdown, build some HTML files (these are what gets published), and deploy them, then build a LaTeX version of the material (which is used to generate a PDF), and an ePub ebook. (The LaTeX step takes some time; I should perhaps simplify things so that only the HTML build is done by default.)

When the Actions are green circled / green ticked and done (which may take a few minutes…), or at least, when the Deploy HTML to gh-pages step has run, go back to the repo home page, where you should see a new commit has been made to your repo:

If you click into the content folder you’ll see one or more session folders:

If you click into a session folder, you see some markdown files:

If you click on one of those, you’ll see some scraped and converted OpenLearn content:

So the content has been grabbed from OpenLearn and saved to your repo.

But that’s not all.

If you scroll down on your README page (I really should make this link more prominent in the README…) you’ll see a link to a github.io site published from your repo:

Click it…

If you see a “404”, page not found, don’t panic

On the repo home page, select the Settings tab:

and scroll down to the Github Pages area:

Change the Source from gh-pages branch to master branch:

And then, select the master branch:

And set the Source back to the gh-pages branch:

When you see something like this, you knows all good to go:

Note that cacheing of a previous build of the site may last for up to 10 minues, so grab yourself a cup of tea, or perhaps look through the markdown files in the content directory, or even go back to the Actions tab and, if the actions have completed.

If the Actions have completed, select the OpenLearnXML2 (or a completed nbsphinx publisher action if you have committed your own changes to the markdown files) and you should see and the availability of an Artifacts download.

Down load and unzip the artifacts file. If the build process has been able to build a PDF file and/or an ePub file from the content, it will be found in the unzipped downloaded directory.

Right… time to try your github.io site link again:

An OpenLearn Editing Thing

This will have to be in a part two to this post… I’ve run out of time for now and need to get back to the day job…

If you are itching to get started, this may work, if I’ve got my autopublishing things fixed…

In the content folder (on the default master branch of the repo),  find the markdown file you want to edit, and click on the pencil icon to open the editor:

my-oer_Part_00_02_md_at_master_·_psychemedia_my-oer

Edit the file / make the changes you want, and commit it (you may want to set a meaningful commit message title summarising the chnages, and perhaps even a longer description about the motivation for the changes, but both are optional…):

Editing_my-oer_Part_00_02_md_at_master_·_psychemedia_my-oer

Click the big green Commit changes button to commit the changes. If you look in the Actions tab, you should see that an nbsphinx publisher action has started that should publish your changes to your site.

Actions_·_psychemedia_my-oer6

Note that even when the publishing action has generated and pushed updated site pages  to where they need to be, the site may take a few minutes to update because of page cacheing on the Github site.

The Future

One of the spinoffs of this for me was the realisation that I could use Github Actions to run arbitrary code in response to particular events, such as.. commits or issue postings. The current machinery uses a Sphinx / nbSphinx publishing route, but I’ve also started exploring a recipe for Jekyll based Jupyter Book publishing. (Next on the to do list will be an Executable Book project / MyST workflow](https://ebp.jupyterbook.org/en/latest/); I also need to split out the workflows into actions of their own, but I haven’t figured out how to do that for myself yet.) It strikes me that I could bundle all these in the same repo with some way of flagging which build process I want to use. This would allow the user to then republish their material using the publishing tool, and its various peculiarities, customisationa and affordances, of their choice.

Immediate to dos, that may not happen because I’m the only user, I know it’s possible, and I’m not that interested, are to: make the Github Pages / pubished site link more prominent in the README; get movie and audio downloads and embeds working. Also a way of handling PDFs linked from the OpenLearn materials, and perhaps extracting text from those, even, to support republishing…

On the publish side, it would be useful to be able to publis to HTML only by default, with some optional way of invoking the PDF and ePub builds. The ePub build also needs things like title and author setting. The PDF build sometimes breaks, eg due to the inability to detect a bounding box size round a gif image. I maybe need to use another PDF generator, eg some hints here.

I also need to refactor the code, two ways: firstly, a simplification, that uses the bare minimum of packages and just churns the markdown direct from XML in one simple step. Secondly, fixing the current workflow, which stages the XML in a SQLite database, so that the database can properly handly content from multiple unitis and I can reliably churn the md from the database for any single unit. At the moment, I think things pretty much assume there’s content from just a single unit in the database. Putting the md into the db might be useful too… Then I could imagine a datasette powered publishing route too…

As a recent tweet from Martin Hawksey reveals, he’s been blogging about how we can turn Google’s App Script to our own purposes as a hosted code runner, and I think Github now provides a similar opportunity for anyone who wants to appropriate it to that end…

OpenLearn OER (Re)Publishing the Text Way

In response to a provocation, I built a thing that will let you grab an OpenLearn unit, convert it to a simple text format, and publish it on your own website.

[For the next step in this journey, see: Appropriating OpenLearn Content and Republishing Edited Versions Of It Via a “Simple” Automated Text Blogging Workflow.]

It doesn’t require much:

  • if you haven’t got one already, create a Github account (just don’t “ooh, Github, that’s really hard, so I won’t be able to do it…”; just f***ing get an account);
  • visit my repo and read down the page to see what to do…

And what to do essentially boils down to:

As for changing the content – it’s not that hard once you’ve done it a few times and just go with the flow of writing what feels natural… “Easy” to edit text files are in the content directory and you can edit them via the Github website.

Merging Several Binder Configurations

As more and more repositories start to incorporate MyBinder / repo2docker build specifications, more and more building blocks start to appear for how to get particular things running in MyBinder. For example, I have several ouseful-template-repos with various building blocks for getting different databases running in MyBinder, and occasionally require an environment that also loads in a Jupyter-server-proxied application, such as OpenRefine. Other times, I might want to pull in the config for a partculalry ast install, or merge configs someone else has developed to run different sets of notbooks in the same Binderised repo.

But: a problem arises if you want to combine multiple Binder specifications from various repos into a single Binder setup in a single repo – how do you do it?

One way might be for repo2docker to itereate through multiple build steps, one for each Binder specification. There may be clashes, of course, such as conflicting package versions from different specifications, but it would then fall to the user to try to resolve the issue. Which is fine, if Binder is making a best attempt rather than guaranteeing to work.

Assuming that such a facility does not exist, it would require updates to repo2docker, so that’s not something we can easily hack around with ourselves. So how about something where we try to combine the contents of multiple binder/ setup directories ourselves. This is something we can start to do easily enough ourselves, and as a personal tool doesn’t necessarily have to work “properly” and “for everything”: for starters, it only has to work with what we want it to work with. And if it only works so far, getting 80% of the way to a working combined configuration that’s fine too.

So what would we need to do?

Simple list files like apt.txt and requirements.txt could be simply concatenated together, leaving it up to pip to do whatever it does with any clashes in pinned package numbers, for example (though we may want to report possible clashes, perhaps via a comment in the file, to help the user debug things).

In a shell script, something like the following would concatenate files in directories binder_1, binder_2, etc.:

for i in $(ls -d binder_*)
do
   echo >> binder/apt.txt
   echo "# $i" >> binder/apt.txt
   cat "$i/requirements.txt" >> binder/apt.txt
done

In Python, something like:

import os

with open('binder/requirements.txt', 'w') as outfile:
    for d in [d for d in os.listdir() if d.startswith('binder_') and os.path.isdir(d)]:
        # Should test: if 'requirements.txt' in os.listdir(d)
        with open(os.path.join(d, 'requirements.txt')) as infile:
            outfile.write(f'\n#{d}\n')
            outfile.write(infile.read())

Merging environment.yml files is a little trickier — the structure within the file is hierarchical — but a package like hiyapyco can help us with that:

import hiyapyco
import fnmatch

_envs = [os.path.join(d, e) for e in [d for d in os.listdir() if d.startswith('binder_') and os.path.isdir(d)] if fnmatch.fnmatch(e, '*.y*ml')]

merged = hiyapyco.load(_envs,
                       method=hiyapyco.METHOD_MERGE,
                       interpolate=True)

with open('binder/environment.yml', 'w') as f:
    f.write(hiyapyco.dump(merged))

There is an issue with environments where we have both environment.yml and requirements.txt files because the environments.yml trumps requirements.txt: the former will run but the latter won’t. A workaround I have used in the past for installing from both is to call install from the requirements.txt file by using a directive in the postBuild file to handle the requirements.txt installation.

I’ve also had to use a related trick to install a really dependent Python package explicitly via postBuild and then install from a renamed requirements.txt also via postBuild: the pip installer installs packages in whatever order it wants, and doesn’t necessarily follow any order “specified” in the requirements.txt file. This means that on certain occasions, a build can fail becuase one Python package is relying on another which is specified in the requirements.txt file but hasn’t been installed yet.

Another approach might be to grab any requirements from a (merged) requirements.txt file into an environment.yml file. For example, we can create a “dummy” _environment.yml file that will install elements from our requirements file, and then merge that into an existing environments.yml file. (We’d probbaly guard this with a check that both environment.y*ml and requirements.txt are in binder/):

_yaml = '''dependencies:
  - pip
  - pip:
'''

# if 'requirements.txt' in os.listdir() and 'environment.yml' in os.listdir():

with open('binder/requirements.txt') as f:
    for item in f.readlines():
        if item and not item.startswith('#'):
            _yaml = f'{_yaml}    - {item.strip()}\n'

with open('binder/_environment.yml', 'w') as f:
    f.write(_yaml)

merged = hiyapyco.load('binder/environment.yml', 'binder/_environment.yml',
                       method=hiyapyco.METHOD_MERGE,
                       interpolate=True)

with open('binder/environment.yml', 'w') as f:
    f.write(hiyapyco.dump(merged))

# Maybe also now delete requirements.txt?

For postBuild elements, different postBuild files may well operate in different shells (for example, we may have one that executes bash code, another that contains Python code). Perhaps the simplest way of “merging” this is to just copy over the separate postBuild files and generate a new one that calls each of them in turn.

import shutil

postBuild = ''

for d in [d for d in os.listdir() if d.startswith('binder_') and os.path.isdir(d)]:
    if 'postBuild' in os.listdir(d) and os.path.isfile(os.path.join(d, 'postBuild')):
        _from = os.path.join(d, 'postBuild')
        _to = os.path.join('binder', f'postBuild_{d}')
        shutil.copyfile(_from, _to)
        postBuild = f'{postBuild}\n./{_to}\n'

with open('binder/postBuild', 'w') as outfile:
    outfile.write(postBuild)

I’m guessing we could do the same for start?

If you want to have a play, the beginnings of a test file can be found here (for some reason, WordPress craps all over it and deletes half of it if I try to embed it in the sourcecode block etc. (I really should move to a blogging platform that does what I need…)

PoC: Using Git Commit Messages As a Githib Actions CLI

Following an idle wonder last week on Using Git Commit Messages as a Command Line?, I had a play and came up with a demo of sorts: ouseful-testing/action-steps.

The idea is that by creating a Github Action that performs actions based, in part at least, on the contents of a Github commit message, we can start to use commit messages as as a CLI to invoke particular Github Action mediated activities.

My first couple of proofs of concept were:

  • a simple script that replaces one file (the README) with the contents of another. At the moment, both files need to be in the same branch (ideally, the replacement files would be pulled in from another branch but I couldn’t figure out how to do that offhand). If you just make a “dummy” commit to any old file with the commit message Update Readme the README will be updated with the contents of one file. If you use the commit message Reset Readme it will be replaced with the contents of another. My thinking in part, here, is that you could “commit” progress messages as you work through a thing and the README keeps getting updated with the next thing you have to do as you commit to say you’ve done the previous thing.
name: Updates
on: push

jobs:
  update_readme:
    if: (github.event.commits[0].message == 'Update Readme')
    runs-on: ubuntu-latest
    steps:

    - name: Copy Repository Contents
      uses: actions/checkout@v2
        
    - name: commit changes
      run: |
        git config --global user.email "${GH_EMAIL}"
        git config --global user.name "${GH_USERNAME}"
        # git checkout -B fastpages-automated-setup
        mv README2.md README.md
        git add README.md
        git commit -m'Update README'
        git push
      env: 
        GH_EMAIL: ${{ github.event.commits[0].author.email }}
        GH_USERNAME: ${{ github.event.commits[0].author.username }}
        
  reset_readme:
    if: (github.event.commits[0].message == 'Reset Readme')
    runs-on: ubuntu-latest
    steps:

    - name: Copy Repository Contents
      uses: actions/checkout@v2
        
    - name: commit changes
      run: |
        git config --global user.email "${GH_EMAIL}"
        git config --global user.name "${GH_USERNAME}"
        # git checkout -B fastpages-automated-setup
        mv README1.md README.md
        git add README.md
        git commit -m'Reset README'
        git push
      env: 
        GH_EMAIL: ${{ github.event.commits[0].author.email }}
        GH_USERNAME: ${{ github.event.commits[0].author.username }}
  • a simple script that lets you upload one or more zip files as part of a push; if the commit message starts with Unzip, in response to the commit, unzip the committed zip files, and then delete the .zip archive files you had just committed, pushing the unzipped files in their place.
name: Unzip
on:
  push:
    paths:
    - '**.zip'

jobs:
  unzip-files:
    if: startsWith(github.event.commits[0].message, 'Unzip')
    runs-on: ubuntu-latest
    steps:

    - name: Copy Repository Contents
      uses: actions/checkout@v2
      with:
        fetch-depth: 2

    - name: handle zip
      run: |
        git config --global user.email "${GH_EMAIL}"
        git config --global user.name "${GH_USERNAME}"
        for f in $(git diff HEAD^..HEAD --no-commit-id --name-only | grep -E '.zip$')
          do
              echo $f
              fn=`unzip $f | grep -m1 'creating:' | cut -d' ' -f5-`
              echo $fn
              git rm $f
              git add $fn
              git commit -m"unzip $f"
          done
        git push
      env: 
        GH_EMAIL: ${{ github.event.commits[0].author.email }}
        GH_USERNAME: ${{ github.event.commits[0].author.username }}

I did also wonder about whether it would be possible to implement something like Adventure, played by issuing instructions through Git commit messages and maybe updating the readme with the game response to each step… Stepping through the hstory of READMEs would be your game transcript…

Are fastpages Really an EASY Way to Publish a Blog From Jupyter Notebooks?

I tried to submit this to the fast.ai discourse forum, having been invited to do so, but after handing over credentials to get an account so I could log in, then having to go to my email client to click the confirmation code, then not being able to create a new topic (new user policy, maybe?) then having my post quarantined and my account largely suspended, I thought I’d post the text of the post here (glad I took a copy…)

(I appreciate that my behaviour / attitude around this may be seen as both childish and churlish,  but I was originally riled by the “easy” hype around fastpages (because I don’t think it necessarily is easy for anyone other than a particularly select population…) and since then, things have just gone downhill in terms of ease of use / communication!;-) “Just do X” has (just) so much baggage associated with it…

Original post and replies thread

Picking up on a Twitter thread, some comments around the “fastpages supports really easy Jupyter blogging” effusiveness on Twitter.

(Note this isn’t meant to be hostile, it’s meant to be usefully critical ;-)

For any seasoned Github user and developer who’s also been responsible for maintaining documentation sites using Jekyll, fastpages “just” requires folk to use Github and Jekyll style publishing to publish a blog site from notebook files and markdown docs.

For anyone familiar with Github, git, and Jekyll publishing, the fastpages automation simplifies some of the faff required in getting that stuff working. (Other approaches, such as Jupyter Book, ipypublish and nbsphinx offer related publishing routes but less hype. A proper comparison of all the approaches might be useful…)

So if you’re familiar with Github and Jekyll, the benefits are quite possibly both clear and enticing. But if you aren’t a Github user or a Jekyll user, things are pretty much as opaque as every they were.

The fastpages mechanic of generating a PR on the first commit generated when cloning the template repo is really neat, and an idea I’ll likely steal. But for a novice, without mental model of how Github works, this doesn’t in and of itself make things that much easier. The naive user is faced with a complex UI, using complex jargon, and probably doesn’t know where to go looking for the PR, how to handle it, what it means when they do handle it, etc etc.

The file listing on the master home page you’re faced with when cloning the repo is also intimidating. There are a lot of files, there’s lots of directory names starting with scary underscores, lots of `.whatever` hidden files. That’s fine if you’re creating a workflow that’s “easy” for folk who are happy with all this stuff, but if the claim is that this is an “easy route into blogging with Jupyter” in general, it isn’t.

One of the attractive features of the Jupyter notebook UI and infrastructure is that someone with little technical knowledge on the command line can quickly start using magics and high level commands, a line at a time, to get stuff done. Just because someone can plot a chart a from a pandas data frame populate[d] from a loaded in CSV file doesn’t necessarily mean they know how to set up the Jupyterhub server they’re actually a user of, nor even how to install pandas into the environment they’re using. As a *user*, why should they? The same goes for their familiarity, or otherwise, with Github and Jekyll. (By the by, it’s probably best to leave the “but they ought to…” arguments aside…)

I’m all for folk developing skills, but onboarding is really hard. And oftentimes, when trying to persuade people to adopt new tech in conservative institutions, you only get infrequent opportunities to entice them in. If you claim something is easy, that you “just” this and that, then watch their face as confusion and terror reigns, and you’ve lost your conversion opportunity. They won’t try again.

To make things *really* easy means taking things much slower. Cloning the repo and showing a clean page with a very simple set of instructions, and all the scary stuff hidden in branches, provides an opportunity for generating an easy way in. The initial readme could provide a set of very clear instructions about setting up tokens etc, along with why they’re necessary (eg Stephen Downes had a go at simplifying them [here, part 1](https://halfanhour.blogspot.com/2020/02/how-to-use-fastpages.html) and [here, part 2](https://halfanhour.blogspot.com/2020/02/how-to-use-fastpages-2.html)).

Things would also be simple if the all[simpler if all] the Jekyll scaffolding were hidden away somewhere, and the user could just slowly introduce things into the top level directory, the homepage for their blog source files, with all the scaffolding hidden away and built on via branches.

This level of simplicity may or may not be desirable for a (semi-)professional, if ad hoc, tool, but if the desire is to find a way to make it easier for novices (to Github, to Jekyll) to publish in what is still quite a low level way, I think more scaffolding is required. (A limiting case of easy is probably to just click a button on your Jupyter notebook and have the file posted somewhere, from where it magically then appears on a public URL.)

Inspired by the initial commit handling Github Action, I started some baby steps explorations of a way of making “performative” Github commit actions ([action-steps](https://github.com/ouseful-testing/action-steps)) that might (or might not!) make things simpler for a novice user (they also run the risk of them developing bad mental models, but I’m just exploring ideas).

For example, you might encourage someone via the readme to create a new file from the Github web UI with a particular filename or particular commit message, and then handle that in a particular way, perhaps updating the README with the next step; this might include some description of how you could then compare the original readme with the updated one. (I did start wondering whether I could code Adventure to be played via commit messages! Has that been done before I wonder?)

You might have additional commit messages that introduce new files into the top level repo, a file at a time. (Where to put simple documentation describing commit performative commands would be another issue!)

I appreciate this is probably *not* how Github is traditionally used, where a principle of least surprise about what appears in the repo compared to the files you actually commit is a sound one (that said, a lot of workflows do make use of commit hooks that do change files…) But I would argue that using Github for the primary purpose of making use of its Github Pages publishing mechanism is not using Github in a traditional version control application way either. Version control is NOT the aim. So what I’m thinking of here is where the user can instruct Git to add in very particular new files at particular times in response to particular commands issued via a particular commit message for a particular reason: to allow them to incrementally develop the complexity of their environment from within the environment as they grow familiar with it. Along the way, the mechanism could coach an introduce the user to features of Github that may be useful in a blogging context, such as the ability to “track changes” and maintain different versions of a content as you draft it etc. This would then introduce them to version control as a side effect of them developing particular blogging workflow practices in an environment that can coach them as they use it.

This may all just be nonsense, of course!

For some definition of “just”…