A Personal Take on Customising Jupyter Via End-User Innovation

Following a Github discussion on The future of the classic notebook interface and the Jupyter Notebook version 7 JEP (Jupyter Enhancement proposal), a Pre-release plan for Notebook v7 is now in play that will see RetroLab, as was, the notebook style JupyerLab powered single document UI form the basis of future notebooks UIs.

I’ve been asked for comments on my take on now the original notebook supported originally supported end-user development which I’ll try to describe here. But first I should add some caveats:

  • I am not a developer;
  • I am not interested in environments that developers use for doing things developers do;
  • I am not a data scientist;
  • I am not interested in environments that data scientists use for doing things that data scientists do;
  • I am a note taker; I am a tinkerer and explorer of the potential for using newly available technologies in combination with other technologies; I am doodler, creating code exploiting sketches to perform particular tasks, often a single line of code at a time; I am an author of texts that exploit interactivity in a wide variety of subject areas using third party, off-the-shelf packages that exploit IPython (and hence, Jupyter notebook) display machinery.
A line of code can be used, largely in ignorance of how it works and simply by copying, pasting, and changing a single recognisable value, to embed a rich interactive into a document. In this case, a 3D molecular model is embdedded based on a standardised compound code passed in via a variable defined elsewhere in the document.
  • I am interested in environments that help me take notes, that help me tinker and explore the potential for using newly available technologies in combination with other technologies, that help me author the sort of texts that I want to explore.
  • I am interested in media that can be used to support the open and distance education, both teaching (in sense of making materials available to learners) and learning (which might be done either in a teaching context, or independently). My preference for teaching materials is that they support learning.
  • I am interested in end-user innovation where an end-user with enthusiasm and only modicum skills can extend an/or co-opt environment or the features or serviecs it offers, for their own personal use, without having to ask for permission or modify the environment’s core offering or code base (i.e. end-user innovation that allows a user to lock themselves in through extras they have added; in certain situations, this may be characterised as developing on top of undocumented features (it certainly shares many similar features));
  • In my organisation, the lead times are ridiculous. The following is only a slight caricature: a module takes 2+ years to create then is expected to last for 5-10 years largely unchanged. A technology freeze might essentially be put in place a year before student first use date. Technology selection is often based on adopting a mature technology at the start of the produciton process (two years prior to first use date).
  • When we adopted Jupyter notebooks for the first time for a module due for first release in 2016, it was a huge punt. The notebooks (IPython notebooks) were immature and unstable at the time; we also adopted pandas which was still in early days. There were a lot of reasons why I felt comfortable recommending both those solutions based on observation of the developer communities and practical experience of using the technologies. One practical observation was that I could get started very quickly, over a coffee, without docs, and do something useful. That meant other people would be able to too. Which would mean low barrier to first use. Which meant easy adoption. Which meant few blockers to trying to scale use. (Note that getting the environment you wanted set up as you wanted could be a faff, but we could mitigate that by using virtual machines to deliver software. It was also likely that installation would get easier.)
  • One of the attractive things about the classic Jupyter notebook UI was that I could easily hack the branding to loosely match organisational branding (a simple CSS tweak, based on inspection by someone who didn’t really know CSS (i.e. me), to point to our logo). As a distance learning organisation, I felt it was important to frame the environment in a particular way, so that students should feel as if they working in what felt like an institutional context. When you’re working in that (branded) space, you are expected to behave, and work, in a particular way:
  • there were also third party extensions, written in simple JS and HTML. These could be created and installed by an end-user, taking inspiration and code from pre-existing extensions. As an end-user, I was interested in customising the appearance of content in the notebook. For teaching / publishing purposes, I was interested in being able to replicate the look of materials in our VLE (virtual learning environment). The materials use colour theming to cue different sorts of behaviour. For example, activities and SAQs (self-assessment questions) use colour highlighted blocks to identify particular sorts of content:
  • the open source nature of the Jupyter code base meant that we could run things as local users or as an on-prem service, or as a rented hosted service from a third party provider; in my opinion all three are important. I think students need to be able to run code locally so that they can: work offline, as well as share or use the provided environment in another context, eg a work context; I think being able to offer an institutionally hosted service provides equitable access to students who may be limited in terms of personal access to computers; I think the external provider route demonstrates a more widespread adoption of a particular approach, which means longer term viability and support as well as a demonstration that we are using “authentic” tools that are used elsewhere.

One of our original extensions sought to colour theme activities in Jupyter notebooks in a similar way. (This could be improved, probably, but the following was a quick solution based on inspection of the HTML to try to find attributes that could be easily styled.)

The cells are highlighted by selecting a cell and then clicking a button. How to add the button to a toolbar was cribbed from a simple, pre-existing extension.

If anything, I’m a “View source” powered tinkerer, copying fragrments of code from things I’ve found that do more or less the sort of thing I want to do. This strategy is most effective when the code to achieve a particular effect appears in a single place. It’s also helpful if its obvious how to load in an required packages, and what those packages might be.

At the time I created the colour theming extension I ran into an issue identifying how to address cells in Javascript and queried it somewhere (I though in a Github issue, but I can’t find it); IIRC, @minrk promptly responded with a very quick and simple idea for how to address my issue. Not only could I hack most of what I wanted, I could articulute enough of a quetion to be abele to ask for help, and help could also be quickly and relatvely easily given: if it’s easy to answer a query, or offer a fix, eg Stack Overflow style, you might; if it’s complex and hard, and takes a lot of effort to answer a query, you are less likely to; and as a result, less likely to help other people then continue past a blocker and continue to help themselves.

The ability to add toolbar buttons to the user interface meant that it was easy enough to customise the tools offered to the end-user via the toolbar. How to add buttons was cribbed from inspection of the Javascript used by the simplest pre-existing extension I could find that added buttons to the toolbar.

Another thing the classic notebook extensions offered was a simple extensions configurator. This is based on a simple YAML file. The extensions configurator means the end-user developer can trivially specify an extensions panel. Here’s an example of what our current cell colour highlighter extension configurator supports, specificlly, which buttons are displayed and what colours to use:

And the corresponding fragment of the YAML file that creates it:

How to create the YAML file was cribbed from the content of a YAML config script from the simplest extensions I could find that offered the configurator controls I required.

How to access state set by the extension configurator from the extension Javascript was based on inspection of very simple Javascript of the simplest pre-existing extension I could find that made use of a similar configurator setting.

The state of the extension settings can be persisted, and is easily shared as part of a distribution via a config file. (This means we can easily configure an environment with pre-enabled extensions with particular settings to give the end user a pre-configured, customised environment that they can then, if they so choose, personalise / customise further).

This is important: we can customise the environment we give to students, and those users can then personalise it.

What this means is that there is a way for folk to easily lock themselves in to their own customised environment.

In the wider world, there are a great many JupyterHub powered environments out there serving Jupyter end-user interfaces as the default UI (JupyterLab, classic notebook, perhaps RetroLab). In order to support differentiation, these different environments may brand themselves, may offer particular back-end resources (compute/GPU, storage, access to particular datasets etc.), may offer particular single-sign on and data access authentication controls, may offer particular computational environments (certain packages preinstalled etc), may offer particular pre-instaled extensions, including in-house developed extensions which may or may not be open sourced / generally available, may wire those extensions together or or “co-/cross-configure” them in such a way as to offer a “whole-is-more-than-sum-of-parts” user experience, and so on.

For the personal user,running locally, or running on a third party server that enables extension installation and persistence, they can configure their own environment from available third party extensions.

And for the have-a-go tinkerer, they can develop and share there own extensions, and perhaps make them available to others.

In each case, the service operator or designer can lock themselves in to a partcular set-up. In our case, we have locked ourselves into a classic Jupyter notebook environment through extensions and configurations we have developed locally. And we are not in a position to migrate, in part because we have accreted workflows and presentation styles through our own not-fork, in part because of the technical barriers to entry to creating extensions in the JupyterLab environment. Because as I see it, that requires: a knowledge of particular frameworks and “modern” ways of using Javascript.

The current version of my own extensions has started to use, by cribbing others,rather than created from a position of knowledge or understanding, things like promises; but I’ve only got to that by iterating on simpler approaches and by cribbing diffs from other, pre-existing extensions that have migrated from the original way of working to more contemporary methods (all hail developers for helpful commit messages!); a knowledge of the JupyterLab frameworks (in the classic notebook, I could, over a half-hour coffee break, crib some simple HTML and CSS from the classic UI, crib some simple JS from an pre-exsiting extension that had a feature I wanted to use, or appeared to use a method for acheiving something similar to the effect I wanted to achieve).

There has been work in the JupyterLab extensions repo to try to provide simple examples, and I have to admit, I don’t check there very often to see if they have added the sorts of examples that I tend to crib on because from experience they tend to be targeted at developers doing developery things.

I. Am. Not. A. Developer. And the development I want to do is often end user interface customisations. (I also note from the Jupyter notebook futures discussions comments along the lines of “the core devs aren’t fron end developers, so keeping the old’n’hacky noteb’ook UI going is unsustainable” which I both accept and appreciate (appreciate in sense of understand). Bt it raises the question: who is there looking for ways to offer “open” and “casual” (i.e. informal, end-user) UI developments.

It is also worth noting that the original notebook UI was developed by folk who were perhaps not web developers and so got by on simple HTML/CSS/JS techniques, because that was their skill level in that domain at the time. And they were also new to Jupyter frameworks in the sense that those frameworks were still new and still relatively small in feature and scope. But the core devs are now seasoned pros in working in those frameworks. Whereas have-a-go end-user developers wanting to scrath that one itch, are always brand new to it. And they may have zero requirement to ever do another bit of development again. On anything. Ever.

The “professionalisation” of examples and extensions in the JupyterLab ecosystem is also hostile to not-developers. For example, here’s a repo for a collapsible headings extension I happened to have in an open tab:

I have no idea what many (most) of those files or, or how neccessary they are to build a working extension. I’m not sure I could figure out how to actually build the extension either (becuase I think they do need building before they can be installed?) I. Am. Not. A . Developer. Just as I don’t think users should have to be sys admins to be able to install and run a notebook (which is one reason we increasingly offer hosted solutions), I don’t think end user developers who want to hack a tiny bit of code should have to be developers with the knowledge, skills and toolchains available to be able to build package before it can be used. (I think there are tools in the JupyerLab UI context that are starting to explore making things a bit more “build free”.)

To help set the scene, imagine the user is a music teacher who wants to use music21 to in a class. They get by using what is essentially a DSL (domain specific language) in the form of music21 functions in a notebook environment. Their use case for Jupyter is write what are essentially interactive handouts relating to music teaching. They also note that they can publish the materials on the web using Jupyer Book They see thay Jupyter Book has a particular UI feature, such as a highlighted noe, and the think “how hard could it be to add that tho the notebook UI”.

One approach I have taken previously with regard to such UI tweaks is to make use of cell tag attributes to identify cells that should be styled in a particular way. (Early on, I’d hoped tags would be exposed appropriately in cell HTML as class attibutes, e.g. expose cell tags as HTML classes. This opens up end user development in a really simple way (hacking CSS, essentially, or iterating HTML based on class attributes; though ideally you’d work with the notebook JSON data structure and index cells based on metadata tag values).

As an example of hacky workflow from a “not a developer” perspective to acheive a styling effect similar to the Jupyter Book style effect above, I use a “tags2style” extension to style things in the notebook, and a tag processor churn notebook .ipynb content into appropriately marked up markdown for Jupyter Book. (Contributing to Jupyter Book extensions is also a little beyond me. I can proof-of-concept, but all the “proper” developer stuff of lint’n’tests and sensible commit messages, as well as how to use git properly, etc., are beyond me…! Not a team player, I guess… Just trying to get stuff done for my own purposes.)

So… in terms of things I’d find useful for end user development, and allowing using to help themselves, a lot of it boils down to not a lot (?!;-):

  • I want to be able to access a notebook datastructure and iterate over cells (either all cells, or just cells of a particular type);
  • I want to be able to read and write tag state;
  • I want to be able to render style based on cell tag; failing that, I want to be able to control cell UI class attributes so I can modify them based on cell tags.
  • I want to be able to add buttons to the toolbar UI;
  • I want to be able to trigger operations from tool bar button clicks that apply to a the current in focus cell, a set of selected cells, all cells / all cells of a particular type;
  • I want to be able to configure extension state in a simply defined configuration panel;
  • I want to be able to easily access extension configuration state ans use it within my extension;
  • I want to be able to easily persist and distrbute extension configuration state;
  • It would be nice to be able to group cells in a nested tree; eg identify a set of cells as an exercise block and within that as exercise-question and exercise-answer cells, and style the (sub-)grouped cells together and potentially the first, rest, and last cells in each (sub-)group differently to the rest.

In passing, several more areas of interest.

First, JupyterLab workspaces. When these were originally announced, really early on, they seemed really appealing to me. A workspace can be used to preconfigure / persist / share a particular arrangement of panels in the JupyterLab UI. This means you can define a “workbench” with a particular arrangement of panels, much as you might set up a physical lab with a particular arrangement of equipment. (Imagine a school chemistry lab; the lab assistant sets up each bench with the apparatus needed for that day’s experiment.) In certain senses, the resulting workspace might be also be thought of as an “app” or a “lab”.

I would have aggressively explored workspaces, but I was locked into using the custom styling extensions of classic notebook, and this blocked me from exploring JupyterLab further.

I firmly believe value can be added to an environment by providing preconfigured workspaces, where the computational environment is set up as required (necessary packages installed, maybe some configuration of particular package settings, appropriate panels opened and arranged on screen), particularly in an educational setting. But I haven’t really seen people talking about using workspaces in such ways, or even many eamples of workspaces being used and customised at all.

Example of Dask Examples repo launched on MyBinder – a custom workspace opens predefined panels on launch.

A lot of work has been placed around dashboards in a Jupyter context, which is perhaps Jupyter used in a business reporting context, but not JupyterLab workspaces, which are really powerful for education.

I note that various discussion relating to classic notebook versus JupyterLab relate to the perceived complexity of the JupyterLab UI. My own take on the JupyterLab UI is that is can be very cluttered and have a lot of menu options or elements available that are irrelevant to a particular user in a particular use case. For different classes of user, we might want to add lightness to the UI, to simplify it to just what is required for a particular activity, and strip out the unnecessary. Workspaces offer that possibility. Dashboard and app style views, if used creatively, can also be used that way BUT they don’t provide access to the JupyterLab controls.

On the question of what to put into workspaces, jupyter-sidecar could be extremely useful in that respect. It was a long time coming, but side cast now lets to display a widget directly into a panel, rather than first having to display it as cell output.

This means I could demo for myself using my nbev3devsim simulator in a JupyterLab context.

Note that the ONLY reason I can access that JS app as a Jupyter widget is via the jp_proxy_widget, which to my mind should be mainitained in core along with things like Jupyter server proxy, JupyterLite, and maybe jhsingle-native-proxy. All these things allow not-developers and not-sysadmins to customise a distributed environment without core developer skills.

A final area of concern for me relates to environments that are appropriate for authoring new forms of document, particularly those that:

  • embed standalone interactive elements generated from a single single line of magic code (for example, embed an interactive map centred on a location give a location);
Using parameterised magic to embed a customised interactive.
  • generate and embed non-textual media elements from a text description;

Note that to innovate in the production of tools to create such outputs does not require “Jupyter developer skills”. The magics can often be simple Pyhton cribbed, largely based on code cribbed from other, pre-existing magics, applied to new off-the-shelf packages that support rich IPython object output displays.

In terms of document rendering, Jupyter Book currently offers one of the richest and most powerful HTML output pathways, but I think support for PDF generation may still lag behind R/bookdown workflows. I’m not sure about e-book generation. For support end-user innovation around the publication path, there are several considerations: the document format (eg MyST) and how to get content into that format (eg generating it from magics); the parsed representation (how easy it is to manipulate a document model and render content from it); the templates, that provide examples for how to go from the parsed object representation to output format content (HTML, LaTeX, etc); and the stylesheets, that allow you to customised content rendered by a particular template.

In terms of document authoring, I think there are several issues: first, rich editors, that allow you to preview or edit directly in a styled display view. For example, WYSIWYG editors. Jupyter notebook has had a WYSWYG markdown cell extension for a long time and it sucks: as soon as you use it you’re original markdown is converted to crappy HTML. The WYSIWYG editor needs to preserve markdown, insofar as that is possible, which means it needs to work with and generate a rich enough flavour of markdown (such as MyST) to provide the author with the freedom to author the content they want to author. It would be nice of such an editor could be extended to allow you to embed to embed high level object genrators, for example, IPython line or block magics.

Ideally, I’d be able to have a rich display in a full notebook editor that resembles the look in terms of broad styling features the look of Jupyter Book HTML output, perhaps provided via a Jupyter Book default theme styling extension for JupyterLab / RetroLab.

I’m not sure what the RStudio / Quarto gameplan is for the Quarto editor, which currently seems to take the form of a rich editor inside RStudio. The docs appear to suggest it is likely to be spun out as its own editor at some point. How various flavours of output document are generated and published will be a good indicator of how much traction Quarto will gain. RStudio allow “Shiny app publishing” from RStudio, so integration with e.g. Github Pages for “static” or WASM powered live code docs, or server backed environments for live code documents executing against a hosted code server would demonstrate a similar keen-ness to support online published outputs.

Personally, I’d like to see a Leanpub style option somewhere for Jupyter Book style ebooks which would open up a commercialisation route that could help drive a certain amount of adoption; but currently, the Leanpub flavour of markdown is not easily generated from eg MyST via tools such as pandoc or Jupytext, which means there is not easy / direct conversion workflow. In terms of supporting end user innovation, having support in Jupytext for “easy” converters, eg where you can specify rules for how Jupytext/MyST object model elements map onto output text, both in terms of linear documents (eg Markdown) or nested (eg HTML, XML).

Internally, we use a custom XML format based on DocBook (I think?). I proposed an internal project to develop pandoc converters for it to allow conversion to/from that format into other pandoc supported formats, which would have enabled notebook mediated authoring and display of such content. After hours of meetings quibbling over what validation process for the output should be (I’d have been happy with – is it good enough to get started converting docs that already exist) I gave up. In the time I spent on the proposal, its revisions, and in meetings, I could probably have learned enough Haskell to hack something together myself. But that wasn’t the point!

At the moment, Curvenote looks to be offering a rich, WYSIWYG authoring, along with various other value adds. For me to feel confident in exploring this further, I would like to be able to run the editor locally and load and save files to both disk and browser storage. Purely as a MyST / markdown editor, a minimum viable Github Pages demo would demonstrate to me that that is possible. In terms of code execution, my initial preference would be to be able to execute code using something like Thebelab connected invisibly to a JupyterLite powered kernel. More generally, the ability to connect to a local or remote Binder Jupyter notebook server and launch kernels against it, and then the ability to connect to a local or remote Jupyter notebook server, both requiring and not requiring authentication.

In terms of what else WASM powered code execution might support, and noting that you can call on things like the pillow image editing package directly in the browser, I wonder about whether it is possible to do the following purely wihtin the browser (and if not, why not / what are the blockers?):

It is also interesting to consider the possibility of extensions to the Jupyter Book UI that allow it to be used as an editor bith in terms of low hanging fruit and also in terms of more ridiculous What if? wonderings. Currently, Thebelab enabled books allow readers to edit code cells as well as executing code against an externally launched kernel. However, there is no ability to save edited code to browser storage (or local or remote storage), or load modified pages from browser storage. (In a shared computer setting, there is also the question of how borwser local storage is managed. In Chrome, or when using Chromebooks, for example, can a user sign in to the browser and they have their local storage synched to other browsers, and have it cleared from the actual browser they were using a session when they sign out?) There is also no option to edit the markdown cell source, bit this would presumably not be markdown anyway, but rather rendered HTML. (Unless the browser was rendering the HTML from the fly from source markdown?!) This perhaps limits their use in education, where we might want to treat the notebooks as interactive worksheets users can modify notebooks and retain their edits. But the ability to edit, save and reload code cell content at least, and may even add and delete code cells, would be a start. One approach might be a simple recipe for running Jupyter Book via Jupyer server proxy (e.g. simple hacky POC), or for Jupyter Book serving JupyterLite. In the first case, if there was a watcher on a file directory, a Jupyter Book user could perhaps open the file in the local /server Jupyter notebook environment, save the file, and then have the directory watcher trigger jupyer book build to update the book. In the second case, could JupyterLite let the user to edit the source HTML of locally stored Jupyter Book HTML content and then re-serve it? Or could we run Jupyter Book build process in the browser and make changes to the source notebook or markdown document?! Which leads to the following more general set of questions about WASM powered code execution in the browser. For example, can we / could we:

  • run Jupyter-proxied flask apps, API publishing services such as datasette, or browser-based code executing Voila dashboards?
  • run Jupyter server extensions, eg jupytext?
  • run Sphinx / Jupyter Book build processes?
  • run pandoc to support pandoc powered document conversion and export?
  • connect to remote storage and/or mount the local file system into the JupyterLite environment (also: what would the security implications of the that be?)?

Are any of these currently or near-future possible? Actually impossible? Are there demos? If they are not possible, why not? What are the blockers?

One of the major issues I had, and continue to have, with Jupyter notebook server handling from the notebook UI is in connecting to kernels. Ideally, a user would be trivially be able to connect to kernels running on either a local server or listed by one or more remote servers, all from the same notebook UI. This would mean a student could work locally much of the time, but connect to a remote server (from the same UI) if they need to access a server with a particular resource availability, such as a GPU or proximity to a large dataset. VS Code makes it relatively easy to connect, from the VS Code client, to new Jupyter servers, but at the cost of breaking other connections. Using a Jupyter notebook server, remote kernel approaches typically appear to require the use of ssh tunneling to establish a connection to launch and connect to a remote server.

One way round the problem of server connections for code execution is to have in-browser local code execution. Extending Thebelab to support in-browser JupyterLite / WASM powered kernel connections will enable users of tools such as Jupyter Book to publish books capable of executing code from just a simple webserver, eg using a site such as Github Pages. Trivially, JupyterLite kernels incorporate a useful range of packages, although to support the easy creation of end-user distributions, a very simple mechanism for adding additional packages that are “pre-installed” in the WASM environment is not available. (The result is the end-end-user needs to install packages before they can be used.) JupyterLite also currently lacks an easy / natural way of loading files from browser storage into code.

Here ends the core dump!

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: