Notes on the JupyterLab Notebook HTML DOM Model, Part 1: Rendered Markdown Cells

I finally relented, and after going through the apparent overkill of building a JupyterLab extension simply to sneak a custom CSS file into the distribution, I posted a query onto Stack Overflow to see if I could crib enough to have a go at a JupyterLab extension that replicates my classic notebook empinken extension that uses toolbar buttons to toggle persistent state on a notebook that can be used to colour highlight cells in particular ways.

The query was comprehensivley answered and gave me all the clues I needed to make a stab at it. I’ll post a note about the extension when I finish a first draft of it, along with some reflections about the process…

The empinken extension I’m interested in building essentially requires four components:

  • buttons to toggle empinken style state on a cell;
  • a means of persisting empinken style state, eg via cell metadata or cell tags;
  • a means of styling cells appropriately (a combination of HTML tag classes and CSS style rules);
  • a means of adding class attributes to the DOM based on the cell empinken style state.

There is an optional fifth consideration in the original, which was a simple YAML file defined control panel exposed by the classic notebook jupyter_nbextensions_configurator. In the case of the empinken extension, this let you set cell colours, with the whole configurator defined by a simple YAML file:

I don’t think JupyerLab supports any similar extension configurator tool: you have to write your own and find somewhere to display it.

I haven’t yet figured out the styling, so the rest of this post will be a transcript of how notebook cells seem to be represented in the DOM. (I haven’t spotted any docs on this? If you know of any, please post a link in the comments.)

So let’s get started… (I’m only going to focus on markdown and code cells for now, because they are generally the only ones I tend to use…)

A Note on Tags

Tags are used by certain extensions and document processing tools (Jupytext, Jupyter Book, etc.) to modify how documents are processed and rendered.

In some cases, it may be that JupyterLab extensions can be used to modify how cells are rendered on the basis of cell tags.

In other cases, it might be that document processors support conventions within a cell for extended rendering. For example, in Jupyter Book, directive blocks can be rendered using styled HTML block. (The JupyterLab Myst extension can used an extended markdown parser to render these elements in a notebook, though I have found it can be a bit slow to render the elements…) However, it is also possible to construct document processing pipelines that use cell tags to denote, eg a certain directive block, and then generate an intermediate MyST document with MyST style directives that is parsed by the Jupyter Book processor. In such a case, the notebook tag would be available to a JupyterLab extension to tune the rendering of the cell. (For a crude example, see innovationOUtside/ou-jupyter-book-tools.)

Markdown Cell Structure

The .ipynb JSON format for a markdown cell is defined as follows:

{
  "cell_type" : "markdown",
  "metadata" : {
       "tags": ["mytag", "my-other-tag"]
  },
  "source" : ["some *markdown*"],
}

When rendered as a JupyterLab cell, the following structure is evident:

<div class="lm-Widget p-Widget jp-Cell jp-MarkdownCell jp-Notebook-cell jp-mod-rendered">
    <div class="lm-Widget p-Widget jp-CellHeader jp-Cell-header"></div>
    <div class="lm-Widget p-Widget lm-Panel p-Panel jp-Cell-inputWrapper">
        <div class="lm-Widget p-Widget jp-Collapser jp-InputCollapser jp-Cell-inputCollapser">
            <div class="jp-Collapser-child"></div>
        </div>
        <div class="lm-Widget p-Widget jp-InputArea jp-Cell-inputArea">
            <div class="lm-Widget p-Widget jp-InputPrompt jp-InputArea-prompt"></div>
            <div class="lm-Widget p-Widget jp-CodeMirrorEditor jp-Editor jp-InputArea-editor lm-mod-hidden p-mod-hidden flash-effect" data-type="inline">
                <div class="CodeMirror cm-s-jupyter CodeMirror-wrap">
                    <div style="overflow: hidden; position: relative; width: 3px; height: 0px; top: 5px; left: 74.4375px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"></textarea></div>
                    <div class="CodeMirror-vscrollbar" tabindex="-1" cm-not-content="true">
                        <div style="min-width: 1px; height: 0px;"></div>
                    </div>
                    <div class="CodeMirror-hscrollbar" tabindex="-1" cm-not-content="true">
                        <div style="height: 100%; min-height: 1px; width: 0px;"></div>
                    </div>
                    <div class="CodeMirror-scrollbar-filler" cm-not-content="true"></div>
                    <div class="CodeMirror-gutter-filler" cm-not-content="true"></div>
                    <div class="CodeMirror-scroll" tabindex="-1">
                        <div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: -15px; border-right-width: 35px; min-height: 61px; padding-right: 0px; padding-bottom: 0px;">
                            <div style="position: relative; top: 0px;">
                                <div class="CodeMirror-lines" role="presentation">
                                    <div role="presentation" style="position: relative; outline: none;">
                                        <div class="CodeMirror-measure"><pre class="CodeMirror-line-like"><span>xxxxxxxxxx</span></pre></div>
                                        <div class="CodeMirror-measure"></div>
                                        <div style="position: relative; z-index: 1;"></div>
                                        <div class="CodeMirror-cursors" style="visibility: hidden;">
                                            <div class="CodeMirror-cursor" style="left: 74.4375px; top: 0px; height: 17px;">&nbsp;</div>
                                        </div>
                                        <div class="CodeMirror-code" role="presentation"><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">A markdown cell.</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="">​</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">Over several lines.</span></pre></div>
                                    </div>
                                </div>
                            </div>
                        </div>
                        <div style="position: absolute; height: 35px; width: 1px; border-bottom: 0px solid transparent; top: 61px;"></div>
                        <div class="CodeMirror-gutters" style="display: none; height: 96px;"></div>
                    </div>
                </div>
            </div>
            <div class="lm-Widget p-Widget jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput" data-mime-type="text/markdown">
                <p>A markdown cell.</p>
                <p>Over several lines.</p>
            </div>
        </div>
    </div>
    <div class="lm-Widget p-Widget jp-CellFooter jp-Cell-footer"></div>
</div>

In what follows, the colouring of the block elements corresponds to a particular CSS element scope:

Let’s start at the top, the <div class="lm-Widget p-Widget jp-Cell jp-MarkdownCell jp-Notebook-cell jp-mod-rendered"> element that contains everything:

Note the padding that extends around the element.

This element contains three child elements: a header (which appears to be empty – what is this for?), the body (or panel), and a footer (which also appears to be empty; again, what is it for and is there any content that can easily be rendered into it?):

If the cell is selected, the top level block adds a couple of extra classes: jp-mod-active jp-mod-selected. The selected cell is highlighted:

And what does the panel element cover?

Let’s look inside the panel element:

This element has two child elements: a collapser, and an input area.

Let’s look at the collapser first, which is quite a simple element:

The collapser elements cover the gutter area that is highlighted when a markdown cell is selected (the child element appears to cover the same extent):

The next element is the inputarea into which the notebook source is eventually parsed and rendered. This appears to cover everything to the right of the collapser:

The element itself contains three elements, an InputPrompt, an Editor and a MarkdownOutput/RenderedMarkdown element:

In a rendered markdown cell, the editor cell (which has quite a complex internal structure) does not appear to have a rendered extent. When the markdown cell is put into edited mode, the MarkdownOutput element seems to disappear from the DOM.

This might be a hassle if we are tag styling cells: we would need to ensure that when the MarkdownOutput element is added back to the DOM the element gets classed appropriately. Which means we would have to set a watcher on the cell.

The extent of the MarkdownOutput element (which is the same as the extent of the editor element when in edit mode) is everything to the right of the input area. Note the padding extends to the left and further to the right.

The MarkdownOutput element simply contains the rendered markdown HTML:

Each paragraph line then has its own lower margin:

In the next post in this series, we’ll consider the code cell DOM elements.

Compiling Full Text Search (FTS5) Into sql.js (SQLite WASM Build)

As part of another side project, scraping traditional stories into various searchable databases, I figured I should probably start making the simple SQlite databases searchable over the web. I don’t really want to have to run a db server, but there is an in-browser WASM build of SQLite available, sql.js, that can be used to provide SQLite support directly within a web page and hence served from a simple web server.

Using the off-the-shelf build with my database tables fails because I make use of the FTS-5 extension (the sql.js release only bundles FTS3). So can I find a build with FTS-5 support, or build one myself?

Poking around the sql.js repo turns up a note in the CONTRIBUTING.md guide a note that you can add extensions, such as FTS5, by making a tweak the the Makefile and adding a -DSQLITE_ENABLE_FTS5 switch to the CFLAGS declaration.

Checking the repo, there is indeed a Makefile and a place to add the switch:

To set up the development enviroment, a VS Code containerised environment is provided. This can be activitated simply by opening the repo folder inside VS Code which then detects the .devcontainer/devcontainer.json and associated Dcokerfile (the build iteself took quite some time…). To avail yourself of this route, you need to have VS Code and Docker installed in advance.

One of the handy things I noted was that the file mount from the directory on my desktop into the container was handled automatically. I also noted in passing (I forget where) the ability to forward ports from inside the container. For the official docs on this sort of development, see for example the VS Code docs Developing inside a Container. I’m also wondering now whether this would be a useful way of distributing code environments to students…

The sql.js docs then suggest all you need to do is run npm run rebuild. This didn’t actually run the build properly for me at all; instead, I had to manually invoke make. But when I did, everything I needed seemed to build okay, the distribution packages appeared in the dist directory, and now I can run my full text FTS5 searches solely within the browser.

As a PS, having managed to create my own custom build so easily, I guess there’s no reason now not to compile in other extensions or perform custom builds… such as the sql.js-httpvfs variant which lets you make requests from a web page to remotely hosted (and potentially very large) sqlite database files (about: Hosting SQLite databases on Github Pages (or any static file hoster)). Various bits of third party guidance about how best to do that in a simple web page context are also starting to appear, as for example here and here.

Playing with Hybrid Cell Exercise Blocks in Jupyter Book via sphinx-exercise

A few weeks ago, I started having a look again at Styled Exercises in Jupyter Book (but still not JupyterLab…). I had intended to include a description of sphinx-exercise in that post, but hit “publish” too quickly. Since that post, things have also moved on a bit in the sphinx-exercise package; but they seem settled now, so here’s a quick review of some the things you can get up to with sphinx-exercise rendered elements in Jupyter Book HTML publications.

The first thing to note is that you can have “question” and “answer” style exercise blocks.

The “question” style exercise blocks are used to define an activity or exercise.

The exercise block is defined using an {exercise} directive. This should include a unique :label: attribute that provides an identifier for the exercise, and might optionally include a :class attribute to allow particular styling of the element. The class attribute can also call on “off-the-shelf” classes shuch as "dropdown". The directive can also include title text in the same way that an {admonition} block does.

Exercises are numbered by default and automatically generated based on a simple exercise count throughout the whole book; the :nonumber: flag attribute can be set to disable auto-numbering. Currently, I donlt think “chapterised” numbering is available, eg with the exercise numbers enumerated within a chapter and preceded by the chapter number.

At the current time, Jupyter Book admonitions do not support the nesting of code cells within an admonition block. This inability to embed executable code cells in the exercise admonition block rather limits its utility if you need to create an activity with some executable code as part of the setup. (That said, you might also argue that learners should manually copy and paste, or rekey, any code provided as part of an exercise into a code cell themselves.)

A solution to this is to use a new gated syntax. Currently, this only works with sphinx in the production of Jupyter Book content. It is not (yet?) an official part of the MyST syntax and it is not supported by JupyterLab-MyST.

The gated exercise admonition requires two admonitions to be used, one at the start of the gated area and one at the end:

```{exercise-start}
:label: example1
```
Add markdown, code cells, etc., as required here...

```{exercise-end}
```

Here’s how it looks on being with in the rendered Jupyter Book:

Expanding the block shows the “nested” markdown and code cells, along with code cell output:

As with other referenceable types, exercises can be referenced and the reference text will be automatically generated. For example, we can generate references using {ref} or {numref} roles. Here’s some example source MyST markdown:

A simple reference to the exercise using a `{ref}` role, {ref}`ex1g`, or a more elaborate one using a `{numref}` role, {numref}`My custom {number} title and {name}  <ex1g>`

This renders as follows:

As well as defining exercises, we can also define (linked) solutions using a {solution} directive. This directive requires an exercise’s :label: identifier as an argument, rather than an optional title . Non-executable code can be emnedded in a {solution} directive block in the normal way (the {solution} directive needs to be fenced by more backticks than any code fence blocks it contains, and may be collapsed by default by setting the :class: to dropdown, in the normal way.

An exercise solution can also be defined using gated directives, specifically {solution-start} EXERCISE_LABEL_ID and {solution-end}:

```{solution-start} exercise-test
:label: solution-gated-test
:class: dropdown
```

Example Solutions

Example code cell without tags and no space after fence:

```{code-cell} python3
# Code cell defined using:
# {code-cell} python3

print("another hello")
```

```{solution-end}
```

For the solution, the :label: unique identifier is optional.

The title to the solution block is derived from the title of the exercise it relates to:

The style of the solution block is rather simpler than that of the exercise block. However, an optional :class: element defined in the {solution-start} admonition (and which can be set to dropdown to collapse the solution by default), can be passed and thenceforth used to style the solution cell(s).

The gated admonitions are very powerful, and essentially allow you to wrap a sequence of markdown and code cell blocks within a referenceable div element in the output book HTML.

At the current time, there is no mechanism for rendering the gated admonitions in JupyterLab or RetroLab. In addition, the juptyerlab-myst extension does not support the {exercise} directive.

Something that would be really useful in the short term would be a jupyterlab-myst-exercise JuptyerLab extension that demonstrated a minimal solution to extending jupyterlab-myst to support the exercise and solution directives, but not the gated directives. Not only would this improve compatability of exercise rendering across Jupyter Book and JupyterLab/RetroLab, it would also demonstrate how to extend juptyerlab-myst.

Something else I’m looking forward to is a separation of the “gated directive” support into a more abstract form (for example, a gated directive class that was then extended to provide the gated exercise and gated solution directives). A MyST enhancement proposal may be on the way regarding this, and is perhaps more likely now that there is an official MyST-specification available [repo].

Elsewhere, there is extension support for collapsible headings in JupyterLab/RetroLab, (assuming it hasn’t rotted?!), so that could perhaps also be cribbed as a way of providing grouped styling and collapsible display between hidden styled exercise or solution start and end blocks (though personally, I’d prefer exercise-start and exercise-end etc. tagged cells to identify gated fences. These could either apply to the first/last cell in the exercise (or solution for gated solutions) or could be otherwise empty markdown cells containing just the appropriate tags).

Run Wine Apps in Your Browser via the daedalOS WASM Desktop

Via @simonw, I note WebAssembly in my Browser Desktop Environment [repo], a desktop that runs in your browser. It also supports Wine apps, so open the demo, drag a Windows exe file onto the desktop that should have loaded in your browser, and run the app…

For example, here’s the old OU RobotLab app running, as a Wine app, in my browser, having dragged it into the browser from my own desktop.

As more and more stuff runs in the browser, the blocker becomes file persistence. I’m guessing that this browser desktop saves into browser storage; but to be properly useful, things like this need to be able synch either with remote storage, or with your own physical desktop using something like the browser File System Access API (browser availability)?

I’m guessing that I may be coming across as all negative on this. I’m just pre-empting one of the two most obvious reasons why I think “colleagues” will say this is a non-starter for use with OUr students; the other being browser-spec requirements. The next most obvious “but we can’t use this with students becuase…” argument will probably but “but you can’t use it offline”. Having to install something to the desktop to serve it locally cancels the “install free” benefits of running things in the server, so the next desirable feature would be the ability to “install” it as a browser app / progressive web app. (For an example PWA installed as a Chrome app from a website/URL, see eg SQL Databases in the Browser, via WASM: SQLite and DuckDB.)

Finally… Simple Custom Styled Markdown Cells in JupyterLab

So, I have a recipe for custom styling Markdown cells in JupyterLab / RetroLab, sort of. It actually co-opts jupyterlab-myst, which can parse MyST special content / admonition blocks contained inside a markdown cell and add class attributes associated with the special content block type to the corresponding cell DOM elements:

We can then tune in to that <aside/> block using a CSS selecter of the form .admonition.note:

.admonition.note {
    outline: 0.0625rem solid #f08080;
    background-color: lightblue;
    border-left-color: green !important;
}

To get the CSS in to the appropriate place, you have to download the internet and a developer toolchain and build a JupyterLab extension. Me neither. But the jupyterlab/extension-cookiecutter-js seems to provide an “easy” way of doing this (I chose the javascript package in the hope it was a bit more compact ands easier to build than the .ts/TypeScript one (though I’m not sure it is):

  • install the cookiecutter package: pip install cookiecutter
  • run it against the JupyterLab extension cookiecutter: cookiecutter https://github.com/jupyterlab/extension-cookiecutter-js
  • you’ll be prompted for various things: give it your name, and then use the same project name (for convenience) for the Python package and the extension name;
  • edit the style/base.css file and save it; <- this is the customisation bit;
  • build the package in editable mode: python -m pip install -e . if you are in the top level directory of the extension directory you created, or python -m pip install -e . ./MY_EXTENSION_NAME if you’re in the parent directory; install your development version of the extension with JupyterLab jupyter labextension develop . --overwrite; and if you make changes, run jlpm run build and restart the JupyerLab server; and that’s it, I think (at least, after a while, as the build process downloads the internet and maybe rebuilds JupyterLab and does who knows what else?!); if you aren’t developing, you should be able to just pip install .; be wary though, I did a simple pip install . and no matter how I tried, I couldn’t seem to upgrade the package version that JupyterLab ran away from the first version I managed to build (it’s languishing there still for all I know…). You can check this extension is installed and available from the command-line using the command: jupyter labextension list; if the extension isn’t enabled, try enabling it jupyter labextension enable MY_EXTENSION_NAME; if it still doesn’t work, or you can’t see it in the listing, try jupyter labextension install MY_EXTENSION_NAME and then perhaps the enable command. As this is JupyterLab, shouting at it a bit as well might help you feel better.
  • I don’t really understand how the build works, because after install the package, if you update it you need to run jlpm run build ?

Things were so much easier when you could just pop a CSS file into a classic notebook config directory…

Usual caveats apply to the below: this is not meant to cause offence, isn’t meant to be disrespectful, isn’t intended as an ad hominem attack; it does involve a certain amount of caricature and parody, etc etc, and may well be wrong, misguided or even complete and utter nonsense.

This is still not very useful if you want to custom style code cells. Various suggestions for being able to add class attributes based on (code cell) tag metadata have been in the issues queue for ages (I sincerely hope, if any of the PRs ever get merged, that support for propagating markdown cell tags over to class attributes is also provided, and not just copying over tags for code cells); several of the issues and PRs I’ve been aware of over the years include the following, but there may be more:

Over in the RetroLab repo — where I suspect that all sorts of stuff you can’t do in the new UI that you could do, or relatively straightforwardly hack into, the classic notebook UI, will soon start raising its head — there’s an open issue on Should custom.css equivalent be supported? https://github.com/jupyterlab/retrolab/issues/308 .

What I’ve felt right from the start is that UI / notebook presentation level “innovation outside” is really difficult in JupyterLab even at the user experience level, particularly around the boundary of notebook structure and notebook content. The notebook cell structure provides some really useful levels of structured content separation (markdown, code and code output) as well as structural metadata (cell tags). If you can exploit the structural elements in the way you present the content, then there is a lot you can do to customise the presentation of the content in a way that is sympathetic to the content and is sensitive to the metadata (cell type, cell tags or other metadata, etc.).

I think we’re still at the early stages of finding out how to make most effective use of notebooks in education, and this means finding playful ways of creating really simple extensions that help explore that edge space where notebook structure can be mapped onto, which is to say, used to transform, presentational elements, for example, that space where tags and other metadata elements can be used to control style.

But the architecture really gets in the way of that.

Currently, the content author has control over the content of a cell, and, to a limited extent, by virtue of the cell type, the presentation of the cell. But if they had additional control over the presentation of the content, for example, tag-sensitive styling, they could author even richer documents, particularly if the styling was also user customisable.

Whilst things like jupyter-myst make additional styling features available to the author, it does so in a way that forces an element of structured authoring inside the content field. To create an admonition block, I don’t select a markdown block with an admonition style (as I might do with tag based styling), but instead I select a markdown block and then put structural information inside it (the labelled, triple backticked code fence markers). (Cf. being able to put HTML content into markdown cells: this is really messy and can clutter thngs badly. Markdown is much cleaner and uses “natural” text to indicate structure; but even better if you can put block level metadata/structure at the level of the block, but not inside the block.)

Presumably because of the way the jupyterlab-myst plugin works (a simplification that perhaps allows the contents of a markdown cell to be treated as “code” that is then parsed and rendered subject to the markdown-as-code parsing extension without having to mess with core JupyterLab code), the contents of the markdown cell, and its parser, have no sight of the structural metadata associated with the cell. So we can’t just tag the code cell and expect it to be rendered as a special content block because that would required hacking JupyterLab core.

Right from the start, it seems as if a decision was taken in the JupyterLab development that the users could do want they want inside code scope and insde a notebook cell source element, and that code execution could return things into the output element, but that the cell metadata was essentially out of bounds unless you were coding at the JupyterLab core level (“it’s our editor and it’s for looking at notebooks how we want look at ’em.”; there’s also a second, later take, which is that the geeky dev user should be able to choose their own theme. In education, it’s often useful for the publisher to control the styling, because the styling is the classroom and style can be an important signposter, reinforcer, framer and contextualiser. Which is not to say that users shouldnlt also be able to select eg light, or dark, or accessible-yellow tinted themes which the publisher should provide and support). This is a real blocker. If the above mentioned, languishing PRs had been (capable of being?) published as simple extensions, they’d have been useable by end-users for months and years already; as extensions, with less code to make sense of, it’s possibly more likely that other people would have been able to develop them further and/or use them as cribs for other extensions; (although there is still the question of dev tools and what to type where to even get your dev environment up and running…) Extensions also mean there are no side effects on the core code-base if the extension code is substandard; if bad things happen from installing the extension, uninstall it. As it is, if you want the functionality that apparently resides inside the PRs, then you have to install jupyterlab from a personal fork / PR branch of the JupyterLab repo, and build it yourself from source in order to even try it out. (Providing good automation in the repo can help here because it means that people can rely on the automation process to mange the development environment and build the distribution, rather than the user necessarily having to figure out a developmnt environment of their own and what commands to run when in order to manage the build process.)

There may well be good reasons why you don’t want “portable” documents such as notebooks to have too much influence over the UI (for example, malicious Python code that spoofs UI elements to steal credentials); but with tools such as ipywidgets, you do allow Python code to have sight of various elements of the DOM, and control over them. Related to this, I note things like davidbrochart/ipyurl, a “Jupyter Widget Library for accessing the server’s URL” that uses the ipywidgets machinery as a hack to get hold of a Jupyter server’s URL because it’s not directly accessible as jupyter scoped state in a code execution environment. I also note jtpio/ipylab, the aim of which is to “provide access to most of the JupyterLab environment from Python notebooks”. But whilst this means that code in notebooks can tinker with the UI, this is not really relevant if I want to expose users to subject matter content in a modified UI. Unless, perhaps, I can create a JupyterLab workspace that is capable of auto-running (in the background, on launch) a configuration notebook that uses ipylab to set up the environment/workspace when I open a workspace?

The jupyterlab/jupyterlab-plugin-playground is another extension that seems to provide a shortcut way of testing plugin code without the need for a complex build environment, but I’m not really sure how it helps anyone other than folk who already know what they’re doing when it comes to developing plugins, if not setting up build and release environments for them. Looking at the code for something like jupyterlab-contrib/jupyterlab-cell-flash (could that be run from the jupyterlab-plugin-playground ? If so, what role does all the other stuff in the cell-flash repo play?!) I note a handy looking NotebookActions.executed method for handling an event, or some other sort of signal. But what other events / signals are available, and how might they be used? (The onModelDBMetadataChange method looks like it might also be handy, but how do I use it to monitor all cells?) And where do I find a convenient list of hooks for things that a handler might influence in turn, other than perhaps poring through jupyterlab/extension-examples looking for cribs?

Enough… What I meant to do when I started this post was publish an example of some simple CSS to style a simple example by co-opting an attention admonition class. And I’ve still not done that!

Simple Custom Styling Notebooks in JupyterLab & RetroLab… Or Not… Again…

Pondering an earlier fragment on Previewing Richly Formatted Jupyter Book Style Content Authored Using MyST-md, a JupyterLab/RetroLab extension that allows you to render rich MyST markdown content blocks, it struck me that I should be able to extend it to provide a way of displaying customised single cell markdown exercise blocks, if nothing else.

After all, how hard could it be?

The JupyterLab extension that adds the functionality is the executablebooks/jupyterlab-myst extension, which seems in turn to rely on executablebooks/markdown-it-docutils, a plugin for markdown-it (whatever that is…). You can find a demo of what the docutils plugin supports here.

The repo is full of developer voodoo files, although the Getting Started section does tell you what you need to be able to type in order to build things (node required (good luck…); running the provided commands will download the internet and install whatever other node packages appear to be required):

Looking at the source TypeScript code in src/directives directory the admonitions.ts, adding a new admonition type seems simple enough:

// ...
export class Tip extends BaseAdmonition {
  public title = "Tip"
  public kind = "tip"
}

export class Warning extends BaseAdmonition {
  public title = "Warning"
  public kind = "warning"
}

export const admonitions = {
  admonition: Admonition,
  attention: Attention,
  caution: Caution,
  danger: Danger,
  error: Error,
  important: Important,
  hint: Hint,
  note: Note,
  seealso: SeeAlso,
  tip: Tip,
  warning: Warning
}

The colour theme and the icon for the styled admonition block are set by a simple bit of CSS:

$admonitions: (
  // Each of these has a reST directives for it.
  "caution": #ff9100 "spark",
  "warning": #ff9100 "warning",
  "danger": #ff5252 "spark",
  "attention": #ff5252 "warning",
  "error": #ff5252 "failure",
  "hint": #00c852 "question",
  "important": #00bfa5 "flame",
  "note": #00b0ff "pencil",
  "seealso": #448aff "info",
  "tip": #00c852 "info",
  "admonition-todo": #808080 "pencil"
);

So adding a new admonition type looks relatively straighforward, albeit constraining you to just custom styling the top bar and icon for the special content block. I reckon I should be able to at least build the markdown-it plugin. But what then? How can I use it?

This plugin is mentioned by name in several places in the jupyterlab-myst extension, so if I can find a way of distributing the plugin, for example, by publishing my own distribution via npm using a new, unique package name, then I should be able to reference this, instead of the original, in a fork of the jupyterlab-myst extension installed into my own Jupyter environment? At the cost of having to fork and install my own version of jupyterlab-myst, of course. And compared to the trivial way in which we can create a really simple Pyhton package to extend Sphinx to provide us with similar, custom admonition blocks.

A simpler way might be to use the executablebooks/markdown-it-plugin-template (from which the markdown-it-docutils plugin was created) and create a separate plugin that I can import directly into jupyterlab-myst; but there is a ton of stuff in the executablebooks/markdown-it-docutils extension that the parser seems to be required, and it seems much easier to just add 5 lines of code to something that already works. Were it not for the issue of actually getting those tweaks into my running Jupyter environment, of course.

And, of course, this still isn’t ideal, because I don’t want to just tweak the colour scheme the header of the custom block: I want to make changes to the background of all of it so that it looks like the exercise block provided by executablebooks/sphinx-exercise, for example:

The styling for the “erroneous” directive is a bit closer to what we want, but this is hard coded for “erroneous” classed cells.

And in the way that the styles are defined for the custom admonitions, they are templated to only tweak the header colour and icon in an easily customisable way.

I wonder if the simplest way is to just appropriate one of the custom admonition classes I am not likely to use (eg the attention class) and manually hack some overrides for that class into the CSS file?

But of course, to add a custom CSS file to JupyterLab, you have to do what? Create a dummy extension that just includes a CSS file? Because of course, JupyterLab doesn’t support custom.css.

UPDATE: here’s a trick for getting a custom CSS file into the JupyterLab environment via a dummy, otherwise empty, extension: https://blog.ouseful.info/2022/03/28/finally-simple-custom-styled-markdown-cells-in-jupyterlab/

Is there anything about JupyterLab that is not really hostile to simple end-user customisation by end-user developers?! I really do HATE IT!

Fragment: On the Value of Traditional Indexes in Full Text Search Environments

Over the last few weeks, I’ve been tinkering with various recipes for pulling searchable text content out of the Internet Archive and popping it into a full text searchable database.

One of my first sketches has used 19th century editions of Notes & Queries. As well as the weekly “content” issues, N & Q also published two index volumes a year detailing the entries of the preceding volume.

Through starting trying to compile sensible index entries for my sin-eater unbook (still a work in progrgress, particularly the index) using the sphinx/Jupyter Book indexing features, I have a new found respect for the compilers of indexes: there’s a real craft to it.

At first glance, you might think there is limitied utility in having an index as well as full text search support, but there are at least two reasons at least why that’s not correct.

The first is navigational: the index provides both a way of identifying search terms as well as helping under the pattern of occurrences of a particular term.

The second is because full-text search using text extracted from large number of scans using OCR really sucks. Even with good stemming etc on full text search terms, even with fuzzy search tools, getting a match on a search term can, at times, be tricky.

So to supplement my full text search over N&Q, I am topping it up with a search into the index that also tries to identify pages directly from related index entries. (The use of the index is also a handy cross-check that the free text search has turned up at least the results included in the originally compiled index.

In passing, I also note the power of the internal cross-referencing scheme used across items appearing in N&Q…

My Personal Blockers to Adopting JupyterLite for Distance and Open Educational Use

When I first came across JupyterLite nine months or so ago (jupyterlite — “serverless” Jupyter In the Browser Using Pyodide and WASM), one of my first thoughts was whether I could use it as the programming environment for an open online course / OER that makes use of Jupyter notebooks.

Working with novices, at scale, at a distance, online, and ideally without support raises various support challenges. Trying to create materials that can run anywhere – via an open notebook server (eg Binderhub), via a local install, or even via JupyterLite raises other issues: ideally, you want exactly the same notebook to work in exactly the same way wherever it’s being run. Other issues come from learners losing their work, working from different machines and browsers at different times, working offline (no network access), etc. etc.

At the time, there were several blockers for me when it comes to adopting JupyterLite as just another environment, blockers that are still present today. So as JuptyerLite is brought to wider attention via a recent post on the official Jupyter blog – Jupyter Everywhere – and associated social media sharing, this is just a note-to-self as to why I still haven’t got round to updating the OpenLearn Learn to Code for Data Analysis course to use JupyterLite.

Note that this isn’t intended as a criticism of the JupyterLite devs or dev process. There may well be solutions or workarounds that I haven’t come across. I’m just making observations as an everyman who thinks “ooh, I could use it for this” and then realise I can’t, quite, which means I can’t, at all (blocker, innit!;-). I should also state that my default use case is an extreme one: large populations of naive learners & novice programmers working largely unsupported, online and offline, potentially across several different BYOD or public access machines that may be unpatched & years old, on courses that are expected to remain largely unmaintained for several years following publication.

The mechanics of JupyterLite are beyond me – not just the JupyterLab-ness but also the WASM / pyodide / pyolite-ness. And then there’s things like browser local storage, local forage(?) and potential links to local file system via a browser file system API. So some of the following may be hard, some may be impossible (at the moment, or dependent on upstream things…).

To set the scene, when you open JupyterLab or RetroLab homepage, you see a list of files and notebooks that are part of the JupyterLite distribution. You can use UI controls to upload additional files and see them in the file listing. Notebooks and files can be opened by clicking on them in the normal way. If you edit a file, the changes are saved to browser storage. (I’m not sure if there’s an “official” way to reset a notebook back to the original version as represented by the version served as part of the original distribution, rather than the edited version in browser storage? That probably should be my first “is this a blocker?”)

So what are my (other) blockers, presented here as questions just in case they already have solutions (please feel free to post answers via the comments…):

  • how do I reset a modified notebook saved in browser storage to the original version served as part of the jupyerlite distribution [A: deleting a file in the JupyterLab file browser deletes it from browser storage; if the file was part of the original distribution, the file remains in the file browser and is reset to the originally isrtibuted version];
  • how do I add additional Python packages to a JupyterLite distribution (ideally, I’d just specify a requirements.txt file); [A: install the files into the environment that is used to generate the release; example]
  • how do I open and read a file programmatically (eg how do I open a data file, or connect to a sqlite database file)? There is an unofficial solution in a discussion thread, but this seems brittle to me and on occasion appears to break. It would be useful if there were an official, min. viable function that also forms part of the release test suite. I wrapped the unofficial solution in a simple utils package but if it is subject to breaks, then it’s not generally usable in published teaching materials unless the jupyterlite version can be guaranteed to be one in which the tricks work; [A: this looks like it will be sorted as far as file read/writes go via jupyterlite/pull/655]
  • how do I write a file that then appears in the file view (eg saving a data file, or writing to a browser storage persisted sqlite database file; or reading an ipynb file from a remote URL, saving it as a file, then generating a URL that will open that notebook from local storage in eg RetroLite via a path= URL parameter); [A: this looks like it will be sorted as far as file read/writes go via jupyterlite/pull/655]
  • how do I retrieve data from a remote URL in a platform independent way (there are tricks / pyolite functions for reading files from URLs but these require pyodide or js package calls; ideally, I’d just use requests and it would figure out how to handle the transport; in the short term, see eg Making the Python requests module work in Pyodide / bartbroere/pyodide-requests);
  • how do I avoid async await requirements on function calls (some pyolite function calls that can be used to mock non-WASM executed Python functions are asynchronous and require an await prefix; this makes it tricky to write code that runs anywhere; is there a way to mask the await requirement and wrap asynchronous calls in a non-async function?) [tracking?: pyodide/pyodide/issues/1503] There is also a related issue around things like time.sleep() [tracked here: pyodide/pyodide/issues/2354]
  • how do I synch with my desktop filesystem (eg synch browser storage and local storage, or run jupyterlite against the desktop filesystem rather than browser storage; at the moment, this requires file upload / download; presumably I can access the browser storage db from my desktop commandline?); [A: this is supported by jupyterlab-contrib/jupyterlab-filesystem-access]
  • how do I synch with remote synching drives (eg Dropbox, OneDrive, GoogleDrive etc. etc.); [tracking: jupyterlite/jupyterlite/issues/315]
  • how do I download a file programmatically (eg by creating a blob that can be downloaded from an auto-clicked link);
  • how do I open a remote notebook, e.g. in RetroLite (for example: https://jupyter.org/try-jupyter/retro/notebooks/?path=https://raw.githubusercontent.com/jupyterlite/jupyterlite/main/examples/python.ipynb (which does not currently work);
  • how do I install Python packages programmatically in a cross-platform way (currently, packages can be installed via notebooks using micropip; it would be more convenient to mask this via some %pip magic; see related issue).

To create platform agnostic notebooks, it might be that notebooks need to have a guarded cell that makes decisions about what package or workaround to load if the wasm platform is detected (eg via import platform as p; p.platform() etc.; test the platform and import packages as required, either via an if or via a try).

In passing, there are also various other things that would open up new opportunities; perhaps greatest amongst these are support for single executable cells, and for running code via pyolite kernels in Jupyer Book using thebe (tracking issue). But I also wonder: would it be possible to use pyolite to run as a part of a kernel gateway (eg Building a JSON API Using Jupyter Notebooks in Under 5 Minutes) to support serverless functions?

POC: Open Jupyter Book Page in JupyterLite (“View executable source”, ish)

As JupyterLite starts starts to finagle its way into Jupyter Book, such as via jupyterlite-sphinx, in which a simple admonition can be used to embed an in-browser executable notebook or live JupyterLab environment:

or via thebe, to provide in-browser execution support for Jupyter Book executable cells (hopefully, soon… [issue, PR]), I thought I’d riff on the jupyterlite-sphinx approach, which drops a jupyterlite disribution in to a Jupyter Book distribution, and see if I could open a Jupyter Book page rendered from a notebook in a RetroLite (JupyterLite notebook) editor. And it seems that a crude proof-of-concept at least wasn’t that hard, cribbing from an earlier example of how to launch a notebook in Deepnote via a PR in executablebooks/sphinx-book-theme. (Things will be so much easier when this is pluggable…)

My attempt is currently at ouseful-PR/sphinx-book-theme/tree/launch-in-jupyterlite and works by adding a new menu option to the launch menu (which can be used to open the source version of the page in Binderhub, or via a specified JupyterHub) that will open the notebook in a retrolite notebook using the retrolite environment that has been added as part of the book distribution:

What this means is you can “View Executable Source” on the Jupyter book page and tinker with it in a JupyterLite notebook editor. At least at first. Because any edits you make to the notebook are:

  • saved to browser storage, so you can keep your changes;
  • not rendered back to the Jupyter book page.

What this means is that the first time you launch the book page into the notebook editor, it does represent the source of the original page, but thereafter any edits mean the two versions differ. If you edit the notebook and then at a later date launch from the book page again, you will see the latest, edited version of the notebook as saved to browser storage.

What this POC throws up, then, is some user issues:

  • it would be handy to know from the notebook page that the version being edited is different to the original version;
  • it would be handy to revert the edited notebook back to the original version;
  • it would be handy to know from the book page that an edited version of the notebook is available;
  • it would be handy to be able to reflow the book page based on the edited notebook version.

Fragment: Grabbing Screenshots of Jupyter Notebook Code Cell Outputs, Ish…

Or not completely, as the case may be…

A quick hack packaging code I was using for grabbing screenshots of styled pandas dataframes so I could share them as images, iframe-shot uses browser automation to render HTML returned via _repr_html_() or embedded in an IFrame when executing a Python code cell, or otherwise, and return an image file from it, either as a data URI or saved to a file.

from iframe_shot import IFrameShot

# Generate an object with access to
# preloaded selenium powered headless browser
grabber = IFrameShot(True)

# HTML string
html = "<html><body><h1>hello there</h1></body></html>"

# Render HTML in browser and grab screenshot
grabber.getHTMLPNG(html)

# Returns rendered data-uri PNG of screenshotted html
# To save as png and return filename, use:
# grabber.getHTMLPNG(html, embedded=False)

# Set html_out=FILEPATH to save the HTML to a file
# Set png_out=FILEPATH to save the image to a file with a specific filename

There are various issues with this:

  • if the style is not part of the HTML, but eg references style set elsewhere in the notebook, or from a style file, the style won’t be rendered;
  • the approach uses browser automation, which adds several large depndencies.

It would be interesting to explore the extent to which something like html2canvas could be used to render cell output HTML onto a canvas element from which an image could be save. (Hmm… could IPython do that?!)

By chance, another screenshot tool appeared in the last week or so (from which I stole the -shot bit of the name): Simon Willison’s shot-scraper. The tool uses  Playwright and is handy for four main reasons:

  • it provides an easy way to grab a screenshot of a page;
  • it can provide a screenshot of part of a page, selected using CSS selectors;
  • it can be used to style and add simple overlays to the captured scene using Javascript;
  • it can be used to scrape webpages using Javascript and provide the response via a JSON object.

I did wonder if I could use it to grab a screenshot of an executed Jupyter notebook output cell, or an output cell in an HTML rendered notebook, but I couldn’t offhand find a way to wrangle a cell ID or unique path to a desired cell output using just CSS selectors. If Javascript were available as a way of selecting DOM elements, and not just CSS selectors, then I think it should be possibel to use shot-scraper to gran screen captures of notebook code cell outputs from run notebooks viewed either as rendered notebooks from a served URL, or from exported HTML.