Running R and Debian Linux in the Browser Via WASM

A few weeks ago, I noted the appearance of georgestagg/webR, an R distribution compiled to WASM that runs completely in the browser. The environment is served from a simple webserver, and requires no installation of R onto a back-end server or onto your own desktop. Instead, the environment runs inside a virtual machine inside your browser (try it here):

WebR – R in the browser

For an iframe embeddable Python REPL running in the browser, see for example jtpio/replite.

replite in-browser JupyterLite powered REPL

At the moment, in WebR, I can’t see a direct way of displaying graphical (chart) output, (there appear to be various blockers with the handling of bitmap images?), although I note that the svglite package is installed and seems to work…

svglite package running in webR

WebR appears to be using https://terminal.jcubic.pl/ for the terminal UI, which in turn appears to support the definition of custom terminal functions; it also looks from jcubic/jquery.terminal#278  that the terminal can output and render images via HTML <img /> tags, so presumably it could also add <svg /> tags to the DOM. Which makes me wonder: could a custom function be defined that will read the contents of the SVG file and display it via an HTML <svg> element?

UPDATE (23/2/22): webR now has a chart display area:

I also wonder if the rendering machinery of JuptyerLite could be used to support the rendering of graphical output (though some magic may need to be involved?). There are several demos of JupyterLite kernels built using the Xeus Jupyter kernel framework, (which includes an early demo of an R kernel, Juniper Kernel), which makes me think; what would be involved in co-opting the WebR build process for use in a Xeus context to create a Xeus JupyterLite R kernel?

In passing, I also note there is a full Debian installation available running via WASM in the browser: WebVM: server-less x86 virtual machines in the browser (try it here: webvm.io ).

See also: jupyterlite — “serverless” Jupyter In the Browser Using Pyodide and WASM and SQL Databases in the Browser, via WASM: SQLite and DuckDB.

Fragment – Searching the Internet Archive for Out-of-Copyright and Openly Licensed Works

This is a living page – more tricks will be added as I find them.

A lot of content on archive.org is in the public domain or is out of copyright. We can search for it using filters such as rights:"Public Domain" or rights:"PublicDomain" or rights:"No Copyright".

But I’m still not convinced. The copyright of a scan presumably resides with whoever did the scan, but how can we find what copyright / license terms are applied by the scanner to a work uploaded to the Internet Archive? Does it rely on the person who uploaded the scan to ensure the license terms are embedded within the work, e.g. as many scans from Google Book Search digitisation project have a Google copyright notice at the start of the work (and watermarks on the scanned pages).

Downloading and Querying a SQLite3 Database From a Remote URL Using JupyterLite

Ish via Simon Willison, font of a thousand useful and practical hacks, a recipe for downloading a SQLite database from a file into an in-browser JupyterLite environment in-memory file system and querying it, Using the sqlite3 Python module in Pyodide:

# Ish via https://til.simonwillison.net/python/sqlite-in-pyodide
from js import fetch

async def load_file_into_in_mem_filesystem(url, fn=None):
    """Load a file from a URL into an in-memory filesystem."""
    
    # Create a filename if required
    fn = fn if fn is not None else url.split("/")[-1]
    
    # Fetch file from URL
    res = await fetch(url)
    
    # Buffer it
    buffer = await res.arrayBuffer()
    
    # Write file to in-memory file system
    open(fn, "wb").write(bytes(buffer.valueOf().to_py()))

    return fn

Then call as:

url="https://raw.githubusercontent.com/psychemedia/lang-fairy-books/main/data.db"

db_file = await load_file_into_in_mem_filesystem(url)

To then query the database:

import sqlite3

# Open database connection
c = sqlite3.connect(db_file)

# Show database tables
c.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()

"""
[('books',),
 ('books_metadata',),
 ('books_fts',),
 ('books_fts_data',),
 ('books_fts_idx',),
 ('books_fts_docsize',),
 ('books_fts_config',)]
"""

Next step: can we also save the downloaded file into browser storage, and then reload it? Possibly related discussion issue: Uploading files that can be read by python code in a notebook.

Also, is there a way of reading a sqlite database that forms part of a JupyterLite distribution, such as a the demo.db database in this distribution: https://ouseful-pr.github.io/jupyterlite-demo/lab/index.html

Styled Exercises in Jupyter Book (but still not JupyterLab…)

Pondering yet again, yet again, how to port our old notebooks+styling extensions from classic notebook to JupyterLab/RetroLab using off-the-shelf tools and hacks, and not having to become a hardcore Jupyter-core Typescript-core complex-development-environment download-an-internet’s-worth of devtools wtf-are-all-those-setup-and-config-files developer, I had a poke around what’s currently available (I’ve really slacked off Tracking Juypter over the last seems-like-forever).

One of the possible routes I’d explored before involved a couple of JupyterLab extensions, one for adding class attributes to rendered notebooks based on cell tags (rafelyall/celltag2dom ), the other for easily adding custom CSS (wallneradam/jupyterlab-custom-css ). Both these extensions fail to install in current JupyterLab, but that doesn’t really surpirse me. FWIW, I still consider to be a really hostile environment to casual, have-a-go end-user developers… ;-)

If notebook tags mapped onto rendered notebook HTML classes it would open up so many lightweight end-user development routes for notebook sensitive custom styling / rendering, even at just the “tinkering with CSS” level. There have been various PRs attempting this in the past, initially https://github.com/jupyterlab/jupyterlab/pull/8410, which was then deprecated in favour of https://github.com/jupyterlab/jupyterlab/pull/8627, but from what I can tell, that is still languishing in the PR queue. I’ve no idea if anything else has replaced it. From a quick skim of a notebook with tagged cells in JupyterLab (which took so long to load in my browser that JupyterLab popped up a message if I wanted to continue waiting, which suggests this is a not-just-me pain point with slowness with which JupyterLab loads (give me VS Code or RStudio any day for a Jupyer IDE…), I couldn’t spot any likely class attributes related to tags propagating through.

With things like the JupyterLab-MyST extension supporting some custom rendering via admonition blocks, we can get some custom block styling into the JupyterLab/Retrolab context, but still not to the extent we can with our current classic notebook extensions.

JupyterLab-MyST lets you preview rich MySt content in JupterLab and RetroLab environments

So using the old trick of “if you can’t solve the problem, change the problem”, I’ve start wondering again about how far we might get making instructional materials available via a Jupyter Book UI, which is much easier to work with. There are still blockers to this: whilst code can be executed within a Jupyter Book context using Thebe, and code can be edited and executed, there is still no way to save edits to browser storage (this should be acheivable: JupyterLite does it, and JupyterLite notebooks embedded in a Jupyter Book page using jupyterlite-sphinx). I’m guessing saving to and load from the host file system may be deemed a little riskier, although I think Chrome does have file system integration?

Another reason for working in the Jupyter Book / Sphinx environment is that you can quicky get strated developing your own custom UI features.

For example, a recipe described by @choldgraf on the Executable Books dicussion forum reveals that you can create custom styled admontition blocks with just a handful of files and a few lines of relatively simple Python:

  • in a new directory example, create a setup.py containing at least:
from setuptools import setup, find_packages

setup(
    name="custom-directive",
    packages=find_packages()
    )
  • in example/custom-directive create __init__.py containing the example code given in the example;
# Via: https://github.com/executablebooks/meta/discussions/655#discussion-3855822

from docutils.parsers.rst.directives.admonitions import Admonition

class Example(Admonition):
    def run(self):
        # Manually add a "tip" class to style it
        if "class" not in self.options:
            self.options["class"] = ["tip"]
        else:
            self.options["class"].append("tip")
        # Add `Example` to the title so we don't have to type it
        self.arguments[0] = f"Example: {self.arguments[0]}"
        # Now run the Admonition logic so it behaves the same way
        nodes = super().run()
        return nodes

def setup(app):
    app.add_directive("example", Example)
  • build and install your package: run pip install ./example
  • add the extension to you Jupyter Book _config.yml file, for example:
sphinx:
  extra_extensions:
    - custom-directive
  • build your Jupyter Book in the normal way: jupyter book build .

I also note that you can easily add your own custom CSS to Jupyter Book environments to provide custom styling for class attributes, the class attributes themselves being trivially set in admonition blocks via a :class: element, for example, or via a custom classed div element. As the docs describe, you can easily add a custom CSS file (eg my-custom-css-file.css) and then just place it in a static directory; the file(s) will then be automatically be copied into an appropriate location in the output book when the book is built:

├── _config.yml
├── _toc.yml
├── page1.md
└── _static
    └── my-custom-css-file.css

Take that, JupyterLab..! ;-)

Another approach is to consider embedding JupyterLite notebooks into an HTML text. One advantage of this approach is that the embedded notebook executes against an in-browser Pyhton environment rather than requiring a connection to a remote server or Binder environment; another is that changes to the notebook are saved to browser storage and will be available if you view the notebook again from the same browser (see, for example, Embedding JupyterLite In-Browser Notebooks in Documentation and Online Educational Materials). A downside is that the JupyterLite environment is a large download, which just add to the long start up time.

Perhaps the best solution, however, is the executablebooks/sphinx-exercise extension; but that’ll have to be the subject for another post, not least because I hit publish on this post rather too quickly!

Embedding JupyterLite In-Browser Notebooks in Documentation and Online Educational Materials

JupyterLite, if you haven’t already come across it, is an in-browser Jupyter environment that can execute Python (scipy stack) code purely within the browser. The code is executed via a WASM powered pyodide environment (essentially, a virtual machine that runs within your browser to provide a Python environment you can access from a web page). The only downside is it can take what feels like forever to download open (I’m not sure what’s cacheable and what isn’t?).

Several Jupyter environments are available:

  • the full JupyterLab environment;
  • a RetroLab jupyter notebook environment;
  • a REPL console (I really dislike the layout of this – the “command line” is waaaaaaay down the display. I think I’d rather it were at the top of the screen, and the output displayed and scrolled under it…).

All that’s missing is a “single executable cell” mode, although there is an open issue on that. (More generally, better support for opening and saving data files to browser storage, and perhaps even disk/local file system, would also be helpful…)

If you prefer to run classic notebook purely in the browser against a pyodide kernel, use Basthon (this always seems to load much faster…). I’m not sure if English language packs are available for it…

In the JupyterLab and Retrolab environments, notebooks can be saved to and loaded from browser storage (Basthon notebook edits are also persisted in browser storage, I think?).

Interestingly, the environments can be embedded within other web pages. There are already examples of documentation sites starting to explore embedding notebook demos in the documentation website, as, for example, ipycanvas.

On the numpy docs site, a jupyterlite REPL console is embedded alongside code you can copy and paste, and then try out:

A sphinx extension, jupyterlite-sphinx, provides several directives that allow you to embed JupyterLab, Retrolab or the REPL console in Sphinx generated documentation or Jupyter Books. For the RetroLab/RetroLite enviornment, you can specify which notebook you want to embed:

One of the main issues I have with Jupyter Book in an educational use context is the inability to persist any changes the user may make to the code cell.

Embedding a RetroLite notebook gets round this to a certain extent if the user is always working from the same browser (i.e. they can access the same browser storage), because edits will be persisted to browser storage.

There are risks with persisting changes, eg if a student edits and breaks some provided code and can’t fix it, so it’d be handy if there were also a way to cache as read-only the orginal document and allow it to be restored.

Consequently, it might be useful if there were a way to disable the save-to-browser storage both as a notebook toolbar button, and as an environment setting, and expose that setting as a jupyterlite-sphinx parameter. This would then provide the option of making a notebook essentially editable and executable, but not saveable.

When using the REPL environment, I suspect that each REPL session will be in its own environment (*not tested*) which means that state will not persist across REPLs in the same page or across pages in the same book. If a single cell execution enviroment is ever supported, it would be useful for this to have three modes: stateless (always run in a new environment); page stateful (maintain state across all cells within a particular page but not across pages); and book state (persist state across all pages in a book). In the latter two cases, a big button to “restart kernel” and reset the state would also be useful.

Fragment: Grabbing Screenshots of Jupyter Notebook Code Cell Outputs, Ish…

Or not completely, as the case may be…

A quick hack packaging code I was using for grabbing screenshots of styled pandas dataframes so I could share them as images, iframe-shot uses browser automation to render HTML returned via _repr_html_() or embedded in an IFrame when executing a Python code cell, or otherwise, and return an image file from it, either as a data URI or saved to a file.

from iframe_shot import IFrameShot

# Generate an object with access to
# preloaded selenium powered headless browser
grabber = IFrameShot(True)

# HTML string
html = "<html><body><h1>hello there</h1></body></html>"

# Render HTML in browser and grab screenshot
grabber.getHTMLPNG(html)

# Returns rendered data-uri PNG of screenshotted html
# To save as png and return filename, use:
# grabber.getHTMLPNG(html, embedded=False)

# Set html_out=FILEPATH to save the HTML to a file
# Set png_out=FILEPATH to save the image to a file with a specific filename

There are various issues with this:

  • if the style is not part of the HTML, but eg references style set elsewhere in the notebook, or from a style file, the style won’t be rendered;
  • the approach uses browser automation, which adds several large depndencies.

It would be interesting to explore the extent to which something like html2canvas could be used to render cell output HTML onto a canvas element from which an image could be save. (Hmm… could IPython do that?!)

By chance, another screenshot tool appeared in the last week or so (from which I stole the -shot bit of the name): Simon Willison’s shot-scraper. The tool uses  Playwright and is handy for four main reasons:

  • it provides an easy way to grab a screenshot of a page;
  • it can provide a screenshot of part of a page, selected using CSS selectors;
  • it can be used to style and add simple overlays to the captured scene using Javascript;
  • it can be used to scrape webpages using Javascript and provide the response via a JSON object.

I did wonder if I could use it to grab a screenshot of an executed Jupyter notebook output cell, or an output cell in an HTML rendered notebook, but I couldn’t offhand find a way to wrangle a cell ID or unique path to a desired cell output using just CSS selectors. If Javascript were available as a way of selecting DOM elements, and not just CSS selectors, then I think it should be possibel to use shot-scraper to gran screen captures of notebook code cell outputs from run notebooks viewed either as rendered notebooks from a served URL, or from exported HTML.

POC: Open Jupyter Book Page in JupyterLite (“View executable source”, ish)

As JupyterLite starts starts to finagle its way into Jupyter Book, such as via jupyterlite-sphinx, in which a simple admonition can be used to embed an in-browser executable notebook or live JupyterLab environment:

or via thebe, to provide in-browser execution support for Jupyter Book executable cells (hopefully, soon… [issue, PR]), I thought I’d riff on the jupyterlite-sphinx approach, which drops a jupyterlite disribution in to a Jupyter Book distribution, and see if I could open a Jupyter Book page rendered from a notebook in a RetroLite (JupyterLite notebook) editor. And it seems that a crude proof-of-concept at least wasn’t that hard, cribbing from an earlier example of how to launch a notebook in Deepnote via a PR in executablebooks/sphinx-book-theme. (Things will be so much easier when this is pluggable…)

My attempt is currently at ouseful-PR/sphinx-book-theme/tree/launch-in-jupyterlite and works by adding a new menu option to the launch menu (which can be used to open the source version of the page in Binderhub, or via a specified JupyterHub) that will open the notebook in a retrolite notebook using the retrolite environment that has been added as part of the book distribution:

What this means is you can “View Executable Source” on the Jupyter book page and tinker with it in a JupyterLite notebook editor. At least at first. Because any edits you make to the notebook are:

  • saved to browser storage, so you can keep your changes;
  • not rendered back to the Jupyter book page.

What this means is that the first time you launch the book page into the notebook editor, it does represent the source of the original page, but thereafter any edits mean the two versions differ. If you edit the notebook and then at a later date launch from the book page again, you will see the latest, edited version of the notebook as saved to browser storage.

What this POC throws up, then, is some user issues:

  • it would be handy to know from the notebook page that the version being edited is different to the original version;
  • it would be handy to revert the edited notebook back to the original version;
  • it would be handy to know from the book page that an edited version of the notebook is available;
  • it would be handy to be able to reflow the book page based on the edited notebook version.

Playing with Hybrid Cell Exercise Blocks in Jupyter Book via sphinx-exercise

A few weeks ago, I started having a look again at Styled Exercises in Jupyter Book (but still not JupyterLab…). I had intended to include a description of sphinx-exercise in that post, but hit “publish” too quickly. Since that post, things have also moved on a bit in the sphinx-exercise package; but they seem settled now, so here’s a quick review of some the things you can get up to with sphinx-exercise rendered elements in Jupyter Book HTML publications.

The first thing to note is that you can have “question” and “answer” style exercise blocks.

The “question” style exercise blocks are used to define an activity or exercise.

The exercise block is defined using an {exercise} directive. This should include a unique :label: attribute that provides an identifier for the exercise, and might optionally include a :class attribute to allow particular styling of the element. The class attribute can also call on “off-the-shelf” classes shuch as "dropdown". The directive can also include title text in the same way that an {admonition} block does.

Exercises are numbered by default and automatically generated based on a simple exercise count throughout the whole book; the :nonumber: flag attribute can be set to disable auto-numbering. Currently, I donlt think “chapterised” numbering is available, eg with the exercise numbers enumerated within a chapter and preceded by the chapter number.

At the current time, Jupyter Book admonitions do not support the nesting of code cells within an admonition block. This inability to embed executable code cells in the exercise admonition block rather limits its utility if you need to create an activity with some executable code as part of the setup. (That said, you might also argue that learners should manually copy and paste, or rekey, any code provided as part of an exercise into a code cell themselves.)

A solution to this is to use a new gated syntax. Currently, this only works with sphinx in the production of Jupyter Book content. It is not (yet?) an official part of the MyST syntax and it is not supported by JupyterLab-MyST.

The gated exercise admonition requires two admonitions to be used, one at the start of the gated area and one at the end:

```{exercise-start}
:label: example1
```
Add markdown, code cells, etc., as required here...

```{exercise-end}
```

Here’s how it looks on being with in the rendered Jupyter Book:

Expanding the block shows the “nested” markdown and code cells, along with code cell output:

As with other referenceable types, exercises can be referenced and the reference text will be automatically generated. For example, we can generate references using {ref} or {numref} roles. Here’s some example source MyST markdown:

A simple reference to the exercise using a `{ref}` role, {ref}`ex1g`, or a more elaborate one using a `{numref}` role, {numref}`My custom {number} title and {name}  <ex1g>`

This renders as follows:

As well as defining exercises, we can also define (linked) solutions using a {solution} directive. This directive requires an exercise’s :label: identifier as an argument, rather than an optional title . Non-executable code can be emnedded in a {solution} directive block in the normal way (the {solution} directive needs to be fenced by more backticks than any code fence blocks it contains, and may be collapsed by default by setting the :class: to dropdown, in the normal way.

An exercise solution can also be defined using gated directives, specifically {solution-start} EXERCISE_LABEL_ID and {solution-end}:

```{solution-start} exercise-test
:label: solution-gated-test
:class: dropdown
```

Example Solutions

Example code cell without tags and no space after fence:

```{code-cell} python3
# Code cell defined using:
# {code-cell} python3

print("another hello")
```

```{solution-end}
```

For the solution, the :label: unique identifier is optional.

The title to the solution block is derived from the title of the exercise it relates to:

The style of the solution block is rather simpler than that of the exercise block. However, an optional :class: element defined in the {solution-start} admonition (and which can be set to dropdown to collapse the solution by default), can be passed and thenceforth used to style the solution cell(s).

The gated admonitions are very powerful, and essentially allow you to wrap a sequence of markdown and code cell blocks within a referenceable div element in the output book HTML.

At the current time, there is no mechanism for rendering the gated admonitions in JupyterLab or RetroLab. In addition, the juptyerlab-myst extension does not support the {exercise} directive.

Something that would be really useful in the short term would be a jupyterlab-myst-exercise JuptyerLab extension that demonstrated a minimal solution to extending jupyterlab-myst to support the exercise and solution directives, but not the gated directives. Not only would this improve compatability of exercise rendering across Jupyter Book and JupyterLab/RetroLab, it would also demonstrate how to extend juptyerlab-myst.

Something else I’m looking forward to is a separation of the “gated directive” support into a more abstract form (for example, a gated directive class that was then extended to provide the gated exercise and gated solution directives). A MyST enhancement proposal may be on the way regarding this, and is perhaps more likely now that there is an official MyST-specification available [repo].

Elsewhere, there is extension support for collapsible headings in JupyterLab/RetroLab, (assuming it hasn’t rotted?!), so that could perhaps also be cribbed as a way of providing grouped styling and collapsible display between hidden styled exercise or solution start and end blocks (though personally, I’d prefer exercise-start and exercise-end etc. tagged cells to identify gated fences. These could either apply to the first/last cell in the exercise (or solution for gated solutions) or could be otherwise empty markdown cells containing just the appropriate tags).

My Personal Blockers to Adopting JupyterLite for Distance and Open Educational Use

When I first came across JupyterLite nine months or so ago (jupyterlite — “serverless” Jupyter In the Browser Using Pyodide and WASM), one of my first thoughts was whether I could use it as the programming environment for an open online course / OER that makes use of Jupyter notebooks.

Working with novices, at scale, at a distance, online, and ideally without support raises various support challenges. Trying to create materials that can run anywhere – via an open notebook server (eg Binderhub), via a local install, or even via JupyterLite raises other issues: ideally, you want exactly the same notebook to work in exactly the same way wherever it’s being run. Other issues come from learners losing their work, working from different machines and browsers at different times, working offline (no network access), etc. etc.

At the time, there were several blockers for me when it comes to adopting JupyterLite as just another environment, blockers that are still present today. So as JuptyerLite is brought to wider attention via a recent post on the official Jupyter blog – Jupyter Everywhere – and associated social media sharing, this is just a note-to-self as to why I still haven’t got round to updating the OpenLearn Learn to Code for Data Analysis course to use JupyterLite.

Note that this isn’t intended as a criticism of the JupyterLite devs or dev process. There may well be solutions or workarounds that I haven’t come across. I’m just making observations as an everyman who thinks “ooh, I could use it for this” and then realise I can’t, quite, which means I can’t, at all (blocker, innit!;-). I should also state that my default use case is an extreme one: large populations of naive learners & novice programmers working largely unsupported, online and offline, potentially across several different BYOD or public access machines that may be unpatched & years old, on courses that are expected to remain largely unmaintained for several years following publication.

The mechanics of JupyterLite are beyond me – not just the JupyterLab-ness but also the WASM / pyodide / pyolite-ness. And then there’s things like browser local storage, local forage(?) and potential links to local file system via a browser file system API. So some of the following may be hard, some may be impossible (at the moment, or dependent on upstream things…).

To set the scene, when you open JupyterLab or RetroLab homepage, you see a list of files and notebooks that are part of the JupyterLite distribution. You can use UI controls to upload additional files and see them in the file listing. Notebooks and files can be opened by clicking on them in the normal way. If you edit a file, the changes are saved to browser storage. (I’m not sure if there’s an “official” way to reset a notebook back to the original version as represented by the version served as part of the original distribution, rather than the edited version in browser storage? That probably should be my first “is this a blocker?”)

So what are my (other) blockers, presented here as questions just in case they already have solutions (please feel free to post answers via the comments…):

  • how do I reset a modified notebook saved in browser storage to the original version served as part of the jupyerlite distribution [A: deleting a file in the JupyterLab file browser deletes it from browser storage; if the file was part of the original distribution, the file remains in the file browser and is reset to the originally isrtibuted version];
  • how do I add additional Python packages to a JupyterLite distribution (ideally, I’d just specify a requirements.txt file); [A: install the files into the environment that is used to generate the release; example]
  • how do I open and read a file programmatically (eg how do I open a data file, or connect to a sqlite database file)? There is an unofficial solution in a discussion thread, but this seems brittle to me and on occasion appears to break. It would be useful if there were an official, min. viable function that also forms part of the release test suite. I wrapped the unofficial solution in a simple utils package but if it is subject to breaks, then it’s not generally usable in published teaching materials unless the jupyterlite version can be guaranteed to be one in which the tricks work; [A: this looks like it will be sorted as far as file read/writes go via jupyterlite/pull/655]
  • how do I write a file that then appears in the file view (eg saving a data file, or writing to a browser storage persisted sqlite database file; or reading an ipynb file from a remote URL, saving it as a file, then generating a URL that will open that notebook from local storage in eg RetroLite via a path= URL parameter); [A: this looks like it will be sorted as far as file read/writes go via jupyterlite/pull/655]
  • how do I retrieve data from a remote URL in a platform independent way (there are tricks / pyolite functions for reading files from URLs but these require pyodide or js package calls; ideally, I’d just use requests and it would figure out how to handle the transport; in the short term, see eg Making the Python requests module work in Pyodide / bartbroere/pyodide-requests);
  • how do I avoid async await requirements on function calls (some pyolite function calls that can be used to mock non-WASM executed Python functions are asynchronous and require an await prefix; this makes it tricky to write code that runs anywhere; is there a way to mask the await requirement and wrap asynchronous calls in a non-async function?) [tracking?: pyodide/pyodide/issues/1503] There is also a related issue around things like time.sleep() [tracked here: pyodide/pyodide/issues/2354]
  • how do I synch with my desktop filesystem (eg synch browser storage and local storage, or run jupyterlite against the desktop filesystem rather than browser storage; at the moment, this requires file upload / download; presumably I can access the browser storage db from my desktop commandline?); [A: this is supported by jupyterlab-contrib/jupyterlab-filesystem-access]
  • how do I synch with remote synching drives (eg Dropbox, OneDrive, GoogleDrive etc. etc.); [tracking: jupyterlite/jupyterlite/issues/315]
  • how do I download a file programmatically (eg by creating a blob that can be downloaded from an auto-clicked link);
  • how do I open a remote notebook, e.g. in RetroLite (for example: https://jupyter.org/try-jupyter/retro/notebooks/?path=https://raw.githubusercontent.com/jupyterlite/jupyterlite/main/examples/python.ipynb (which does not currently work);
  • how do I install Python packages programmatically in a cross-platform way (currently, packages can be installed via notebooks using micropip; it would be more convenient to mask this via some %pip magic; see related issue).

To create platform agnostic notebooks, it might be that notebooks need to have a guarded cell that makes decisions about what package or workaround to load if the wasm platform is detected (eg via import platform as p; p.platform() etc.; test the platform and import packages as required, either via an if or via a try).

In passing, there are also various other things that would open up new opportunities; perhaps greatest amongst these are support for single executable cells, and for running code via pyolite kernels in Jupyer Book using thebe (tracking issue). But I also wonder: would it be possible to use pyolite to run as a part of a kernel gateway (eg Building a JSON API Using Jupyter Notebooks in Under 5 Minutes) to support serverless functions?

Fragment: On the Value of Traditional Indexes in Full Text Search Environments

Over the last few weeks, I’ve been tinkering with various recipes for pulling searchable text content out of the Internet Archive and popping it into a full text searchable database.

One of my first sketches has used 19th century editions of Notes & Queries. As well as the weekly “content” issues, N & Q also published two index volumes a year detailing the entries of the preceding volume.

Through starting trying to compile sensible index entries for my sin-eater unbook (still a work in progrgress, particularly the index) using the sphinx/Jupyter Book indexing features, I have a new found respect for the compilers of indexes: there’s a real craft to it.

At first glance, you might think there is limitied utility in having an index as well as full text search support, but there are at least two reasons at least why that’s not correct.

The first is navigational: the index provides both a way of identifying search terms as well as helping under the pattern of occurrences of a particular term.

The second is because full-text search using text extracted from large number of scans using OCR really sucks. Even with good stemming etc on full text search terms, even with fuzzy search tools, getting a match on a search term can, at times, be tricky.

So to supplement my full text search over N&Q, I am topping it up with a search into the index that also tries to identify pages directly from related index entries. (The use of the index is also a handy cross-check that the free text search has turned up at least the results included in the originally compiled index.

In passing, I also note the power of the internal cross-referencing scheme used across items appearing in N&Q…