jupyter – OUseful.Info, the blog…

Updating the Empinken Extension from JupyterLab 3.x to 4.x

Just over a year ago, I posted a series of posts (starting here) where I finally got round to hacking together a JupyterLab (v3.x) version of OUr empinken notebook extension that lets you colour individual Jupyter notebook cells from notebook toolbar button. I’m increasingly not a developer, even more so than I never was, so when the extension inevitably failed to work in JupyterLab v. 4, I was faced with having to repeat the painful experience again.

Anyway…

Here are some fragmentary notes getting a version of the extension working in v4. At the moment, user settings are not supported. Hopefully, that will be a simple migration of code from the v3.x extension. Hopefully…

Repo branch relating to the below is here. Install as: pip install https://github.com/innovationOUtside/jupyterlab_empinken_extension/raw/jlv4/dist/jupyterlab_empinken_extension-0.4.0-py3-none-any.whl

Adding Buttons Notebook Toolbars

A toolbar-button example in the jupyterlab/extension-examples repo “shows” how to add buttons to the toolbar, for some definition of show. The README tells you where you can find an “example” of how to attach a custom defined command to a button, but to see how to add the simple base button, you need to try to figure out what files to edit, and how…

From what I could tell, there is a two step process.

In the schema/plugin.json file, add JSON definitions for what the buttons are and what commands they call:

{
  "jupyter.lab.toolbars": {
    "Notebook": [
      {
        "name": "empinken-activity",
        "command": "ouseful-empinken:activity",
        "label": "A"
      },
      {
        "name": "empinken-learner",
        "command": "ouseful-empinken:learner",
        "label": "L"
      },
      {
        "name": "empinken-solution",
        "command": "ouseful-empinken:solution",
        "label": "S"
      },
      {
        "name": "empinken-tutor",
        "command": "ouseful-empinken:tutor",
        "label": "T"
      }
    ]
  },
  "title": "jupyterlab_empinken_extension",
  "description": "jupyterlab_empinken_extension settings.",
  "type": "object",
  "properties": {},
  "additionalProperties": false
}

I think there may also be a priority value, or similar, for controlling where in the toolbar the button is described. I’m sure I saw an example somewhere for how to register and use and SVG image for the button, but I forget where, and I’m saving that actual battle for another day (or not; the letters work ok-ish, and I have better things to do…)

The jupyterlab_code_formatter uses an alternative approach, creating a ToolbarButton explicitly, as I did in the original empinken extension. So maybe there was an easier migration route. Whatever. A ToolbarButton example looks like it may be on the way…

I’m not sure about the other settings in the schema/plugin.json file. Things are working atm, and I don’t have the will to explore what happens if I remove them.

We define the command used by the button in the src/index.ts file. The id is built up from the same root as the title in the schema/plugin.json file — I don’t know if that’s a requirement? (I have so many possibly superstitious , possibly relevant beliefs about trying to wrangle anything to do with JupyterLab… And zero understanding. For example, is the :plugin part of the id a convention or a meaningful requirement?)

import {
  JupyterFrontEnd,
  JupyterFrontEndPlugin
} from '@jupyterlab/application';

/**
 * The plugin registration information.
 */

const plugin: JupyterFrontEndPlugin<void> = {
  id: 'jupyterlab_empinken_extension:plugin',
  description:
    'A JupyterLab extension adding a button to the Notebook toolbar.',
  requires: [],
  autoStart: true,
  activate: (app: JupyterFrontEnd) => {
    const { commands } = app;

    const command = 'ouseful-empinken:activity';

    // Add commands
    commands.addCommand(command, {
      label: 'Execute ouseful-empinken:activity Command',
      caption: 'Execute ouseful-empinken:activity Command',
      execute: () => {
        console.log(`ouseful-empinken:activity has been called.`);

        }
      }
    })
  })
}


/**
 * Export the plugin as default.
 */
export default plugin;

So I can generate buttons. The buttons are there to do two things relative to a selected code cell:

toggle a tag in the cell metadata;
add class to the notebook cell HTML.

Strictly speaking, the second requirement, adding the class, should probably be triggered from a signal that a cell’s metadata has been updated. That would mean if an appropriate tag is added to a cell by other means, such as the JupyterLab cell tag editor, the custom styling would follow. But I don’t know if such a signal exists. So I’m piling all the concerns together…

Because the same command, essentially, is applied with a few parameters for each button, I create a generic function then customise its application in particular commands:

const empinken_tags: string[] = ["activity", "learner", "solution", "tutor"];

const createEmpinkenCommand = (label: string, type: string) => {
      //this works wrt metadata
      const caption = `Execute empinken ${type} Command`;
      return {
        label,
        caption,
        execute: () => {
          let activeCell = notebookTracker.activeCell;
          //console.log(label, type, caption)
          //console.log(activeCell)
          const nodeclass = 'iou-' + type + "-node";
          if ( activeCell !== null) {
            let tagList = activeCell.model.getMetadata("tags") as string[] ?? [];
            //console.log("cell metadata was", tagList, "; checking for", type);
            if (tagList.includes(type)) {
              // ...then remove it
              const index = tagList.indexOf(type, 0);
              if (index > -1) {
                tagList.splice(index, 1);
              }
              activeCell.model.setMetadata("tags", tagList)
              // Remove class
              activeCell.node.classList.remove(nodeclass)
              // cell.node.classList exists
            } else {
              // remove other tags
              tagList = removeListMembers(empinken_tags, tagList)
              empinken_tags.forEach((tag:string) => {
                activeCell.node.classList.remove('iou-' + tag + "-node")
              })
              // add required tag
              tagList.push(type)
              activeCell.model.setMetadata("tags", tagList)
              activeCell.node.classList.add(nodeclass)
            }
            //console.log("cell metadata now is", tagList);
          }
        }
      };
    };
    
    // Generate the commands that are registered for each button
    empinken_tags.forEach((tag:string) => {
      commands.addCommand('ouseful-empinken:'+tag,
        createEmpinkenCommand(tag.charAt(0).toUpperCase(),
        tag));
    })

The notebooktracker is defined as part of the app activation:

import { INotebookTracker } from '@jupyterlab/notebook';

const plugin: JupyterFrontEndPlugin<void> = {
  id: 'jupyterlab_empinken_extension:plugin',
  description:
    'A JupyterLab extension adding a button to the Notebook toolbar.',
  requires: [INotebookTracker],
  autoStart: true,
  activate: (app: JupyterFrontEnd, notebookTracker: INotebookTracker) => {

...

As it stands, the app now publishes buttons to the notebook toolbar. Clicking a button toggles notebook metadata tag state and HTML style on a selected cell.

The next thing is to style cells when a notebook is rendered. In JupyerLab 3.5, a fullyrendered signal fired that could be used to detect a rendered notebook and pass the notebook reference. The original extension then added another pass of styling to colour appropriately tagged cells. The fullyrendered signal isn’t in 4.x, perhaps because notebooks are more intelligently rendered now, but I couldn’t find a signal that says when a notebook has been opened.

A call for help on the Jupyter discourse group about how to grab a list of open notebook references and then iterate through the cells brought a suggestion from Luke Anger to use the labShell.currentChanged signal:

import { LabShell } from '@jupyterlab/application';

...

    const labShell = app.shell as LabShell;
      labShell.currentChanged.connect(() => {
        const notebook = app.shell.currentWidget as NotebookPanel;
        const cellList = notebook.content.model.cells;
        ...
    }

I’d previously tried using an event on the notebooktracker, but had given up because I couldn’t seem to get a list of anything other than the first cell:

notebookTracker.currentChanged.connect((tracker, panel) => {
    if (!panel) {
      return;
    }
...

I had the same issue following Luke’s suggestion, but he then suggested it might be a race , and to wait awhile:

    const labShell = app.shell as LabShell;
    labShell.currentChanged.connect(() => {
      const notebook = app.shell.currentWidget as NotebookPanel;
      if (notebook) {
        notebook.revealed.then(() => {

...

Luke had also suggested grabbing the list of cells from the notebook as notebook.content.model.cells. This does give a list of cells, but no access to the node object on a cell, so instead I went into via the rendered cells (notebook.content.widgets), which still give access to the cell metadata. (I guess at this point I should be thankful that changes to the cell widget metadata also seem to be mirrored on the cell model.)

    const labShell = app.shell as LabShell;
    labShell.currentChanged.connect(() => {
      const notebook = app.shell.currentWidget as NotebookPanel;
      if (notebook) {
        notebook.revealed.then(() => {
          notebook.content.widgets?.forEach(cell=>{
            const tagList = cell.model.getMetadata('tags') ?? [];
            //console.log("cell metadata",tagList)
            tagList.forEach((tag:string) => {
              if (empinken_tags.includes(tag)) {
                //console.log("hit",tag)
                cell.node?.classList.add('iou-'+tag+'-node');
              }
            })
          })
        })
      }
    });

Thus far, it all seems to work okay in JupyterLab 4.x (and breaks in JupyterLab 3.x); but that’s probably okay, and I just have to remember different versions of the extension are required for JupyterLab 3/4.

Still to do: I’d like to add user settings back in, so users can change the colour applied to each tag. Hopefully, this is just a straightforward migration of the 3.x settings code. It’d be nice to be able to colour cells if appropriate tags are added by other means, and it’d be nice to have icons in the buttons. But I found it painful enough getting as far as the above, and I don’t really want to make myself tired-frustrated-and-angry again…

PS for an example of manually creating a button, see https://github.com/kondratyevd/toolbar-button

Search Assist With ChatGPT

Via my feeds, a tweet from @john_lam:

The tools for prototyping ideas are SO GOOD right now. This afternoon, I made a “citations needed” bot for automatically adding citations to the stuff that ChatGPT makes up

https://twitter.com/john_lam/status/1614778632794443776

A corresponding gist is here.

Having spent a few minutes prior to that doing a “traditional” search using good old fashioned search terms and the Google scholar search engine to try to find out how defendants in English trials of the early 19th century could challenge jurors (Brown, R. Blake. “Challenges for Cause, Stand-Asides, and Peremptory Challenges in the Nineteenth Century.” Osgoode Hall Law Journal 38.3 (2000) : 453-494, http://digitalcommons.osgoode.yorku.ca/ohlj/vol38/iss3/3 looks relevant), I wondered whether ChatGPT, and a John Lam’s search assist, might have been able to support the process:

Firstly, can ChatGPT help answer the question directly?

Secondly, can ChatGPT provide some search queries to help track down references?

The original rationale for the JSON based response was so that this could be used as part of an automated citation generator.

So this gives us a pattern of: write a prompt, get a response, request search queries relating to key points in response.

Suppose, however, that you have a set of documents on a topic and that you would like to be able to ask questions around them using something like ChatGPT. I note that Simon Willison has just posted a recipe on this topic — How to implement Q&A against your documentation with GPT3, embeddings and Datasette — that independently takes a similar approach to a recipe described in OpenAI’s cookbook: Question Answering using Embeddings.

The recipe begins with a semantic search of a set of papers. This is done by generating an embdding for the documents you want to search over using the OpenAI embeddings API, though we could roll our own that runs locally, albeit with a smaller model. (For example, here’s a recipe for a simple doc2vec powered semantic search.) To perform a semantic search, you find the embedding of the search query and then find near embeddings generated from your source documents to provide the results. To speed up this part of the process in datasette, Simon created the datasette-faiss plugin to use FAISS .

The content of the discovered documents are then used to seed a ChatGPT prompt with some “context”, and the question is applied to that context. So the recipe is something like: use a query to find some relavant documents, grab the content of those documents as context, then create a ChatGPT prompt of the form “given {context}, and this question: {question}”.

It shouldn’t be too difficult to hack together a think that runs this pattern against OU-XML materials. In other words:

generate simple text docs from OU-XML (I have scrappy recipes for this already);
build a semantic search engine around those docs (useful anyway, and I can reuse my doc2vec thing);
build a chatgpt query around a contextualised query, where the context is pulled from the semantic search results. (I wonder, has anyone built a chatgpt like thing around an opensource gpt2 model?)

PS another source of data / facts are data tables. There are various packages out there that claim to provide natural language query support for interrogating tabular data eg abhijithneilabraham/tableQA, and this review article, or the Higging Face table-question-answering transformer, but I forget which I’ve played with. Maybe I should write a new RallyDataJunkie unbook that demonstrates those sort of tool around tabulated rally results data?

FInding the Path to a Jupyter Notebook Server Start Directory… Or maybe not…

For the notebook search engine I’ve been tinkering with, I want to be able to index notebooks rooted on the same directory path as a notebook server the search engine can be added to as a Jupyter server proxy extension. There doesn’t seem to be a reliably set or accessible environment variable containing this path, so how can we create one?

Here’s a recipe that I think may help: it uses the nbclient package to run a minimal notebook that just executes a simple, single %pwd command against the available Jupyter server.

import nbformat
from nbclient import NotebookClient

_nb =  '''{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pwd"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}'''

nb = nbformat.reads(_nb, as_version=nbformat.NO_CONVERT)

client = NotebookClient(nb, timeout=600)
# Available parameters include:
# kernel_name='python3'
# resources={'metadata': {'path': 'notebooks/'}})

client.execute()

path = nb['cells'][0]['outputs'][0]['data']['text/plain'].strip("'").strip('"')

Or maybe it doesn’t? Maybe it actually just runs in the directory you run the script from, in which case it’s just a labyrinthine pwd… Hmmm…

Sketching a datasette powered Jupyter Notebook Search Engine: nbsearch

Every so often, I’ve pondered the question of "notebook search": how can we easily support searches over Jupyter notebooks. I don’t really understand why this area seems so underserved, especially given the explosion in the number of notebooks and the way in which notebooks are increasingly used as a document for writing technical documentation, tutorial and instructional material.

One approach I have seen as a workaround is to produce an HTML site from a set of notebooks using something like nbsphinx or Jupyter Book simply to generate access to an inbuilt search engine. But that somehow feels redundant to me. The HTML Jupyter book form is not a collection of notebooks, nor does it provide a satisfying search environment. To access runnable notebooks you need to click through to open the notebook in another environment (for example, a MyBinder environment built from a repository of notebooks that created the HTML pages), or return the the HTML environment and run code cells inline using something like Thebelab.

So I finally got round to considering this whole question again in the form of a quick sketch to see what an integrated Jupyter notebook server search engine might feel like. It’s still early days — the nbsearch tool is provided as a Jupyter server proxy application, rather than integrated as a Jupyter server extension available via a integrated tab, but that does mean it also works in a standalone mode.

The search engine is built on top of a SQLite database, served using datasette. The base UI was stolen wholesale from Simon Willison’s Fast Autocomplete Search for Your Website demo.

The repo is currently here.

The search index is currently based on a full text search index of notebook code and markdown cells. (At the moment, you have to manually generate the index from a command line command. On the to do list for another sketch is an indexer that monitors the file system.) Cells are returned in a cell-type sensitive way:

Code cells are syntax highlighted using Prism.js, and feature a Copy button for copying the (unstyled) code (clipboard.js). Markdown cells are styled using a simple Javascript markdown parser (marked.js).

The code cells should also have line numbers but this seems a little erratic at the moment; I can’t get local static js and css files to load properly under the Jupyter server proxy at the moment, so I’m using a CDN. The prism.js line number extension is a separate CDN delivered script to the main Prism script, and it seems that the line number extension doesnlt necessarily load correctly? A race condition maybe?

Each result item displays a link to the original notebook (although this doesn’t necessarily resolve correctly at the moment), and a description of which cell in the notebook the result corresponds to. An inline graphic depicts the structure of the notebook (markdown cells are blue, and code cells pink). Clicking the graphic toggles the display (show / hide) of that results cell group.

The contents of a cell are limited in terms of number of characters displayed. Clicking the the Show all cell button displays the full range of content. Two other buttons — Show previous cell and Show next cell — allow you to repeatedly grab additional cells that surround the originally retrieved results cell.

I’ve also started experimenting with a Thebelab code execution support. At the moment this is hardwired to use a MyBinder backend, but the intention is that if a local Jupyer server is available (eg as in the case when running nbsearch as a Jupyter server proxy application), it will use the local Jupyter server. (Ideally, it would also ensure the correct kernel is selected for any given notebook result.)

nbsearch UI with ThebeLab code execution example.

At the moment, things don’t work completely properly with Thebelab. If you run a query, and "activate" Thebelab in the normal way, things work fine. But when I dynamically add new cells, they arenlt activated.

If I try to manually activate them via a cell-centric button:

then the run/restart buttons appear, but trying to run the cell just hangs on the "Waiting for kernel…" message.

At the moment, the code cell is non-editable, but making it editable should just be a case of tweaking the code cell attributes.

There are lots of other issues to consider regarding cell execution, such as when a cell requires other cells to have run previously. This could be managed by running another query to grab all the previous code cells associated with a particular code code, and running those cells on a restarted kernel using Thebelab before running the current cell.

Providing an option to grab and display (and even copy) all the previous code in a notebook, or perhaps explore the gather package for finding precursor cells, might be a useful facility anyway, even without the ability to execute the code directly.

At the moment, results are limited to the first ten. This needs tweaking, perhaps with a slider ranged to the total number of results for a particular query and then letting you slide to select how many of them you want to display.

A switch to limit results to just code or just markdown cells might also be useful, as would an indicator somewhere that shows the grouped number of hits per notebook, perhaps with selection of this group acting as a facet: selecting a particular notebook would then limit cell results to just that notebook, perhaps grouping and ordering cells within a notebook by cell otde.

The ranking algorithm is something else that may be worth exploring more generally. One simple ranking tweak that may be useful in an educational setting could be to order results by notebook and cell order (for example, if notebooks are named according to some numbering convention: 01.1 Introduction to X, O1.2 X in more detail, 02.1 etc). Again, Simon Willison has led the way in some of the practicalities associated with exploring custom ranking schemes in his post Exploring search relevance algorithms with SQLite.

Way back when, when I originally started blogging, search was one of my favourite topics. I’ve neglected it over the years, but still think it has a lot to offer as a teaching and learning tool (eg things like Search Engine Powered Courses… and search hubs / discovered custom search engines). Many educators disagree with this approach because they like to think they are in control of the narrative, whereas I think that search, with a properly tuned ranking algorithm, can help support a student demand led, query result constructed, personalised structured narrative. Maybe it’s time for me to start playing with these ideas again…

First Foray into the Reclaim Cloud (Beta) – Running a Personal Jupyter Notebook Server

For years and years I;ve been hassling my evil twin brother (it’s a long story) Jim Groom about getting Docker hosting up and running as part of Reclaim, so when an invite to the Reclaim Cloud beta arrived today (thanks, Jim:-), I had a quick play (with more to come in following days and weeks, hopefully… or at least until he switches my credit off;-)

For an early example of how to get JupyterHub up and running on Reclaim Cloud, see https://github.com/ousefulReclaimed/jupyterhub-docker/ . Best practice around this currently (July ’21) seems to be Tim Sherratt’s (@wragge) GLAM Workbench on Reclaim Cloud recipes.

The environment is provided by Jelastic, (I’m not sure how the business model will work, eg in terms of what’s being licensed and what’s being resold…?).

Whilst there are probably docs, the test of a good environment is how far you can get by just clicking buttons, so here’s a quick recap of my first foray…

Let’s be having a new environment then..

Docker looks like a good choice:

Seems like we can search for public DockerHub containers (and maybe also provate ones if we provide credentials?).

I’ll use one of my own containers, that is built on top of an official Jupyter stack container:

Select one and next, and a block is highlighted to show we’ve configured it…

When you click apply, you see loads of stuff available…

I’m going to cheat now… the first time round I forgot a step, and that step was setting a token to get into the Jupyter notebook.

If you look at my repo docs for the container I selected, you see that I recommend setting the Jupyter login token via an environment variable…

In the confusing screen, there’s a {...} Variables option that I guessed might help with that:

Just in passing, if your network connection breaks in a session, we get a warning and it tries to reconnect after a short period:

Apply the env var and hit the create button on the bewildering page:

And after a couple of minutes, it looks like we have a container running on a public IP address:

Which doesn’t work:

And it doesn’t work becuase the notebook isnlt listening on port 80, it autostarts on port 8888. So we need to look for a port map:

A bit of guessing now – we porbbaly want an http port, which nominally maps, or at least default, to port 80? And then map that to the port the notebook server is listening on?

Add that and things now look like this as far as the endpoints go:

Try the public URL again, on the insecure http address:

Does Jim Rock?

Yes he does, and we’re in…

So what else is there? Does it work over https?

Hmmm… Let’s go poking around again and see if we can change the setup:

So, in the architecture diagram on the left, if we click the top Balancing block, we can get a load balancer and reverse proxy, which are the sorts of thing that can often handle certificates for us:

I’ll go for Nginx, cos I’ve heard of that…

It’s like a board game, isn’t it, where you get to put tokens on your personal board as you build your engine?! :-)

It takes a couple of mins to fire up the load balancer container (which is surely what it is?):

If we now have a look in the marketplace (I have to admit, I’d had skimmed through this at the start, and noticed there was something handy there…) we can see a Let’s Encrypt free SSL certificate:

Let’s have one of those then…

I’ll let you into another revisionist secret… I’d tried to install the SSL cert without the load balancer, but it refused to apply it to my container… and it really looked like it wanted to apply to something else. Which is what made me thing of the nginx server…

Again we need to wait for it to be applied:

When it is, I donlt spot anyhting obvious to show the Let’s Encrypt cert is there, but I did get a confirmation (not shown in screenshots).

So can we log in via https?

Bah.. that’s a sort of yes, isn’t it? The cert’s there:

but there’s http traffic passing through, presumably?

I guess I maybe need another endpoint? https onto port 8888?

I didn’t try at the time — that’s for next time — becuase what I actually did was to save Jim’s pennies…

And confirm…

So… no more than half an hour from a zero start (I was actually tinkering whilst on a call, so only half paying attention too…).

As for the container I used, that was built and pushed to DockerHub by other tools.

The container was originally defined in a Github repo to run on MyBinder using not a Dockerfile, but requirements.txt and apt.txt text files in a binder/ directory.

The Dockerhub image was built using a Github Action:

And for that to be able to push from Github to DockerHub, I had to share my DockerHub username and password as a secret with the Github repo:

But with that done, when I make a release of the repo, having tested it on MyBinder, an image is automatically built and pushed to Dockerhub. And when it’s there, I can pull it into Reclaim Cloud and run it as my own personal service.

Thanks, Jim..

PS It’s too late to play more today now, and this blog post has taken twice as long to write as it took me to get a Jupyter notebook sever up an running from scratch, but things on my to do list next are:

1) see if I can get the https access working;

2) crib from this recipe and this repo to see if I can get a multi-user JupyterHub with a Dockerspawner up and running from a simple Docker Compose script. (I can probably drop the Traefik proxy and Let’s Encrypt steps and just focus on the JupyerHub config; the Nginx reverse proxy can then fill the gap, presumably…)

Rapid ipywidgets Prototyping Using Third Party Javascript Packages in Jupyter Notebooks With jp_proxy_widget

Just before the break, I came across a rather entrancing visualisation of Jean Michel Jarre’s Oxygene album in the form of an animated spectrogram [video].

Time is along the horizontal x-axis, and frequency along the vertical y-axis. The bright colours show the presence, and volume, of each frequency as the track plays out.

Such visualisations can help you hear-by-seeing the structure of the sound as the music plays. So I wondered… could I get something like that working in a Jupyter notebook….?

And it seems I can, using the rather handy jp_proxy_widget that provides a way of easily loading jQueryUI components as well as the requests.js module to load and run Javascript widgets.

Via this StackOverflow answer, which shows how to embed a simple audio visualisation into a Jupyter notebook using the Wavesurfer.js package, I note that Wavesurfer.js also supports spectrograms. The example page docs are a bit ropey, but a look at the source code and the plugin docs revealed what I needed to know…

#%pip install --upgrade ipywidgets
#!jupyter nbextension enable --py widgetsnbextension

#%pip install jp_proxy_widget

import jp_proxy_widget

widget = jp_proxy_widget.JSProxyWidget()

js = "https://unpkg.com/wavesurfer.js"
js2="https://unpkg.com/wavesurfer.js/dist/plugin/wavesurfer.spectrogram.min.js"
url = "https://ia902606.us.archive.org/35/items/shortpoetry_047_librivox/song_cjrg_teasdale_64kb.mp3"

widget.load_js_files([js, js2])

widget.js_init("""
element.empty();

element.wavesurfer = WaveSurfer.create({
    container: element[0],
    waveColor: 'violet',
        progressColor: 'purple',
        loaderColor: 'purple',
        cursorColor: 'navy',
        minPxPerSec: 100,
        scrollParent: true,
        plugins: [
        WaveSurfer.spectrogram.create({
            wavesurfer: element.wavesurfer,
            container: element[0],
            fftSamples:512,
            labels: true
        })
    ]
});

element.wavesurfer.load(url);

element.wavesurfer.on('ready', function () {
    element.wavesurfer.play();
});
""", url=url)

widget

#It would probably make sense to wire up these commands to upywidgets buttons...
#widget.element.wavesurfer.pause()
#widget.element.wavesurfer.play(0)

The code is also saved as a gist here and can be run on MyBinder (the dependencies should be automatically installed):

Here’s what it looks like (It may take a moment or two to load when you run the code cell…)

It doesn’t seem to work in JupyterLab though… [UPDATE: following recent patches to jp_proxy_widget, it may well work now…]

It looks like the full ipywidgets machinery is supported, so we can issue start and stop commands from the Python notebook envioronment that control the widget Javascript.

So now I’m wondering what other Javascript apps are out there that might be interesting in a Jupyter notebook context, and how easy it’d be to get them running…?

It might also be interesting to try to construct an audio file within the notebook and then visualise it using the widget.

PS ipywidget slider corss-referencing wavesurfer.js playhead: https://gist.github.com/scottire/654019e88e6225c15a68006ab4a3ba98 h/t @_ScottCondron

PPS and more from Scott Condron, showing how to wire up holoviews sliders, a spectrogram and an audio player: interactive audio plots in jupyter notebooks

Accessing MyBinder Kernels Remotely from IPython Magic and from VS Code

One of the issues facing us as a distance learning organisation is how to support the computing needs of distance learning students, on a cross-platform basis, and over a wide range of computer specifications.

The approach we have taken for our TM351 Data Management and Analysis course is to ship students a Virtualbox virtual machine. This mostly works. But in some cases it doesn’t. So in the absence of an institutional online hosted notebook, I started wondering about whether we could freeload on MyBinder as a way of helping students run the course software.

I’ve started working on an image here though it’s still divergent from the shipped VM (I need to sort out things like database seeding, and maybe fix some of the package versions…), but that leaves open the question of how students would then access the environment.

One solution would be to let students work on MyBinder directly, but this raises the question of how to get the course notebooks into the Binder environment (the notebooks are in a repo, but its a private repo) and out again at the end of a session. One solution might be to use a Jupyter github extension but this would require students setting up a Github repository, installing and configuring the extension, remember to sync (unless auto-save-and-commit is available, or could be added to the extendion) and so on…

An alternative solution would be to find a way of treating MyBinder like an Enterprise Gateway server, launching a kernel via MyBinder from a local notebook server extension. But I don’t know how to do that.

Some fragments I have had laying around for a bit were the first fumblings towards a Python Mybinder client API, based on the Sage Cell client for running a chunk of code on a remote server… So I wondered whether I could do another pass over that code to ceate some IPython magic that let you create a MyBinder environment from a repo and then execute code against it from a magicked code cell. Proof of concept code for that is here: innovationOUtside/ipython_binder_magic.

One problem is that the connection seems to time out quite quickly. The code is really hacky and could probably be rebuilt from functions in the Jupyter client package, but making sense of that code is beyond my limited cut-and-paste abilities. But: it does offer a minimal working demo of what such a thing could be like. At a push, a student could install a minimal Jupyter server on their machine, install the magic, and then write notebooks using magic to run the code against a Binder kernel, albeit one that keeps dying. Whilst this would be inconvenient, it’s not a complete catastrophe because the notebook would be bing saved to the student’s local machine.

Another alternative struck me today when I say that Yuvi Panda had posted to the official Jupyter blog a recipe on how to connect to a remote Jupyterhub from Visual Studio Code. The mechanics are quite simple — I posted a demo here about how to connect from VS Code to a remote Jupyter server running on Digital Ocean, and the same approach works for connecting to out VM notebook server, if you tweak the VM notebook server’s access permissions — but it requires you to have a token. Yuvi’s post says how to find that from a remote JupyterHub server, but can we find the token for a MyBinder server?

If you open your browser’s developer tools and watch the network traffic as you launch a MyBinder server, then you can indeed see the URL used to launch the environment, along with the necessary token:

But that’s a bit of a faff if we want students to launch a Binder environment, watch the newtrok traff, grab the token and then use that to create a connection to the Binder environment from VS Code.

Searching the contents of pages from a running Binder environment, it seems that the token is hidden in the page:

And it’s not that hard to find… it’s in the link from the Jupyter log. The URL needs a tiny bit of editing (cut the /tree path element) but then the URL is good to go as the kernel connection URL in VS Code:

Then you can start working on your notebook in VS Code (open a new notebook from the settings menu), executing the code against the MyBinder environment.

You can also see the notebooks listed in the remote MyBinder environment.

So that’s another way… and now it’s got me thinking… how hard would it be to write a VS Code extension to launch a MyBinder container and then connect to it?

Ps by the by, I notice that developer tools in Firefox become increasingly useful with the Firefox 71 release in the form of a websocket inspector.

This lets you inspect traffic sent across a webseocket connection. For example, if we force a page reload on a running Jupyter notebook, we can see a websocket connection:

We can the click on that connection and monitor the messages being passed over it…

I thought this might help me debug / improve my Binder magic, but it hasn’t. The notebook looks like it sends an empty ping as a heartbeat (as per the docs), but if I try to send an empyt message from the magic it closes the connection? Instead, I send a message to the hearbeat channel…

PS sort of related, binderbot, “A simple CLI to interact with binder, eg run local notebooks on a remote binder”.

OER Text Publishing Workflows Rooted on OpenLearn OU-XML Via Github, CircleCI and Github Pages Using Jupytext and nbSphinx

Slowly, slowly, my recipes are coming together for generating markdown from OU-XML sourced, variously, from modules on the OU VLE and units on OpenLearn.

The code needs a couple more passes through but at some point I should be able to pull a simple CLI together (hopefully! I’m still manually running some handcranked steps spread across a couple of notebooks at the moment:-(

So… where am I currently at?

First up, I have chunks of code that can generate markdown from OU-XML, sort of. The XSLT is still a bit ropey (~~lists are occasionally broken~~[FIXED], for example, repeating the text) and the image link reconciliation for OpenLearn images doesn’t work, although I may have a way of accessing the images directly from the OU-XML image paths. (There could still be image rights issues if I was archiving the images in my own repo, which perhaps counts as a redistribution step…?)

The markdown can be handled in various ways.

Firstly, it can be edited/viewed as markdown. Chatting to colleague Jon Rosewell the other day, I realised that JupyterLab provides one way of editing and previewing markdown: in the JupyterLab file browser, right click on an .md file and you should be able to preview it:

There is also a WYSIWYG editor extension for JupyterLab (which looks like it may enter core at some point): Jupyter Scribe / jupyterlab-richtext-mode.

If you have Jupytext installed, then clicking on an .md file in the notebook tree browser opens the document into a Jupyter notebook editor, where markdown and code cells can be edited separately. An .ipynb file can then be downloaded from the notebook editor, and/or Jupytext can be used to pair markdwon and .ipynb docs from the notebook file menu if you install the Jupytext notebook extension. Jupytext can also be called on the command line to convert .md to .ipynb files. If the markdown file is prefaced with Jupytext YAML metadata (i.e. if the markdown file is a “Jupytext markdown” file, then notebook metadata (which includes cell tags, for example) is preserved in the markdown and can be used for round-tripping between markdown and notebook document formats. (This is handy for RISE slideshows, for example; the slide tags are preserved in the markdown so you can edit a RISE slideshow as a markdown document and then present it via Jupytext and a notebook server.)

In a couple of simple tests I tried, the .ipynb generated from markdown using Jupytext seemed to open okay in the new Netflix Polynote notebook application (early review). This is handy, because Polynote has a WYSIWYG markdown editor… So for anyone who gripes that notebooks are too hard because writing markdown is too hard, this provides an alternative.

I also note that the wrong code language has been selected (presumably the default in the absence of any specified language? So I need to make sure I do tag code cells with a default language somehow… I wonder if Jupytext can do that?).

Having a bunch of markdown documents, or notebooks derived from markdown documents using Jupytext is one thing, providing as it does a set of documents that can be easily edited and interacted with, albeit in the context of a Jupyter notebook server.

However, we can also generate HTML websites based on those documents using tools such as Jupyter Book and nbsphinx. Jupyter Book uses a Jekyll engine to build HTML sites, which is a bit of a pain (I noted a demo here that used CircleCI to build a site from notebooks and md using Jupyter Book) but the nbsphinx Python package that extends the (also pip installable) Sphinx documentation engine is a much easier propostion…

As a proof-of-concept demo, the ouseful-oer/openlearn-learntocode repo contains markdown files generated from the OpenLearn Learn to code for data analysis course.

Whenever the master branch on the repository is updated, CircleCI kicks in and uses nbsphinx to build a documentation site from the markdown docs and pushes them to the repository’s gh-pages branch, which makes the site available via Github Pages: “Learn To Code…” on Github Pages.

What this means is that I should be able to edit the markdown directly via the Github website, or using an online editor such as prose.io connected to my Github account, commit changes and then let CircleCI rebuild the site for me.

(I’m pretty sure I haven’t set things up as efficiently I could in terms of CI; what I would like is for only things that have changed to be rebuilt, but as it is, everything gets rebuilt (although the installed Python environment should be cached?) Hints / tips / suggestions about improving my CircleCI config.yml file would be much appreciated…

At the moment, nbsphinx is set up to run .md files through Jupytext to convert them to .ipynb, which nbsphinx then eventually churns back to HTML. I’ve also disabled code cell execution in the current set up (which means the routing through .ipynb in this instance is superfluous – the site could just be generated from the .md files). But the principle is there for a flick of a switch meaning that the code cells could be executed and their outputs immortalised in the punlished site HTML.

So… what next?

I need to automate the prodcution of the root index file (index.rst) so that the table of contents are built from the parsed OU-XML. I think Sphinx handles navigation menu nesting based on header levels, which is a bit of a pain in the demo site. (It would be nice if there were a Sphinx trick that lets me increase the de facto heading level for files in a subdirectory so that in the navigation sidebar menu each week’s content could be given its own heading and then the week’s pages listed as child pages within that. Is there such a trick?)

Slowly, slowly, I can see the pieces coming together. A tool chain looks possible that will:

download OU-XML;
generate markdown;
optionally, cast markdown as notebook files (via jupytext);
publish markdown / (un)executed notebooks (via nbsphinx).

A couple of next steps I want tack on to the end as and when I get a chance and top up my creative energy levels: firstly, a routine that will wrap the published pages in an electron app for different platforms (Mac, Windows, Linux); secondly, publishing the content to different formats (for example, PDF, ebook) as well as HMTL.

I also need to find a way of adding interaction — as Jupyter Book does — integrating something like ThebeLab or nbinteract buttons to support in-page code execution (ThebeLab) and interactive widgets (nbinteract).

News: Arise All Ye Notebooks

A handful of brief news-y items…

Netflix Polynote Notebooks

Netflix have announced a new notebook candidate, Polynote [code], capable of running polyglot notebooks (scala, Python, SQL) with fixed cell ordering, variable inspector and WYSIWYG text authoring.

At the moment you need to download and install it yourself (no official Docker container yet?) but from the currently incomplete installation docs, it looks like there may be other routes on the way…

The UI is clean, and whilst perhaps slightly more cluttered than vanilla Jupyter notebooks it’s easier on the eye (to my mind) than JupyterLab.

Cells are code cells or text cells, the text cells offering a WYSIWYG editor view:

One of the things I note is the filetype: .ipynb.

Code cells are sensitive to syntax, with a code completion prompt:

I really struggle with code complete. I can’t write import pandas as pd RETURN because that renders as import pandas as pandas. Instead I have to enter import pandas as pd ESC RETURN.

Running cells are indicated with a green sidebar to the cell (you can get a similar effect in Jupyter notebooks with the multi-outputs extension):

I couldn’t see how to connect to a SQL database, nor did I seem to get an error from running a presumably badly formed SQL query?

The execution model is supposed to enforce linear execution, but I could insert a cell after and unrun cell and get an error from it (so the execution model is not run all cells above either literally, or based on analysis of the programme abstract syntax tree?)

There is a variable inspector, although rather than showing or previewing cell state, you just get a listing of variables and then need to click through to view the value:

I couldn’t see how to render a matplotibl plot:

The IPython magic used in Jupyter notebooks throws an error, for example:

This did make me realise that cell lines are line numbered on one side and there’s a highlight shown on the other side which line errored. I couldn’t seem to click through to raise a more detailed error trace though?

On the topic of charts, if you have a Vega chart spec, you can paste that into a Vega spec type code cell and it will render the chart when you run the cell:

The developers also seem to be engaging with the “open” thing…

Take it for a spin today by heading over to our website or directly to the code and let us know what you think! Take a look at our currently open issues and to see what we’re planning, and, of course, PRs are always welcome!

Streamlit.io

Streamlit.io is another new not-really-a-notebook alternative, pip installable and locally runnable. The model appears to be that you create a Python file and run the streamlit server against that file. Trying to print("Hello World") doesn’t appear to have any effect — so that’s a black mark as far as I’n concerned! — but the display is otherwise very clean.

Hovering top right will raise the context menu (if it’s timed-out itself closed) showing if the source file has recently been saved and not rerun, or allowing you to always rerun the execution each time the file is saved.

I’m not sure if there’s any cacheing of steps that are slow to run if associated code hasn’t changed up to that point in a newly saved file.

Ah, it looks there is…

… and the docs go into further detail, with the use of decorators to support cacheing the output of particular functions.

I need to play with this a bit more, but it looks to me like it’d make for a really interesting VS Code extension. It also has the feel of Scripted Forms, as was, (a range of widgets are available in streamlit as UI components), and R’s Shiny application framework. It also feels like something I guess you could do in Jupyterlab, perhaps with a bit of Jupytext wiring.

In a similar vein, a package called Handout also appeared a few weeks ago, offering the promise of “[t]urn[ing] Python scripts into handouts with Markdown comments and inline figures”. I didnlt spot it in the streamlit UI, but it’d be useful to be able to save or export the rendered streamlit document eg as an HTML file, or even as an ipynb notebook, with run cells, rather than having to save it via the browser save menu?

Wolfram Notebooks

Wolfram have just announced their new, “free” Wolfram Notebooks service, the next step in the evolution of Wolfram Cloud (announcement review], I guess? (I scare-quote “free because, well, Wolfram; you’d also need to carefully think about the “open” and “portable” aspects…

*Actually, I did try to have a play, but I went to the various sites labelled as “Wolfram Notebooks” and I couldn’t actually find a 1-click get started (at all, let alone, for “free”) link button anywhere obvious?

Ah… here we go:

[W]e’ve set it up so that anyone can make their own copy of a published notebook, and start using it; all they need is a (free) Cloud Basic account. And people with Cloud Basic accounts can even publish their own notebooks in the cloud, though if they want to store them long term they’ll have to upgrade their account.

Fragment: Indexing Local Jupyter Notebooks for Search

It’s been some time since I last explored this (eg here and here, and as far as I know know other solutions have appeared since, but a question still remains as to how to effectively search over a set of notebooks.

Partial alternative solutions maybe worth noting include:

nbscan for searching over notebooks from the command-line;
nbgallery bakes in Solr/sunspot; it’d be really nice if the nbgallery search tools could be easily decoupled so the search could be added to an arbitrary Jupyter notebook, or JupyterHub, server as an extension…);
this simple search engine with automcomplete by Simon Willison.

There is also the lunr based search of Jupyter Book (related issue). (The more recent elasticlunr Javascript search engine also looks interesting… perhaps even more so than lunr.js…)

[UPDATE: This is new to me, and I’ve not had a chance to try it: Find your Jupyter notebooks with ElasticSearch – elastic search recipe.]

One of the things I often wondered about in respect of building a notebook search engine index would be how to crawl / index freshly updated notebooks.

One way would presumably be to regularly crawl the directory path in which notebooks live looking for notebook files that have a changed timestamp compared to the last time they were indexed; another might be to set up some sort of watcher on the operating system that calls the indexer whenever it spots a file being updated (maybe something like fswatch?).

Another way might be to use something like the pgcontents contents manager to save (or process) notebooks into a search engine index database. (For other examples of Jupyter notebook content managers, see this Tracking Jupyter round-up. I wonder, is there a sqlite content manager that can save notebooks directly into SQLite? Would the pgcontents extension handle that with little or no modification, other thn to the supplied database connection string?) If notebooks were saved as notebooks to disk, and into a database for indexing as part of the search engine, how would the indexed notebook also be linked back to the notebook on disk so it could be linked to via search results?

Thinks: how is nbgallery architected? Where are notebooks saved to? How is the Solr search engine index managed?

More generally, I wonder: are there any Python based, simple full-text search engines with local fielsystem crawlers/monitors/indexers out there?

PS Other search engines to have a look at:

PPS updating lunr.js – thread: https://github.com/olivernn/lunr.js/issues/284, https://www.npmjs.com/package/lunr-mutable-indexes . Maybe also https://github.com/lucaong/minisearch