My Personal Blockers on Getting Started With JupyterLab

Although at times the content of this blog may come across as somewhat technical, as anyone who has looked at any of my code would tell you, I am not a developer (actually, you could interpret that phrase in a lot of ways!). This post represents a stream of consciousness about some of the stumbling blocks that I perceive as preventing me from getting started building my own extensions for JupyterLab.

See also this related JupyterLab issue: Getting Started Docs for Non-Developers.

The code I write is generally written to get things done, not to form part of some production application. It is a means to an end. It’s poorly structured, and eclectically commented. There’s no linting. My repo commits are random collections of files with often vacuous commit messages. You would not want me committing code to your code base.

I typically categorise my code outputs into various classes:

  • code fragments, which are simple tricks or hacks for performing a particular effect, often something I’ve picked up from somewhere else. One fragment I need to record in a post somewhere is how to densify points along a geojson linestring, a trick I picked up here ; a recent fragment of my own shows how we might be able to style a JSON fragement that identifies the location of a typo in a text string: there may be better ways, “approved ways”, of doing this, but I didn’t find one when I looked so I made a really simple thing to try to do it myself;
  • code sketches, which often take the form of notebooks that describe a mini-project, one way of doing things. The notebooks in my Conversations With Data: Unistats repo are sketches as much as anything; this are often free-form, as I explore a particular topic;
  • code recipes start off as sketches, but then I try to tease out and explain some sort of task or process and perhaps tidy up the code a bit; producing a recipe involves a bit of iteration, trying to identify each step and the reason for it, and ensure that everything is complete: things like Visualising WRC Rally Stages With rayshader and R and Visualising WRC Rally Timing and Results Data are packed fill of recipes;
  • code hacks are the closest I get to production code, not in the sense of it being properly linted, commented and test but in the sense of something I install and use. My notebook extensions are all hacks.

You’ll note that I don’t write tests: I write a line of code at a time, and look at its output; if it doesn’t look right, or it breaks, I try to fix it. I rerun all the cells in a notebook with a fresh kernel a lot to check that things keep working; if the’re broken, I check to see why the broken cell has broken, then read back up to check each previous step is doing what I wanted and has fed forward what I intended it to feed forward.

When I first started using Jupyter notebooks, the classic notebook, they were still called IPython notebooks. (We started developing a course using notebooks in 2014 that went live, after a 6 month deleay, in 2016B (which is to say, February 2016).) To try to make the experience a bit more like the VLE, I hacked together an extension to augment the notebook with coloured cells to represent activities and to allow tutors to highlight cells they had commented in assessment feedback to students. That extension continues (though updated) as nb_extension_empinken.

Since then we have added various other extensions, such as a riff on empinken that styles cell based on bootstrap-like tags (nb_extension_tagstyler), the ipython_magic_sqlalchemy_schemadisplay extension to display (sort of!) database schemas for a connected database or an extension I liked but no-one else did to pop out cells into a floating widget so you could easily refer back to them (nb_cell_dialog).

Over the years, the repos have added clutter, bits of of automation, more elaborate approaches to packaging, but in the beginning, they were very simple, essentially just a README file, a setup.py file, a directory containing a simple __init__.py file and a static file containing the actual extension code in the form of an index.js file. The packaging structure was cribbed directly from other extensions (typically, the simplest one I could find, a minimum viable extension in other words), the setup for the extension was cribbed directly from other (minimally viable) extensions and much of the code was cribbed from other extensions. The empinken extension is essentially a toolbar button that added metadata to a cell, and a routine that iterated each notebook cell, checked the metadata and updated the css. There were other extensions that in whole or in part demonstrated how to do each of those tasks, which I then pinched and reassmbled to my own purposes.

The code was limited to py packaging (cribbed) and some simple js (largely cribbed).

The development environment was a text editor.

The testing was to install the package, refresh the notebook page and (with the help of browser developer tools) see where it was breaking until it didn’t.

The on-ramp was achievable.

So now we come to JupyterLab, which appears to me as a hostile architecture.

I’ll try to pick out what I mean by that. Please note that the following is not intended as a personal attack on the JupyterLab teams or the docs, it’s a pardoy as much as anything.

From the docs, the getting started is to install a development environment, and I’m already lost. When ever I try to use node it seems to download the internet and the instructions typically say “build the package” without saying what words of incantation I need to type into the command line to actually build the package (why should I just know how to do that?)

The next step is to install a cookiecutter. This doesn’t necessarily help because I have no idea what all the files are for, whether they or necessary, or what changes can be made to each one to perfrom a particular task. I’d rather be interested to a minimally viable set of files one at a time with an explanation of what each one does and containing nothing uncommented that is not essential. (Some “useful but optional” fragments may also be handy so I can uncomment them and try them out to see what they do, but not too many.)

When it comes to trying out some example code, I need to learn a new language, .ts (which is to say, TypeScript). I have no idea what TypeScript is or how to run it.

I also need to import load of things from @ things, whatever they are. If I’m trying to figure out how to do a thing by cribbing code from seeing what files are are loaded to support a working extension in my browser (which I suspect is way harder to do with JupyterLab than it was from classic notebook) I’m not sure if thre’s an obvious way to re-engineer the TypeScript code from the Javascript in the browser that does something like what I want to do.

I’m not totally sure what “locate the extension means? Is that something I have to do or is it something the code exampe is doing? (I am getting less than rational at this point because I already know that I am at the point of blindly clicking run at things I don’t understand.)

Before we can try out the extension, it needs building. In the classic notebook extensions I could simply install a package, but now I need to build one:

This is a step back to the old, pre-REPL days because there is a level of indirection here: I don’t get to try the code I’ve written, I have to convert it to somethig else. When it doesn’t work, where did I go wrong?

  • with the logic?
  • with the javascript?
  • with the typescript?
  • with the build process?
  • can the TypeScript be “right” and the javascript “wrong”? I have no idea…

I think things have improved with JupyterLab now that installing an extension doesn’t require an length delay as the JupyterLab environment rebuilds itself (which was a block in itself in earlier days).

Okay, so skimming the docs doesn’t give me a sense that I’d be able to do anything other than follow the steps, click the buttons and create the example extension.

How about checking a repo to see if I can make sense of a pre-existing extension that does something close to what I what?

The classic notebook collapsible headings extension allows you to click on a heading and collapse all the cells beneath it to the next heading of the same or higher level. It works by setting a piece of metadata on the cell containing the heading you want to collapse. A community contributed extension does the same thing, but uses a different tag, "heading_collapsed": "true" rather than "heading_collapsed": true (related issue). Either that or the JupyterLab extension is broken for some other reason.

Here’s what the repo looks like (again, I’m not intending to mock or attack the repo creator, this is just a typical, parodied, example):

Based on nothing at all except my own personal prejudices, I reckon every file halves the number of folk who think they can make sense of what’s going on… (prove me wrong ;-)

Here’s the Jupyter classic notebook extension repo:

So what do I see as the “hostile architecture” elements?

  • there is not an immediately obvious direct way to write some javascript, install it into JupyterLab and check the extension works;
  • in many example repos, a lot of the files relate to packaging the project; for a novice, it’s not clear what the extension files are, what the exntension files are, and whether all the project files are necessary
  • using TypeScript introduces a level of indirection: the user is now developing for JupyterLab, not for the end user environment they can view source from in the browser. (I think this is something I hadn’t articulated to myself before: in classic notebook extensions, you hack the final code; in JupyterLab, you write code in application land, and magic voodoo converts it to things that run in the browser.)
  • in developing for JupyterLab, you need to know what bit o f Juyterlab to hook into. There’s a lot of hooks, and it’s not clear how a (non-developer) novice can find the ones they need to hook into, let alone how to use them.

And finally:

  • there isn’t a “constructive” tutorial that builds up a minimally viable extension from a blank sheet an explained step at a time.

As I recall from years and years ago, if you ever see or hear a developer say “just”, or you can add a silent “just” to a statement ((just) build the node thing) you know the explanation is far from complete and is not followable.

Faced with these challenges of having to step up and go and do a developer course, learn about project tools, pick up TypeScript, andd try to familiarise myself with a complex application framework, I would probably opt to learn how to develop a VS Code extension on the grounds that the application is more general, runs by default as an desktop application rather than browser accessed service, has increasingly rich support for Jupyter notebooks, and has a wide range of other extensions to use, and crib from, and that can be easily discovered from the VS Code extensions marketplace.

PS I think followable is missing term in the reproducibility lexicon, in both a weak sense and a strong sense. In the weak sense, if you follow the instructions, does it work? In a strong sense, does the reader come away feeling that they could create their own extension.

PPS In passing, I note this from my Twitter timeline yesterday…

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

One thought on “My Personal Blockers on Getting Started With JupyterLab”

  1. wow! you have just written down what most amateur and professional software developers go through on a daily basis :-) testing is hard which is why most people don’t do it especially at the amateur and non professional levels. I don’t do testing :-) it should be built-in to the system and since it’s bolted on to R and Jupyter after the fact it’s harder to start writing tests. my chaotic efforts are on github just like you but my latest is here: https://github.com/rtanglao/rt-flickr-sqlite-csv

Comments are closed.

%d bloggers like this: