Preparing Jupyter Notebooks for Release to Students

Over the years, I’ve sketched various tools to support the release of notebooks to students, but as I’m not the person who prepares and distributes the releases, they never get used (“Tony hacking crap again” etc.;-).

Anyway, on the basis that the tools aren’t completely crap, and may be of use to others, perhaps even folk working on other modules internally that make use of notebooks and are using them for the first time this presentation, I’ll post a quick summary of some of them here. (And if they are broken, a little use and testing by not-me could well provide the bug reports and motivation I need to fix them to a level of slightly less possible brokenness.)

The package that bundles the tools can be found here: innovationOUtside/nb_workflow_tools.

First up, tm351nbtest is a tool that helps check whether the notebooks run correctly in the latest environment.

The notebooks we save to the private module team repo all have their cells run, in part so that we can review what the expected outputs are. (When checking in notebooks, the tm351nbrun --file-processor runWithErrors . command can be used to ensure all noebooks in the specified path have their cells run.) The nbval package is a handy package that runs the notebooks in the current environment and checks that the contents of the new output cell match those of the previous, saved output cell. (I keep thinking that jupyter-cache might also be handy here?) Cells that are known to generated an error can be ignored by tagging them with the raises-exception tag and cells you want to ignore the output of can be tagged with the nbval-ignore-output tag. Running the tool generates a report identifying each notebook and each cell where the outputs don’t match.

The next tool, nb_collapse_activities, checks that out activity blocks all have their answers precollapsed. Activities are tagged and coloured using the innovationOUtside/nb_extension_empinken extension; activities with answers use the classic notebook collapsible headings extension to collapse the cells beneath an activity answer heading block (all cells are collapsed to the the cell with a header at the same level or higher as the collapsed answer cell header). The nb_collapse_activities utility tries to identify answer head cells and whenever it finds one, adds heading_collapsed: true metadata.

The third tool also processes the notebooks for release: tm351nbrun --file-processor clearOutput clears the outputs of every code cell and essentially resets each notebook to an unrun state.

A fourth tool, nbzip, can be used to zip required notebook folders for release to students.

A sort of release process could then work soemthing like this. In the environment you want to test in:

# Install package
pip3 install --upgrade git+https://github.com/innovationOUtside/nb_workflow_tools

# When checking in notebooks, ensure cells are run
# Ensure that all cells are run even in presence of errors
tm351nbrun --file-processor runWithErrors .

# Test notebooks
tm351nbtest .

# Quality reports
## Whatever...

# Clear outputs
tm351nbrun --file-processor clearOutput .

# Collapse acvitity answers
nb_collapse_activities .

# Spell check
## However... Or run earlier before output cells cleared

# Zip files
# Whichever...

In passing, the nb_workflow_tools package also includes some other utilities not directly relevant to release, but occasionally handy during production: nb_merge to merge two or more notebooks, and nb_split to split a notebook into to or more notebooks.

I’ve also been exploring various approaches to spell-checking notebooks. These are currently being collected in innovationOUtside/nb_spellchecker and the various issues attached to that repo. When I have something reliable, I’ll add it to innovationOUtside/nb_workflow_tools. Another set of quality tools I had been working on but halted due to a universal “why would we want to know anything comparative about the contents of our notebooks” can be found in innovationOUtside/nb_quality_profile. At some point I’ll revisit this and then try to bundle them up into a simple CLI tool I can also add to nb_workflow_tools.

In passing, and for completeness, I’ve also started sketching some innovationOUtside/ou-jupyter-book-tools. The idea of these is that they can provide an intermediate publishing step, where necessary, that maps from cell tags, for example, to complementary Jupyter Book / MyST / restructured text markdown.

Fragment: Loading Data into pandas DataFrames in JupyerLite

Just a quick note to self about a couple of tricks for loading data files into pandas in JupyterLite. At the moment, a simple pandas.read_csv() is unlikely to work, but there a couple of workarounds have been posted over the last couple of months so I’ve wrapped them into a simple package until such a time as everything works “properly” – innovationOUtside/ouseful_jupyterlite_utils.

Install the package in jupyterlite as:

import micropip

package_url = "https://raw.githubusercontent.com/innovationOUtside/ouseful_jupyterlite_utils/main/ouseful_jupyterlite_utils-0.0.1-py3-none-any.whl"

await micropip.install(package_url)

And then load data into pandas as per:

from ouseful_jupyterlite_utils import pandas_utils as pdu

# Load CSV from URL
# Via @jtpio
URL = "https://support.staffbase.com/hc/en-us/article_attachments/360009197031/username.csv"
df = await pdu.read_csv_url(URL, "\t")

# Load CSV from local browser storage
# Via @bollwyvl
df = await pdu.read_csv_local("iris.csv", "\t")
df

I’ll add more workarouds to the package as I find them (working with SQLite files is next on my to do list…) and then remove items as they (hopefully) become natively supported.

My Personal Blockers on Getting Started With JupyterLab

Although at times the content of this blog may come across as somewhat technical, as anyone who has looked at any of my code would tell you, I am not a developer (actually, you could interpret that phrase in a lot of ways!). This post represents a stream of consciousness about some of the stumbling blocks that I perceive as preventing me from getting started building my own extensions for JupyterLab.

See also this related JupyterLab issue: Getting Started Docs for Non-Developers.

The code I write is generally written to get things done, not to form part of some production application. It is a means to an end. It’s poorly structured, and eclectically commented. There’s no linting. My repo commits are random collections of files with often vacuous commit messages. You would not want me committing code to your code base.

I typically categorise my code outputs into various classes:

  • code fragments, which are simple tricks or hacks for performing a particular effect, often something I’ve picked up from somewhere else. One fragment I need to record in a post somewhere is how to densify points along a geojson linestring, a trick I picked up here ; a recent fragment of my own shows how we might be able to style a JSON fragement that identifies the location of a typo in a text string: there may be better ways, “approved ways”, of doing this, but I didn’t find one when I looked so I made a really simple thing to try to do it myself;
  • code sketches, which often take the form of notebooks that describe a mini-project, one way of doing things. The notebooks in my Conversations With Data: Unistats repo are sketches as much as anything; this are often free-form, as I explore a particular topic;
  • code recipes start off as sketches, but then I try to tease out and explain some sort of task or process and perhaps tidy up the code a bit; producing a recipe involves a bit of iteration, trying to identify each step and the reason for it, and ensure that everything is complete: things like Visualising WRC Rally Stages With rayshader and R and Visualising WRC Rally Timing and Results Data are packed fill of recipes;
  • code hacks are the closest I get to production code, not in the sense of it being properly linted, commented and test but in the sense of something I install and use. My notebook extensions are all hacks.

You’ll note that I don’t write tests: I write a line of code at a time, and look at its output; if it doesn’t look right, or it breaks, I try to fix it. I rerun all the cells in a notebook with a fresh kernel a lot to check that things keep working; if the’re broken, I check to see why the broken cell has broken, then read back up to check each previous step is doing what I wanted and has fed forward what I intended it to feed forward.

When I first started using Jupyter notebooks, the classic notebook, they were still called IPython notebooks. (We started developing a course using notebooks in 2014 that went live, after a 6 month deleay, in 2016B (which is to say, February 2016).) To try to make the experience a bit more like the VLE, I hacked together an extension to augment the notebook with coloured cells to represent activities and to allow tutors to highlight cells they had commented in assessment feedback to students. That extension continues (though updated) as nb_extension_empinken.

Since then we have added various other extensions, such as a riff on empinken that styles cell based on bootstrap-like tags (nb_extension_tagstyler), the ipython_magic_sqlalchemy_schemadisplay extension to display (sort of!) database schemas for a connected database or an extension I liked but no-one else did to pop out cells into a floating widget so you could easily refer back to them (nb_cell_dialog).

Over the years, the repos have added clutter, bits of of automation, more elaborate approaches to packaging, but in the beginning, they were very simple, essentially just a README file, a setup.py file, a directory containing a simple __init__.py file and a static file containing the actual extension code in the form of an index.js file. The packaging structure was cribbed directly from other extensions (typically, the simplest one I could find, a minimum viable extension in other words), the setup for the extension was cribbed directly from other (minimally viable) extensions and much of the code was cribbed from other extensions. The empinken extension is essentially a toolbar button that added metadata to a cell, and a routine that iterated each notebook cell, checked the metadata and updated the css. There were other extensions that in whole or in part demonstrated how to do each of those tasks, which I then pinched and reassmbled to my own purposes.

The code was limited to py packaging (cribbed) and some simple js (largely cribbed).

The development environment was a text editor.

The testing was to install the package, refresh the notebook page and (with the help of browser developer tools) see where it was breaking until it didn’t.

The on-ramp was achievable.

So now we come to JupyterLab, which appears to me as a hostile architecture.

I’ll try to pick out what I mean by that. Please note that the following is not intended as a personal attack on the JupyterLab teams or the docs, it’s a pardoy as much as anything.

From the docs, the getting started is to install a development environment, and I’m already lost. When ever I try to use node it seems to download the internet and the instructions typically say “build the package” without saying what words of incantation I need to type into the command line to actually build the package (why should I just know how to do that?)

The next step is to install a cookiecutter. This doesn’t necessarily help because I have no idea what all the files are for, whether they or necessary, or what changes can be made to each one to perfrom a particular task. I’d rather be interested to a minimally viable set of files one at a time with an explanation of what each one does and containing nothing uncommented that is not essential. (Some “useful but optional” fragments may also be handy so I can uncomment them and try them out to see what they do, but not too many.)

When it comes to trying out some example code, I need to learn a new language, .ts (which is to say, TypeScript). I have no idea what TypeScript is or how to run it.

I also need to import load of things from @ things, whatever they are. If I’m trying to figure out how to do a thing by cribbing code from seeing what files are are loaded to support a working extension in my browser (which I suspect is way harder to do with JupyterLab than it was from classic notebook) I’m not sure if thre’s an obvious way to re-engineer the TypeScript code from the Javascript in the browser that does something like what I want to do.

I’m not totally sure what “locate the extension means? Is that something I have to do or is it something the code exampe is doing? (I am getting less than rational at this point because I already know that I am at the point of blindly clicking run at things I don’t understand.)

Before we can try out the extension, it needs building. In the classic notebook extensions I could simply install a package, but now I need to build one:

This is a step back to the old, pre-REPL days because there is a level of indirection here: I don’t get to try the code I’ve written, I have to convert it to somethig else. When it doesn’t work, where did I go wrong?

  • with the logic?
  • with the javascript?
  • with the typescript?
  • with the build process?
  • can the TypeScript be “right” and the javascript “wrong”? I have no idea…

I think things have improved with JupyterLab now that installing an extension doesn’t require an length delay as the JupyterLab environment rebuilds itself (which was a block in itself in earlier days).

Okay, so skimming the docs doesn’t give me a sense that I’d be able to do anything other than follow the steps, click the buttons and create the example extension.

How about checking a repo to see if I can make sense of a pre-existing extension that does something close to what I what?

The classic notebook collapsible headings extension allows you to click on a heading and collapse all the cells beneath it to the next heading of the same or higher level. It works by setting a piece of metadata on the cell containing the heading you want to collapse. A community contributed extension does the same thing, but uses a different tag, "heading_collapsed": "true" rather than "heading_collapsed": true (related issue). Either that or the JupyterLab extension is broken for some other reason.

Here’s what the repo looks like (again, I’m not intending to mock or attack the repo creator, this is just a typical, parodied, example):

Based on nothing at all except my own personal prejudices, I reckon every file halves the number of folk who think they can make sense of what’s going on… (prove me wrong ;-)

Here’s the Jupyter classic notebook extension repo:

So what do I see as the “hostile architecture” elements?

  • there is not an immediately obvious direct way to write some javascript, install it into JupyterLab and check the extension works;
  • in many example repos, a lot of the files relate to packaging the project; for a novice, it’s not clear what the extension files are, what the exntension files are, and whether all the project files are necessary
  • using TypeScript introduces a level of indirection: the user is now developing for JupyterLab, not for the end user environment they can view source from in the browser. (I think this is something I hadn’t articulated to myself before: in classic notebook extensions, you hack the final code; in JupyterLab, you write code in application land, and magic voodoo converts it to things that run in the browser.)
  • in developing for JupyterLab, you need to know what bit o f Juyterlab to hook into. There’s a lot of hooks, and it’s not clear how a (non-developer) novice can find the ones they need to hook into, let alone how to use them.

And finally:

  • there isn’t a “constructive” tutorial that builds up a minimally viable extension from a blank sheet an explained step at a time.

As I recall from years and years ago, if you ever see or hear a developer say “just”, or you can add a silent “just” to a statement ((just) build the node thing) you know the explanation is far from complete and is not followable.

Faced with these challenges of having to step up and go and do a developer course, learn about project tools, pick up TypeScript, andd try to familiarise myself with a complex application framework, I would probably opt to learn how to develop a VS Code extension on the grounds that the application is more general, runs by default as an desktop application rather than browser accessed service, has increasingly rich support for Jupyter notebooks, and has a wide range of other extensions to use, and crib from, and that can be easily discovered from the VS Code extensions marketplace.

PS I think followable is missing term in the reproducibility lexicon, in both a weak sense and a strong sense. In the weak sense, if you follow the instructions, does it work? In a strong sense, does the reader come away feeling that they could create their own extension.

PPS In passing, I note this from my Twitter timeline yesterday…

Fragment: A Couple of Unofficial and Unofficial Unofficial Jupyter Extensions for VS Code and the Future of Rich Visual Editing of Interactive Generative Texts

To the extent that the Jupyter Extension for VS Code represents the “official” VS Code extension, if not an official Jupyter extension for VS Code (which does not exist, at least not in the jupyter project namespace on Github, nor I suspect on the basis of core Jupyter team contributions, it does a pretty good job, and gets more fully featured with every release.

But to a certain extent, it’s still lagging behind what you can do in even the classic notebook UI. For example, a feature I increasingly make use of is the ability to edit cell tags. These can be used by extensions to modify the presentation of cells in the user interface (I have extensions for that in class notebooks, but not in JupyterLab/RetroLab) or in downstream rendered materials such as Jupyter Book outputs. Where Jupyter Book doesn’t directly exploit cell tags, we can hack string’n’glue tools to transform tagged cells to markup that Jupyter Book / Sphinx can make use of…

Anyway, whilst cell tag editing is not supported in the official (from the VS Code side) unoffical (from the Jupyter side) Jupyter extension, it is supported by an unoffical (from the VS Code side) unoffical (from the Jupyter side) Jupyter Powertools VS Code extension for the VS Code Insiders build of VS Code.

From the blurb, this includes:

  • Shrinking traceback reports to allow you to see just the error (cf. the “Skip traceback” classic notebook extension);
  • Cell magics (& code completion for cell magics);
  • Generate reveal.js slideshows and preview them inside the VS Code environment;
  • Edit cell metadata (tags, slide metadata);
  • Automatic syntax highlighting of cell magics (e.g. using %%html will provide syntax highlighting and language features for HTML in the cell)
  • Toggle cell from Markdown to Cell with toolbar icon (and vice versa)

In passing, I also note a couple of WYSIWYG markdown editor extensions (ryanmcalister/unotes and zaaack/vscode-markdown-editor for example), so it’s not hard to imagine having a VS Code Jupyter environment with WYSIWYG markdown editing in the markdown cells although I’m not sure how easy that is to achieve in practice.

PS in passing I note that there is currently a flurry of interest on the official Jupyter discourse site – Inline variable insertion in markdown – in getting inline code into markdown cells in Jupyter notebooks, cf. the inline `r CODE` fragments that are natively available in Rmd, for example. There are certain practical issues associated with this (how are the markdown cells with code references updated for example) and there are already hacky workarounds (eg my own trick for Displaying Jupyter Notebook Code Cell Content As Formatted Markdown/HTML Code Cell Output using a Python f-string magic) for cases where you aren’t necessarily interesting in reactive updates, such as when generating static Jupyter Book outputs. I keep wondering if things like the reactive kernel implemented as davidbrochart/akernel might also provide another solution?

PPS on my long list, one thing I’m hoping to see at some point is an executable document editor running purely in the browser on top of something like a JupyterLite kernel. There are a couple of different directions I think this could come from: one would be getting something like the Curvenote editor or a Stencila editor working in the browser on top of a JupyterLite WASM powered backend, and the other would be some plugins to make a Jupyter Book editable, cf. a wiki, with edits saved to browser local storage. This would tie you to a particular browser on a particular machine unless browser synching also syncs local browser storage, but for folk who work on a single machine, that would be an acceptable constraint.

PPPS See also The World Moves On – Jupyter Classic Notebook No Longer the MyBinder Default UI.

The World Moves On – Jupyter Classic Notebook No Longer the MyBinder Default UI

See the official announcement here.

I’ve been tinkering with web apps and and using third party APIs for long enough now to have a sense for the lifecycle of various projects.

In the early days, things are innocent and open, and traction is often generated because its easy for folk to try things out: the frameworks are simpler and eaasier to use than previous ones, so folks use them; the APIs don’t require API keys, because no-one’s abusing them; the features are limited, because the service or app is still young, and whilst the docs may or not comprehensive, they’re still small enough to find your way around; and the relatively simplicity of the codebase (because it’s still small) means it’s not too hard to poke around to figure out to do things.

Then the project gets popular, and bigger, and more complex, and it’s harder to play with. It becomes more “enterprisey”, even if it’s still an open source project: the development environment starts to become ever more complex, the code becomes more compartmentalised and elaborate, and unless you’re working with it regularly, it can be hard to get a sensible overview of it.

And the examples often get more elaborate. The sort of examples where you already need to have a good mental model of the whole code framework to make any sense of the examples.

And as things move on, they become more exclusive. I have limited developer skills and limited time. My personal approach works for identifying early stage projects or apps that might have potential (at least as I see it). Low barrier to entry for folk who want to get stuff done with the application or package, and perhaps customise it.

When spotting new tech, new code packages, new ideas, I see if I can get something simple, but novel, working, or that perhaps solves one the dozens of things on my “it would be handy if I had a thing to do X” list, within half an hour (the half an hour is elastic up to one to two hours!)- a half hour hack. And from a standing start.

And if I can’t, then I figure it’s probably too hard for large numbers of other people to get started with too; which will limit its adoption.

And so it is with the Jupyter UIs. The classic notebook was relatively straightforward to build simple extensions for, but JupyterLab continues to be beyond me. Classic notebook is minimally maintained, but the core Jupyter developer effort in UI terms is based on JupyterLab and UIs based around that framework (such as RetroLab). There are other UIs too that better suit my needs:

  • VS Code is increasingly powerful as a notebook editing and development environment (see for example the recent addition of rich visual differencing of different versions of the same notebook); VS Code can also support the creation of generated materials without the need for a code executing Jupyter back end. See for example VS Code as an Integrated, Extensible Authoring Environment for Rich Media Asset Creation.
  • for authoring simple notebooks and interactive texts, the visual editor in RStudio, which looks like it will soon be split into a simpler, standalone editor, the Quarto editor, looks promising;
  • the Curvenote editor is one to watch although I don’t really have a sense yet as to whether this will gain traction as a self-hosted or locally deployable UI irrespective of the viability of the hosted offering… Stencila is still also a thing, but it just never quite seems to be able to break through, and may now just be too complex to generate any momentum amongst have-a-go early adopters looking for a better way.

But the need to find an environment that works for me and that can be widely shared to others is starting to become a blocker (see OpenJALE for my current environment). It’s pressingly timely, I think, for a simple, ideally extensible, editing environment for line at a time coding that can be used to write linear instructional materials, interactive texts and generative materials (documents that embed source code to create assets such as charts, tables and interactive widgets; see for example Subject Matter Authoring Using Jupyter Notebooks).

The reason I need to start looking for a new approach is that it’s starting to look like the classic Jupyter notebook is coming to the end of general availability. The latest sign of this is the announcement that MyBinder launched environments will now, by default, be launched into the JupyterLab UI rather than the classic notebook UI.

For anyone who has repos with environments defind to use the classic notebook and notebook extensions that don’t work in JupyterLab, tweaks will need to be made to MyBinder launch buttons and URL to ensure that the URL path is set to /tree (in a launch URL: ?urlpath=/tree/ ).

Increasingly, to use the classic notebook, you’ll need to know where to find it. Because by default the assumption will be that you want to enter a complex developer environment, not a simple notebook authoring UI.

PS The Jupyter project is a fantastic initiative for making access to compute and arbitrary code execution possible. I think the classic notebook UI had a large part to play in providing an on-ramp to getting started with code for a lot of people: a clean, simple design, with a minimum of clutter and relatively easily extended and customised with simple HTML, Javascript and CSS. It has also provided a way for disciplines to bring computational approaches, particularly a line of code at a time approaches, to a wider audience through narrated and contextualised code scripts that perform particular tasks within the context of a human readable document. But the JuptyerLab UI is not that. It’s a piece of hostile architecture to pretty much everyone who isn’t a developer. You may be able to get to a notebook within the JupterLab UI. But you will have already put the fear into folk that it will be too complicated for them to understand. Making JupyterLab the default UI is off-putting to anyone who opens up the UI for the first time after hearing that Jupyter notebooks provide an “easy way” to get started with computing, because it looks just like any other terrifyingly complex IDE. Not a simple file listing and a simple Word Processor app that can execute code and display the results.

A Quick Look at the Quarto Pandoc Publishing System and Visual Markdown Editor

Picking up on Noting: the Quarto Rich Authoring Environment for Generative Texts, here’s a quick review of a few things that jumped out at me about Quarto, “[a] scientific and technical publishing system built on Pandoc”.

This is of particular interest to me at the moment giving I’ve recently been lobbying internally for Pandoc support for OU-XML. (I say “recently”, but I’ve been idly pitching it once every few months or so for at least a couple of years.;-)

The main objectives of the pitch were:

  • To create a conversion utility to support the conversion between OU-XML and Jupyter notebook (ipynb) formats, to support Computing, Mathematics, and Statistics modules currently in production
  • To create a conversion utility for converting between OU-XML and markdown, to support the production of FutureLearn courses and Microcredentials

The folk over at RStudio have obviously seen value in Pandoc as a tool for generating a wide range of potential output document types, including PDF, MS Word, MS PowerPoint, HTML slides and EPUB ebooks, as well good ol’ HTML. And they have not unreasonably taken the view that using Markdown as a base document format, albeit with certain extensions, simplifies the tooling on the editor side. That and the fact that they have been using Pandoc since forever for knitr and bookdwon publishing workflows, and were already on the way to developing a rich visual editor as part of RStudio.

For me, the ability to point to a visual editor already hooked into Pandoc is a timely piece of evidence I can appeal to as to why Pandoc makes sense as part of an (authoring and) publishing system: even if we only use it for generating previews, if Pandoc can be used to export the created content into OU-XML, we can continue to use our current publishing processes to render content published to students. (Or we might choose to explore Pandoc as a generator of output format documents too.) And as more modules show an interest in using Jupyter notebooks, we could also use OU-XML, vis Pandoc conversion, to gold-master notebooks if anyone really felt the need to. We could also author VLE content using notebooks and convert it to OU-XML, or transform legacy content into noteboooks. I also still think we can use things like notebooks to author content across the curriculum for OU-XML mediated publication via the VLE and other “traditional” channels (see various subject specific demos in Subject Matter Authoring Using Jupyter Notebooks, for example, or some notes on generating reusable educational assets particularly in the “generative production” context). And so on.

The TL:DR as far as Quarto goes is that we can considering the Quarto publishing system in three parts:

  • the visual editor: visual markdown editor that allows toggling between visual preview and source markdown view;
  • the publishing tools (the quarto CLI): command line tools for converting between the new .qmd document format and the Jupyter notebook .ipynb format;
  • the .qmd file format.

The visual editor is currently only available bundled into an RStudio preview (there is a visual editor in RStudio, but maybe not as feature rich?) The editor can be used to edit a new project type, a Quarto Project.

This seems to allow you specify a connection to an automatically discovered Jupyter kernel. I’m pretty sure this is the first time I’ve seen RStudio starting to integrate with a Jupyter server, and it represents another sign that Jupyter mediated remote code execution environments are starting to become part of a wider ecosystem independently of the Jupyter user interfaces such as Jupyter classic notebook, JupyterLab and RetroLab. (See my glossary which explains in part why Jupyter is NOT notebooks...)

As with a lot of emerging tech, the UI is the bit that takes forever to get right even if the rest of the technology stack is mostly working and completely useable by folk who are prepared to get their hands dirty and put up with the occasional snafu. That said, I didn’t actually get much further with the RStudio preview Quarto editor – I wanted to try out Python projects but trying to run any Python code just caused an error in trying to find a Python environment, suggesting I install miniconda. Checking other RStudio projects where I had successfully been running python code cells, and they were now all borked too. So the way the new RStudio preview is using reticulate seems to be broken for me at least (I’ve since rolled back to the latest official RStudio release and my old py code chunks are happily running again…). In retrospect, I should have played with the editor features and tried to run some R code before I reverted. It’s probably also worth wating for things to become a little more stable and wait for the standalone editor to appear before I play with it again becuase that’s probably the sort of tool that would be most likely to be considered as appropriate internally. In the meantime, the current official RStudio release does have a basic visual editor, so visual editing of Rmd files etc is possible, if not full integration with the quarto publishing tools.

Suffice to say that from the docs it looks as if there are quite a few UI and underlying framework tools to support technical writing (equations, figure references, citations, footnotes etc.) and more general content authoring (tables, lists, spell-checking, non-breaking spaces and so on) as well as executable code and the embedding of generated code outputs, as well as the display of non-executable code.

The publishing tools are presented as a set of CLI tools that include conversion tools and previewing tools. Out of the can, R, Python and Observable JavaScript code chunks are supported (I’m not sure about the extent to which arbitrary language code can be executed via an appropriate Jupyter kernel?).

The command line toos include:

  • quarto create-project PROJDIR: create a new project in the current or specified directory;
  • quarto render DIRECTORY: render files in the current or specifed directory; the --execute flag will execute code blocks before rendering the output; where possible, knitr cache and jupyter cache may be used to to cache cell outputs;
  • quarto preview DIRECTORY: render and preview files in the current or stated directory using a live-reloading development server. The preview view will update if any updates to project files are saved;
  • quarto convert FILENAME: convert a Jupyter .ipynb document to .qmd format and vice versa.

Under the hood, there seems to be yet another yet another markdown format: .qmd. This format resembles an earlier yet another markdown format, .Rmd, also managed by RStudio, but tries to abstract away from the original R basis of .Rmd. The minor differences perhaps also make it a little easier to handle Pandoc attributes or bring it more into line with Pandoc Markdown. Some of the notable points about the specifics of the Markdown flavour are summarised here.

I don’t know if there is a formal description of the .Rmd or .qmd formats anywhere (I struggle to call them “standards” because I can’t find a reference standard for theml; the best reference for .Rmd appears to be the rstudio/rmarkdown repo, but I didn’t spot even an .Rmd validator there?) so it’s not immediately obvious about how easy it might be to map between something like MyST and .qmd. Support for .qmd within Jupytext looks to be well on the way (eg see this issue) as a result of integrating the quarto conversion tools into jupytext. This strikes me as a bit risky: the .qmd “standard” is under the control of RStudio rather than an open community process, and the quarto conversion tool is also controlled by RStudio, albeit from code in a public repo with a GPL licence.

I’ll do a proper comparison of .qmd, Jupytext markdown and MyST in another post somewhen soon… In part, I’m waiting for a PR to appear that offers .qmd support in Jupytext just in case there are minor tweaks that result from getting that PR through.

From a quick play with the quarto conversion tools to convert a notebook tagged with cell tags, cell level metadata seems to be handled by special comment lines (with a #| prefix at the start of a code chunk.

In the jupytext issue thread, there’s also comment regarding things such as “re-writ[ing] the [.qmd] cell option YAML into tags (e.g. #| echo: false needs to become tags: ["remove_cell"])”. This immediately flags to me that we could very quickly start getting a combinatorial explosion in the number of ways that “sort of” equivalenent tags and markup need to transformed across different markup styles. For example, I’ve already started hacking together string’n’glue tools to convert the tags I use to trigger extension powered styling in classic notebooks to MyST markup that works with Jupyter Book, and I also noted in this issue that a JupyterLab extension that supposedly replicates a classic notebook extension I use (collapsible headings) uses a different metadata convention to the original extension.

A screenshot from the quarto docs also shows how the YAML header information needs to be captured in the form of a raw code cell at the start of a notebook. This makes the notebook look a little bit ragged when viewed in a Jupyter notebook editing UI and would perhaps be cleaner if it could be hidden as notebook level metadata. However, the YAML header is also familiar e.g. as part of Jupytext markdown and MyST frontmatter. Hopefully, there won’t be too many differences or inconsistencies or discontinuities between these various elements and any conversion required will be quite obvious.

The format is also compliant with the approach used by Papermill for support launching parameterised notebooks and adopts the same convention for specifying notebook parameters.

There is a discussion ongoing about supporting .qmd in Jupytext, although it all feels quite ad hoc, and issues surrounding things like how to handle extensible style concepts (hidden cells and cell outputs, admonitions etc.) seems out of scope of that discussion.

Certainly, there doesn’t yet seem to be clear leader, yet alone winner, in terms of one true standard to rule them all for marking up rich/high level/interactive presentation style features in executable markdown docs. (My personal preference is probably starting to tend towards MyST and it’ll be interesting to see if a quarto editor gets uptake and drives adoption of .qmd).

On my to do list is a proper comparison of how the Quarto “callout” markup maps onto MyST/Jupyter Book “admonitions” for example.

That said, as long as lossless mappings between formats are supported using something like Jupytext, I guess it’s not a blocking issue if different tools prefer different native formats? However, it may be that jupytext needs a plugin mechanism so that folk can plug in their own conversion tools to convert particular tag formats of conventions used by one publishing framework such as Jupyter Book to work with another such as nbsphinx, quarto, knitr or stencila.

PS My OU-XML/Pandoc proposal (copies of the proposal in its various incarnations available on request) seems to have got bogged down by loads of meetings and stage gate processing so it may or may not happen. I’m happy to help work through the technical issues, but not waste more time restating arguments about why it’s a good idea and rewriting the proposal again and again to include elaborate plans, goals and objectives that will probably need reviewing at first contact with the detail. We should just try to build the thing and it’ll either work losslessly or it won’t; and if it won’t, it’ll work usefully or it won’t. And in the double won’t case, we’ll have learned more about OU-XML as a format, OU-XML as it’s actually used to mark up modules, and learned something about Pandoc, so we’ll have a net win of learning something anyway (the project stream it was pitched into was called Test and Learn… Unless, of course, any work is put out to external consultants or contractors, in which case we’ll continue to not learn internally about contemporary tooling, whether or not we adopt it. Go figure…).

PPS In the meantime, I keep meaning to revisit my hacky OU-XML XSLT tools to see how much OU-XML structure I can force into MyST or Jupytext-markdown format (and/or maybe even the new Quarto markdown format, .qmd. And given pandoc tools work with those formats, it’ll give a hacky way round the edges to give some sort of crappy as-if/maybe-works-a-bit conversion from OU-XML into notebook format, if not the reverse. The reverse bit (md2OU-XML) is where the institution gets value in terms of being able to use content authored in notebook / RStudio / Quarto like environments in OU workflows. The forward path (OU-XML2md) allows us to get content out of the gold-master format and into useful formats and environments ex- of the official publishing route. I now realise why folk are reluctant to engage: they don’t want to lose content into arbitrary formats and publishing environments (the forward path), and they don’t want folk authoring using arbitrary authroing tools (the reverse path) that might not create valid OU-XML (not that the schema is easy to find to validate files…). And they don’t realise that even it it only half works, it might still be quicker than, and even just as effective as, the shonky processes we already have in place. And it might also act as a driver to moving towards a simpler document format, which I seem to recall was mooted in the last few years. [For ref, internal folk can see the current OU-XML tag structure here.]

PPPS In passing, I should probably also do a review at some point of the Curvenote editor which builds on Prosemirror and provides a rich browser based editor. (I don’t think there’s a free standing demo, but you can try it out via the Curvenote collaboration website which offers notebook file sharing and commenting via a browser extension.)

The Stencila editor is probably also worth revisiting, though to my mind this is now way to complex for me to try out. A March 2021 blog post suggested that Stencila integration with the Libero XML editor would start to offer the ability to “encode code chunks and expressions as JATS XML, … allow[ing] on-the-fly conversion of executable formats such as Jupyter Notebooks and R Markdown, [so that they could] be opened within Libero Editor”. An electron based desktop app was also promised, but I’ve not seen any sign of it as yet (the original Stencila app that I had high hopes for was deprecated years ago). UPDATE: in progress here.

And finally, whilst on the topic of XML editors, I’ve just started exploring this XML editing extension for VS Code: redhat-developer/vscode-xml.

Noting: the Quarto Rich Authoring Environment for Generative Texts

And so it seems that internal discussions are starting to take place about possible Jupyter flavoured production workflows (I wasn’t part of the working group; my 30 page comment response is available to anyone who wants a copy;-). This is led mainly by a unit that has to date had no real experience or apparent interest in the Jupyterverse, what it signifies or where it and related approaches might be heading at a pace faster than the internal one.

One of the comments I keep expecting to hear (and have heard in the past) is about the lack of rich editing tools in the official Jupyter enviornments.

This argument may have held some weight a couple of years ago, albeit ignoring the fact theat there are some extensions for supporting rich editing in official Jupyter UIs, although they do have some “issues”; the classic notebook rich editor, for example, is an HTML editor rather than a markdown editor, so it craps up your markdown as tagged HTML if you make any changes to the text.

But as more and more tools provide the ability to connect to a Jupyter server (for example, VS Code), or edit alternative formats that can be converted to Jupyter notebook .ipynb documents using converters such as Jupytext, the lack of rich authoring environments can no longer be argued. (That said, I now expect arguments to go the other way, the editing environments are too complicated.)

RStudio is an example of one such environment that can be used to edit one of the Jupytext convertable document formats —Rmd (R Markdown) — in a rich editor (see for example Authoring Notebooks from RStudio). VS Code is another. Python code can be executed against a specified Python environment, although I’m not sure whether you can connect to a particular IPython kernel over a Jupyter server connection?

However, I also note more recently the appearance of Quarto, “[a] scientific and technical publishing system built on Pandoc”. The Quarto visual editor which provides “a WYSIWYM editing interface [“What You See Is What You Mean”, apparently] for all of Pandoc markdown, including tables, citations, cross-references, footnotes, divs/spans, definition lists, attributes, raw HTML/TeX, and more”.

Furthermore, “[t]he visual editor also includes support for executing code cells and viewing their output inline”. Currently, the Quarto editor is accessed as an RStudio extension, although it seems as if the editor will be made available as a standalone editro at some point.

For a review of using VS Code for authoring rich content, see for example Authoring Notebooks in VS Code. See also Connect to a remote Jupyter server and VS Code as an Integrated, Extensible Authoring Environment for Rich Media Asset Creation. Note that VS Code can also be used to edit and run Jupyter notebook .ipynb documents.

I’ll try to do a full review over the next week or two…

PS In passing, I note that the same unit leading the “Jupyter production” project also seems to be making a landgrab for content production in the form of “learning developers who will …  take the lead on course content, informing overall course design, and creating effective learning journeys through digital media and learner communications and engagement.”

PPS I do, of course, selectively misquote; the advertised jobs are for “three Senior Learning Developers to join our cross-functional course design and production team working on microcredentials and short courses on FutureLearn” so in the short term the idea is that they get to try things out producing ‘Non-core” materials, rather than materials intended for full qualifications. I note in passing that my department is apparently behind a DataSkills by The Open University bootcamp, funded by the Department for Education, in which learners will “work towards the internationally recognised Microsoft Azure Data Fundamentals (DP-900) and Microsoft Data Associate (DA-100) certifications” using Microsoft Power BI. Good to know. I wonder if we should be changing our own data management course away from open source technologies and instead use these same technologies to bring our various offerings into alignment? And if not, why not? (There are good arguments why we might not want to sully our full qualification modules with vendor certified training content (as some would have it…), but the move to non-academic units creating our “commerical” training (sorry, bootcamp) content is… interesting…

Code Fragment: Highlighting Typos

Another of those occasional examples of why I think it can be handy folk know how to code… a simple code fragment to style the output of a spell checker:

# Spell checking and grammar checking using:
#
# - https://github.com/jxmorris12/language_tool_python/
# - https://languagetool.org/

# %pip install --upgrade language_tool_python

import language_tool_python
tool = language_tool_python.LanguageTool('en-US')

text = 'This sentence is fine. A sentence with a error in the Hitchhiker’s Guide tot he Galaxy'
matches = tool.check(text)
matches


def styler(error):
    html = error.context
    from_ = error.offsetInContext
    to_ = from_ + error.errorLength
    txt = html[from_:to_]
    html =  html[:from_] + '<span style="color:red">' + txt +"</span>"+ html[to_:]
    print(f"**{html}")
    return html


from IPython.display import HTML
display(HTML(styler(matches[0])))

display(HTML(styler(matches[1])))

I did think I’d find a simple Jupyter extension to add languagetool support (using something like language_tool_python) to check notebook markdown cells, but from a quick search, I couldn’t find one offhand…?

Fragment: Author Support Tools – Markdown in VS Code

In passing, some exgtensions I’m using to support authoring markdown in VS Code:

See also: VS Code as an Integrated, Extensible Authoring Environment for Rich Media Asset Creation and the (should now be deprecated?) Fragment: More Typo Checking for Jupyter Notebooks — Repeated Words and Grammar Checking.

“Save and Reveal” Discussion Activities in Moodle VLEs and Jupyter Notebooks

Having a quick peek at some of the materials that have been produced for a new OU course that I’ve had nothing to do with, I notice a new to me VLE interaction style in the form of a FreeResponse text box…

The activity design provides some set up, encourages the the learner to make (and record) a free text response, and (once some text has been enetered into the free response area), a sample discussion can be displayed.

In the underlying markup, the structure is defined as an activity embedding a question, an interaction and a discussion.

<Activity>
  <Heading></Heading>
  <Timing></Timing>
  <Question>
     <Paragraph>In your own words, write brief definitions of the following terms that you’ve seen so far.</Paragraph>
     <BulletedList>
       <ListItem>supervised learning</ListItem>
       <ListItem>unsupervised learning</ListItem>
     </BulletedList>
  </Question>
  <Interaction>
    <FreeResponse size="formatted" id="act_x"/>
  </Interaction>
  <Discussion>
    <Paragraph>Your descriptions will, of course, differ from mine. That’s OK, so long as you’ve understood the key idea behind each term.</Paragraph>
     <BulletedList>
       <ListItem>Supervised learning: learning where the training data is labelled with the ‘correct answer’, such as the correct classification or final result.</ListItem>
      <ListItem>Unsupervised learning: learning where the training data has no labels so the learning system has to make its own discoveries about what the data means.</ListItem>
    </BulletedList>
  </Discussion>
</Activity>

Here’s what the underlying OU-XML looks like;

In the materials I produced for a module update last year, I’d started making use of a related pattern we’d started exploring in the data management module notebooks, specifically, explicit calls to action that get students to engage in note taking and reflection at certain points in the notebook.

My actual aim is to get students to take full ownership of the notebook materials so that they are more likely to annotate them at will; but for a first year equivalent module at least, or in a module where students are exposed to Jupyter notebooks for the first time, we need to coach learners into feeling that they can and should really make their own inline notes when they feel it is useful to do so…

Here’s an example of a simple call out suggesting a student makes some form of commentary. This is done outside of a “formal” activity and is presented in line, almost as a prompt to make a marginal comment:

Structurally, the coloured section is rendered using the nb_extension_empinken Jupyter notebook extension based on a cell tag:

For other examples of using the nb_extension_empinken extension, see “Try it and See” Interactive Learning Activities in Jupyter Notebooks.

Calls to action for making comments as part of an activity are also used, and are supported by a hidden example discussion in an activity design that matches the activity design used in VLE FreeResponse activity:

The design pattern is actually a very old one that’s been used in OU materials for decades. I first saw it explicitly referenced in the OU materials bible, the SOL (“Supported Open Learning” guide) along with the notion of worked examples as if voiced by a tutor at your side.)

The markup for the call to action in my embdedded activity is not a million miles away from the markup used in OU-XML. Here’s what the markup looks like when the Jupyter notebook is saved in the myst-nb format using Jupytext:

Jupyter notebook tags for certain style markup can also be transformed into myst-nb admonitions that can be rendered natively in a Jupyter Book output. See for example the tags2myst utility in ou-jupyter-book-tools.

Generating this sort of markup from the OU-XML should be easy enough (eg using a XSLT transformation) but the reverse may be trickier when it comes to ensuring that all the cells associated with a single activity are hirearchically grouped as part of that activity. (Creating an activity ID applied to each cell as a tag or other metadata would be one say of solving this. But an issue still arises as to how the grouping should be managed within the notebook UI: selecting all cells associated with an activity an then clicking a toolbar button to group them would be one solution, but a method would still be need to use some optional style to indicate that contiguous cells are all part of the same group. I’m not familiar enough with CSS to know if or how that could be easily done using the notebook HTML structure?)

PS this has got me wondering whether I should put together a document that demonstrates possible mappings from all OU-XML tags, eg as described in the the OU-XML Structured Content Tag Guide (OU staff only). It might be useful if I also mined existing OU-XML documents (eg as per Reusing Educational Assets) to see if I could pull out various activity designs (such as the question-interactiveFeeText-discussion pattern) and demonstrate how those might appear in a Jupyter notebook editor, Jupyter Book HTML output, or myst-md “source” document.