Fragment – TM351 Notebooks Jupyter Books In the VM and Via An Electron App

Another fragment because I keep trying to tidy the code so it’s ready to share and then keep getting distracted…

Previously, I’ve already posted crude proof of concept demos transforming OU-XML to markdown and rendering it via Jupyter Book and wrapping a Jupyter Book in an electron app shell.

I’ve managed to get a Jupyter Book generated from TM351 notebooks working under a local server inside the TM351 VM executing code against the notebook server in the VM, and also managed to connect the electron app version to the notebook server inside the VM. (The TM351 VM already runs a webserver, so I could just define the Jupyter Book to live down a particular directory path when I built it, and then pop it into a corresponding subdirectory in the root directory served by the webserver.)

The following not very informative demo shows me running a query against the database inside the VM twice, adding a row with a particular PK (which works) then trying again (which correctly fails because that PK is already in use).

The magic we use inside the VM for previewing a schema diagram of the current db also seems to work.

The VM solution currently requires two sorts of trust for executing code against the VM kernel from the electron app to work correctly: firstly, that the server is on a specified port (we default to a mapping 35180 when distributing the VM and this port is baked into the electron app book pages); secondly, that no-one attacks the VM — the notebook server is running pretty wide open:

jupyter notebook --port=8888 --ip=0.0.0.0 --no-browser --notebook-dir=/vagrant/notebooks --allow-root --NotebookApp.allow_origin='*' --NotebookApp.allow_remote_access=True  --NotebookApp.token=notverysecret

One of the nice things about the Jupyter Book view is that it bakes in search. I’ve had a couple of aborted looks at notebook search before (eg Initial Sketch – Searching Jupyter Notebooks Using lunr and More Thoughts On Jupyter Notebook Search as well as via nbgallery) and the Jupyter Book solution uses lunr which was one of the things I’d previously used to sketch a VM notebook search engine.

At the moment, the search part works but the links are broken – a template needs tweaking somewhere,  I think, or in the short term we could workaround with a hacky fix by post-processing the generated pages and fixing the broken URLs with a string replace.

One of the problems with lunr is is that the index is baked into a JSON file. datasette is now able to serve feeds off the current state of a SQLite3 database, even if the database is updated by another process, which makes me wonder: is one way of getting a “live” notebook search into the VM would to monitor the notebook directory, update a SQLite database with current notebook contents whenever the a notebook is added or updated (we could archive deleted notebooks in the database and just mark them as deleted without actually deleting them) and then use datasette to generate and serve a current JSON index to lunr whenever a page is refreshed?

So what does this electron and Jupyter Book stuff mean anyway?

  1. we can wrap TM351 notebooks in a Jupyter Book style UI and serve it inside the VM against the notebook server running in the VM. This provides an alternative UI and some interesting pedagogical differences. For example, in the traditional notebook UI, students are free to run, edit, add and delete cells, whereas in the book UI they can only run and edit them. Also, in the book, if the page is refreshed after any edits, the original code content will be re-presented and the edited version lost;
  2. we can wrap the TM351 notebooks in a Jupyter Book style UI and wrap that in an electron container. A student can then view the book UI on the desktop and if the VM is available on the expected port, execute code against it. If the VM is not available, the students can still read the notebooks, if not actually execute the code. (One thing we could do is publish a version of the book with all the code cells run and showing their outputs…)
  3. the electron app could also be built to tell ThebeLab (which handles the remote code execution) to launch a kernel somewhere else, such as on MyBinder or on an OU managed on-demand Binder kernel server. (I guess we could also add electron menu options that might allow us to select a ThebeLab execution environment?) This latter approach could be interesting: it means we could provide compute backend support to students *only* when they need it in quite a natural way. The notebook content will always be available and readable, even offline, although in the offline case code would not be (remotely) executable.
  4. distributing the notebooks *as an app* means we can distribute content we donlt want to share openly and in public whilst still freeloading on the possibility of executing code in the appified Jupyter Book on a MyBinder compute backend server;
  5. on my to do list (still fettlin’ the code) is to provide a combined Jupyter Book view that includes content from the VLE as well as the notebooks. With both sets of content in the same Jupyter book, we could then readily run a search over VLE and notebook content at the same time (though forum search would still be missing). At the moment, search is an issue becuase: 1) VLE search sucks and it only works over the VLE hosted content, which *does not* include the notebooks; 2) we don’t offer any notebook search tools (other than hints of things students could do themseleves to search notebooks).

I’m not sure I’m going to get everything as “safe” to leave as I’d hoped to before a holiday away from’t compootah, so I just hope its stable enough for me to remember how to finish off all the various unfinished bits in a week or two!

Fragment – Jupyter Book Electron App

Noting an experiment by Ines Montani in creating an electron app wrapped version of the spacy course/python course creator template, I wondered how easy it would be to wrap a static Jekyll / Jupyter Book created site, such as one generated from an OU-XML feedstock, as an electron app.

The reason for doing this? With the book bundled as an electron app, you could download the app and run it, standalone, to view the book, with no web server required. (The web server used to serve the book is bundled inside the electron app.)

A quick search around turned up an electron appifier script (thmsbfft/electron-wrap), which did the job nicely:

cd to the folder that contains the files, and run:

bash <(curl -s https://raw.githubusercontent.com/thmsbfft/electron-wrap/master/wrap.sh)

If all the js required to handle interactives used in the book is available locally, the book should work, in a fully featured way, even when offline.

(I also noted a script for wrapping a website in an electron app — jiahaog/nativefier — which may be interesting… It can build for various platforms, which would be handy, but it only wraps a website, it doesn’t grab the files and serve them locally inside the app…? See also Google Chrome app mode.)

So what next?

A couple of things I want to try, though probably won’t have a chance to do so for the next couple of weeks at least… (I need a break: holiday, with no computer access… yeah!)

First, getting Jupyter Book python interactions working locally. At the moment, Python code execution is handled by ThebeLab lanuching a MyBinder container and using a Jupyter server created there to provide execution kernels. But ThebeLab can also run against a local kernel, for example using the connection settings:

 
    {
      bootstrap: true,
      kernelOptions: {
        name: "python3",
        serverSettings: {
          "baseUrl": "http://127.0.0.1:8888",
          "token": "test-secret"
        }
      },
    }
  

and starting the Jupyter server with settings something along the lines of:

jupyter notebook --NotebookApp.token=test-secret --NotebookApp.allow_origin='https://thebelab.readthedocs.io'

(The allow_origin setting is used to allow traffic from another IP domain, such as the domain a ThebeLab page is being served from.)

My primary use case for the local server is so that I can reimagine the delivery of notebooks in the TM351 VM as a Jupyter Book, presenting the notebooks in book form rather than vanilla notebook form and allowing code execution from within the book against the Jupyter server running locally in the VM.

But if Jupyter Book can run against a local kernel, then could we also distribute, and run, a Jupyter server / kernel within the electron app? This post on Building a deployable Python-Electron App suggests a possible way, using pyinstaller to “freeze (package) Python applications into stand-alone executables, under Windows, GNU/Linux, Mac OS X, FreeBSD, Solaris and AIX”. Said executables could then be added to the electron app and provide the Python environment within the app, which means it should run in a standalone mode, without the need for the user to have Python, a Jupyter server environment, or the required Python environment installed elsewhere on their machine? Download the electron book app and it should just run; and the code should run against the locally bundled Jupyter kernel; and it should all work offline, too, with no other installation requirements or side-effects. And in a cross-platform way. Perfect for OUr students…

An alternative, perhaps easier, approach might be to bundle a conda environment in the electron app. In this case, snakestagram could help automate the selection of an appropriate environment for different platforms (h/t @betatim for clarifying what snakestagram does).

(I’m not sure we could go the whole way and put all the TM351 environment into an electron app — we call on Postgres, MongodDB and OpenRefine too — but in the course materials a lot of the Postgres query activities could be replaced with a SQLite3 backend.)

Looking around, a lot of electron apps seem to require the additional installation of Python environments on host that are then called from the electron app (example). Finding a robust recipe for bundling Python environments within electron apps would be really useful I think?

There are a couple of Jupyter electron apps out there already, aside from nteract, (which runs against an external kernel). For example, jupyterlab/jupyterlab_app, which also runs against an external kernel, and AgrawalAmey/nnfl-app, which looks like it downloads Anaconda on first run (I’m not sure if the installation is “outside” the app, onto host filesystem? Or does it add it to a filesystem within the electron app context?). The nnfl-app also has support for assignment handling and assignment uploads to a complementary server (about: Jupyter In Classroom).

Finally, I wonder if Jupyter Book could run Python code within the browser environment that the electron app essentially provides using Pyodide, “the Python scientific stack, compiled to WebAssembly” that gives Python access within the browser?

Fragment – OpenLearn Jupyter Books Remix

Following on from my quandary the other day, I stuck with the simple markdown converter route for OU-XML and now have the first glimmerings of a proof of concept conversion route from OU-XML to Jupyter books.

As an example, I grabbed the OU-XML for the Learn to Code for Data Analysis course on OpenLearn, used XSLT to transform it to markdown, and then published the markdown to Github using Jupyter Book.

There’s an example here (repo).

You’ll notice quite a few issues with it:

  • there are no images (the OU-XML uses file paths to images that don’t resolve; there is a way round this — grab links from the HTML version of the course and swap those in for the OU-XML image links — but it’s fiddly; I have code that does it for scrapes of courses from the VLE which will work in the OpenLearn setting, but I should tidy the code so it works nicely for the VLE and OpenLearn so it’s still on the to do list…);
  • code is extracted but so are bits of code output, code error messages etc. The way code / code outputs are handled in OU-XML is a bit hit and miss (crap in, crap out etc) but it’s okay to start working from;
  • I haven’t done proper XLST handlers for all the OU-XML elements yet; if you see something wrong, let me know and I’ll try to fix it.

The markdown produced by the XLST can be used by Jupytext to generate notebooks (or if you have things set up, just open an .md file in the Jupyter notebook UI and it will open as a notebook). One thing the Jupytext commandline converter doesn’t seem to do is add a (dummy) kernelspec attribute to the generated .ipynb file, so if I try to bulk convert .md files to .ipynb and then use those with Jupyter Book, Jupyter Book throws an error. If I open the notebook against a kernel and save it, it works fine with Jupyter Book.

The reason for converting things to .ipynb format is that Jupyter Book can make those pages interactive. I’ve popped a couple of valid  .ipynb rather than .md pages into the demo to illustrate how it (sort of!) works:

The notebook generated pages are enriched with a notebook download button and a ThebeLab button. Clicking the ThebeLab button makes the code cells interactive via ThebeLab/MyBinder.

Unfortunately, most of the code is a bit borked (eg missing dependencies, or missing local files) but you can still edit the code in a ThebeLab’d cell and execute it.

One other issue is that Jupyter Book fires up a separate MyBinder kernel for each notebook / page, which means you can’t carry state between pages. It’d be quite handy if all the ThebeLab panels were connected to the same IPython instance. (@mdpacer has an example of a related sort of common state conversation between a console and a notebook here.) That way, you could run examples in one chapter and carry the state through to later chapters. (Yes, I KNOW there are issues doing that. But if I didn’t raise it, folk would complain that you CAN’T carry state across pages…!)

So… what next?

For content in markdown format, I think you can just edit the files in the book’s _build folder and the book page should be updated automatically (erm, I should check this…). If you enable Jupytext to dual notebooks and markdown, if you edit one of the notebook files in a Jupyter notebook environment, it should create a .md file that could be added to the _build directory and thence update the book page.

In part, the above all speaks to things like Fragment – ROER: Reproducible Open Educational Resources and OERs in Practice: Re-use With Modification where we take openly licensed educational materials and get them into a format that is raw enough for us to be able to edit them relatively easily. We can also get them into an environment where we can start repurposing and modifying them (for example, we could run and edit the notebooks in MyBinder) as well as publishing any modifications we make simply by recommitting any updated files to the Github repo.

The above approach also speaks to Build Your Own Learning Tools (BYOT). For example, now I have a way to relatively easily dump an OpenLearn course into an environment where I can start swapping static bits of content out for interactive bits I may want to develop in their place.

PS For anyone in the OU, I can grab OU course materials that have an OU-XML basis from the VLE, so if there’s a course you’d like me to try it with, let me know…

Quandary… To Pandoc or Not? (Yet…)

Whilst listening in, via Skype, on the School meeting yesterday, I treated it as radio and also started tinkering with an XSLT converter for transforming OU-XML to something I can get into a Jupyter notebook form. (If anyone can point me to official OU XSLT transformers for OU-XML, that’d be really useful…)

I’m 15 years out of XSLT, so I’ve started with an easy converter into HTML that is most of the way there now for common OU-XML elements, as well as one that will convert into a markdown format supported by Jupytext, which would allow me to go OU-XML-md-ipynb. I also wonder if an OU-XML-ipynb (JSON) rout might be a useful exercise.

But then I started wondering… would it make more sense to try to get it into Pandoc? Pandoc recently announced Jupyter notebook/ipynb support as a native converter, so what are the routes in and out of pandoc?

Poking around, it seems that pandoc represents things internally using its own AST (abstract syntax tree). Pandoc filters let you write your own output filters for converting documents represented using the AST in whatever format you want. There are a couple of Python packages that support writing pandoc output filters: pandocfilters, which includes examples, and panflute (docs), which has a separate examples repo; there’s also this handy overview of Technical Writing with Pandoc and Panflute.

So that’s the first question: can I write a filter to generate a valid OU-XML document? OU-XML probably has some structural elements that are not matched by pandoc AST elements, but can these be encoded somehow as extensions to the AST, or represented as text elements in documents produced by pandoc that could be post-processed into OU-XML elements?

Going the other way, it seems that pandoc can ingest a JSON format that serialises the Pandoc AST structure, so if I can convert OU-XML into that then it would make life a lot easier for generating a wide range of output document formats from OU-XML.

The AST is documented here; we can also output documents in the serialised AST/json by using the json output format which could provide a useful crib…

So here’s the quandary… do I spend the rest of the morning finishing of my hack XSLT converter, or do I switch track and try to go down the pandoc route? Hmmm… maybe I should finish what I started: it’ll give me a bit more XSLT practice and should result in enough of an approximation of OU-XML content in notebooks that we can start to see whether that sort of conversion even makes sense.

MyBinder Launches From Any git Repository: Github, Gists, GitLab, Bitbucket etc

By default, MyBinder looks to repositories on Github for its builds, but it can also build from Githubs gists, GitLab.com repositories, and, well, any git repository with a networked endpoint, it seems:

What prompted me to this was looking for a way to launch a MyBinder container from Bitbucket. (For the archaeologists, there are various issues and PRs (such as here, and here, as well as this recent forum post — How to use bitbucket repositories on mybinder.org — that trace some of the history…)

So what’s the trick?

For now, you need to get hold of the URL to a particular Bitbucket repo commit. For example, to try running this repo you need to co to the Commits page and grab the URL for the most recent master commit (or whichever one you you want) which will contain the commit hash:

For example, soenthing like https://bitbucket.org/ueacomputervision/image-labelling-tool/commits/f3ddb33e4839f8a0fe73c168993b405adc13daf0 gives the commit hash f3ddb33e4839f8a0fe73c168993b405adc13daf0.

For the repo base URL https://bitbucket.org/ueacomputervision/image-labelling-tool, the MyBinder launch link then takes on the form:

https://mybinder.org/v2/git/https%3A%2F%2Fbitbucket.org%2Fueacomputervision%2Fimage-labelling-tool.git/f3ddb33e4839f8a0fe73c168993b405adc13daf0

which is to say:

https://mybinder.org/v2/git/ESCAPED_REPO_URL.git/COMMIT_HASH

But it does look like things may get easier in the near future…

Feeding a MyBinder Container Built From One Github Repository With the Contents of Another

Long time readers should be more than well aware by now of MyBinder, the Jupyter project service that will build a Docker image from the contents of a git repository and then launch a container based on that image so you can work with a live, running, albeit temporary, instance if it.

But that’s not all it can do…

Via Chris Holdgraf on the Jupyter discourse community site (Tip: embed custom github content in a Binder link with nbgitpuller), comes a magical trick whereby you can launch a MyBinder instance built from one repository and populate it with files from another.

Why’s this useful? Well, if you’ve had a play with your own repos using MyBinder, you’ll know that each time you make a change to a repository, MyBinder will want to rebuild the Docker image next time you try to launch the repo there.

So if your repo defines a complex build that takes some time to install all of its dependencies, you have to wait for that build even if all you did was correct a typo in the markdown of a notebook file.

So here’s the trick…

nbgitpuller is a Jupyter server extension that supports the “one-way synchronization of a remote git repository to a local git repository”.

There are other approaches to git syncing too. See the next edition of Tracking Jupyter to find out what they are…

Originally developed as a tool to help distribute notebooks to students, it can be called via a Jupyter server URL. For example, if you have nbgitpuller installed in a local Jupyter server running on the default port 8888, the following URL will pull data from the specified repo into the base directory the notebook server points to using a URL of the form:

localhost:8888/git-pull?repo=https://github.com/USER/NOTEBOOK_REPO

One of the neat things about Binderhub / MyBinder is that can pass a git-pull? argument through as part of a MyBinder launch URL, so if the repo you want to build from installs and enables nbgitpuller, you can then pull notebooks into the launched container from a second, nbgitpulled repository.

For example, yesterday I came across the Python show_ast package, and incorporated IPython magic,  that will render the abstract syntax tree of a Python command:

Such a thing may be useful in an introductory programming course (TBH, I’m never really sure what people try to teach in introductory programming courses, what the useful mental models are, how best to help folk learn them, and how to figure out how to teach them…).

As with most Python based repos, particularly ones that contain Jupyter notebooks (that is, .ipynb files [….thinks… ooh… I has a plan to try s/thing else too….]) I generally try to “run” them via MyBinder. In this case, the repo didn’t work because there is a dependency on the Linux graphviz apt package and the Python graphviz package.

At this point, I’d generally fork the repo, create a binderise branch containing the dependencies, then try that out on MyBinder, sometimes adding an issue and/or making a pull request to the original repository suggesting they Binderise it…

…but nbgitpuller provides a different opportunity. Suppose I create a base container that contains the Graphviz Linux application and the graphivz Python package. Something like this: ouseful-testing/binder-graphviz.

Then I can create a MyBinder session from that repo and pull in the show_ast package from its repo and run the notebook directly:

https://mybinder.org/v2/gh/ouseful-testing/binder-graphviz/master/?urlpath=git-pull?repo=https://github.com/hchasestevens/show_ast

Fortuitously, things work seemlessly in this case because the example notebook lives in directory where we can import show_ast without the need to install it (otherwise we’d have needed to run pip install . at the top level of the repo). In general, where notebooks are kept in a notebooks or docs directory, for example, the path to import the package would break. (Hmmm… I need to think about protocols for handling that… It’s better practise to put the notebooks somewhere but that means we need to install the package or change the import path to it, which is one more step for folk to stumble over…)

Thinking about my old show’n’tell repo, the branches of which scruffily define various Binder environments suited to particular topic areas (environments for working on chemistry notebooks, for example, or astronomy notebooks, or classical language or music notebooks) and also contain demo notebooks, I could instead just define a set of base Binder environment containers, slow to build but built infrequently, and then lighter weight notebook repos containing just demo notebooks for a particular topic area. These could then be quickly and easily updated, and run on MyBinder having been nbgitpulled by a base container, without having to rebuild the base container each time I update a notebook in a notebook repo.

A couple of other things to note here. First, nbgitpuller has its own helper for creating nbgitpuller URLs, the nbgitpuller link generator:

It’s not hard to imagine a similar UI, or another tab to that UI, that can build a MyBinder link from a “standard” base container selected from a dropdown menu (or an optional link to a git repo) and then a provided git repo link for the target content repo.

Second, this has got me thinking about how we (don’t) handle notebook distribution very well in the OU.

For our TM351 internal course, we control the student’s computing environment via VM we provide them with, so we could install nbgitpuller in it, but the notebooks are stored in a private Github repo and we don’t want to give students any keys to it at all. (For some reason, I seem to be the only person who doesn’t have a problem with the notebooks being in a public repo!;-)

For our public notebook utilising courses on FutureLearn or OpenLearn, the notebooks are in a public repo, but we don’t have control of the learners’ computing environments, (which is to say, we can’t preinstall nbgitpuller and can’t guarantee that learners will have permissions of network access to install it themselves).

It’s almost as if various pieces keep appearing, but the jigsaw never quite seems to fit together…