Fragment – Jupyter For Edu

With more and more core components, as well as user contibutions, being added to the Jupyter framework, I’m starting to lose track of what’s possible. One of the things I might be useful for the OU, and Institute of Coding, context is to explore various architectural patterns that can be constructed in a Jupyter mediated environment that are particular useful for education.

In advance of getting a Github repo / wiki together to start that, here are a few fragments my my feeds, several of which have appeared in just the last couple of days:

Jupyter Enterprise Gateway Now a Top Level Jupyter Project

Via the Jupyter blog, I see the Jupyter Enterprise Gateway is now a top-level Jupyter project.

The Jupyter Enterprise Gateway “enables Jupyter Notebook to launch remote kernels in a distributed cluster“, which provides a handy separation between a notebook server (or Jupyterhub multi-user notebook server) and the kernel that a notebook runs against. For example, Jupyter Enterprise Gateway can be used to create kernels in a scaleable way using Kubernetes, or (I’m guessing…?) to do things like launch remote kernels running on a GPU cluster. From the docs it looks like Jupyter Enterprise Gateway  should work in a Jupyterhub context, although I can’t offhand find a simple howto / recipe for how to do that. (Presumably, Jupyterhub creates and launches user specific notebook server containers, and these then create and connect to arbitrary kernel running back-ends via the Jupyter Enterprise Gateway? Here’s a related issue I found.)

Running Notebook Cells One at a Time in a Terminal

The ever productive Doug Blank has a recipe for stepping through notebook cells in a terminal [code: nbplayer]. The player launches an IPython terminal that displays the first cell in the notebook and lets you step through them (executing or skipping the cell) one at a time. You can also run your own commands in between stepping through the notebook cells.

I can imagine using this to create a fixed set of steps for an activity that I want a student to work through, whilst giving them “free time” to explore the state of current execution environment, for example, or try out particular “given” functions with different parameters. This approach also provides a workaround for using notebook authored exercises in the terminal environment, which I know some colleagues favour over the notebook environment.

On my to do list is recast some of the activities from the new TM112 course to see how they feel using this execution model, and then compare that to the original activity and the activity run using the same notebook in a notebook environment.

Adding Multiple Student Users to a Jupyterhub Environment

Also via Doug Blank, a recipe for adding multiple users to a Jupyterhub environment using a form that allows you to simply add a list of user names: a more flexible way of adding accounts to Jupyterhub. User account details and random passwords are created automatically and then emailed to students.

To allow users to change passwords, e.g. on first run, I think the NotebookApp.allow_password_change=True notebook server parameter (Jupyter notebook – Config file and command line options) allows that?

The repo also shows a way of bundling nbviewer to allow users to “publish” HTML versions of their notebooks.

Doug also points to yuvipanda/jupyterhub-firstuseauthenticator, a first use authenticator for Jupyterhub that allows new users to create an account and then set a password on it. This could be really handy for workshops, where you want to allow uses to self-serve an environment that persists over a couple of workshop sessions, for example. (One thing we still need to do in the OU is get a Jupyterhub server up and running with persistent user storage; for TM112, we ran a temporary notebook server, which meant students couldn’t save and return to notebooks on the server – they’d have to download notebooks and then re-upload them into a new session if they wanted to return to working on a notebook they had modified. That said, the activity was designed as a “displosable” activity…)

Zip All Notebooks

This handy extension — nbzipprovides a button to zip and download a Jupyter notebook server folder.  If you’re working on a temporary notebook server, this provides and easy way of grabbing all the notebooks in one go. What might be even nicer would be to select a sub-folder, or selected set of files, using checkbox selectors? I’m not sure if there’s a complementary tool that will let you upload a zipped archive and unpack it in one go?

Fragment – Running Multiple Services, such as Jupyter Notebooks and a Postgres Database, in a Single Docker Container

Over the last couple of days, I’ve been fettling the build scripts for the TM351 VM, which typically uses vagrant to build a VirtualBox VM from a set of shell scripts, so they can be used to build a single Docker container that runs all the TM351 services, specifically Jupyter notebooks, OpenRefine, PostgreSQL and MongoDB.

Docker containers are typically constructed to a run a single service, with compositions of containers wired together using Docker Compose to create applications that deliver, or rely on, more than one running service. For example, in a previous post (Setting up a Containerised Desktop API server (MySQL + Apache / PHP 5) for the ergast Motor Racing Data API) I showed how to set up a couple of containers to work together, one running a MySQL database server, the other an http service that provided an API to the database.

So how to run multiple services in the same container? Docs on the Docker website suggest using supervisord to run multiple services in a single container, so here’s a fragment on how I’ve done that from my TM351 build.

To begin with, I’ve built the container up as a tiered set of containers, in a similar way to the way the stack of opinionated Jupyter notebook Docker containers are constructed:

#Define a stub to identify the images in this image stack
IMAGESTUB=psychemedia/tm361testm

# minimal
## Define a minimal container, eg a basic Linux container
## using whatever flavour of Linux we prefer
docker build --rm -t ${IMAGESTUB}-minimal-test ./minimal

# base
## The base container installs core packages
## The intention is to define a common build environment
## populated with packages likely to be common to many courses
docker build --rm --build-arg BASE=${IMAGESTUB}-minimal-test -t ${IMAGESTUB}-base-test ./base

#...

One of the things I’ve done to try to generalise the build steps is allow the name a base container to be used to bootstrap a new one by passing the name of the base image in via an optional variable (in the above case, --build-arg BASE=${IMAGESTUB}-minimal-test). Each Dockerfile in a build step directory uses the following construction to work out which image to use as the FROM basis:

#Set ARG values using --build-arg =
#Each ARG value can also have a default value
ARG BASE=psychemedia/ou-tm351-base-test
FROM ${BASE}

Using the same approach, I have used separate build tiers for the following components:

  • jupyter base: minimal Jupyter notebook install;
  • jupyter custom: add some customisation onto a pre-existing Jupyter notebook install;
  • openrefine: add the OpenRefine application; (note, we could just use BASE=ubuntu to create this a simple, standalone OpenRefine container);
  • postgres: create a seeded PostgreSQL database; note, this could be split into two: a base postgres tier and then a customisation that adds users, creates and seed databases etc;
  • mongodb: add in a seeded mongo database; again, the seeding could be added as an extra tier on a minimal database tier;
  • topup: a tier to add in anything I’ve missed without having to go back to rebuild from an earlier step…

The intention behind splitting out these tiers is that we might want to have a battle hardened OU postgres tier, for example, that could be shared between different courses. Alternatively, we might want to have tiers offering customisations for specific presentations of a course, whilst reusing several other fixed tiers intended to last out the life of the course.

By the by, it can be quite handy to poke inside an image once you’ve created it to check that everything is in the right place:

#Explore inside animage by entering it with a shell command
docker run -it --entrypoint=/bin/bash psychemedia/ou-tm351-jupyter-base-test -i

Once the services are in place, I add a final layer to the container that ensures supervisord is available and set up with an appropriate supervisord.conf configuration file:

##Dockerfile
#Final tier Dockerfile
ARG BASE=psychemedia/testpieces
FROM ${BASE}

USER root
RUN apt-get update && apt-get install -y supervisor

RUN mkdir -p /openrefine_projects  && chown oustudent:100 /openrefine_projects
VOLUME /openrefine_projects

RUN mkdir -p /notebooks  && chown oustudent:100 /notebooks
VOLUME /notebooks

RUN mkdir -p /var/log/supervisor
COPY monolithic_container_supervisord.conf /etc/supervisor/conf.d/supervisord.conf

EXPOSE 3334
EXPOSE 8888

CMD ["/usr/bin/supervisord"]

The supervisord.conf file is defined as follows:

##supervisord.conf
##We can check running processes under supervisord with: supervisorctl

[supervisord]
nodaemon=true
logfile=/dev/stdout
loglevel=trace
logfile_maxbytes=0
#The HOME envt needs setting to the correct USER
#otherwise jupyter throws: [Errno 13] Permission denied: '/root/.local'
#https://github.com/jupyter/notebook/issues/1719
environment=HOME=/home/oustudent

[program:jupyternotebook]
#Note the auth is a bit ropey on this atm!
command=/usr/local/bin/jupyter notebook --port=8888 --ip=0.0.0.0 --y --log-level=WARN --no-browser --allow-root --NotebookApp.password= --NotebookApp.token=
#The directory we want to start in
#(replaces jupyter notebook parameter: --notebook-dir=/notebooks)
directory=/notebooks
autostart=true
autorestart=true
startsecs=5
user=oustudent
stdout_logfile=NONE
stderr_logfile=NONE

[program:postgresql]
command=/usr/lib/postgresql/9.5/bin/postgres -D /var/lib/postgresql/9.5/main -c config_file=/etc/postgresql/9.5/main/postgresql.conf
user=postgres
autostart=true
autorestart=true
startsecs=5

[program:mongodb]
command=/usr/bin/mongod --dbpath=/var/lib/mongodb --port=27351
user=mongodb
autostart=true
autorestart=true
startsecs=5

[program:openrefine]
command=/opt/openrefine-3.0-beta/refine -p 3334 -i 0.0.0.0 -d /vagrant/openrefine_projects
user=oustudent
autostart=true
autorestart=true
startsecs=5
stdout_logfile=NONE
stderr_logfile=NONE

One thing I need to do better is to find a way to stage the construction of the supervisord.conf file, bearing in mind that multiple tiers may relate to the same servicel for example, I have a jupyter-base tier to create a minimal Jupyter notebook server and then a jupyter-base-custom tier that adds in specific customisations, such as branding and course related notebook extensions.

When the final container is built, the supervisord command is run and the multiple services started.

One other thing to note: we’re hoping to run TM351 environments on an internal OpenStack cluster. The current cluster only allows students to expose a single port, and port 80 at that, from the VM (IP addresses are in scant supply, and network security lockdowns are in place all over the place). The current VM exposes at least two http services: Jupyter notebooks and OpenRefine, so we need a proxy in place if we are to expose them both via a single port. Helpfully, the nbserverproxy Jupyter extension (as described in Exposing Multiple Services Via a Single http Port Using Jupyter nbserverproxy), allows us to do just that. One thing to note, though – I had to enable it via the same user that launches the notebook server in the suoervisord.conf settings:

##Dockerfile fragment

RUN $PIP install nbserverproxy

USER oustudent
RUN jupyter serverextension enable --py nbserverproxy
USER root

To run the VM, I can call something like:

docker run -p 8899:8888 -d psychemedia/tm351dockermonotest

and then to access the additional services, I can browse to e.g. localhost:8899/proxy/3334/ to see the OpenRefine application.

PS in case you’re wondering why I syndicated this through RBloggers too, the same recipe will work if you’re using Jupyter notebooks with an R kernel, rather than the default IPython one.

Fragment: On Reproducible Open Educational Resources

Via O’Reilly’s daily Four Short Links feed, I notice the Open Logic Project, “a collection of teaching materials on mathematical logic aimed at a non-mathematical audience, intended for use in advanced logic courses as taught in many philosophy departments”. In particular, “it is open-source: you can download the LaTeX code [from Github]”.

However:

the TeX source does mean you need a (La)TeX environment to run it (and the project does bundle some of the custom .sty style files you need in the repo, which is handy).

Compare this with the Simple Maths Equations and Notation notebook I’ve started sketching as part of a self-started, informal “reproducible OERs with Jupyter notebooks” project I’m dabbling with:

Here, a Jupyter notebook contains LaTeX code can then be rendered (in part?) through the notebook previewer – at least in so far as expressions are written in Mathjax parseable code – and also within a live / running Jupyter notebook. Not only do I share the reproducible source code (as a notebook), I also share a link to at least one environment capable of running it, and that allows it to be reused with modification. (Okay, in this case, not openly so because you have to have an Azure Notebooks account. But the notebook could equally run on Binderhub or a local install, perhaps with one or two additional requirements if you don’t already run a scientific Python environment.)

In short, for a reproducible OER that supports reuse with modification, sharing the means of production also means sharing the machinery of production.

To simplify the user experience, the notebook environment can be preinstalled with packages needed to render a wider range of TeX code, such as drawings rendered using TikZ. Alternatively, code cells can be populated with package installation commands to custom a more vanilla environment, as I do in several demo notebooks:

What the Open Logic Project highlights is that reproducible OERs not only provide ready access to the “source code” of a resource so that it can be easily reused with modification, but that access to an open environment capable of processing that source code and rendering the output document also needs to be provided. (Open / reproducible science researchers have known this for some time…)

Getting a Tex/LateX environment up and running can be a faff – and can also take up a lot of disk space – so the runtime environment requirements are not negligible.

In the case of Jupyter notebooks, LateX support is available, and container images capable of running on Binderhub, for example, relatively easily defined (see for example the Binder LateX example). (I’m not sure how rich Stencila support for LaTeX is too, and/or whether it requires an external LaTeX environment when running the Stencila desktop app?)

It also strikes me that another thing we should be doing is export a copy of the finished work, eg as a PDF or complete, self-standing HTML archive, in case the machinery does break. This is also important where third party services are called. It may actually make sense to use something like requests for all third party URL requests, and save a cached version of all requests (using requests-cache) to provide a local copy of whatever it was that was called when originally flowing the document.

See also: OER Methods – Generative Designs for Reuse-With-Modification

An Easier Approach to Electrical Circuit Diagram Generation – lcapy

Whilst I might succumb in my crazier evangelical moments to the idea that academic authors (other than those who speak LateX natively) and media developers might engage in the raw circuitikz authoring described Reproducible Diagram Generators – First Thoughts on Electrical Circuit Diagrams, the reality is that it’s probably just way too clunky, and a little bit too far removed from the everyday diagrams educators are likely to want to create, to get much, if any, take up.

However, understanding something of the capabilities of quite low level drawing packages, and reflecting (as in the last post) on some of the strategies we might adopt for creating reusable, maintainable, revisable with modification and extensible diagram scripts puts us in good stead for looking out for more usable approaches.

One such example is the Python lcapy package, a linear circuit analysis package that supports:

  • the description of simple electrical circuits at a sloghtly hoger level than the raw circuitikz circuit creation model;
  • the rendering of the circuits, with a few layout cues, using circuitikz;
  • numerical analysis of the circuits in terms of response in time and frequency domains, and the charting of the results of the analysis; and
  • various forms of symbolic analysis of circuit descriptions in various domains.

Here are some quick examples to give a taste of what’s possible.

You can run the notebook (albeit subject to significant changes) that contains the original working for examples used in this post on Binderhub: Binder

Here’s a simple circuit:

And here’s how we can create it in lcapy from a netlist, annotated with cues for the underlying circuitikz generator about how to lay out the diagram.

from lcapy import Circuit

cct = Circuit()
cct.add("""
Vi 1 0_1 step 20; down
C 1 2; right, size=1.5
R 2 0; down
W 0_1 0; right
W 0 0_2; right, size=0.5
P1 2_2 0_2; down
W 2 2_2;right, size=0.5""")

cct.draw(style='american')

The things to the right of the semicolon on each line are the optional layout elements – they’re not required when defining the actual circuit itself.

The display of nodes and numbered nodes are all controllable, and the symbol styles are selectable between american, british and european stylings.

The lcapy/schematic.py package describes the various stylings as composites of circuitikz regionalisations, and could be easily extended to support a named house style, or perhaps accommodate a regionalisation passed in as an explicit argument value.

if style == 'american':
    style_args = 'american currents, american voltages'
elif style == 'british':
    style_args = 'american currents, european voltages'
elif style == 'european':
    style_args = ('european currents, european voltages, european inductors, european resistors')

As well as constructing circuits from netlist descriptions, we can also create them from network style descriptions:

from lcapy import R, C, L

cct2= (R(1e6) + L(2e-3)) | C(3e-6)
#For some reason, the style= argument is not respected
cct2.draw()

The diagrams generated from networks are open linear circuits rather than loops, which may not be quite what we want. But these circuits are quicker to write, so we can use them to draft netlists for us that we may then want to tidy up a bit further.

print(cct2.netlist())

'''
W 1 2; right=0.5
W 2 4; up=0.4
W 3 5; up=0.4
R 4 6 1000000.0; right
W 6 7; right=0.5
L 7 5 0.002; right
W 2 8; down=0.4
W 3 9; down=0.4
C 8 9 3e-06; right
W 3 0; right=0.5
'''

Circuit descriptions can also be loaded in from a named text file, which is handy for course material maintenance as well as reuse of circuits across materials: it’s easy enough to imagine a library of circuit descriptions.

#Create a file containing a circuit netlist
sch='''
Vi 1 0_1 {sin(t)}; down
R1 1 2 22e3; right, size=1.5
R2 2 0 1e3; down
P1 2_2 0_2; down, v=V_{o}
W 2 2_2; right, size=1.5
W 0_1 0; right
W 0 0_2; right
'''

fn="voltageDivider.sch"
with open(fn, "w") as text_file:
    text_file.write(sch)

#Create a circuit from a netlist file
cct = Circuit(fn)

The ability to create – and share – circuit diagrams in a Python context that plays nicely with Jupyter notebooks is handy, but the lcapy approach becomes really useful if we want to produce other assets around the circuit we’ve just created.

For example, in the case of the above circuit, how do the various voltage levels across the resistors respond when we switch on the sinusoidal source?

import numpy as np
t = np.linspace(0, 5, 1000)
vr = cct.R2.v.evaluate(t)
from matplotlib.pyplot import figure, savefig
fig = figure()
ax = fig.add_subplot(111, title='Resistor R2 voltage')
ax.plot(t, vr, linewidth=2)
ax.plot(t, cct.Vi.v.evaluate(t), linewidth=2, color='red')
ax.plot(t, cct.R1.v.evaluate(t), linewidth=2, color='green')
ax.set_xlabel('Time (s)')
ax.set_ylabel('Resistor voltage (V)');

Not the best example, admittedly, but you get the idea!

Here’s another example, where I’ve created a simple interactive to let me see the effect of changing one of the component values on the response of a circuit to a step input:

(The nice plotting of the diagram gets messed up unfortunately, at least in the way I’ve set things up for this example…)

As the code below shows, the @interact decorator from ipywidgets makes it trivial to create a set of interactive controls based around the arguments passed into a function:

import numpy as np
from matplotlib.pyplot import figure, savefig

@interact(R=(1,10,1))
def response(R=1):
    cct = Circuit()

    cct.add('V 0_1 0 step 10;down')
    cct.add('L 0_1 0_2 1e-3;right')
    cct.add('C 0_2 1 1e-4;right')
    cct.add('R 1 0_4 {R};down'.format(R=R))
    cct.add('W 0_4 0; left')

    t = np.linspace(0, 0.01, 1000)
    vr = cct.R.v.evaluate(t)

    fig = figure()
    #Note that we can add Greek symbols from LaTex into the figure text
    ax = fig.add_subplot(111, title='Resistor voltage (R={}$\Omega$)'.format(R))
    ax.plot(t, vr, linewidth=2)
    ax.set_xlabel('Time (s)')
    ax.set_ylabel('Resistor voltage (V)')
    ax.grid(True)
    
    cct.draw()

Using the network description of a circuit, it only takes a couple of lines to define a circuit and then get the transient response to step function for it:

Again, it doesn’t take much more effort to create an interactive that lets us select component values and explore the effect they have on the damping:

As well as the numerical analysis, lcapy also supports a range of symbolic analysis functions. For example, given a parallel resistor circuit, defined using a network description, we can find the overall resistance in simplest terms:

Or for parallel capacitors:

Some other elementary transformations we can apply – providing expressions for the an input voltage in the time or Laplace/s domain:

We can also create pole-zero plots quite straightforwardly, directly from an expression in the s-domain:

This is just a quick skim through of some of what’s possible with lcapy. So how and why might it be useful as part of a reproducible educational resource production process?

One reason is that several of the functions can reduce the “production distance” between different likely components of a set of educational materials.

For example, given a particular circuit description as a netlist, we can annotate it with direction cribs in order to generate a visual circuit diagram, and we can use a circuit created from it directly (or from the direction annotated script) to generate time or frequency response charts. (We can also obtain symbolic transfer functions.)

When trying to plot things like pole zero charts, where it is important that the chart matches a particular s-domain expression, we can guarantee that the chart is correct by deriving it directly from the s-domain expression, and then rendering that expression in pretty LaTeX equation form in the materials.

The ability to simplify expressions  – as in the example of the simplified expressions for overall capacitance or resistance in the parallel circuit examples above – directly from a circuit description whilst at the same time using that circuit description to render the circuit diagram, also reduces the amount of separation between those two outputs to zero – they are both generated from the self-same source item.

You can run the notebook (albeit subject to significant changes) that contains the original working for examples used in this post on Binderhub: Binder

Exposing Multiple Services Via a Single http Port Using Jupyter nbserverproxy

Over the last couple of weeks I’ve been circling, but failing to make much actual progress on using, OpenStack as a platform for making self-serve OU hosted VMs available to students. (I’m increasingly starting to think this is not sensible, but I’m struggling to find someone I can chat to about it… OpenStack is too enterprise, like a heavy “Java” thing where I need a just works “Python” thing…).

Anyway.

One of the issues with the OU Faculty OpenStack setup is the way the security model locks everything down. Not only is no API access available, there is also a limit on IP address allocation and open ports are limited to port 80 (and maybe port 22? Or maybe not.)

For the TM351 VM – which is what we’re looking to put onto OU OpenStack – we have been exposing services on at least two http ports, one for the Jupyter notebooks and one for OpenRefine. (The latest build also has a simple VM webserver, and I’m experimenting with a notebook search engine. Optionally, we have also allowed students to open up ports to the PostgreSQL and MongoDB services.)

If I do find a sensible way to get the VM running on OpenStack, finding a way to shove all the http services through port 80 looks like a necessary requirement. Previously, I’d noticed that @betatim’s openrefineder demo made use of a proxy to expose the OpenRefine service via the Jupyter notebook port, and looking at it again today I noticed that the nbopenrefineproxy package it was using is available as a Jupyterhub project package: jupyterhub/nbserverproxy.

In the current TM351 VM set-up, we have the following:

  • Jupyter notebook on guest port 8888, host port 35180
  • OpenRefine on guest port 3334, host port 35181

However, if I install and enable nbserverproxy, and restart the Jupyter notebook server, I can now find OpenRefine proxied as http://localhost:35180/proxy/3334/ as well as on http://localhost:35181.

One gotcha to note is that the OpenRefine page doesn’t render properly from that URL without the trailing slash because the OpenRefine HTML includes relative links to assets:

...
<link type="text/css" rel="stylesheet" href="externals/select2/select2.css" />
 <link type="text/css" rel="stylesheet" href="externals/tablesorter/theme.blue.css" />
...

which resolve as e.g. http://localhost:35180/proxy/externals/select2/select2.css (404).

However, with the trailing slash, the links do resolve correctly (e.g. as http://localhost:35180/proxy/3334/externals/select2/select2.css) when the trailing slash is added.

Handy… and the way to go if we do get this running on OpenStack.

PS if you know of a baby steps tutorial that shows how I can build a custom VM image on a Mac that I can upload to OpenStack, please let me know via the comments. Or otherwise get in touch if you can talk me through the various approaches.

First Class R Support in Binder / Binderhub – Shiny Apps As Well as R-Kernels and RStudio

I notice from the binder-examples/r repo that Binderhub now appears to offer all sorts of R goodness out of the can, if you specify a particular R build.

From the same repo root, you can get:

And from previously, here’s a workaround for displaying R/HTMLwidgets in a Jupyter notebook.

OpenRefine is also available from a simple URL – https://mybinder.org/v2/gh/betatim/openrefineder/master?urlpath=openrefine – courtesy of betatim/openrefineder:

Perhaps it’s time for me to try to get my head round what the Jupyter notebook proxy handlers are doing…

PS see also Scripted Forms for a simple markdown script way of building interactive Jupyter widget powered UIs.

More Thoughts On Jupyter Notebook Search

Following on from initial sketch of Searching Jupyter Notebooks Using lunr, here’s a quick first pass [gist] at pouring Jupyter notebook cell contents (code and markdown) into a SQLite database, running a query over it and then inspecting the results using a modified NLTK text concordancer to show the search phrase in the context of where it’s located in a document.

The concordancer means we can offer a results listing more in accordance with a traditional search engine, showing just the text in the immediate vicinity of a search term. (Hmm, I’d need to check what happens if the search term appears multiple times in the search result text.) This means we can offer a tidier display the dumping the contents of a complete cell into the results listing.

The table the notebook data is added to is created so that it supports full text search. However, I imagine that any stemming that we could apply is not best suited to indexing code.

Similarly, the NLTK tokeniser doesn’t handle code very well. For example, splits occur around # and % symbols, which means things like magics, such as %load_ext, aren’t recognised; instead, they’re split into separate tokens: % and load_ext.

A bigger issue for the db approach is that I need to find a way to update / clean the database as and when notebooks are saved, updated, deleted etc.

PS sqlbiter provides a way of ingesting – and unpacking – JUpyter notebooks into a sqlite database.

PPS Handy Python command line tool for searching notebooks: https://github.com/conery/nbscan

Install it into TM351 VM from a Jupyter notebook code cell by running the following command when connected to the internet:

!sudo pip install git+https://github.com/conery/nbscan.git

Search for things in notebooks using commands like:

  • search in code cells in notebooks in current directory (.) and all child directories for a phrase: !nbscan.py --dir . --grep 'import pandas' --code
  • search in all cells for the word ‘pandas’: !nbscan.py --dir . --grep pandas
  • search in markdown cells for the pattern 'data repr\w*' (that is, the phrase starting data repr…):!nbscan.py --dir . --grep 'data repr\w*' --markdown

Would be handy to make a simple magic for this?