More Thoughts On Jupyter Notebook Search

Following on from initial sketch of Searching Jupyter Notebooks Using lunr, here’s a quick first pass [gist] at pouring Jupyter notebook cell contents (code and markdown) into a SQLite database, running a query over it and then inspecting the results using a modified NLTK text concordancer to show the search phrase in the context of where it’s located in a document.

The concordancer means we can offer a results listing more in accordance with a traditional search engine, showing just the text in the immediate vicinity of a search term. (Hmm, I’d need to check what happens if the search term appears multiple times in the search result text.) This means we can offer a tidier display the dumping the contents of a complete cell into the results listing.

The table the notebook data is added to is created so that it supports full text search. However, I imagine that any stemming that we could apply is not best suited to indexing code.

Similarly, the NLTK tokeniser doesn’t handle code very well. For example, splits occur around # and % symbols, which means things like magics, such as %load_ext, aren’t recognised; instead, they’re split into separate tokens: % and load_ext.

A bigger issue for the db approach is that I need to find a way to update / clean the database as and when notebooks are saved, updated, deleted etc.

PS sqlbiter provides a way of ingesting – and unpacking – JUpyter notebooks into a sqlite database.

PPS Handy Python command line tool for searching notebooks: https://github.com/conery/nbscan

Install it into TM351 VM from a Jupyter notebook code cell by running the following command when connected to the internet:

!sudo pip install git+https://github.com/conery/nbscan.git

Search for things in notebooks using commands like:

  • search in code cells in notebooks in current directory (.) and all child directories for a phrase: !nbscan.py --dir . --grep 'import pandas' --code
  • search in all cells for the word ‘pandas’: !nbscan.py --dir . --grep pandas
  • search in markdown cells for the pattern 'data repr\w*' (that is, the phrase starting data repr…):!nbscan.py --dir . --grep 'data repr\w*' --markdown

Would be handy to make a simple magic for this?

Author: Tony Hirst

I'm a lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

One thought on “More Thoughts On Jupyter Notebook Search”

Comments are closed.