It’s been some time since I last explored this (eg here and here, and as far as I know know other solutions have appeared since, but a question still remains as to how to effectively search over a set of notebooks.
Partial alternative solutions maybe worth noting include:
- nbscan for searching over notebooks from the command-line;
- nbgallery bakes in Solr/sunspot; it’d be really nice if the
nbgallerysearch tools could be easily decoupled so the search could be added to an arbitrary Jupyter notebook, or JupyterHub, server as an extension…);
- this simple search engine with automcomplete by Simon Willison.
[UPDATE: This is new to me, and I’ve not had a chance to try it: Find your Jupyter notebooks with ElasticSearch – elastic search recipe.]
One of the things I often wondered about in respect of building a notebook search engine index would be how to crawl / index freshly updated notebooks.
One way would presumably be to regularly crawl the directory path in which notebooks live looking for notebook files that have a changed timestamp compared to the last time they were indexed; another might be to set up some sort of watcher on the operating system that calls the indexer whenever it spots a file being updated (maybe something like
Another way might be to use something like the
pgcontents contents manager to save (or process) notebooks into a search engine index database. (For other examples of Jupyter notebook content managers, see this Tracking Jupyter round-up. I wonder, is there a sqlite content manager that can save notebooks directly into SQLite? Would the
pgcontents extension handle that with little or no modification, other thn to the supplied database connection string?) If notebooks were saved as notebooks to disk, and into a database for indexing as part of the search engine, how would the indexed notebook also be linked back to the notebook on disk so it could be linked to via search results?
Thinks: how is
nbgallery architected? Where are notebooks saved to? How is the Solr search engine index managed?
More generally, I wonder: are there any Python based, simple full-text search engines with local fielsystem crawlers/monitors/indexers out there?
PS Other search engines to have a look at:
PPS updating lunr.js – thread: https://github.com/olivernn/lunr.js/issues/284, https://www.npmjs.com/package/lunr-mutable-indexes . Maybe also https://github.com/lucaong/minisearch