First Play With nbgallery

Having hacked together a bulk uploader for nbgallery and uploaded the TM351 notebooks to a test environment, I’m now in a position to start having a play with it.

All public notebooks are searchable, so how does the search fare?

The search box top right gets a little bit lost in the search results listing. It could be handy to at least print out the search string (“Searching for: …”) at the top of the results list, if not making the search box larger and in a more central location. The search results themselves take the form of the name / description/tag of each hit (i.e. the notebook metadata) along with a fragment showing how the search terms appeared in context within the notebook.

Some of my earlier experiments on notebook search here and here also show context.

A range of options are provided for ordering the results. Trending looks like it could be interesting (this is based on recent views, presumably), for example where students are searching notebooks relevant to the current week’s study.

That said, we can also display notebooks by tag, so it’s easy enough to display notebooks associated with a particular week’s study if we tag notebooks by study week:

(One thing I noticed zooming out on the page to grab the above screenshot is that the font size of the notebook titles doesn’t seem to respond to the zoom level; it would probably be worth checking to see if there are other accessibility issues.)

If we click through on a result, we see a list of related notebooks followed by a preview of the notebook. (nbgallery strips out all cell outputs on upload, so no cell outputs are displayed).

To search through the preview, we can use a normal browser in-page search (ctrl/cmd-F).

A range of options are provided to support community activity around a notebook for logged in users, including the ability to “star” a notebook, provide feedback or add a comment:

Logged in users can also click on the notebook tags to edit them.

Via the Further options menu, users can view various notebook metrics, email a notebook, or propose a change request:

The metrics available include number of views, runs, stars and the edit history.

If comments have been provided, the number indicator by the comment flag shows how many comments have been received, although this only appears on the notebook page. There doesn’t appear to be an indicator of how many comments are associated with a notebook on the search results page, nor did I spot a general “recent comments” feed anywhere.

When you post a comment, there is no indication that you have done so and the form remains in place. You need to close it manually. (Hitting “Post Comment” again just pops up a “can’t do that” alert on the grounds that you’re trying to post a duplicate comment.)

The comments themselves look as if they are an ordered (rather than threaded) list. It also looks like any signed in used can edit anybody else’s comment?

Users who aren’t signed in can download a notebook, but not star it, comment on it, modify the tags etc.

When I tried to add feedback, I got an error:

I’m not sure if there are settings I need to tweak to address that?

Logged in users can also run a notebook from nbgallery via an associated notebook server. (I’d prefer it if the Run in Jupyter flash wasn’t displayed if there isn’t a linked notebook server available for the logged in user.) For example, running a notebook server on  port 443 on the same host as nbgallery using the nbgallery notebook container:

docker run --rm -p 443:443 -e "NBGALLERY_URL=http://localhost:3000" -e "NBGALLERY_CONFIG_TOKEN=letmein" nbgallery/jupyter-alpine

starts a notebook server with the nbgallery extension pre-installed.

We can view the notebook server homepage on https://localhost:443 and log into it using the token-as-password letmein. Running the container in the way described above also gives permission for the nbgallery server running in on http://localhost:3000 to open notebooks via the notebook server.

Within nbgallery itself, a logged in user can associate one of more Jupyter environments via the user menu:

Each environment is given a name and the URL of the associated notebook server (in this case, https://localhost:443):

When a notebook server is associated with a user, notebooks can be opened from nbgallery within the notebook server.

If we create a new notebook in the linked notebook server, we can upload it to nbgallery, adding a title, description and optional tags as in a manual notebook upload step:

If we modify the notebook that is linked to one in the gallery (that is, that has been uploaded to the gallery or launched from the gallery), we can save a change to the gallery or submit a change request:

When uploading a new version, you can add tags but not additional comments such as a commit message:

Viewing the notebook details in nbgallery, we can see a summary of the change history:

We can also click through to a preview of each version of the notebook:

(The revision number doesn’t appear in the change history though, so it can be hard to reconcile a particular version with it’s appearance in the change history listing.)

A logged in user can make a change request to someone else’s notebooks by uploading a new version of them or by opening the notebook in the linked notebook server and submitting a change request:

When I submitted the change request, I got an error form in response, but it looks like the change request was made, as this listing of Change Requests from the user menu suggests:

An exclamation mark by the user menu also identifies that change requests are pending.

Viewing the change request provides a view over the current version of the notebook and the proposed changes. Notebooks can be viewed alongside each other or the diffs can be viewed:

The thumbs up/down indicators are used to accept or deny a change request, along with a brief comment:

Accepted changed notebooks are used to replace the current version of the notebook, and the change logged in the change history. Denied change requests are recorded as such in the change requests list, with a link to the version of the notebook containing the unsuccessfully proposed changes:

If feedback was provided, a comment icon identifies its presence and pops up the feedback in a tooltip when hovered over.

Health stats for linked and run notebooks are supposed to be available, but I couldn’t get those to work (as far as the health stat reports were concerned, the notebooks were never run no matter how many times I ran them), so maybe I’m missing something there in the setup too? [UPDATE: health settings run with a flag set: notebook instrumentation docs; specifically, -e NBGALLERY_ENABLE_INSTRUMENTATION=1 in the docker command line.]

I’m not sure how well this would work for managing TM351 notebooks compared to out current Github workflow (which I should write up somewhere). The error responses (whether they’re valid or not) for change requests and feedback are confusing, and I’m not sure how the feedback is handled if and when it works. Not being able to easily spot new comments easily (unless I’m missing something) could be a bit of a pain. That said, the proof would be in the testing-through-use, so I’ll maybe give it a week or two’s trial with some of my own notebook workflows.

In terms of use with students, it could be useful to provide a version of nbgallery with notebooks runnable by students without them having to log in to it. It could also be useful if notebooks could be run ‘inline’ from the notebook preview pages, for example using something like ThebeLab or Voila, particularly if a particular Binderhub repo / config could be specified in metadata somewhere.

Bulk Jupyter Notebook Uploads to nbgallery Using Selenium

I’ve recently started looking at nbgallery [repo], “an enterprise Jupyter Notebook sharing and collaboration platform” written in Ruby. The gallery provides a range of tools, including:

  • a Solr powered notebook search engine;
  • a notebook “health check” (I haven’t tried this yet);
  • integration with Jupyter notebooks, so you can run notebooks (I haven’t tried this yet).

One thing that seems to be lacking is the ability to bulk upload files (for example, contained in a zip file). I haven’t spotted an API either, or a Python wrapper to provide a de facto API. This makes a proper test over lots of notebooks tricky…

UPDATE: it looks like a Python API for nbgallery is on the way… nbgallery/nbgallery-api-python

The notebook upload is a two step process.

The first step requires selection of a notebook, and a required acknowledgement of rights:

The second provides and opportunity to submit a required title and non-null description and a (repeated) rights acknowledgement:

The upload process utilises a multi-part form.

To upload a notebook, a user needs to be logged in.

Creating a new user requires an email confirmation step, which means you need to set up email server details in the docker-compose.yml file. I used my OU ones:

EMAIL_USERNAME: $OU_USERNAME
EMAIL_PASSWORD: $OU_PWD
EMAIL_DOMAIN: open.ac.uk
EMAIL_ADDRESS: ${OUCU}@open.ac.uk
EMAIL_DEFAULT_URL_OPTIONS_HOST: localhost:3000
EMAIL_SERVER: smtp.office365.com

My usual approach for automating this sort of thing would be to have a go with mechanical soup or mechanize, but on a quick first attempt using both of those, I couldn’t get the scraper to work.

Instead, I took the opportunity to have a play with Selenium With Python, a Python wrapper for the Selenium web testing framework. This provides a set of Python functions for automating the launching of a web-browser (Chrome, Safari, Firefox, etc) and the automated clicking of pages viewed within that automated browser.

The full script I used can be found here.

The initialisation looks like this:

from selenium import webdriver

#Selenium package includes several utilitities
# for waiting until things are ready
#https://selenium-python.readthedocs.io/waits.html
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()

#Allow the driver to poll the DOM for up to 10s when
# trying to find an element
driver.implicitly_wait(10)

#We might also want to explicitly define wait conditions
# on a particular element
wait = WebDriverWait(driver, 10)

driver.get("http://localhost:3000/")

The login function looks something like this:

def nbgallery_login(driver, wait, user, pwd):
    ''' Login to nbgallery.
        Return once the login dialogue has disappeared.
    '''

    driver.find_element_by_id("gearDropdown").click()

    element = driver.find_element_by_id("user_email")
    element.click()

    element.clear()
    element.send_keys(user)

    element = driver.find_element_by_id("user_password")
    element.clear()
    element.send_keys(pwd)
    element.click()

    driver.find_element_by_xpath("//input[@value='Login']").click()

The first form script looks like this:

    #path is full path to file
    if not path.endswith('.ipynb'):
        print('Not a notebook (.ipynb) file? [{}]'.format(path))
        return

    #Part 1

    element = wait.until(EC.element_to_be_clickable((By.ID, 'uploadModalButton')))
    element.click()

    driver.find_element_by_id("uploadFile").send_keys(path);
    driver.find_element_by_xpath('//*[@id="uploadFileForm"]/div[3]/div/div/label/input').click()
    driver.find_element_by_id("uploadFileSubmit").click()

And the script to handle the second part of the form looks like this:

    #Part 2
    element = driver.find_element_by_id("stageTitle")
    element.click()

    #Is there notebook metadata we can search for title?
    if not title:
        title = path.split('/')[-1].replace('.ipynb','')
    element.clear()
    element.send_keys(title)

    element = driver.find_element_by_id("stageDescription")
    element.click()

    #Is there notebook metadata we can search for description?
    #Any other notebook metadata we could make use of here?
    element.clear()
    #Description needs to be not null
    desc= 'No description.' if not desc else desc
    element.send_keys(desc)

    element = driver.find_element_by_id("stageTags-tokenfield")
    element.click()
    #time.sleep(1)

    #Handle various tagging styles
    #Is there notebook metadata we can search for tags?
    tags = '' if not tags else tags
    if isinstance(tags, list):
        tags=','.join(tags)
    tags = tags if tags.endswith(',') else tags+','

    element.clear()
    element.send_keys(tags) #need the final comma to set it?

    if private:
        driver.find_element_by_id("stagePrivate").click()

    driver.find_element_by_xpath('//*[@id="stageForm"]/div[9]/div/div/label/input').click()
    driver.find_element_by_id("stageSubmit").click()

    #https://blog.codeship.com/get-selenium-to-wait-for-page-load/
    #Wait for new page to load
    wait.until(EC.staleness_of(driver.find_element_by_tag_name('html')))

Here’s how it plays out:

There’s still stuff that could be added — error trapping for duplicate notebooks, for example — but I think this is enough to let me upload a complete set of course notebooks and see how useful nbgallery is as a way of presenting notebooks.

If it is, and I get the Jupyter notebook server integration working, then I wonder: would it be useable as a notebook navigator in the TM351 VM? It’d probably need really tight integration with the notebook server so that when notebooks are saved they are also committed to the gallery?