Fragment: Grabbing Screenshots of Jupyter Notebook Code Cell Outputs, Ish…

Or not completely, as the case may be…

A quick hack packaging code I was using for grabbing screenshots of styled pandas dataframes so I could share them as images, iframe-shot uses browser automation to render HTML returned via _repr_html_() or embedded in an IFrame when executing a Python code cell, or otherwise, and return an image file from it, either as a data URI or saved to a file.

from iframe_shot import IFrameShot

# Generate an object with access to
# preloaded selenium powered headless browser
grabber = IFrameShot(True)

# HTML string
html = "<html><body><h1>hello there</h1></body></html>"

# Render HTML in browser and grab screenshot
grabber.getHTMLPNG(html)

# Returns rendered data-uri PNG of screenshotted html
# To save as png and return filename, use:
# grabber.getHTMLPNG(html, embedded=False)

# Set html_out=FILEPATH to save the HTML to a file
# Set png_out=FILEPATH to save the image to a file with a specific filename

There are various issues with this:

  • if the style is not part of the HTML, but eg references style set elsewhere in the notebook, or from a style file, the style won’t be rendered;
  • the approach uses browser automation, which adds several large depndencies.

It would be interesting to explore the extent to which something like html2canvas could be used to render cell output HTML onto a canvas element from which an image could be save. (Hmm… could IPython do that?!)

By chance, another screenshot tool appeared in the last week or so (from which I stole the -shot bit of the name): Simon Willison’s shot-scraper. The tool uses  Playwright and is handy for four main reasons:

  • it provides an easy way to grab a screenshot of a page;
  • it can provide a screenshot of part of a page, selected using CSS selectors;
  • it can be used to style and add simple overlays to the captured scene using Javascript;
  • it can be used to scrape webpages using Javascript and provide the response via a JSON object.

I did wonder if I could use it to grab a screenshot of an executed Jupyter notebook output cell, or an output cell in an HTML rendered notebook, but I couldn’t offhand find a way to wrangle a cell ID or unique path to a desired cell output using just CSS selectors. If Javascript were available as a way of selecting DOM elements, and not just CSS selectors, then I think it should be possibel to use shot-scraper to gran screen captures of notebook code cell outputs from run notebooks viewed either as rendered notebooks from a served URL, or from exported HTML.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...