Automating the Production of Student Software Guides With Annotated Screenshots Using Playwright and Jupyter Notebooks

With a bit of luck, we’ll be updating software for our databases course starting in Sept/Oct to use JupyterLab. It’s a bit late for settling this, but I find the adrenaline of going live, and the interaction with pathfinder students in particular, to be really invigorating (a bit like the old days of residential schools) so here’s hoping…

[I started this post last night, closed the laptop, opened it this morning and did 75 mins of work. The sleep broke WordPress’ autobackup, so when I accidentally backswiped the page, I seem to have lost over an hour’s work. Which is to say, this post was originally better written, but now it’s rushed. F**k you, WordPress.]

If we are to do the update, one thing I need to do is update the software guide. The software guide looks a bit like this:

Which is to say:

  • process text describing what to do;
  • a (possibly annotated) screenshot showing what to do, or the outcome of doing it;
  • more process text, etc.

Maintaining this sort of documentation can be a faff:

  • if the styling of the target website/application changes, but the structure and process steps remain the same, the screenshots drift from actuality, even if they are still functionally correct;
  • if the application structure or process steps change, then the documentation properly breaks. This can be hard to check for unless you rerun everying and manually test things before each presentation at least.

So is there a better way?

There is, and it’s another example of why it’s useful for folk to learn to code, which is to say, have some sort of understanding about how the web works and how to construct simple computational scripts.

Regular readers will know that I’ve tinkered on and off with the Selenium browser automation toolkit in the past, using it for scraping as well as automating repetitive manual tasks (such as downloading from one exams system scores of student scripts one at a time and grabbing a 2FA code for each of them for use in submiting marks into a second system). But playwright, Microsoft’s browser automation tool (freely available), seems to be what all the cool kids are using now, so I thought I’d try that.

The playwright app itself is a node app, which also makes me twitchy because becuase node is a pain to install. But the pytest-python package, which is installable from PyPi and which wraps playwright, bundles it’s own version of node, which makes things much simple. (See Simon Wilison’s Bundling binary tools in Python wheels for a discussion of this; for any edtechies out there, this is a really useful pattern, because if students have Python installed, you can use it as a route to deploy other things…)

Just as a brief aside, playwright is also wrapped by @simonw’s shot-scraper command line tool which makes it dead easy to grab screenshots. For example, we can grab a screenshot of the OU home page as simply as typing shot-scraper

Note that because the session runs in a new, incognito, browser, we get the cookie notice.

We can also grab a screenshot of just a particular, located CSS element: shot-scraper -s '#ou-org-footer'. See the shot-scraper docs for many more examples.

In many cases, screenshots that appear in OU course materials that mismatch with reality are identified and reported by students. Tools like shot-scraper and pytest can both be used as a part of a documentation testing suite where we create gold master images and then, as required, test “current” screenshots to see if they match distributed originals.

But back to the creation or reflowing of documentation.

As well as command line control using shot-scraper, we can also drive playwright from Python code, executed synchronously, as in a pytest test, or asynchronously, as in the case of Python running inside a Jupyter notebook. This is what I spent yesterday exploring, in particular, whether we could create reproducible documentation in the sense of something that has the form text, screenshot, text, … and looks like this:

but is actually created by something that has the form text, code, (code output), text, … and looks like this:

And as you’ve probably guessed, we can.

For some reason, my local version of nbconvert seems to now default to using no-input display settings and I can’t see why (there are no nbconvert settings files I can see. Anyone got any ideas how/why this might be happening? The only way I can work around it atm is to explicitly enable the display of the inputs: jupyter nbconvert --to pdf --TemplateExporter.exclude_output_prompt=True --TemplateExporter.exclude_input=False --TemplateExporter.exclude_input_prompt=False notebook.ipynb.

It’s worth noting a couple of things here:

  • if we reflow the document to generate new output screenshots, they will be a faithful representation of what the screen looks like at the time the document is reflowed. So if the visual styling (and nothing else) has been updated, we can capture the latest version;
  • the code should ideally follow the text description of the process steps, so if the code stops working for some reason, that might suggest the process has changed and so the text might well be broken too.

Generating the automation code requires knowledge of a couple of things:

  • how to write the code itself: the documentation helps here (eg in respect of how to wait for things, how to grab screenshots, how to move to things so they are in focus when you take a screenshot), but a recipe / FAQ / crib sheet would also be really handy;
  • how to find locators.

In terms of finding locators, one way is to do it manually, by usier browser developer tools to inspect elements and grab locators for required elements. But another, simpler way, is to record a set of playwright steps using the playwright codegen URL command line tool: simple walk through (click through) the process you want to document in the automatically launched interactive browser, and record the corresponding playwright steps.

With the script recorded, you can then use that as the basis for you screenshot generating reproducible documentation.

For any interested OU internal readers, there is an example software guide generated using playwright in MS Teams > Jupyter notebook working group team. I’m happy to share any scripts etc I come up with, and am interested to see other examples of using browser automation to test and generate documentation etc.

Referring back to the original software guide, we note that some screenshots have annotations. Another nice feature of playwright is that we can inject javascript into the controlled browser and evaulate it in that context, we which means we can inject Javascript that tinlers with the DOM.

The shot-scraper tool has a nifty function that will accept a set of locators for a particular page and, using Javascript injection, create a div surround them that can then be used to grab a screenshot of the area covering those locators.

It’s trivial to amend that function to add a border round an element:

# Based on @simomw's shot-scraper
import json
import secrets
import textwrap

from IPython.display import Image

def _selector_javascript(selectors, selectors_all,
                         padding=0, border='none', margin='none'):
    selector_to_shoot = "shot-scraper-{}".format(secrets.token_hex(8))
    selector_javascript = textwrap.dedent(
    new Promise(takeShot => {{
        let padding = %s;
        let margin = %s;
        let minTop = 100000000;
        let minLeft = 100000000;
        let maxBottom = 0;
        let maxRight = 0;
        let els = => document.querySelector(s));
        // Add the --selector-all elements => els.push(...document.querySelectorAll(s)));
        els.forEach(el => {{
            let rect = el.getBoundingClientRect();
            if ( < minTop) {{
                minTop =;
            if (rect.left < minLeft) {{
                minLeft = rect.left;
            if (rect.bottom > maxBottom) {{
                maxBottom = rect.bottom;
            if (rect.right > maxRight) {{
                maxRight = rect.right;
        // Adjust them based on scroll position
        let top = minTop + window.scrollY;
        let bottom = maxBottom + window.scrollY;
        let left = minLeft + window.scrollX;
        let right = maxRight + window.scrollX;
        // Apply padding
        // TO DO - apply margin?
        // margin demo:
        top = top - padding;
        bottom = bottom + padding;
        left = left - padding;
        right = right + padding;
        let div = document.createElement('div'); = 'absolute'; = top + 'px'; = left + 'px'; = (right - left) + 'px'; = (bottom - top) + 'px'; = '{border}'; = margin + 'px';
        div.setAttribute('id', %s);
        setTimeout(() => {{
        }}, 300);
        % (
            padding, margin,
    return selector_javascript, "#" + selector_to_shoot

async def screenshot_bounded_selectors(page, selectors=None,
    if selectors or selectors_all:
        selector_javascript, selector_to_shoot = _selector_javascript(
                selectors, selectors_all, padding, border, margin

        # Evaluate javascript
        await page.evaluate(selector_javascript)
    if (selectors or selectors_all) and not full_page:
        # Grab screenshot
        await page.locator(selector_to_shoot).screenshot(path=path)
        if selector_to_shoot:
            await page.locator(selector_to_shoot).hover()
        await page.screenshot(path=path)

    # Should we include a step to
    # then remove the selector_to_shoot and leave the page as it was
    if show:
    return path

Then we can use a simple script to grab a screenshot with a highlighted area:

from time import sleep
from playwright.async_api import async_playwright

playwright = await async_playwright().start()

# If we run headless--False, plawright will launch a 
# visible browser we can track progress in
browser = await playwright.chromium.launch(headless = False)

# Create a reference to a browser page
page = await browser.new_page()

# The page we want to visit
PAGE = ""

# And a locator for the "Accept all cookies" button
cookie_id = "#cassie_accept_all_pre_banner"

# Load the page
await page.goto(PAGE)

# Accept the cookies
await page.locator(cookie_id).click()

# The selectors we want to screenshot
selectors = []
selectors_all = [".int-grid3"]

# Grab the screenshot
await screenshot_bounded_selectors(page, selectors, selectors_all,
                                   border='5px solid red',

Or we can just sreenshot and highlight the element of interest:

Simon has an open shot-scraper issue on adding things like arrows to the screenshot, so this is obviously something that might repay a bit more exploration.

I note that the Jupyter accessibility docs have a section on the DOM / locator structure of the JupyterLab UI that include highlighted screenshots annotated, I think, using drawio. It might be interesting to try to replicate / automatically generate those, using playwright?

Finally, it’s worth noting that there is another playwright based tool, galata, that provides a set of high level tools for controlling JupyterLab and scripting JupyterLab actions (this will be the subject of another post). However, galata is currently a developer only tool, in that it expected to run against a wide open Juypter environment (no authentication), and only a wide open JupyterLab environment. It does this by over-riding playwright‘s .goto() method to expect a particular Jupyterlab locator, which means that if you want to test a deployment that sits behind the Jupyter authenticator (which is the recommended and default way of running a Jupyer server), or you want to go through some browser steps involving arbitrary web pages, you can’t. (I have opened an issue regarding at least getting through Jupyter authentication here and a related Jupyter discourse discussion here. What would be useful in the general case would be a trick to use a generic playwright script automate steps up to a JupyterLab page and then hand over browser state to a galata script. But I don’t know how to “just” do that. Which is why I say this is a developer tool, and as such is hostile to non-developer users, who, for example, might only be able to run scripts against a hosted server accessed through mutliple sorts of institutional and Jupyter based authentcation. The current set up also means it’s not possible to use galata out of the can for testing a deployment. Devs only, n’est-ce pas?

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: