Figure Descriptions: Accessibility and Equivalent Experience

One of the things that course teams work hard at at the OU is making materials accessible. This isn’t just because as an educational institution there is a legal obligation to do so: it’s built into the institutional DNA.

In the course of a module production meeting yesterday we had a short workshop on a writing figure descriptions – long text descriptions that can provide a student with a screen reader with an equivalent experience of figure included in the course text, often in the form of a narrated description of the salient points in the image. For readers with a sight impairment, the long description may read out by a screen reader to provide an alternative to looking at the figure directly.

There is an art to writing text descriptions that I’m not sure I’ve ever mastered – I guess I should read the guidance produced by the UK Association for Accessible Formats (which I think draw on OU expertise).

There are some rules of thumb that I do try to bear in mind though (please feel free to correct me in the comments if you take issue with any of these): you don’t want to duplicate what’s in the text that refers to the figure, nor the figure caption. Where the sighted reader is expected to read something for themselves from the figure, you don’t want the figure description to describe the answer as well as the figure. Where the exercise is critiquing a figure, or learning how to read it or extract salient points from it in order to critique it (for example, in the case of an art history course), the long description shouldn’t give away the reading, highlight the salient point specifically, or turn into critique. Generally, the figure description shouldn’t add interpretation to the figure – that comes from the reading of the figure (or the figure description). You also need to take care about the extent which the figure description describes the semantics of the figure; for example, identifying a decision symbol in a flow chart as such (a semantic description) compared to describing it as a diamond (which you might want to do when teaching someone how to read a flow chart for the first time.

Sometimes, a figure appears in a document that doesn’t appear to need much of a description at all; for example, an image that appears purely as an illustration, a portrait of a historical figure, for example, whose brief biographical details appear in the main text. In such a case, it could be argued that a figure description is not really required, or if it is, it should be limited to something along the lines of “A portrait of X”. (A quick way in to generating the description for such an image might also be to refer to any search terms used to discover the image by the original author if it was discovered using a search tool…)

But if the purpose of the image is to break up the flow of the text on the printed page, give the reader a visual break in the text and a brief respite from reading, or help set the atmosphere of the reading, then what should an equivalent experience be for the student accessing the materials via a screen reader? For example, in the workshop I wondered whether the figure description should provide a poetic description to evoke the same sentiment that the author who included the image intended to evoke with it? (A similar trick applied in text is to include a quotation at the start of a section, or as an aside, for example.) A claim could be made that this provides information over and above that contained in the image, but if the aim is to provide an equivalent experience then isn’t this legitimate?

Similarly, if an image is used to lighten the presentation of the text on the page by introducing a break in the text, essentially including an area of white space, how might a light break be introduced into the audio description of the text? By changing the text-to-speech voice, perhaps, or its intonation? On the other hand, an interlude might break a sense of flow if the student is engaged with the academic text and doesn’t want the interruption of a aside?

Another example, again taken from the workshop, concerns the use of photographic imagery that may be intended to evoke a memory of a particular news event, perhaps through the use of an iconic image. In this case, the purpose of the imagery may be emotionally evocative, as well as illustrative; rather than providing a very simple, literal, figure description, could we go further in trying to provide an equivalent experience? For example, could we use a sound effect, perhaps overlaid with a recording of a news headline either taken from a contemporary radio news source (perhaps headed with leading audio ident likely to be familiar to the listener to bring to mind a news bulletin) or a written description then recorded by a voice actor especially to evoke a memory of the event?

In other words, to provide an equivalent experience, should we consider treating the figure description (which will be read by a screen reader) as a radio programme style fill where a sound effect, rather than just a text description, may be more appropriate? For a “poetic aside” intended to substitute for a visual break, should we use a prerecorded, human voice audio clip, rather than triggering the screen reader, even if with a different voice to break up the (audio) flow?

Just as an aside, I note that long descriptions are required for our electronic materials, but I’m not sure how they are handled when materials are produced for print? The OU used to record human readers reading the course texts delivered as audio versions of the course texts to students, presumably with the human reader also inserting the figure descriptions at an appropriate point. I wonder, did the person recording the audio version of the text use a different tone of voice for the different sorts of figures to break up the rest of the recorded text? I also wonder if rather than human reader voiced recordings, the OU now delivers electronic copies of documents that must be converted to speech by students’ own text-to-speech applications? In which case, how do the audio versions compare to the human recorded versions in terms of student experience and understanding?

A couple of other things I wondered about related to descriptions of “annotated” diagrams on the one hand, and descriptions of figures for figures that could be “written” (with the figures generated from the written description) on the other.

In the first case, consider the example of a annotation of a piece of python code, such as the following clumsy annotation of a Python function.


In this case, the figure is annotated (not very clearly!) in such a way to help a sighted reader parse the visual structure of a piece of code – there are semantics in the visual structure. So what’s the equivalent experience for an unsighted or visually impaired student using a screen reader? Such a student is likely to experience the code through a screen reader which will have its own idiosyncratic way of reading aloud the code statement. (There are also tools that can be used to annotate python functions to make them clearer, such as For an unsighted reader using a screen reader, an equivalent experience is presumably an audio annotated version of the audio description of the code that the student might reasonably expect their screen reader to create from that piece of code?

When it comes to diagrams that can be generated from a formally written description of them (such as some of the examples I’ve previously described here), where the figure itself can be automatically generated from the formal text description, could we also generate a long text description automatically? A couple of issues arise here relating to our expectations of the sighted reader for whom the figure was originally created (assuming that the materials are originally created with a sighted reader in mind), such as whether we expect them to be able to extract some sort of meaning or insight from the figure, for example.

As an example, consider a figure that represents a statistical chart. The construction of such charts can be written using formulations such as Leland Wilkinson’s Grammer of Graphics, operationalised by Hadley Wickham in the ggplot2 R library, (or the Yhat python clone, ggplot). I started exploring how we could generate a literal reading of a chart constructed using ggplot (or via a comment, in matplotlib) in First Thoughts on Automatically Generating Accessible Text Descriptions of ggplot Charts in R; a more semantic reading would come from generating text about the analysis of the chart, or describing “insight” generated from it, as things like Automated Insights’ Wordsmith try to do (eg as a Tableau plugin).

Something else I picked up on in passing was that work is ongoing in making maths notation expressed in MathJax accessible via a browser using screen readers (this project maybe? MathJax a11y tool). By the by, it’s perhaps worth noting that MathJax is used to render LaTeX expressions from Jupyter markdown cells, as well as output cells of a Jupyter notebook. In addition, symbolic maths expressions described using sympy are rendered using MathJax. I haven’t tested maths expressions in the notebooks with the simple jupyter-a11y extension though (demo; I suspect it’s just the LaTeX that gets read aloud – I haven’t tested it…) It would be interesting to see hear how well maths expressions rendered in Jupyter notebooks are supported by screen reader tools.

Finally, I realise that I am writing from my own biased perspective and I don’t have a good model in my head for how our unsighted students access our materials – which is more fault me. Apologies if any offence caused – please feel free to correct any misunderstandings or bad assumptions on my part via the comments.

PS one thing I looked for last night but could find were any pages containing example HTML pages along with audio recordings of how a user using a screen reader might hear the page read out. I know I should really install some screen reader tools and try them out for myself, but it would take me time to learn them. Seeing examples of variously complex pages – including ones containing maths expressions, figure descriptions, and so on, and how they sound when rendered using a screen a reader as used by an expert user, would be a useful resource I think?

PPS Of course, when it comes to figure captions for illustrative imagery, we could always give the bots a go; for example, I notice this just appeared on the Google Research blog: Show and Tell: image captioning open sourced in TensorFlow.