Sustainability of the ipynb/nbformat Document Format

[Reposted from Jupyter discourse site just so I have my own copy…]

By chance, I just came across the Library of Congress Sustainability of Digital Formats that has a schema for cataloguing digital document formats as well as a set of criteria against which the sustainability of digital documents formats can be tracked.

Sustainability factors include:

There are also fields associated with Quality and functionality factors which for text documents include: normal rendering, integrity of document structure, integrity of layout and display, support for mathematics/formulae etc., functionality beyond normal rendering.

I note that .ipynb is not currently on the list of mentioned formats. Records for geojson and Rdata provide a steer for the sorts of thing that an ipynb record might initially contain. (I also note that Python / Jupyter kernels don’t have a standardised serialisation format akin to R’s .rdata workspace serialisation (dill goes some way to towards this, maybe also [](, maybe also data-vault). I also appreciate this is complicated by the wide variety of custom objects created by Python packages, but just as IPython supports rich display integration through __repr__ methods (see also the notes at the end of the IPython.display.display docs for a description of what methods are supported), it might also be timely to start thinking about __serialise__ methods (they may already exist; there is so much I don’t know about Python! I do know that things don’t always work though; eg Python’s json package in my py envt breaks when trying to serialise numpy.int64 objects…).)

There is now a significant number of notebooks on eg Github, as well as signs that notebooks are starting to be used as a publishing format (or at least, as a feedstock for publication, whether rendered using nbconvert or more elaborate tools such as Jupyter-book, nbsphinx, ipypublish, or howsoever).

I wonder if it would be timely to review the ipynb document format in terms of its sustainability and whether getting it included on the LoC list (or other appropriate forum) would be an appropriate thing to do for several reasons, including:

  • signals the existence of the document format to the Library / sustainability community in terms the are familiar with and may be able to help with;
  • help identify how nbformat should not develop in future in ways that might affect its sustainability as a format;
  • help identify things that might help improve its sustainability;
  • help inform workflows and behaviours regarding how eg cell metadata / tags feed into sustainability.

If .ipynb is to remain the core data-structure for representing Jupyter executable documents and their outputs, and as other third party applications (such as VSCode, or Google Colab) start to support the format, and if it doesn’t already exist, I also wonder whether a simple RFC style document (cf. the GeoJSON RFC) would be appropriate alongside the slightly less formal nbformat documentation as a formal statement of the document standard?

Interoperability is driven by convention as well as standard, and if we are going to see external services developing around Jupyter from individuals or organisations not previously associated with the Jupyter community, but offering interoperability with it, there needs to be a clear basis for what the standards are. This includes not just the base ipynb format, but also messaging and state protocols.

The nbformat format description docs pages seem to act as the normative reference work for the .ipynb standard, and I assume the Jupyter client – messaging docs are the normative reference for the client-server messaging? For ipywidgets, the widget messaging protocol and widget model state docs in the ipywidgets repo appear to provide the normative reference.

See also: this thing on Managing computational notebooks from a preservation perspective and the FAIR Principles (Findable – Accessible – Interoperable – Re-usable).

And on the question of being able to rebuild eg Dockerised environments, here and here.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: