A Quick Look at the Quarto Pandoc Publishing System and Visual Markdown Editor

Picking up on Noting: the Quarto Rich Authoring Environment for Generative Texts, here’s a quick review of a few things that jumped out at me about Quarto, “[a] scientific and technical publishing system built on Pandoc”.

This is of particular interest to me at the moment giving I’ve recently been lobbying internally for Pandoc support for OU-XML. (I say “recently”, but I’ve been idly pitching it once every few months or so for at least a couple of years.;-)

The main objectives of the pitch were:

  • To create a conversion utility to support the conversion between OU-XML and Jupyter notebook (ipynb) formats, to support Computing, Mathematics, and Statistics modules currently in production
  • To create a conversion utility for converting between OU-XML and markdown, to support the production of FutureLearn courses and Microcredentials

The folk over at RStudio have obviously seen value in Pandoc as a tool for generating a wide range of potential output document types, including PDF, MS Word, MS PowerPoint, HTML slides and EPUB ebooks, as well good ol’ HTML. And they have not unreasonably taken the view that using Markdown as a base document format, albeit with certain extensions, simplifies the tooling on the editor side. That and the fact that they have been using Pandoc since forever for knitr and bookdwon publishing workflows, and were already on the way to developing a rich visual editor as part of RStudio.

For me, the ability to point to a visual editor already hooked into Pandoc is a timely piece of evidence I can appeal to as to why Pandoc makes sense as part of an (authoring and) publishing system: even if we only use it for generating previews, if Pandoc can be used to export the created content into OU-XML, we can continue to use our current publishing processes to render content published to students. (Or we might choose to explore Pandoc as a generator of output format documents too.) And as more modules show an interest in using Jupyter notebooks, we could also use OU-XML, vis Pandoc conversion, to gold-master notebooks if anyone really felt the need to. We could also author VLE content using notebooks and convert it to OU-XML, or transform legacy content into noteboooks. I also still think we can use things like notebooks to author content across the curriculum for OU-XML mediated publication via the VLE and other “traditional” channels (see various subject specific demos in Subject Matter Authoring Using Jupyter Notebooks, for example, or some notes on generating reusable educational assets particularly in the “generative production” context). And so on.

The TL:DR as far as Quarto goes is that we can considering the Quarto publishing system in three parts:

  • the visual editor: visual markdown editor that allows toggling between visual preview and source markdown view;
  • the publishing tools (the quarto CLI): command line tools for converting between the new .qmd document format and the Jupyter notebook .ipynb format;
  • the .qmd file format.

The visual editor is currently only available bundled into an RStudio preview (there is a visual editor in RStudio, but maybe not as feature rich?) The editor can be used to edit a new project type, a Quarto Project.

This seems to allow you specify a connection to an automatically discovered Jupyter kernel. I’m pretty sure this is the first time I’ve seen RStudio starting to integrate with a Jupyter server, and it represents another sign that Jupyter mediated remote code execution environments are starting to become part of a wider ecosystem independently of the Jupyter user interfaces such as Jupyter classic notebook, JupyterLab and RetroLab. (See my glossary which explains in part why Jupyter is NOT notebooks...)

As with a lot of emerging tech, the UI is the bit that takes forever to get right even if the rest of the technology stack is mostly working and completely useable by folk who are prepared to get their hands dirty and put up with the occasional snafu. That said, I didn’t actually get much further with the RStudio preview Quarto editor – I wanted to try out Python projects but trying to run any Python code just caused an error in trying to find a Python environment, suggesting I install miniconda. Checking other RStudio projects where I had successfully been running python code cells, and they were now all borked too. So the way the new RStudio preview is using reticulate seems to be broken for me at least (I’ve since rolled back to the latest official RStudio release and my old py code chunks are happily running again…). In retrospect, I should have played with the editor features and tried to run some R code before I reverted. It’s probably also worth wating for things to become a little more stable and wait for the standalone editor to appear before I play with it again becuase that’s probably the sort of tool that would be most likely to be considered as appropriate internally. In the meantime, the current official RStudio release does have a basic visual editor, so visual editing of Rmd files etc is possible, if not full integration with the quarto publishing tools.

Suffice to say that from the docs it looks as if there are quite a few UI and underlying framework tools to support technical writing (equations, figure references, citations, footnotes etc.) and more general content authoring (tables, lists, spell-checking, non-breaking spaces and so on) as well as executable code and the embedding of generated code outputs, as well as the display of non-executable code.

The publishing tools are presented as a set of CLI tools that include conversion tools and previewing tools. Out of the can, R, Python and Observable JavaScript code chunks are supported (I’m not sure about the extent to which arbitrary language code can be executed via an appropriate Jupyter kernel?).

The command line toos include:

  • quarto create-project PROJDIR: create a new project in the current or specified directory;
  • quarto render DIRECTORY: render files in the current or specifed directory; the --execute flag will execute code blocks before rendering the output; where possible, knitr cache and jupyter cache may be used to to cache cell outputs;
  • quarto preview DIRECTORY: render and preview files in the current or stated directory using a live-reloading development server. The preview view will update if any updates to project files are saved;
  • quarto convert FILENAME: convert a Jupyter .ipynb document to .qmd format and vice versa.

Under the hood, there seems to be yet another yet another markdown format: .qmd. This format resembles an earlier yet another markdown format, .Rmd, also managed by RStudio, but tries to abstract away from the original R basis of .Rmd. The minor differences perhaps also make it a little easier to handle Pandoc attributes or bring it more into line with Pandoc Markdown. Some of the notable points about the specifics of the Markdown flavour are summarised here.

I don’t know if there is a formal description of the .Rmd or .qmd formats anywhere (I struggle to call them “standards” because I can’t find a reference standard for theml; the best reference for .Rmd appears to be the rstudio/rmarkdown repo, but I didn’t spot even an .Rmd validator there?) so it’s not immediately obvious about how easy it might be to map between something like MyST and .qmd. Support for .qmd within Jupytext looks to be well on the way (eg see this issue) as a result of integrating the quarto conversion tools into jupytext. This strikes me as a bit risky: the .qmd “standard” is under the control of RStudio rather than an open community process, and the quarto conversion tool is also controlled by RStudio, albeit from code in a public repo with a GPL licence.

I’ll do a proper comparison of .qmd, Jupytext markdown and MyST in another post somewhen soon… In part, I’m waiting for a PR to appear that offers .qmd support in Jupytext just in case there are minor tweaks that result from getting that PR through.

From a quick play with the quarto conversion tools to convert a notebook tagged with cell tags, cell level metadata seems to be handled by special comment lines (with a #| prefix at the start of a code chunk.

In the jupytext issue thread, there’s also comment regarding things such as “re-writ[ing] the [.qmd] cell option YAML into tags (e.g. #| echo: false needs to become tags: ["remove_cell"])”. This immediately flags to me that we could very quickly start getting a combinatorial explosion in the number of ways that “sort of” equivalenent tags and markup need to transformed across different markup styles. For example, I’ve already started hacking together string’n’glue tools to convert the tags I use to trigger extension powered styling in classic notebooks to MyST markup that works with Jupyter Book, and I also noted in this issue that a JupyterLab extension that supposedly replicates a classic notebook extension I use (collapsible headings) uses a different metadata convention to the original extension.

A screenshot from the quarto docs also shows how the YAML header information needs to be captured in the form of a raw code cell at the start of a notebook. This makes the notebook look a little bit ragged when viewed in a Jupyter notebook editing UI and would perhaps be cleaner if it could be hidden as notebook level metadata. However, the YAML header is also familiar e.g. as part of Jupytext markdown and MyST frontmatter. Hopefully, there won’t be too many differences or inconsistencies or discontinuities between these various elements and any conversion required will be quite obvious.

The format is also compliant with the approach used by Papermill for support launching parameterised notebooks and adopts the same convention for specifying notebook parameters.

There is a discussion ongoing about supporting .qmd in Jupytext, although it all feels quite ad hoc, and issues surrounding things like how to handle extensible style concepts (hidden cells and cell outputs, admonitions etc.) seems out of scope of that discussion.

Certainly, there doesn’t yet seem to be clear leader, yet alone winner, in terms of one true standard to rule them all for marking up rich/high level/interactive presentation style features in executable markdown docs. (My personal preference is probably starting to tend towards MyST and it’ll be interesting to see if a quarto editor gets uptake and drives adoption of .qmd).

On my to do list is a proper comparison of how the Quarto “callout” markup maps onto MyST/Jupyter Book “admonitions” for example.

That said, as long as lossless mappings between formats are supported using something like Jupytext, I guess it’s not a blocking issue if different tools prefer different native formats? However, it may be that jupytext needs a plugin mechanism so that folk can plug in their own conversion tools to convert particular tag formats of conventions used by one publishing framework such as Jupyter Book to work with another such as nbsphinx, quarto, knitr or stencila.

PS My OU-XML/Pandoc proposal (copies of the proposal in its various incarnations available on request) seems to have got bogged down by loads of meetings and stage gate processing so it may or may not happen. I’m happy to help work through the technical issues, but not waste more time restating arguments about why it’s a good idea and rewriting the proposal again and again to include elaborate plans, goals and objectives that will probably need reviewing at first contact with the detail. We should just try to build the thing and it’ll either work losslessly or it won’t; and if it won’t, it’ll work usefully or it won’t. And in the double won’t case, we’ll have learned more about OU-XML as a format, OU-XML as it’s actually used to mark up modules, and learned something about Pandoc, so we’ll have a net win of learning something anyway (the project stream it was pitched into was called Test and Learn… Unless, of course, any work is put out to external consultants or contractors, in which case we’ll continue to not learn internally about contemporary tooling, whether or not we adopt it. Go figure…).

PPS In the meantime, I keep meaning to revisit my hacky OU-XML XSLT tools to see how much OU-XML structure I can force into MyST or Jupytext-markdown format (and/or maybe even the new Quarto markdown format, .qmd. And given pandoc tools work with those formats, it’ll give a hacky way round the edges to give some sort of crappy as-if/maybe-works-a-bit conversion from OU-XML into notebook format, if not the reverse. The reverse bit (md2OU-XML) is where the institution gets value in terms of being able to use content authored in notebook / RStudio / Quarto like environments in OU workflows. The forward path (OU-XML2md) allows us to get content out of the gold-master format and into useful formats and environments ex- of the official publishing route. I now realise why folk are reluctant to engage: they don’t want to lose content into arbitrary formats and publishing environments (the forward path), and they don’t want folk authoring using arbitrary authroing tools (the reverse path) that might not create valid OU-XML (not that the schema is easy to find to validate files…). And they don’t realise that even it it only half works, it might still be quicker than, and even just as effective as, the shonky processes we already have in place. And it might also act as a driver to moving towards a simpler document format, which I seem to recall was mooted in the last few years. [For ref, internal folk can see the current OU-XML tag structure here.]

PPPS In passing, I should probably also do a review at some point of the Curvenote editor which builds on Prosemirror and provides a rich browser based editor. (I don’t think there’s a free standing demo, but you can try it out via the Curvenote collaboration website which offers notebook file sharing and commenting via a browser extension.)

The Stencila editor is probably also worth revisiting, though to my mind this is now way to complex for me to try out. A March 2021 blog post suggested that Stencila integration with the Libero XML editor would start to offer the ability to “encode code chunks and expressions as JATS XML, … allow[ing] on-the-fly conversion of executable formats such as Jupyter Notebooks and R Markdown, [so that they could] be opened within Libero Editor”. An electron based desktop app was also promised, but I’ve not seen any sign of it as yet (the original Stencila app that I had high hopes for was deprecated years ago). UPDATE: in progress here.

And finally, whilst on the topic of XML editors, I’ve just started exploring this XML editing extension for VS Code: redhat-developer/vscode-xml.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: