Quandary… To Pandoc or Not? (Yet…)

Whilst listening in, via Skype, on the School meeting yesterday, I treated it as radio and also started tinkering with an XSLT converter for transforming OU-XML to something I can get into a Jupyter notebook form. (If anyone can point me to official OU XSLT transformers for OU-XML, that’d be really useful…)

I’m 15 years out of XSLT, so I’ve started with an easy converter into HTML that is most of the way there now for common OU-XML elements, as well as one that will convert into a markdown format supported by Jupytext, which would allow me to go OU-XML-md-ipynb. I also wonder if an OU-XML-ipynb (JSON) rout might be a useful exercise.

But then I started wondering… would it make more sense to try to get it into Pandoc? Pandoc recently announced Jupyter notebook/ipynb support as a native converter, so what are the routes in and out of pandoc?

Poking around, it seems that pandoc represents things internally using its own AST (abstract syntax tree). Pandoc filters let you write your own output filters for converting documents represented using the AST in whatever format you want. There are a couple of Python packages that support writing pandoc output filters: pandocfilters, which includes examples, and panflute (docs), which has a separate examples repo; there’s also this handy overview of Technical Writing with Pandoc and Panflute.

So that’s the first question: can I write a filter to generate a valid OU-XML document? OU-XML probably has some structural elements that are not matched by pandoc AST elements, but can these be encoded somehow as extensions to the AST, or represented as text elements in documents produced by pandoc that could be post-processed into OU-XML elements?

Going the other way, it seems that pandoc can ingest a JSON format that serialises the Pandoc AST structure, so if I can convert OU-XML into that then it would make life a lot easier for generating a wide range of output document formats from OU-XML.

The AST is documented here; we can also output documents in the serialised AST/json by using the json output format which could provide a useful crib…

So here’s the quandary… do I spend the rest of the morning finishing off my hack XSLT converter, or do I switch track and try to go down the pandoc route? Hmmm… maybe I should finish what I started: it’ll give me a bit more XSLT practice and should result in enough of an approximation of OU-XML content in notebooks that we can start to see whether that sort of conversion even makes sense.

PS For other possible routes to OU-XML, eg from a Jupyter notebook via an nbconvert template, see Initial Notes… Jupyter Notebook To OU-XML.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...