Jupyter Notebooks as Part of a Publishing System – “Executable” Inline Maths and Music Notations

One of the books I’m reading at the moment is Michael Hiltzik’s Dealers of Lightning: Xerox PARC and the Dawn of the Computer Age (my copy is second hand, ex-library stock…), birthplace to ethernet and the laser printer, as well as many of the computer user interactions we take for granted today. One thing I hadn’t fully appreciated was Xerox’s interests in publishing systems, which is in part what put it in mind for this post. The chapter I just finished reading tells of their invention of a modeless, WYSIWYG word processor, something that would be less hostile than the mode based editors of the time (I like the joke about accidentally entering command mode and typing edit – e: select entire document, d: delete selection, i:insert, t: the letter inserted. Oops – you just replaced your document with the letter t).

It must have been a tremendously exciting time there, having to invent the tools you wanted to use because they didn’t exist yet (some may say that’s still the case, but in a different way now, I think: we have many more building blocks at our disposal). But it’s still an exciting time, because while a lot of stuff has been invented, whether or not there is more to come, there are still ways of figuring out how to make it work easier, still ways of figuring out how to work the technology into our workflows in more sensible way, still many, many ways of trying to figure out how to use different bits of tech in combination with each other in order to get what feels like much more than we might reasonably expect from considering them as a set of separate parts, piled together.

One of the places this exploration could – should – take place is in education. Whilst at HE we often talk down tools in place of concepts, introducing new tools to students provides one way of exporting ideas embodied as tools into wider society. Tools like Jupyter notebooks, for example.

The  more I use Jupyter notebooks, the more I see their potential as a powerful general purpose tool not just for reproducible research, but also as general purpose computational workbench and as a powerful authoring medium.

Enlightened publishers such as O’Reilly seem to have got on board with using interactive notebooks in a publishing context (for example, Embracing Jupyter Notebooks at O’Reilly) and colleges such as Bryn Mawr in the US keep coming up with all manner of interesting ways of using notebooks in a course context – if you know of other great (or even not so great) use case examples in publishing or education, please let me know via the comments to this post – but I still get the feeling that many other people don’t get it.

“Initially the reaction to the concept [of the Gypsy, GUI powered wordprocessor that was to become part of the Ginn publishing system] was ‘You’re going to have to drag me kicking and screaming,'” Mott recalled. “But everyone who sat in front of that system and used it, to a person, was a convert within an hour.”
Michael Hiltzik, Dealers of Lightning: Xerox PARC and the Dawn of the Computer Age, p210

For example, in writing computing related documents, the ability to show a line of code and the output of that code, automatically generated by executing the code, and then automatically inserted into the document, means that when writing code examples, “helpful corrections” by an over-zealous editor go out of the window. The human hand should go nowhere near the output text.


Similarly when creating charts from data, or plotting equations: the charts should be created from the data or the equation by running a script over a source dataset, or plotting an equation directly.


Again, the editor, or artist, should have no hand in “tweaking” the output to make it look better.

If the chart needs restyling, the artist needs to learn how to use a theme (like this?!) or theme generator rather then messing around with a graphics package (wrong sort of graphic). To add annotations, again, use code because it makes the graphic more maintainable.


We can also use various off-the-shelf libraries to generate HTML/Javascript fragments for creating inline interactives that can be embedded within the notebook, or saved and then reused elsewhere.


There are also several toolkits around for creating other sorts of diagram from code, as I’ve written about previously, such as the tools provided on blockdiag.com:


Aside from making diagrams more easily maintainable, rendering them inline within a Jupyter notebook that also contains the programmatic “source code” for the diagram, written diagrams also provide a way in to the automatic generation of figure londesc text.

Electrical circuit schematics can also be written and embedded in a Jupyter notebook, as this Schemdraw example shows:


So far, I haven’t found an example of a schematic plotting library that also allows you to simulate the behaviour of the circuit from the same definition though (eg I can’t simulate(d, …) in the above example, though I could presumably parameterise a circuit definition for a simulation package and use the same parameter values to label a corresponding Schemdraw circuit).

There are some notations that are “executable”, though. For example, the sympy (symbolic Python) package lets you write texts using python variables that can be rendered either as a symbol using mathematical notation, or by their value.


(There’s a rendering bug in the generated Mathjax in the notebook I was using – I think this has been corrected in more recent versions.)

We can also use interactive widgets to help us identify and set parameter values to generate the sort of example we want:


Sympy also provides support for a wide range of calculations. For example, we can “write” a formula, render it using mathematical notation, and then evaluate it. A Jupyter notebook plugin (not shown) allows python statements to be included and executed inline, which means that expressions and calculations can be included – and evaluated – inline. Changing the parameters in an example is then easy to achieve, with the added benefit that the guaranteed correct result of automatically evaluating the modified expression can also be inlined.


(For interactive examples, see the notebooks in the sympy folder here; the notebooks are also runnable by launching a mybinder container – click on the launch:binder button to fire one up.) 

As well as writing mathematical expressions than can be both expressed using mathematical notation, and evaluated as a mathematical expression, we can also write music, expressing a score in notational form or creating an admittedly beepy audio file corresponding to it.


(For an interactive example, run the midiMusic.ipynb notebook by clicking through on the launch:binder button from here.)

We can also generate audio files from formulae (I haven’t tried this in a sympy context yet, though) and then visualise them as data.


Packages such as librosa also seem to provide all sorts of tools for analysing an visualising audio files.

When we put together the Learn to Code MOOC for FutureLearn, which uses Jupyter notebooks as an interactive exercise environment for learners, we started writing the materials in (web pages for the FutureLearn teaching text, notebooks for the interactive exercises) in Jupyter notebooks. The notebooks can export as markdown, the FutureLearn publishing systems is based around content entered as a markdown, so we should have been able to publish direct from the notebooks to FutureLearn, right? Wrong. The workflow doesn’t support it: editor takes content in Microsoft Word, passes it back to authors for correction, then someone does something to turn it into markdown for FutureLearn. Or at least, that’s the OU’s publishing route (which has plenty of other quirks too…).

Or perhaps will be was the OU’s publishing route, because there’s a project on internally (the workshops around which I haven’t been able to make, unfortunately) to look at new authoring environments for producing OU content, though I’m not sure if this is intended to feed into the backend of the current route – Microsoft Word, Oxygen XML editor, OU-XML, HTML/PDF etc output – or envisages a different pathway to final output. I started to explore using Google docs as an OU XML exporter, but that raised little interest – it’ll be interesting to see what sort of authoring environment(s) the current project delivers.

(By the by, I remember being really excited about the OU-XML a publishing system route when it was being developed, not least because I could imagine its potential for feeding other use cases, some of which I started to explore a few years later; I was less enthused by its actual execution and the lack of imagination around putting it to work though… I also thought we might be able to use FutureLearn as a route to exploring how we might not just experiment with workflows and publishing systems, but also the tech – and business models around the same – for supporting stateful and stateless interactive, online student activities. Like hosting a mybinder style service, for example, or embedded interactions like the O’Reily Thebe demo, or even delivering a course as a set of linked Jupyter notebooks. You can probably guess how successful that’s been…)

So could Jupyter notebooks have a role to play in producing semi-automated content (automated, for example in the production of graphical objects and the embedding of automatically evaluated expressions)? Markdown support is already supported and it shouldn’t take someone too long (should it?!) to put together an nbformat exporter that could generate OU-XML (if that is still the route we’re going?)? It’d be interesting to hear how O’Reilly are getting on…

Whatever, again…

Pondering A Remote Robot Lab

Several years ago, I used to run a joint EPSRC & AHRB funded research network, the Creative Robotics Research Network (CRRN). The idea behind the network was to provide a forum for academics and practitioners with an interest in creative applications of robotics to share ideas, experience and knowledge.

We had a lot of fun with the network – the mailing list was active, we hosted several events, and on one network visit to a special effects company, I have a hazy memory of flamethrowers being involved… Erm…

Anyway, last weekend I went to a Raspberry Pi hackday organised by ex-IW resident Dr Lucy Rogers at Robin Hill, site of the Bestival for any festival goers out there, and currently taking the form of the electric woods, an atmospheric woodland sound and light show with a great curry along the way. If you can get on to the Island for half term, make an evening of it…

The event was sponsored by Alec Dabell, owner of Vectis Ventures, who also run the Island’s theme park – Blackgang Chine. (If you’ve ever holidayed on the Island as a child or with kids of your own, you’ll know it..:-) The idea? To play with some tech that can be worked up for controlling Blackgang’s animatronic dinosaurs or the light shows at Robin Hill and Blackgang Chine, as well as learning something along the way. (IBM’s Andy Stanford-Clark, another Island resident, pitched in with a talk on Lora, a low power wifi protocol for the internet of things, as well as being on hand to help out with those of us getting to grips with NodeRED and MQTT for the first time ;-)

Here’s a clip from a previous event…

Also at the event was another ex-CRRN member, Mat Walker, with his latest creation: Ohbot.

Designed as a desktop “talking head” robot for educational use, the Arduino controlled Ohbot has seven servos to control the motion of the head, lips and eyes and eyelids, as well as colour LEDs in the eyes themselves.


Text-to speech support also provides a good motivation for trying to get the lip synching to work properly. The Ohbot has a surprisingly expressive face, more so even than the remarkably similar one rendered in the simulator that comes as part of the programming environment. With an extra web cam, Ohbot can be programmed to move its head – and eyes – to follow you around the room…

Needless to say, Ohbot got me thinking… And here’s how…

One of the things being developed in the OU at the moment is a remote engineering lab, part of the wider OpenSTEM lab. The engineering lab, which is being put together by uberhacker Tim Drysdale, should go live to second year equivalent OU engineering students in October next year (I think?) and third year equivalent students the year after.

The lab itself has multiple bays for different physical experiments, with several instances of each experiment to allow several student individual access to the same experiment at the same time.

One of the first experiments to be put together is a mechanical pendulum – students can log in to the apparatus, control the motion of the pendulum, and observe in real time it’s behaviour via a live video feed, as well as data traces from instrumentation applied to the apparatus. One of the things Tim has been working on is getting the latency of the control signals and the video feed right down – and it seems to be looking good.


Another couple of courses in production at the OU at the moment are two first year equivalent computing courses. The first one of these teaches students basic programming using Scratch (I have issues with this, but anyway…); Ohbot also uses a blockly style user interface, although it’s currently built just for Windows machines, I think?

Hmmm… as part of the Open Engineering Lab, the OU has bought three (?) Baxter robots, with the intention that students will be able to log in and programmatically control them in real time. I seem to recall there was also some discussion about whether we could run some Lego EV3 robots, perhaps even mobile ones. The problem with mobile robots, of course, is the “activity reset” problem. The remote experimentation lab activities need to run without technician support, which means they need to clear down in a safe way at the end of each student’s activity andd reset themselves fro the next student to log in to them. With mobile robots, this is an issue. But with Ohbot, it should be a doddle? (We’d probably have to rework the software, but that in turn maybe something that could be done in collaboration with the Ohbot guys…)

Keenly priced at under a couple of hundred squids, with sensors, I can easily image a shelf with 8 or so Ohbot bays providing an interactive remote robot programming activity for our first year computing, as well as engineering, students. The question is, can I persuade anyone else that this might be worth exploring..?

Computers May Structure the World But We Don’t Make Use of That

An email:


Erm… a Word document with some images and captions – styled as such:


Some basic IT knowledge – at least – it should be basic in what amounts to a publishing house:


The .docx file is just a zip file… That is, a compressed folder and its contents… So use the .zip

So here’s the unzipped folder listing – can you spot the images?


The XML content of the doc – viewed in Firefox (drag and drop the file into a Firefox browser window). Does anything jump out at you?


Computers can navigate to the tags that contain the caption text by looking for the Caption style. It can be a faff associating the image captions with the images though (you need to keep tallies…) because the Word XML for the figure doesn’t seem to include the filename of the image… (I think you need to count your way through the images, then relate that image index number with the following caption block?)

So re: the email – if authors tag the captions and put captions immediately below an image – THE MACHINE CAN DO IT, if we give someone an hour or two to knock up the script and then probably months and months and months arguing about the workflow.

PS I’d originally screencaptured and directly pasted the images shown the above into a Powerpoint presentation:


I could have recaptured the screenshots, but it was much easier to save the Powerpoint file, change the .pptx suffix to .zip, unzip the folder, browse the unzipped Powerpoint media folder to see which image files I wanted:


and then just upload them directly to WordPress…

See also: Authoring Multiple Docs from a Single IPython Notebook for another process that could be automated but lack of imagination and understanding just blanks out.

And the Library Said: “Thou Shalt Learn to DO Full References But We Will Not Allow You to Search By Them”

OU Library guidance to students on citations for journal articles reads as follows:


Using this reference I should be able to run a pretty good known item search – or not, as the case may be?


So where does the full reference  – Journal, for example – help exactly? On Google, maybe… (Actually, the one search may search all tuples in different fields – so the title as well as a the journal title – and generate retrieval/ranking factors based on that?)

References and search contexts are complementary  – for a reference to be effective,it needs to work with your search context, which typically means the user interface of your search system  – for a specific known item reference this typically means the (hidden away) advanced search interface.

So I wonder: whilst we penalise students from not using full, formal references (even though they often provide enough of a reference to find the item on Google), the officially provided search tools don’t let you use the information in the formal reference in a structured way to retrieve and hopefully access (rather than  discover – the reference is the discovery component) the desired item?

Or am I reading the above search UI incorrectly…?

PS in terms of teaching material design, and referencing the above citation example, erm….?


Because of course I’m not searching for a Journal that has something to do with Frodo Baggins – I’m searching for an article

PPS I’m also finding more and more that the subscription journal content I want to access is from journals that the OU Library doesn’t subscribe to. I’m not sure how many of the journals it does subscribe to that are bundled are never accessed (the data should reveal that)? So I wonder – as academics (and maybe students), should we instead be given a budget code we could use to buy the articles we want? And for articles used by students in courses, get a ‘site license” for some articles?

Now what was the URL of that pirated academic content site someone from the library told me about again…?

PS from the Library – don’s use the reference – just bung the title in the search onebox like you would do on a web search engine..


Hmm… but if I have a full reference, I should be able to run a search that returns just a single result, for exactly the item I want? Or maybe returns links to a few different instances (from different suppliers) of just that resource? But then – which is the preferred one (the Library search ranks different suppliers of the same work according to what algorithm?)

Or perhaps the library isn’t really about supporting know item retrieval – it’s about supporting serendipity and the serendipitous discovery of related items? (Though that begs the question about how the related item list is algorithmically generated?)

Or maybe ease of use has won out – and running a scruffy search then filtering down by facet gives a good chance of effective retrieval with an element of serendipity around similar resources?

(Desperately tries to remember all the arguments libraries used to make against one box searching…)

Google Asks the Question

Via my feeds, a post on the Google Operating System blog that notes Google Converts Queries Into Questions:

When searching for [alcohol with the highest boiling], Google converted my query into a question: “Which alcohol has the highest boiling point?”

Me too:


I ran a related query – alcohol with highest boiling point – which offered a range of related questions, albeit further down the results list:


Google results trying to draw you into a conversation – and hence running more queries (or questions…)?

Libraries Are Where You Go to Help Make Sense of The World

After a break of a couple of years, I’ll be doing a couple of sessions at ILI 2016 next week; for readers with a long memory, the Internet Librarian International conference is where I used to go to berate academic librarians every year about how they weren’t keeping up with the internet, and this year will perhaps be a return to those stomping grounds for me ;-)

One of the questions I used to ask – and probably still could – was where in the university I should go to get help with visualising a data set, either to help me make sense of it, or as part of a communications exercise. Back when IT was new, libraries used to be a place you could go to get help with certain sorts of information skills and study skills (such as essay writing skills), which included bits of training and advice on how to use appropriate software applications. As the base level in digital information skills increases – many people are able to figure out how to open a spreadsheet on their own.


But has the Overton window for librarians offering IT support developed with the times? Where should students wanting to develop more refined skills – how to start cleaning a dataset, for example, or visualising one sensibly, or even just learning how to read a chart properly on the one hand, or tell a story with data on the other – actually go? And what about patrons who want to be able to make better use of automation to help them in their information related tasks (screenscraping, for example, or extracting text, images or tables from hundreds of pages of PDFs); or who want help accessing commodity “AI” services accessed via APIs? Or need support in writing scientific communications that sensibly embed code and its outputs, or mathematical or musical notation, particular for a web based (i.e. potentially interactive) journal or publication? Or who just need to style a chart in a particular way?

Now it’s quite likely that, having been playing with tech for more years than I care to remember, I’m afflicted by “the curse of knowledge“, recently contextualised for libraries by Lorcan Dempsey, quoting Steven Pinker. In the above paragraph, I half-assume readers know what screenscraping is, for example, as well as why it’s blindingly obvious (?!) why you might want to be able to do it, even if you don’t know how to do it? (For librarians, there’s a couple of things to note there: firstly, what it is and why you might want to do it; secondly, which might be a referral to tools, if not training, what sorts of tool might be able to help you with it.)

But the question remains – there’s a lot of tech power tools out there that can help you retrieve, sort, search, analyse, organise and present information out there, but where do I go for help?

If not the library, where?

If not the library, why not?

The end result is often: the internet. For which, for many in the UK, read: Google.

Anyway, via the twitterz (I think…) a couple of weeks ago, I spotted this interesting looking job ad from Harvard:

Visualization Specialist
School/Unit Harvard College Library
Location USA – MA – Cambridge
Job Function Library
Time Status Full-time
Department Harvard College Library – Services for Maps, Media, Data, and Government Information

Duties & Responsibilities – Summary
Reporting to the Head, Social Sciences and Visualization in the unit for Maps, Media, Data and Government Information, the Visualization Specialist works with staff and faculty to identify hardware and software needs, and to develop scalable, sustainable practices related to data visualization services. This position designs and delivers workshops and training sessions on data visualization tools and methods, and develops a range of instructional materials to support library users with data visualization needs in the Social Sciences and Humanities.

The Visualization Specialist will coordinate responsibilities with other unit staff and may supervise student employees.

Duties and Responsibilities
– Advises, consults, instructs, and serves as technical lead with data visualization projects with library, faculty teaching, and courses where students are using data.
– Identifies, evaluates and recommends new and emerging digital research tools for the Libraries and Harvard research community.
– Develops and supports visualization services in response to current trends, teaching and learning–especially as it intersects with Library collections and programs.
– Collaborates in developing ideas and concepts effectively across diverse interdisciplinary audiences and serves as a point person for data visualization and analysis efforts in the Libraries and is attuned to both the quantitative and qualitative uses with datasets. Understands user needs for disseminating their visualizations as either static objects for print publications or interactive online objects to engage with.
– Develops relationships with campus units supporting digital research, including the Center for Government and International Studies, Institute for Quantitative Social Sciences, and Harvard Library Central Services, and academic departments engaged in data analysis and visualization.
– Develops, collects, and curates exemplar data sets from multiple fields to be used in visualization workshops and training materials

Basic Qualifications
– ALA-accredited master’s degree in Library or Information Science OR advanced degree in Social Sciences, Psychology, Design, Informatics, Statistics, or Humanities.
– Minimum of 3 years experience in working with data analysis and visualization in an academic setting.
– Demonstrated experience with data visualization tools and programming libraries.
– Proficiency with at least one programming language (such as Python and R).
– Ability to use a variety of tools to extract and manipulate data from various sources (such as relational databases, XML, web services and APIs).

Additional Qualifications

Experience supporting data analysis and visualization in a research setting.

Proficiency using tools and programming libraries to support text analysis.
Familiarity with geospatial technology.
Experience identifying and recommending new tools, technologies, and online delivery of visualizations.
Graphic design skills and proficiency using relevant software.

Many of the requisite skills resonate with the calls (taunts?) I used to make asking library folk where I should go for support with data related based questions. At which point you may be thinking – “okay, techie geeky stuff… scary…not our gig…”.

But if you focus of data visualisation many of which actually relate to communication issues – representation and presentation – rather than technical calls for help. For example, what sort of chart should I use to communicate this sort of thing? How can I change the look of a chart? How can I redesign a chart to help me communicate with it better?

And it’s not just the presentation of graphical information. Part of the reason I put together the F1 Data Junkie book was that I wanted to explore the RStudio/RMarkdown (Rmd) workflow for creating (stylish) technical documents. Just the other day I noticed that in the same way charts can be themed, themes for Rmd documents are now starting to appear – such as tint (Tint Is Not Tufte); in fact, it seems there’s a whole range of output themes already defined (see also several other HTML themes for Rmd output).


What’s nice about these templates is that they are defined separately from the actual source document. If you want to change from one format to another, things like the R rticles package make it easy. But how many librarians even know such workflows exist? How many have even heard of markdown?

It seems to me that tools around document creation are in a really exciting place at the moment, made more exciting once you start to think about how they fit into wider workflows (which actually makes them harder to promote, because folk are wedded to their current crappy workflows).

So are the librarians on board with that, at least, given their earlier history as word-processor evangelists?

See also: A New Role for the Library – Gonzo Librarian Informationista, including the comment Notes on: Exploring New Roles for Librarians. This also touches on the notion of an embedded librarian.

And this, from Martin Bean, previously VC of the OU, several years ago…

Fragment – Using Raspberry Pi as a Course Software Runner?

For many years now, the OU has required students to have access to a computer in order to access online course materials and run course related software. A minimum specification computer is specified (2GB of RAM) and is supposedly platform neutral.

Putting together the headless TM351VM, which uses a virtual machine running on a host O/S, we needed to conform to this minimum spec; the VM we ended up with requires 1GB of RAM and takes up about 12-15GB of space (with gubbins), though it only needs at most about 8 GB of that.


In the previous post, I described how I recently got a Raspberry Pi Up and Running for the first time. And it got me thinking (again) about how we deliver software applications to students with the minimum of setup requirements.

1GB RAM…. 8-16 GB free space…

About the spec for a Raspberry Pi 3 with a cheap memory card?

So imagine this – students joining the university are given a Raspberry Pi 3 in a nicely branded box, along with an ethernet cable to attach it to their computer directly or to a wifi router; course software is designed to run as a service accessed via a browser, and to meet the Raspberry Pi 3 spec (which is on a par with what’s left over from a current min spec machine once its own O/S and background services have been taken into account).

The software for a particular course is issued on a course specific micro-SD card, supplied in a larger, OU and course branded SD card holder.

The micro-SD card contains a “course image” containing headless services that autorun on startup; the device is named with a suitably discoverable name – OU.local; a simple web server is found on the default http port and lists local URLs to locally running services on the PI and perhaps also links to the VLE and other online course resources. (This reminds me of the first browser based course materials I had a hand in in 1999 or so – an eSG – electronic study guide – that delivered locally installed HTML interactive content and linked applications, as well as links to online materials resources – for a mainly for print course (T396).)

The student plugs the course micro-SD card into the Pi, connects the pi to their computer or router via ethernet, switches the Pi on (i.e. plugs the power cable in) and goes to OU.local in their browser. Job done? [UPDATE: on a Mac, this is easy; in Windows… I’m not so sure? Bah…:-( Alternative is to plug pi into wifi router and then get student to try to find it’s IP address eg https://www.raspberrypi.org/documentation/remote-access/ip-address.md Or can a particular name alias (ou.local?) be requested from a wifi router (though that doesn’t feel very secure to me!)? Or we ship a tiny display such as the display-o-tron hat with Raspberry Pi that displays the IP address it’s allocated? That adds to the expense, but if it’s part of the packaging, that maybe offsets part of the case cost?]

To improve robustness, the micro-SD card image could also run a process monitor to check necessary services were always running, and perhaps a control panel to allow students to monitor/start/stop/restart services if and as required.

To persist student created files, a named course USB stick plugged into the Pi and mounted at a known location would allow portability of files.

For each new course, with its own software, we just mail out a new micro-SD card with a course Pi image on it.

For the research student who possibly needs to run some slightly heavier weight applications that the Pi still has the computational oomph to run, we ship them cards that just run the application or application suites they need on starting up the Pi.

I know this idea has been mooted several times by various folk in the OU before (my recent tinkering was prompted by TM351 course colleagues Neil Smith and Alistair Willis that we could try a Pi as an alternative offering for TM351 students struggling to get the course software installed), but having had a bit of a play recently, it feels pretty tractable…

See also: What Happens When “Computers” Are Replaced by Tablets and Phones? and Wondering if Life Would be Easier With an OU – or FutureLearn – Compute Stick…?.