Jupyter Notebooks, Cognitive Tools and Philosophical Instruments

A placeholder post, as much as anything, to mark the AJET Call for Papers for a Special Issue on Re-Examining Cognitive Tools: New Developments, New Perspectives, and New Opportunities for Educational Technology Research as a foil for thinking about what Jupyter notebooks might be good for.

According the the EduTech Wiki, “[c]ognitive tools refer to learning with technology (as opposed to learning through technology)” which doesn’t really makes sense as as sentence and puts me off the idea of ed-tech academe straight away.

In the sense that cognitive tools support a learning process, I think they can do so in several ways. For example, in Programming in Jupyter Notebooks, via the Heavy Metal Umlaut I remarked on several different ways in which the same programme could be constructed within a notebook, each offering a different history and each representing a differently active approach to code creation and programming problem solving.

One of the things I try to do is reflect on my own practice, as I have been doing recently whilst trying to rework fragments of some OpenLearn materials as reproducible educational resources (which is to say, materials that generate their own resources and as such support reuse with modification more generally than many educational resources).

For example, consider the notebook at https://notebooks.azure.com/OUsefulInfo/libraries/gettingstarted/html/3.6.0%20Electronics.ipynb

You can also run the notebook interactively; sign in to Azure notebooks (if you’re OU staff, you can use your staff OUCU/OU password credentials) and clone my Getting Started library into your workspace. If notebooks are new to you, check out the 1.0 Using Jupyter Notebooks in Teaching and Learning - READ ME FIRST.ipynb notebook.

In creating the electronics notebook, I had to learn a chunk of stuff (the lcapy package is new to me and I had to get my head round circuitikz) but I found trying to figure out how to make examples related to the course materials provide a really useful context for giving me things to try to do with the package. In that the sense, the notebook was a cognitive tool (I guess) that supported my learning about lcapy.

For the https://notebooks.azure.com/OUsefulInfo/libraries/gettingstarted/html/1.05%20Simple%20Maths%20Equations%20and%20Notation.ipynb notebook, I had to start getting my head round sympy and on the way cobble together bits and pieces of code that might be useful when trying to produce maths related materials in a reproducible way. (For example, creating equations in sympy that can then be rendered, manipulated and solved throughout the materials in a way that’s appropriate for a set of educational (that is, teaching and/or learning) resources.

Something else that came to mind is that the notebook medium as both an authoring medium and a delivery medium (we can use it just to create assets; or we can also use it deliver content to students) changes the sorts of things you might want to do in the teaching. For example, I had the opportunity to create self test functions, and there is the potential for interactives that let students explore the effect of changing component values in a circuit. (We could also plot responses over a range of variable values, but I haven’t demoed that yet.) In a sense, the interactive affordances of the medium encouraged me to think of opportunities to create philosophical instruments that allow authors – as well as students – to explore the phenomena being described by the materials. Although not a chemistry educator, putting together a reworking of some OpenLearn chemistry materials – https://notebooks.azure.com/OUsefulInfo/libraries/gettingstarted/html/3.1.2%20OpenLearn%20Chemistry%20Demos.ipynb – gave me some ideas about the different ways in which the materials could be worked up to support interactive / self-checking /constructive learning use. (That is, ways in which we could present the notebooks as interactive cognitive tools to support the learning process on the one hand, or as philosophical instruments that would allow the learner explore the subject matter in an investigative and experimental way.)

I like to think the way I’m using the Jupyter notebooks as part of an informal “reproducible-OER” exploration is in keeping with some of the promise of live authoring using the OU’s much vaiunted, though still to be released, OpenCreate authoring environment (at least, as I understand the sorts of thing it is supposed to be able to support) with the advantage of being available now.

It’s important to recognise that Jupyter notebooks can be thought of as a medium that behaves in several ways. In the first case, it’s a rich authoring medium to work with – you can create things in it and for it, for example in the form of interactive widgets or reusable components such as IPython magics (for example, this interactive mapping magic: https://github.com/psychemedia/ipython_magic_folium ). Secondly, it’s a medium qua environment that can itself be extended and customised through the enabling and disabling of notebook extensions, such as ones that support WYSIWYG markdown editing, or hidden, frozen and read-only executable cells, which can be used to constrain the ways in which learners use some of the materials, perhaps as a counterpoint to getting them to engage more actively in editing other cells. Thirdly, it acts as a delivery medium, presenting content to readers who can engage with the content in an interactive way.

I’m not sure if there are any good checklists of what makes a “cognitive tool” or a “philosophical instrument”, but if there are it’d be interesting to try to check Jupyter notebooks off against them…

Programming in Jupyter Notebooks, via the Heavy Metal Umlaut

Way back when, I used to take delight in following the creative tech output of Jon Udell, then at InfoWorld. One of the things I fondly remember is his Heavy Metal Umlaut screencast:

You can read about how he put it together via the archived link available from Heavy metal umlaut: the making of the movie.

At some point, I seem to remember a tool became available for replaying the history of a Wikipedia page that let you replay its edits in time (perhaps a Jon Udell production?) Or maybe that’s a false memory?

A bit later, the Memento project started providing tools to allow you to revisit the history of the web using archived pages from the Wayback Machine.Memento. You can find the latest incarnation here: Memento Demos.

(Around the time it first appeared, I think Chris Gutteridge built something related? As Time Goes By, It Makes a World of Diff?)

Anyway – the Heavy Metal Umlaut video came to mind this morning as I was pondering different ways of using Jupyter notebooks to write programmes.

Some of my notebooks have things of this form in them, with “finished” functions appearing in the code cells:

Other notebooks trace the history of the development of a function, from base elements, taking an extreme REPL approach to test each line of code, a line at a time, as I try to work out how to do something. Something a bit more like this:

This is a “very learning diary” approach, and one  way of using a notebook that keeps the history of all the steps – and possibly trial and error within a single line of code or across several lines of code  – as you work out what you want to do. The output of each state change is checked to make sure that the state is evolving as you expect it to.

I think this approach can be very powerful when you’re learning because you can check back on previous steps.

Another approach to using the notebooks is to work within a cell and build up a function there. Here’s an animated view of that approach:

This approach loses the history – loses your working – but gets to the same place, largely through the same process.

That said, in the notebook environment used in CoCalc, there is an option to relay a notebook’s history in much the same was as Memento lets you replay the history of a web page.

In practice, I tend to use both approaches: keeping a history of some of the working, whilst RPLing in particular cells to get things working.

I also dip out into other cells to try things out / check things, to incorporate in a cell, and then delete the scratchpad / working out cell.

I keep coming back to the idea that Jupyter notebooks are a really powerful environment for learning in, and think  there’s still a lot we can do to explore the different ways we might be able to use them to support teaching as well as learning…:-)

PS via Simon Willison, who also recalled a way of replaying Wikipedia pages, this old Greasemonkey script.

PPS Sort of related, and also linking this post with [A] Note On Web References and Broken URLs, an inkdroid post by Ed Summers on Web Histories that reviews a method by Prof Richard Rogers for Doing Web history with the Internet Archive: screencast documentaries.

Programming, meh… Let’s Teach How to Write Computational Essays Instead

From Stephen Wolfram, a nice phrase to describe the sorts of thing you can create using tools like Jupyter notebooks, Rmd and Mathematica notebooks: computational essays that complements the “computational narrative” phrase that is also used to describe such documents.

Wolfram’s recent blog post What Is a Computational Essay?, part essay, part computational essay,  is primarily a pitch for using Mathematica notebooks and the Wolfram Language. (The Wolfram Language provides computational support plus access to a “fact engine” database that ca be used to pull factual information into the coding environment.)

But it also describes nicely some of the generic features of other “generative document” media (Jupyter notebooks, Rmd/knitr) and how to start using them.

There are basically three kinds of things [in a computational essay]. First, ordinary text (here in English). Second, computer input. And third, computer output. And the crucial point is that these three kinds of these all work together to express what’s being communicated.

In Mathematica, the view is something like this:


In Jupyter notebooks:

In its raw form, an RStudio Rmd document source looks something like this:

A computational essay is in effect an intellectual story told through a collaboration between a human author and a computer. …

The ordinary text gives context and motivation. The computer input gives a precise specification of what’s being talked about. And then the computer output delivers facts and results, often in graphical form. It’s a powerful form of exposition that combines computational thinking on the part of the human author with computational knowledge and computational processing from the computer.

When we originally drafted the OU/FutureLearn course Learn to Code for Data Analysis (also available on OpenLearn), we wrote the explanatory text – delivered as HTML but including static code fragments and code outputs – as a notebook, and then ‘ran” the notebook to generate static HTML (or markdown) that provided the static course content. These notebooks were complemented by actual notebooks that students could work with interactively themselves.

(Actually, we prototyped authoring both the static text, and the elements to be used in the student notebooks, in a single document, from which the static HTML and “live” notebook documents could be generated: Authoring Multiple Docs from a Single IPython Notebook. )

Whilst the notion of the computational essay as a form is really powerful, I think the added distinction between between generative and generated documents is also useful. For example, a raw Rmd document of Jupyter notebook is a generative document that can be used to create a document containing text, code, and the output generated from executing the code. A generated document is an HTML, Word, or PDF export from an executed generative document.

Note that the generating code can be omitted from the generated output document, leaving just the text and code generated outputs. Code cells can also be collapsed so the code itself is hidden from view but still available for inspection at any time:

Notebooks also allow “reverse closing” of cells—allowing an output cell to be immediately visible, even though the input cell that generated it is initially closed. This kind of hiding of code should generally be avoided in the body of a computational essay, but it’s sometimes useful at the beginning or end of an essay, either to give an indication of what’s coming, or to include something more advanced where you don’t want to go through in detail how it’s made.

Even if notebooks are not used interactively, they can be used to create correct static texts where outputs that are supposed to relate to some fragment of code in the main text actually do so because they are created by the code, rather than being cut and pasted from some other environment.

However, making the generative – as well as generated – documents available means readers can learn by doing, as well as reading:

One feature of the Wolfram Language is that—like with human languages—it’s typically easier to read than to write. And that means that a good way for people to learn what they need to be able to write computational essays is for them first to read a bunch of essays. Perhaps then they can start to modify those essays. Or they can start creating “notes essays”, based on code generated in livecoding or other classroom sessions.

In terms of our own learnings to date about how to use notebooks most effectively as part of a teaching communication (i.e. as learning materials), Wolfram seems to have come to many similar conclusions. For example, try to limit the amount of code in any particular code cell:

In a typical computational essay, each piece of input will usually be quite short (often not more than a line or two). But the point is that such input can communicate a high-level computational thought, in a form that can readily be understood both by the computer and by a human reading the essay.

...

So what can go wrong? Well, like English prose, can be unnecessarily complicated, and hard to understand. In a good computational essay, both the ordinary text, and the code, should be as simple and clean as possible. I try to enforce this for myself by saying that each piece of input should be at most one or perhaps two lines long—and that the caption for the input should always be just one line long. If I’m trying to do something where the core of it (perhaps excluding things like display options) takes more than a line of code, then I break it up, explaining each line separately.

It can also be useful to "preview" the output of a particular operation that populates a variable for use in the following expression to help the reader understand what sort of thing that expression is evaluating:

Another important principle as far as I’m concerned is: be explicit. Don’t have some variable that, say, implicitly stores a list of words. Actually show at least part of the list, so people can explicitly see what it’s like.

In many respects, the computational narrative format forces you to construct an argument in a particular way: if a piece of code operates on a particular thing, you need to access, or create, the thing before you can operate on it.

[A]nother thing that helps is that the nature of a computational essay is that it must have a “computational narrative”—a sequence of pieces of code that the computer can execute to do what’s being discussed in the essay. And while one might be able to write an ordinary essay that doesn’t make much sense but still sounds good, one can’t ultimately do something like that in a computational essay. Because in the end the code is the code, and actually has to run and do things.

One of the arguments I've been trying to develop in an attempt to persuade some of my colleagues to consider the use of notebooks to support teaching is the notebook nature of them. Several years ago, one of the en vogue ideas being pushed in our learning design discussions was to try to find ways of supporting and encouraging the use of "learning diaries", where students could reflect on their learning, recording not only things they'd learned but also ways they'd come to learn them. Slightly later, portfolio style assessment became "a thing" to consider.

Wolfram notes something similar from way back when...

The idea of students producing computational essays is something new for modern times, made possible by a whole stack of current technology. But there’s a curious resonance with something from the distant past. You see, if you’d learned a subject like math in the US a couple of hundred years ago, a big thing you’d have done is to create a so-called ciphering book—in which over the course of several years you carefully wrote out the solutions to a range of problems, mixing explanations with calculations. And the idea then was that you kept your ciphering book for the rest of your life, referring to it whenever you needed to solve problems like the ones it included.

Well, now, with computational essays you can do very much the same thing. The problems you can address are vastly more sophisticated and wide-ranging than you could reach with hand calculation. But like with ciphering books, you can write computational essays so they’ll be useful to you in the future—though now you won’t have to imitate calculations by hand; instead you’ll just edit your computational essay notebook and immediately rerun the Wolfram Language inputs in it.

One of the advantages that notebooks have over some other environments in which students learn to code is that structure of the notebook can encourage you to develop a solution to a problem whilst retaining your earlier working.

The earlier working is where you can engage in the minutiae of trying to figure out how to apply particular programming concepts, creating small, playful, test examples of the sort of the thing you need to use in the task you have actually been set. (I think of this as a "trial driven" software approach rather than a "test driven* one; in a trial,  you play with a bit of code in the margins to check that it does the sort of thing you want, or expect, it to do before using it in the main flow of a coding task.)

One of the advantages for students using notebooks is that they can doodle with code fragments to try things out, and keep a record of the history of their own learning, as well as producing working bits of code that might be used for formative or summative assessment, for example.

Another advantage is that by creating notebooks, which may include recorded fragments of dead ends when trying to solve a particular problem, is that you can refer back to them. And reuse what you learned, or discovered how to do, in them.

And this is one of the great general features of computational essays. When students write them, they’re in effect creating a custom library of computational tools for themselves—that they’ll be in a position to immediately use at any time in the future. It’s far too common for students to write notes in a class, then never refer to them again. Yes, they might run across some situation where the notes would be helpful. But it’s often hard to motivate going back and reading the notes—not least because that’s only the beginning; there’s still the matter of implementing whatever’s in the notes.

Looking at many of the notebooks students have created from scratch to support assessment activities in TM351, it's evident that many of them are not using them other than as an interactive code editor with history. The documents contain code cells and outputs, with little if any commentary (what comments there are are often just simple inline code comments in a code cell). They are barely computational narratives, let alone computational essays; they're more of a computational scratchpad containing small code fragments, without context.

This possibly reflects the prior history in terms of code education that students have received, working "out of context" in an interactive Python command line editor, or a traditional IDE, where the idea is to produce standalone files containing complete programmes or applications. Not pieces of code, written a line at a time, in a narrative form, with example output to show the development of a computational argument.

(One argument I've heard made against notebooks is that they aren't appropriate as an environment for writing "real programmes" or "applications". But that's not strictly true: Jupyter notebooks can be used to define and run microservices/APIs as well as GUI driven applications.)

However, if you start to see computational narratives as a form of narrative documentation that can be used to support a form of literate programming, then once again the notebook format can come in to its own, and draw on styling more common in a text document editor than a programming environment.

(By default, Jupyter notebooks expect you to write text content in markdown or markdown+HTML, but WYSIWYG editors can be added as an extension.)

Use the structured nature of notebooks. Break up computational essays with section headings, again helping to make them easy to skim. I follow the style of having a “caption line” before each input. Don’t worry if this somewhat repeats what a paragraph of text has said; consider the caption something that someone who’s just “looking at the pictures” might read to understand what a picture is of, before they actually dive into the full textual narrative.

As well as allowing you to create documents in which the content is generated interactively - code cells can be changed and re-run, for example - it is also possible to embed interactive components in both generative and generated documents.

On the one hand, it's quite possible to generate and embed an interactive map or interactive chart that supports popups or zooming in a generated HTML output document.

On the other, Mathematica and Jupyter both support the dynamic creation of interactive widget controls in generative documents that give you control over code elements in the document, such as sliders to change numerical parameters or list boxes to select categorical text items. (In the R world, there is support for embedded shiny apps in Rmd documents.)

These can be useful when creating narratives that encourage exploration (for example, in the sense of  explorable explantations, though I seem to recall Michael Blastland expressing concern several years ago about how ineffective interactives could be in data journalism stories.

The technology of Wolfram Notebooks makes it straightforward to put in interactive elements, like Manipulate, [interact/interactive in Jupyter notebooks] into computational essays. And sometimes this is very helpful, and perhaps even essential. But interactive elements shouldn’t be overused. Because whenever there’s an element that requires interaction, this reduces the ability to skim the essay."

I've also thought previously that interactive functions are a useful way of motivating the use of functions in general when teaching introductory programming. For example, An Alternative Way of Motivating the Use of Functions?.

One of the issues in trying to set up student notebooks is how to handle boilerplate code that is required before the student can create, or run, the code you actually want them to explore. In TM351, we preload notebooks with various packages and bits of magic; in my own tinkerings, I'm starting to try to package stuff up so that it can be imported into a notebook in a single line.

Sometimes there’s a fair amount of data—or code—that’s needed to set up a particular computational essay. The cloud is very useful for handling this. Just deploy the data (or code) to the Wolfram Cloud, and set appropriate permissions so it can automatically be read whenever the code in your essay is executed.

As far as opportunities for making increasing use of notebooks as a kind of technology goes, I came to a similar conclusion some time ago to Stephen Wolfram when he writes:

[I]t’s only very recently that I’ve realized just how central computational essays can be to both the way people learn, and the way they communicate facts and ideas. Professionals of the future will routinely deliver results and reports as computational essays. Educators will routinely explain concepts using computational essays. Students will routinely produce computational essays as homework for their classes.

Regarding his final conclusion, I'm a little bit more circumspect:

The modern world of the web has brought us a few new formats for communication—like blogs, and social media, and things like Wikipedia. But all of these still follow the basic concept of text + pictures that’s existed since the beginning of the age of literacy. With computational essays we finally have something new.

In many respects, HTML+Javascript pages have been capable of delivering, and actually delivering, computationally generated documents for some time. Whether computational notebooks offer some sort of step-change away from that, or actually represent a return to the original read/write imaginings of the web with portable and computed facts accessed using Linked Data?

I pitched the best ideas I had, garbled by lone working, incoherent hysterical naked.

On Tuesday, I listened to myself start to pitch ideas around “reproducible reports” incoherently until the situation was saved by convener @fantasticlife who reset the proceedings by starting at the beginning and setting out the problem the ideas were supposed to address.

Yesterday, I listened to myself ranting incoherent demands for getting Jupyter notebooks installed NOW on OU servers.

In each case I went in naked, unprepared in making sure the scene was set for what the problem I perceived was, and what benefits might arise if we solved it in a way I would then advocate.

In each case, I got hysterical, fumbling words and ideas and because to see how it makes sense (as it does in my head) you need to understand the whole process or system the problem is situated in and how the solution addresses it.

Lone working doesn’t help much in this respect – hours spent typing meaningless words through a keyboard trying to get jumbled ideas out. No time spent in human conversation, rehearsing, trying out storylines, addressing questions and quizzical looks.

Lack of respect doesn’t help much either –  “everyone else it doing it WRONG”!;-)

Whatever…

(See what I did there?!;-)

Anyway – I need to start working on better advocacy skills that try to crystallise out the problems I think the tools and approaches I want to advocate might help to address.

First step, try to match message to audience. The following, for example, could be a starting point to trying to advocate the use of Jupyter notebooks as an environment to support the teaching of programming to folk who are interested in teaching programming and the selection of environments for teaching programming…

Second step: pithy identification of a problem. For example, in the Atlantic article The Coming Software Apocalypse, James Somers quotes John Resig, who in observing student programmers realised that “the students who did well—in fact the only ones who survived at all—were those who could step through that text one instruction at a time in their head, thinking the way a computer would, trying to keep track of every intermediate calculation”.

Third step: ways of addressing the problem. One of the reasons I like Jupyter notebooks so much is that a natural way of using them is to use them to develop (and implicitly test) code by writing it a line at a time.

Writing a line of code at a time, and displaying the output after each step, means you don’t have to keep the complete state of the programme in your head.

In writing many programs, the aim is often to get from one state to another state that allows you to do something more easily. For example, it may be possible to generate a complex chart from a data set directly if the data is correctly organised.

The Jupyter notebook allows you to lay out the multiple steps required to get from your original state to the desired state and check your progress as you go along. (Implicitly, this provides a form of testing each step.)

This is good for exposition in teaching, but also in learning, as the student:

  • constructs the program one line of code at a time;
  • checks the output resulting from that line;
  • compares the new state with the previous state to check the correct sort of operation or transformation has been applied.

Having worked out the programme, which is a series of steps, and checking its implementation in code by visualising the intermediate state at each step, the student may then start to package the code contained in several cells in a single cell that contains many steps – and check that works correctly, essentially treating the cell containing multiple lines of code as a single line of more complex code. The next step may be to package those multiple lines of code that are bundled into one cell into a single function that allows those cells to be executed directly as a single line of code.

All the while, the original line-at-time code cells from further up the notebook act as a reference, and stepwise documentation supporting visual testing, of how each line of code works and the state changes it produces (that is, the input state it expects and the output state it produces).

Sigh… I still don’t see why folk don’t grok that?:-(

PS From the same Atlantic article: Bret Victor’s frustration that “when someone wanted to do something interesting with a computer, they had to write code”. This is one area where I know I see the world very differently from computing colleagues: they want to teach programming; I want to help people use computers to get stuff done. But where I differ from someone like Bret Victor is that I see value in people having access to the single lines of code that do things and the intermediate states that arise because it facilitates a “scripting” approach to programming.

Note that I don’t mean scripting vs programming in the sense of compiled versus interpreted programmes; I mean it more in the sense of scripts of linear sequences of instructions that can be used to perform a particular task.

In the notebook context, I see each line of code, or at leach each code cell, as the line of code, plus its output (or at least, an output that depicts any resulting change in state, such as the new value of a parameter if the line of code updates the value of that parameter). There is also an expectation of what the input to that cell block might look like, in the form of outputs from previous code blocks that initiate the state expected as input to the code cell. The result is that an executed notebook can be read as a  narrative trace that shows:

  • an arrangement of various lines of code,
  • the effect of applying each line of code in terms of how transforms the output or outputs from previous cells into the output of the current cell.

In the Scratch inspired visual programming environments beloved of many of my colleagues, differently shaped and coloured blocks limit how programming blocks can be joined together to ensure syntactic correctness. The following screenshot from another Block.ly inspired programming environment, Open RobertaLab, further groups different sorts of functional blocks in a colour coded command palette:

Whilst these environments do often provide a code view, it’s often not possible to keep track of the intermediate state or the state transformations applied by each block:

(I’m not sure if OpenBuild(?), which I’m guessing is the OU’s fork of MIT Scratch, that’s about to be used in our new level 1 course, offers a code view?)

That is, whilst the block style interfaces help maintain syntactic correctness but doesn’t allow you to monitor changes in state from applying a particular block, using the notebook style does let you inspect the effect of applying a particular operation. In many ways, the notebook view is like an exploded step tracer that lets you keep track of the state of a linear programme as it works its way through a series of sequential steps.

That’s another feature of what I’m calling ‘scripting’ – information processing recipes that get a thing done by transforming stuff into other stuff.

(I can hear my colleagues now – “Ah yes, all notebooks are good for is one-shot linear programmes”. Whatever. If a line of code calls a function repeatedly, you don’t necessarily need to see the output of the function at each step (though you could display that if you wanted to); what is important is that you can see how the function works in a one shot mode (and maybe test it with various parameters) and also see what the overall effect is by applying it however times in the particular cell that calls it repeatedly.)

Part of the reason I’m in favour of having lines-of-code-that-do-things to hand, as in a Jupyter notebook, is that building graphical user interfaces is hard. Even if you can build a nice graphical programming environment to support the development of a particular sort of application or support the user in performing a particular sort of task (tidying up a messy data set and generating a chart from it), the GUI elements are themselves going to trigger, and perhaps insert particular parameter values into, lines of code.

For example, by going to a menu and selecting an item, you are triggering the execution of a particular block of code a line at a time. By ticking a checkbox, or checking a radio button, particularly in a responsive interface, you are setting the value of a parameter that is passed to a line of code for it to do something with.

What the notebook does is let you arrange cell blocks of code that can perform similar actions in terms of manipulating state to the actions triggered by invoking those interactive user elements. Rather than “select that menu option, check that box” in a graphical interface, you “use the block of code that would be triggered by that menu” and “use a block of code to set the value that would be updated by the checkbox and apply it”.