Fragment — Figure-Ground: Opposites Jupyter and Excel

Via a Twitter trawl sourcing potential items for Tracking Jupyter, I came across several folk picking up on a recent Growing the Internet Economy podcast interview — Invest Like the Best, EP.178 — with John Collison, co-founder of digital payments company, Stripe, picking up on a couple of comments in particular.

Firstly, on Excel in the context of “no code” environments:

[I]f you look at Excel, no one calls as a no-code tool, but Excel, I think is one of the most underappreciated programming environments in the world. And the number of Excel programmers versus people using how we think of as more traditional languages is really something to behold.”

One of the issues I have with using things like Scratch to teach adults to code is that it does not provide an environment that resonates with the idea of using code to do useful work. To the extent that programming is taught in computing departments as an academic discipline on the one hand, and a softare engineering, large project codebase activity on the other has zero relevance to the way I use code every day, as a tool for building tools, exploring stateful things in a state transformative way, and getting things done through automation.

I would far rather we taught folk a line of code at a time principles using something like Excel. (There’s an added advantage to this that you also teach students in a natural way about concepts relating to vector based / columnar computation, as well as reactivity, neither of which are typically taught in introductory, maybe even advanced, academic programming classes. Certainly, after several years of teaching the pandas domain specific language in a data management and analysis course, we have only recently really articulated to ourselves how we really do need to develop the idea of vectorised computation more explictly.)

Secondly, on Excel as an environment:

“[L]ots of features of Excel … make it a really nice programming environment and really nice to learn in, where the fact that it’s continuously executed means that unlike you running your code and it doesn’t work, and you’ve got some error, that’s hard to comprehend. Instead, you have a code that just continuously executed in the form of the sheets you see in front of you. And similarly the fact that its individual cells, and you kind of lay out the data spatial… Or the program spatially, where the code and the data is interspersed together, and no one part of it can get too big and diffuse.”

The continuous execution is typically a responsive evaluation of all cells based on an input change to one of them. In this sense, Excel has many similarites with the piecewise “REPL” (read-evaluate-print-loop) execution model used by Jupyter (notebook) kernels, where a change to code in an input cell is evaluated when the cell is run and often an output data state is rendered, such as a chart or a table.

One the replies to one of the shares, from @andrewparker — makes this explicit: “[w]hen writing code, the functions are always visible and the variables’ contents are hidden. Excel is programming where the opposite is true.”

In the spreadsheet, explicit input data is presented to hidden code (that is, formulas) and the result of code execution is then rendered in the form of transformed data. In many “working” spreadsheets, partial steps (“a line of code at a time”) are calculated across parallel columns, with the spreadsheet giving a macroscopic view over partial transformations of the data a step at a time, befire returning the final calculations in the final column, or in the form of an interpreted display such as a graphical chart.

One of the oft-quoted criticisms against Jupyter notebooks is that “the state is hidden” (although if you treat Jupyter notebooks as a linear narrative and read and execute them as such, this claim is just so-much nonsense…) but suggests viewing notebooks in a complementary way: rather than having the parallel columnar cells of the Excel case, where a function at the top of the column may be applied to data values from previous columns, you have top-down linear exposition of the calculation where code cell at a time is used to transform the state generated by the previous cell. (One of the ways I construct notebooks is to take an input dataset and display it at the top of the notebook, apply a line of code to trasnform it and display the result of that transformation, apply another line of code in another cell and view the result of that, and so on.) You can now see not only the state of the data after each transformative step, but also the formula (the line of code) that generated it from data rendered from an earlier step.

Again picking up on the criticism of notebooks that at any given time you may read notebook as a cacophony of incoherent partially executed, a situation that may occur if you run a notebook to completion, then maybe change the input data at the top and run it half way, and then change the input data at the top and run just the first few cells, leaving a notebook with rendered data everywhere execution from different stes of data. This approach corresponds to the model of a spreadsheet worksheet where perhaps you have to click on each column in turn and hit return before the cells are updated, and that cell updates are only responsive to your action that triggers an update on selected cells. But if you get into the habit of only executing notebook cells using a restart-kernel-then-run-all execution model (which an extension could enforce) then this nonsense does not occur, and all the linear cells would be updated, in linear order.

And again, here there is a point of contrast: in the spreadsheet setting, any column or selection of cells may be created by the applciation of a formula to any other collection of cells in the workbook. In a jupyter notebook, if you use the restart-kernel-then-run-all execution model, then the rendering of data as the output to each code cell is a linear sequence. (There are other notebook extensions that let you define dependent cells which could transform the execution order to a non-linear one, but why would you do that..?)

Things can still get messy, though. For example, from another, less recent (March, 2020) tweet I found in the wild: I just realized that making plots in Excel has the same “bad code” smell for me that doing research in a Jupyter notebook does: you’re mixing analysis, data, and results in a way that doesn’t let you easily reuse bits for another later analysis (@rharang) Biut then, reuse is another issue altogether.

Thinks: one of the things I think I need to think about is how the spatial layout of a spreadsheet could may onto the spatial layout of a notebook. It might be interesting to find some “realistic” spreadsheets containing plausible business related calculations and give them a notebook treatment…

Anyway, here’s a fuller exceprted transcript from the podcast:

Patrick: I’m curious how you think about the transition to what’s now being called the no-code movement. The first part of the question is, how under supplied is the world in terms of just talented software developers? But may that potentially not be as big a problem if we do get no-code tools that would allow someone like me that has dabbled in but is certainly not terribly technical on software, more so in data science to build things for myself and not need engineers. What do you think that glide path looks like over the next say 10 years?

John Collison: The answer to how short staffed we are on engineers is still clearly loads, …, [W]e’re still really short of software engineers.

And no-code, I don’t think no-code is fully a panacea, because I think the set of at even when you’re doing no-code, you’re still reasoning about the relations between different objects and data flows and things like that. And so I think when you’re doing, when you’re building an app with Zapier or something like that, you’re still doing a form of engineering, you’re just not necessarily writing codes. And so hopefully that’s something that can give leverage to people without necessarily needing to have to spend quite as much time in it. And this is not new by the way, if you look at Excel, no one calls as a no-code tool, but Excel, I think is one of the most underappreciated programming environments in the world. And the number of Excel programmers versus people using how we think of as more traditional languages is really something to behold.

And I actually think lots of features of Excel that make it a really nice programming environment and really nice to learn in, where the fact that it’s continuously executed means that unlike you running your code and it doesn’t work, and you’ve got some error, that’s hard to comprehend. Instead, you have a code that just continuously executed in the form of the sheets you see in front of you. And similarly the fact that its individual cells, and you kind of lay out the data spatial… Or the program spatially, where the code and the data is interspersed together, and no one part of it can get too big and diffuse. Anyway, I think there are all of these ways in which, anyone who’s developing a no-code or new software paradigm should look at Excel because so many people have managed to essentially learn how to do some light programming from looking at other people’s models and other people’s workbooks and kind of emulating what they see.

… I don’t think no-code will obviate the need for software programmers, I would hope that it can make many more people able to participate in software creation and kind of smooth the on ramp, which is right now, there’s like a really sharp, vertical part of that one.

Some of this sentiment resonates with one of my motivations for “why code?”: it gives people a way of looking at problems that helps them understand the extent to which they may be computable, or may be decomposed, as well as given them a tool that allows them to automate particular tasks, or build other tools that help them get stuff done.

See also:

And one to watch again: