Rally Review Charts Recap

At the start of the year, I was planning on spending free time tinkering with rally data visualisations and trying to pitch some notebook originated articles to Racecar Engineering. The start of lockdown brought some balance, and time away from scrren and keyboard in the garden, but then pointless made up organisational deadlines kicked in and my 12 hour plus days in the front of the screen kicked back in. As Autumn sets in, I’m going to try to cut back working hours, if not screen hours, by spending my mornings playing with rally data.

The code I have in place at the moment is shonky as anything and in desperate need of restarting from scratch. Most of the charts are derived from mutliple separate steps, so I’m thinking about pipeline approaches, perhaps based around simple web services (so a pipeline step is actually a service call). Thiw will give me an opportunity to spend some time seeing how production systems actually build pipeline and microsservice architectures to see if there is anything I can pinch.

One class of charts, shamelessly stolen from @WRCStan of @PushingPace / pushingpace.com, are pace maps that use distance along the axis and a function of time on the y-axis.

One flavour of pace map uses a line chart to show the cumulative gap between a selected driver and other drivers. The following chart, for example, shows the progress of Thierry Neuville on WRC Rally Italia Sardegna, 2020. On the y-axis is the accumulated gap in seconds, at the end of each stage, to the identified drivers. Dani Sordo was ahead for the whole of the rally, building a good lead over the first four stages, then holding pace, then losing pace in the back end of the rally. Neuville trailed Seb Ogier over the first five stages, then after borrowing from Sordo’s set-up made a come back and a good battle with Ogier from SS6 right until the end.

Off the pace chart.

The different widths of the stage identify the stage length; the graph is thus a transposed distance-time graph, with the gradient showing the difference in pace in seconds per kilometer.

The other type of pace chart uses lines to create some sort of stylised onion skin histogram. In this map, the vertical dimension is seconds per km gained / loat relative to each driver on the stage, with stage distance again on the x-axis. The area is thus and indicator of the total time gained / lost on the stage, which marks this out as a histogram.

In the example below, bars are filled relative to a specific driver. In this case, we’re plotting pace deltas relative to Thierry Neuville, and highlighting Neuville’s pace gap to Dani Sordo.

Pace mapper.

The other sort of chart I’ve been producing is more of a chartable:

Rally review chart.

This view combines tabular chart, with in-cell bar charts and various embedded charts. The intention was to combine glanceable visual +/- deltas with actual numbers attached as well as graphics that could depict trends and spikes. The chart also allows you to select which driver to rebase the other rows to: this allows you to generate a report that tells the rally story from the perspective of a specified driver.

My thinking is that to create the Rally Review chart, I should perhaps have a range of services that each create one component, along with a tool that lets me construct the finished table from component columns in the desired order.

Some columns may contain a graphical summary over values contained in a set of neighbouring columns, in which case it would make sense for the service to itself be a combination of services: one to generate rebased data, others to generate and return views over the data. (Rebasing data may be expensive computationally, so if we can do it once, rather than repeatedly, that makes sense.)

In terms of trasnformations, then, there are are least two sorts of transformation service required:

  • pace transformations, that take time quantities on a stage and make pace out them by taking the stage distance into account;
  • driver time rebasing transformations, that rebase times relative to a specified driver.

Rummaging through other old graphics (I must have the code somewhere bit not sure where; this will almost certainly need redoing to cope with the data I currently have in the form I have it…), I also turn up some things I’d like to revisit.

First up, stage split charts that are inspired by seasonal subseries plots:

Stage split subseries

These plots show the accumulated delta on a stage relative to a specified driver. Positions on the stage at each split are shown by overplotted labels. If we made the x-axis a split distance dimension, the gradient would show pace difference. As it is, the subseries just indicate trend over splits:

Another old chart I used to quite like is based on a variant of quantised slope chart or bump charts (a bit like postion charts.

This is fine for demonstrating changes in position for a particular loop but gets cluttered if there are more than three or four stages. The colour indicates whether a driver gained or lost time realtive to the overall leader (I think!). The number represents the number of consecutive stage wins in the loop by that driver. The bold font on the right indicates the driver improved overall position over the course of the loop. The rows at the bottom are labelled with position numbers if they rank outside the top 10 (this should really be if they rank lower than the number of entruies in the WRC class).

One thing I haven’t tried, but probably should, is a slope graph comparing times for each driver where there are two passes of the same stage.

I did have a go in the past at more general position / bump charts too but these are perhaps a little too cluttered to be useful:

Again, the oberprintined numbers on the first position row indicate the number ofconsecutive stage wins for that driver; the labels on the lower rows are out of top 10 position labels.

What may be more useful would be adding the gap to leader of diff to car ahead, either on stage or overall, with colour indicating either that quantity, heatmap style, or for the overall case, whether that gap / difference increased or decreased on the stage.

Deconstructing the TM351 Virtual Computing Environment via VS Code

For 2020J, which is to say, the 2020 October presentation, of our TM351 Data Management and Analysis course, we’ve deprecated the original VirtualBox packaged virtual machine and moved to a monolithic Docker container that packages all the required software applications and services (a Jupyer notebook server, postgres and mongoDB database servers, and OpenRefine).

As with the VM, the container is headless and exposes applications over http via browser based user interfaces. We also rebranded away from “TM351 VM” to “TM351 VCE”, where VCE stands for Virtual Computing Environment.

Once Docker is installed, the environment is installed and launched purely from the command line using a docker run command. Students early in to the forums have suggested moving to docker compose, which simplifies the command line command significantly, but also at the cost of having to supply a docker-compose.yaml . With OU workflows, it can take weeks, if not months, to get files onto the VLE for the first time, and days to weeks to post updates (along with a host of news announcements and internal strife about the possibility of tutors/ALs and students having different versions of the file). As we need to support cross-platfrom operation, and as the startup command specifies file paths for volume mounts, we’d need different docker-compose files (I think?) because file paths on Mac/Linux hosts, versus Windows hosts, use a different file path syntax (forward vs back slashes as path delimiters. [If anyone can tell me how to write a docker-compose.yaml files with arbitrary paths on the volume mounts, please let me know via the comments…]

Something else that has cropped up early in the forums is mention of VS Code, which presents a way to personalise the way in which the course materials are used.

By default, the course materials we provide for practical activities are all based on Jupyter notebooks, delivered via the Jupyter notebook server in the VCE (or via an OU hosted notebook server we are also exploring this year). The activities are essentially inlined using notebook code cells within a notebook that presents a linear teaching text narrative.

Students access the notebooks via their web browser, wherever the notebook server is situated. For students running the Docker VCE, notebook files (and OpenRefine project files) exist in a directory on the student’s own computer that is then mounted into the container; make changes to the notebooks in the container and those changes are saved in the notebooks mounted from host. Delete the container, and the notebooks are still on your desktop. For students using the online hosted notebook server, there is no way of synchronising files back to the student desktop, as far as I am aware; there was an opportunity to explore how we might allow students to use something like private Github repositories to persist their files in a space they control, but to my knowledge that has not been explored (a missed opportunity, to my mind…).

Using the VS Code Python extension, students installing VS Code on their own computer can connect to the Jupyter server running in the containerised VCE and (I don’t know if the permissions allow this on the hosted server).

The following tm351vce.code-workspace file describes the required settings:

"folders": [
"path": "."
"settings": {
"python.dataScience.jupyterServerURI": "http://localhost:35180/?token=letmein"

The VSCode Python extension renders notebooks, so students can open local copies of files from their own desktop and execute code cells against the containerised kernel. If permissions on the hosted Jupyter service allow remote/external connections, this would provide a workaround for synching notebooks files: students would work with notebook files saved on their own computer but executed against the hosted server kernel.

Queries can be run against the database servers via the code cells in the normal way (we use some magic to support this for the postgres database).

If we make some minor tweaks to the config files for the PostgreSQL and MongoDB database servers, we can use the VS Code PostgreSQL extension and MongoDB extension to run queries from VS Code directly against the databases.

For example, the postgres database:


and the mongo database:


Note that this is now outside the narrative context of the notebooks, although it strikes me that we could generate .sql and .json text files from notebooks that show code literally and comment out the narrative text (the markdown text in the notebooks).

However, we wouldn’t be able to work directly with the data returned from the database via Python/pandas dataframes, as we do in the notebook case. (Note also that in the notebooks we use a Python API for querying the mongo database, rather than directly issuing Javascript based queries.)

At this point you might ask why we would want to deconstruct / decompose the original structured notebook+notebook UI environment and allow students to use VS Code to access the computational environment, not least when we are in the process of updating the notebooks and the notebook environment to use extensions that add additional style and features to the user environment. Several reasons come to my mind that are motivated by finding ways in which we can essentially lose control, as educators, of the user interface whilst still being reasonably confident that the computational environment will continue to perform as we intend (this stance will probably make many of my colleagues shudder; I call it supporting personalisation…):

  • we want students to take ownership of their computational environment; this includes being able to access it from their own clients that may be better suited to their needs, eg in terms of familiarity, accessibility, productivity, etc;
  • a lot of our students are already working in software development and already have toolchains they are working with. Whilst we see benefits of using the notebook UI from a teaching and learning perspective, the fact remains that students can also complete the activities in other user environments. We should not hinder them from using their own environments — the code should still continue to run in the same way — as long as we explain how the experience may not be the same as the one we are providing, and also noting that some of the graphics / extensions we use in the notebooks may not work in the same way, or may not even work at all, in the VS Code environment.

If students encounter issues when using their own environment, rather than the one we provide, we can’t offer support. If the personalised learning environment is not as supportive for teaching and learning as the environment we provide, it is the student’s choice to use it. As with the Jupyter environment, the VS Code environment sits at the centre of a wide ecosystem of third party extesions. If we can make our materials available in that environment, particulalry for students already familiar with that environment, they may be able to help us by identifying and demonstrating new ways, perhaps even more effective ways, of using the VS Code tooling to support their learning than the enviorment we provide. (One example might be the better support VS Code has for code linting and debugging, which are things we don’t teach, and that our chosen environment perhaps even prevents students who know how to use such tools from making use of them. Of course, you could argue we are doing students a service by grounding them back in the basics where they have to do their own linting and print() statement debugging… Another might be the Live Share/collaboration service that lets two or more users work collaboratively in the same notebook, which might be useful for personal tutorial sessions etc.)

From my perspective, I believe that, over time, we should try to create materials that continue to work effectively to support both teaching and learning in environments that students may already be working in, and not just the user interface environments we provide, not least becuase we potentially increase the number of ways in which students can see how they might make use of those tools / environments.

PS I do note that there may be licensing related issues with VS Code and the VS Code extensions store, which are not as open as they could be; VSCodium perhaps provides a way around that.

Quiz Night Video Chat Server

Our turn to host quiz night this week, so rather than use Zoom, I thought I’d give Jitsi a spin.

Installation onto a Digital Ocean box is easy enough using the Jitsi server marketplace droplet, althugh it does require you to set up a domain name so the https routing works, which in turn seems to be required to grant Jitsi access via your browser to the camera and microphone.

As per guidance here, to add a domain to a Digital Ocean account simply means adding the domain name (eg ouseful.org) to your account and then from your domain control panel, adding the Digital Ocean nameservers (ns1.digitalocean.com, ns2.digitalocean.com, ns3.digitalocean.com).

It’s then trivial to create and map a subdomain to a Digital Ocean droplet:

The installation process requires creating the droplet from the marketplace, assigning the domain name, and then from the terminal running some set-up scripts:

  • install server: ./01_videoconf.sh
  • add LetsEncrypt https support: ./02_https.sh

Out of the can, anyone can gain access, although you can add a password to restrict access to a video room once you’ve created it.

To local access down when it comes to creating new meeting rooms, you can add simple auth to created user accounts. The Digital Ocean tutorial How To Install Jitsi Meet on Ubuntu 18.04 “Step 5 — Locking Conference Creation” provides step by step instructions. (It would be nice if a simple script were provided to automate that a little…)

The site is react powered, and I couldn’t spot an easy way to customise the landing page. The text all seems to be provided via language pack files (eg /usr/share/jitsi-meet/lang/main-enGB.json) but I couldn’t see how to actually put any changes into effect? (Again, another simple script in the marketplace droplet to help automate or walk you through that would be useful…)

Fragment: Revealing Otherwise Hidden Answers In Jupyter Notebooks

Some notes culled from an internal feedback forum regarding how to provide answers to questions set in Jupyter notebooks. To keep things simple, this does not extend to providing any automated testing (eg software code tests, computer marked assessments etc); just the provision of worked answers to questions. The sort of thing that might appear in the back of a text book referenced from a question set in the body of the book.

The issue at hand is: how do you provide friction that stops the student just looking at the answer without trying to answer the question themselves? And how much friction should you provide?

In our notebook using data management and analysis course, some of the original notebooks took a naive approach of presenting answers to questions in a separate notebook. This add lots of friction and a quite unpleasant and irritiating interaction, involving opening the answer notebook in a new tab, waiting for the kernel start, and then potentially managing screen real estate and multiple windows to compare the original question and student’s own answer with the answer. Other technical issues include state managament, if for answer the desired answer was not in and of itself reproducible, but instead required some “boilerplate” setting up of the kernel state to get it into a situation where the answer code could be meaningfully run.

One of the advantages of having answer code in a separate notebook is that a ‘Run All’ action in the parent notebook wonlt run the answer code.

Another approach taken in the original notebooks was to use an exercise extension that provided a button that could be clicked to reveal a hidden anwser in the original notebook. This required an extension to be enabled in the environment in which the original notebook was authored, so that the question and answer cells could be selected and then marked up with exercise cell metadata via a toolbar button, and on the student machine, so that the exercise button could be actively be rendered and the answer cell initially hidden by the notebook UI when the notebook was loaded in the browser. (I’m not sure if it would also be possible to explicitly code a button using HTML in a markdown cell that could hide/reveal another classed or id‘d HTML element; I suspect any javascript would be sanitised out of the markdown?)

Most recently, all the notebooks containing exercises in ourr databases course now use a model where we use a collapsible heading to hide the answer. Clicking the collapsed header indicator in the notebook sidebar reveals the answer, which is typically copmposed of several cells, both markdown and code cells. The answer is typically provided in a narrative form, often in the style of a “tutor at your side”, providing not just an explained walkthrough of the answer but also identifying likely possible incorrect answers and misunderstandings, why they might at first appear to have been “reasonable” answers, and why they are ultimately incorrect.

This answer reveal mechanic is very low friction, and degrades badly: if the collapse headings extension is note enabled, the answer is rendered as just any other notebook cell.

When using these approaches, care needs to be taken when considering what the Run All action might do when the answer code cells are reached: does the answer code run? Does it create an error? Is the code ignored and code execution continued in cells after the hidden answer cell.

In production workflows where a notebook may be executed to generated a particular output document, such as a PDF with all cells run, how are answer cells to be treated? Should they be displayed? SHould they be displayed and answer cells run? This might depend on audience: for example, a PDF output for a tutor may contain answers and answer outputs, a student PDF may actually be two or three documents: one containing the unanswered text, one containing the answer code but not executed, one cotnaining the answer code and executed code outputs.

Plagiarising myself from a forthcoming post, as well as adding friction, I think wonder if what we’re actually trying to achieve by the way the answer is accessed or aotherwise revealed might include various other psychological and motivational factors that affect learning. For example, setting up the anticipation that pays-off as a positive reward for a student who gets the answer right, or a slightly negative payoff that makes a student chastise themselves for clicking right to the answer. At the same time, we also need to accommodate students who may have struggled with a question and is reluctantly appealing to the answer as a request for help. There are also factors relating to the teaching, insofar as what we expect the student to do before they reveal the answer, what expectation students may have (or that we may set up) regarding what the “answer” might provide, how we expect the student to engage with the answer, and what we mght expect them to do after reading the answer.

So what other approaches might be possible?

If the answer can be contained in just the contents of a sinlge code cell, the answer can be placed in an external file, for example, week2_section1_activity3.py, in the same directory or an inconvenient to access solutions directory such as ../../solutions. This effectively hides the solution but allows a student to reder it by running the following in a code cell:

%load ../../solutions/week2_section1_activity3.py

The content of the file will be loaded into the code cell.

One advantage of this approach is that a student has to take a positive action to render the answer code into the cell. You can make this easier or harder depending on how much friction you want to add. Eg low friction: just comment out the magic in the code cell, so all a student has to do is uncomment and run it. Higher friction: make them work out what the file name is and write it themselves, then run the code cell.

I’m not sure I’m convinced having things strewn across multiple files is the best way of adding the right amount and right sort of friction to the answer lookup problem though. It also causes maintenance issues, unless you are generating separate student activity notebook and student hint or answer notebooks from a single master notebook that contains activities and hints/answers in the same document.

Other approaches I’ve seen to hiding and revealing answers include creating a package that contains the answers, and then running a call onto that package to display the answer. For example, something like:

from course_activities import answers

An alternative to rendering answers from a python package would be to take more of a “macro” approach and define a magic which maybe has a slightly different feeling to it. For example:

%hint week2_section1_activity3

In this approach, you could explore different sorts of psychological friction, using nudges:

%oh_go_on_then_give_me_a_hint_for week2_section1_activity3


%ffs_why_is_this_so_hard week2_section1_activity3

If the magic incantation is memorable, and students know the pattern for creating phrases like week2_section1_activity3 eg from the notebook URL pattern, then they can invoke the hint themself by saying it. You could even have aliases for revealing the hint, which the student could use in different situations to remind themselves of their state of mind when they were doing the activity.

%I_did_it_but_what_did_you_suggest_for week2_section1_activity3

With a little bit of thought, we could perhaps come up with a recipe that combines approaches to provide an experience that does degrade gracefully and that perhaps also allows us to play with different sorts of friction.

For example, suppose we extend the Collapsible Heading notebook extension so that if we tag a cell as an answer-header cell and a code cell below put a #%load answer.py line. In an unextended notebook, the student has to uncomment the load magic and run the cell to reveal the answer; in an extended notebook, the extension colour codes the answer cell and collapses the answer; when the student clicks to expand the answer, the extension looks for #%load lines in the collapsed area, uncomments and executes them to load in the answer, expands the answer cells automatically.

We could go further and provide an extension that lets an author write an answer in eg a markdown cell and then “answerify” it by posting it as metadata in preceding Answer reveal cell. In an unextended student notebook, the answer would be hidden in metadata. To reveal it, a student could inspect the cell metadata. Clunky for them and answer wouldn’t be rendered, but doable. In extended notebook clicking reveal would cause the extension to extract the answer from metadata and then render it in the notebook.

In terms of adding friction, we could perhaps add a delay between clicking a reveal answer button and displaying an answer, although if the delay is too long, a student may pre-emptively come to click the answer button before they actually want to tanswer the button so they don’t have to wait. The use of audible signal might also provide psychological nudges that subconsciously influence the student’s decision about when to reveal an answer.

There are lots of things to explore I think but they all come with a different set of opportunity costs and use case implications. But a better understanding of what we’re actually trying to achieve would be a start… And that’s presumably been covered in the interactive learning design literature? And if it hasn’t, then why not? (Am I overthinking all this again?!)

PS It’s also worth noting something I haven’t covered: notebooks that include interactive self-test questions, or self-test questions where code cells are somehow cross-referenced with code tests. (The nteract testbook recipe looks like it could be interesting in this respcect, eg where a user might be able to run tests from one notebook against another notebook. The otter-grader might also be worth looking at in this regard.


If you look back not that far in history, the word “computer” was a term applied a person working in a particular role. According to Webster’s 1828 American Dictionary of the English Language, a computer was defined as “[o]ne who computes or reckons; one who estimates or considers the force and effect of causes, with a view to form a correct estimate of the effects”.

Going back a bit further, to Samuel Johnson’s magnum opus, we see a “computer” is defined more concisely as a “reckoner” or “accountant”.

In a disambiguation page, Wikipedia identifies Computer_(job_description), quoting Turing’s Computing Machinery and Intelligence paper in Mind (Volume LIX, Issue 236, October 1950, Pages 433–460):

The human computer is supposed to be following fixed rules; he has no authority to deviate from them in any detail. We may suppose that these rules are supplied in a book, which is altered whenever he is put on to a new job.

Skimming through a paper that appeared in my feeds today — CHARTDIALOGS: Plotting from Natural Language Instructions [ACL 2020; code repo] — the following jumped out at me:

In order to further inspect the quality and difficulty of our dataset, we sampled a subset of 444 partial dialogs. Each partial dialog consists of the first several turns of a dialog, and ends with a Describer utterance. The corresponding Operator response is omitted. Thus, the human has to predict what the Operator (the plotting agent) will plot, given this partial dialog. We created a new MTurk task, where we presented each partial dialog to 3 workers and collected their responses.

Humans. As computers. Again.

Originally, the computer was a person doing a mechanical task.

Now, a computer is a digital device.

Now a computer aspires to be AI, artificial (human) intelligence.

Now AI is, in many cases, behind the Wizard of Oz curtain, inside von Kempelen’s “The Turk” automaton (not…), a human.

Human Inside.

A couple of of other things that jumped out at me, relating to instrumentation and comparison between machines:

The cases in which the majority of the workers (3/3 or 2/3) exactly match the original Operator, corresponding to the first two rows, happen 72.6% of the time. The cases when at least 3 out of all 4 humans (including the original Operator) agree, corresponding to row 1, 2 and 5, happen 80.6% of the time. This setting is also worth considering because the original Operator is another MTurk worker, who can also make mistakes. Both of these numbers show that a large fraction of the utterances in our dataset are intelligible implying an overall good quality dataset. Fleiss’ Kappa among all 4 humans is 0.849; Cohen’s Kappa between the original Operator and the majority among 3 new workers is 0.889. These numbers indicate a strong agreement as well.

Just like you might compare the performance  of different implementations of an algorithm in code, we also compare the performance of their  instationation in digitial or human computers.

At the moment, for “intelligence” tasks (and it’s maybe worth noting that Mechanical Turk has work packages defined as HITs, “Human Intelligence Tasks”) humans are regarded as providing the benchmark god standard, imperfect as it is.

7.5 Models vs. Gold Human Performance (P3) The gold human performance was obtained by having one of the authors perform the same task as described in the previous subsection, on a subset


See also: Robot Workers?

Fragment — Figure-Ground: Opposites Jupyter and Excel

Via a Twitter trawl sourcing potential items for Tracking Jupyter, I came across several folk picking up on a recent Growing the Internet Economy podcast interview — Invest Like the Best, EP.178 — with John Collison, co-founder of digital payments company, Stripe, picking up on a couple of comments in particular.

Firstly, on Excel in the context of “no code” environments:

[I]f you look at Excel, no one calls as a no-code tool, but Excel, I think is one of the most underappreciated programming environments in the world. And the number of Excel programmers versus people using how we think of as more traditional languages is really something to behold.”

One of the issues I have with using things like Scratch to teach adults to code is that it does not provide an environment that resonates with the idea of using code to do useful work. To the extent that programming is taught in computing departments as an academic discipline on the one hand, and a softare engineering, large project codebase activity on the other has zero relevance to the way I use code every day, as a tool for building tools, exploring stateful things in a state transformative way, and getting things done through automation.

I would far rather we taught folk a line of code at a time principles using something like Excel. (There’s an added advantage to this that you also teach students in a natural way about concepts relating to vector based / columnar computation, as well as reactivity, neither of which are typically taught in introductory, maybe even advanced, academic programming classes. Certainly, after several years of teaching the pandas domain specific language in a data management and analysis course, we have only recently really articulated to ourselves how we really do need to develop the idea of vectorised computation more explictly.)

Secondly, on Excel as an environment:

“[L]ots of features of Excel … make it a really nice programming environment and really nice to learn in, where the fact that it’s continuously executed means that unlike you running your code and it doesn’t work, and you’ve got some error, that’s hard to comprehend. Instead, you have a code that just continuously executed in the form of the sheets you see in front of you. And similarly the fact that its individual cells, and you kind of lay out the data spatial… Or the program spatially, where the code and the data is interspersed together, and no one part of it can get too big and diffuse.”

The continuous execution is typically a responsive evaluation of all cells based on an input change to one of them. In this sense, Excel has many similarites with the piecewise “REPL” (read-evaluate-print-loop) execution model used by Jupyter (notebook) kernels, where a change to code in an input cell is evaluated when the cell is run and often an output data state is rendered, such as a chart or a table.

One the replies to one of the shares, from @andrewparker — makes this explicit: “[w]hen writing code, the functions are always visible and the variables’ contents are hidden. Excel is programming where the opposite is true.”

In the spreadsheet, explicit input data is presented to hidden code (that is, formulas) and the result of code execution is then rendered in the form of transformed data. In many “working” spreadsheets, partial steps (“a line of code at a time”) are calculated across parallel columns, with the spreadsheet giving a macroscopic view over partial transformations of the data a step at a time, befire returning the final calculations in the final column, or in the form of an interpreted display such as a graphical chart.

One of the oft-quoted criticisms against Jupyter notebooks is that “the state is hidden” (although if you treat Jupyter notebooks as a linear narrative and read and execute them as such, this claim is just so-much nonsense…) but suggests viewing notebooks in a complementary way: rather than having the parallel columnar cells of the Excel case, where a function at the top of the column may be applied to data values from previous columns, you have top-down linear exposition of the calculation where code cell at a time is used to transform the state generated by the previous cell. (One of the ways I construct notebooks is to take an input dataset and display it at the top of the notebook, apply a line of code to trasnform it and display the result of that transformation, apply another line of code in another cell and view the result of that, and so on.) You can now see not only the state of the data after each transformative step, but also the formula (the line of code) that generated it from data rendered from an earlier step.

Again picking up on the criticism of notebooks that at any given time you may read notebook as a cacophony of incoherent partially executed, a situation that may occur if you run a notebook to completion, then maybe change the input data at the top and run it half way, and then change the input data at the top and run just the first few cells, leaving a notebook with rendered data everywhere execution from different stes of data. This approach corresponds to the model of a spreadsheet worksheet where perhaps you have to click on each column in turn and hit return before the cells are updated, and that cell updates are only responsive to your action that triggers an update on selected cells. But if you get into the habit of only executing notebook cells using a restart-kernel-then-run-all execution model (which an extension could enforce) then this nonsense does not occur, and all the linear cells would be updated, in linear order.

And again, here there is a point of contrast: in the spreadsheet setting, any column or selection of cells may be created by the applciation of a formula to any other collection of cells in the workbook. In a jupyter notebook, if you use the restart-kernel-then-run-all execution model, then the rendering of data as the output to each code cell is a linear sequence. (There are other notebook extensions that let you define dependent cells which could transform the execution order to a non-linear one, but why would you do that..?)

Things can still get messy, though. For example, from another, less recent (March, 2020) tweet I found in the wild: I just realized that making plots in Excel has the same “bad code” smell for me that doing research in a Jupyter notebook does: you’re mixing analysis, data, and results in a way that doesn’t let you easily reuse bits for another later analysis (@rharang) Biut then, reuse is another issue altogether.

Thinks: one of the things I think I need to think about is how the spatial layout of a spreadsheet could may onto the spatial layout of a notebook. It might be interesting to find some “realistic” spreadsheets containing plausible business related calculations and give them a notebook treatment…

Anyway, here’s a fuller exceprted transcript from the podcast:

Patrick: I’m curious how you think about the transition to what’s now being called the no-code movement. The first part of the question is, how under supplied is the world in terms of just talented software developers? But may that potentially not be as big a problem if we do get no-code tools that would allow someone like me that has dabbled in but is certainly not terribly technical on software, more so in data science to build things for myself and not need engineers. What do you think that glide path looks like over the next say 10 years?

John Collison: The answer to how short staffed we are on engineers is still clearly loads, …, [W]e’re still really short of software engineers.

And no-code, I don’t think no-code is fully a panacea, because I think the set of at even when you’re doing no-code, you’re still reasoning about the relations between different objects and data flows and things like that. And so I think when you’re doing, when you’re building an app with Zapier or something like that, you’re still doing a form of engineering, you’re just not necessarily writing codes. And so hopefully that’s something that can give leverage to people without necessarily needing to have to spend quite as much time in it. And this is not new by the way, if you look at Excel, no one calls as a no-code tool, but Excel, I think is one of the most underappreciated programming environments in the world. And the number of Excel programmers versus people using how we think of as more traditional languages is really something to behold.

And I actually think lots of features of Excel that make it a really nice programming environment and really nice to learn in, where the fact that it’s continuously executed means that unlike you running your code and it doesn’t work, and you’ve got some error, that’s hard to comprehend. Instead, you have a code that just continuously executed in the form of the sheets you see in front of you. And similarly the fact that its individual cells, and you kind of lay out the data spatial… Or the program spatially, where the code and the data is interspersed together, and no one part of it can get too big and diffuse. Anyway, I think there are all of these ways in which, anyone who’s developing a no-code or new software paradigm should look at Excel because so many people have managed to essentially learn how to do some light programming from looking at other people’s models and other people’s workbooks and kind of emulating what they see.

… I don’t think no-code will obviate the need for software programmers, I would hope that it can make many more people able to participate in software creation and kind of smooth the on ramp, which is right now, there’s like a really sharp, vertical part of that one.

Some of this sentiment resonates with one of my motivations for “why code?”: it gives people a way of looking at problems that helps them understand the extent to which they may be computable, or may be decomposed, as well as given them a tool that allows them to automate particular tasks, or build other tools that help them get stuff done.

See also:

And one to watch again:

Family Faux Festivals ish-via Clashfinder

However many weeks we are into lockdown by now, we’ve been dabbling in various distributed family entertainments, from quizzes to online escape rooms. We’ve also already missed two festivals — Bearded Theory and the Isle of Wight Festival — with more to not come: Rhythm Tree, Festival at the Edge and Beautiful Days.

When we do go to festivals, I tend to prep by checking out the relevant Clashfinder site, listening to a couple of tracks from every band listed, figuring out which bands I intend to see and printing off enough copies of the Clashfinder listing to have spares..

With no festivals upcoming, I floated the idea we programme our own faux festival on the Clashfinder site, with each person getting two stages to programme as desired: a mid-size one and a smaller one.

Programming on the Clashfinder site means adding an act to a stage at a particular time and for a particular duration; you can optionally add various bits of metadata, such as the band’s name, homepage, or a Youtube video:

In the setup page for the particular Clashfinder site, you can enable automatic tagging: the system will try to identify the act and automatically add MusicBrainz metadata and generate relative links from it. Alternatively, you can disable this feature and the links you provide will be used as the link destinations:

On the public page for the festival, hovering over an act pops up dialogue that lets you click through on any added links, such as any Youtube link you may have added:

As well as the graphical editor there is also a text editing option, which gives you more of a data centric view:

You can also export the data as CSV, Excel, JSON, XML etc. There’s also an Excel import facility.



One of the things I pondered was whether I could knock up a thing that would play out the festival in real time, or “as if” realtime, where you pretend it’s a particular day of festival and play out the videos in real time as if it were that day.

Here’s my first attempt:

It’s a single web page app that uses the data (manually copied over at the moment) from the Clashfinder site and lets you view the festival in real time or as-if real time.

The broadcast model is false. The client web page checks the time and if an act is on at that time the video will play. If there’s no act scheduled at any particular time, you get a listing for that stage for that day with a line through the acts you’ve missed.

Ideally, you want to schedule videos that are not part of a playlist. If a video is in a playlist, then when a video finishes, the next video seems to autoplay, which is a real pain, if your scheduled slot extends more that a few seconds past the end time of the video…

(Hmm… I wonder, could you set an end time past the end of the video to see if that pauses autoplay of the next item in the playlist? Or maybe pass in a playlist with a dummy video, perphaps relating to your faux festival to play in the immediate aftermath og an act video whilst still in their scheduled slot time?)

On the to do list is a simple templated github repo that lets you submit a Clashfinder URL as an issue and it will then build and publish your site for you (eg using using something akin to this proof-of-concept approach) using Github Pages.

This approach would work equally for scheduling faux conferences, schools programming, etc. The content play out is synchronised and locally pulled, rather than broadcast. If you want to get social, use whatever social networking channel you prefer.

Essentiall, it’s using Clashfinder to schedule the play out of stage based Youtube playlists.

Note that if there’s a bunch of you scheduling things on the same Clashfinder event, there are no locks, so you need to refresh and update regularly or you could find that stale page you’ve had open in edit mode for the last three days and then make a single typo change to has wiped out the hundreds of commits the rest of your gang has made over the previous three days.

There’s lots of fettling I still want to do to the template page, but even in its cirrent bare bones state, it sort of works…

First Foray into the Reclaim Cloud (Beta) – Running a Personal Jupyter Notebook Server

For years and years I;ve been hassling my evil twin brother (it’s a long story) Jim Groom about getting Docker hosting up and running as part of Reclaim, so when an invite to the Reclaim Cloud beta arrived today (thanks, Jim:-), I had a quick play (with more to come in following days and weeks, hopefully… or at least until he switches my credit off;-)

The environment is provided by Jelastic, (I’m not sure how the business model will work, eg in terms of what’s being licensed and what’s being resold…?).

Whilst there are probably docs, the test of a good environment is how far you can get by just clicking buttons, so here’s a quick recap of my first foray…

Let’s be having a new environment then..

Docker looks like a good choice:

Seems like we can search for public DockerHub containers (and maybe also provate ones if we provide credentials?).

I’ll use one of my own containers, that is built on top of an official Jupyter stack container:

Select one and next, and a block is highlighted to show we’ve configured it…

When you click apply, you see loads of stuff available…

I’m going to cheat now… the first time round I forgot a step, and that step was setting a token to get into the Jupyter notebook.

If you look at my repo docs for the container I selected, you see that I recommend setting the Jupyter login token via an environment variable…

In the confusing screen, there’s a {...} Variables option that I guessed might help with that:

Just in passing, if your network connection breaks in a session, we get a warning and it tries to reconnect after a short period:

Apply the env var and hit the create button on the bewildering page:

And after a couple of minutes, it looks like we have a container running on a public IP address:

Which doesn’t work:

And it doesn’t work becuase the notebook isnlt listening on port 80, it autostarts on port 8888. So we need to look for a port map:

A bit of guessing now – we porbbaly  want an http port, which nominally maps, or at least default, to port 80? And then map that to the port the notebook server is listening on?

Add that and things now look like this as far as the endpoints go:

Try the public URL again, on the insecure http address:

Does Jim Rock?

Yes he does, and we’re in…

So what else is there? Does it work over https?

Hmmm… Let’s go poking around again and see if we can change the setup:

So, in the architecture diagram on the left, if we click the top Balancing block, we can get a load balancer and reverse proxy, which are the sorts of thing that can often handle certificates for us:

I’ll go for Nginx, cos I’ve heard of that…

It’s like a board game, isn’t it, where you get to put tokens on your personal board as you build your engine?! :-)

It takes a couple of mins to fire up the load balancer container (which is surely what it is?):

If we now have a look in the marketplace (I have to admit, I’d had skimmed through this at the start, and noticed there was something handy there…) we can see a Let’s Encrypt free SSL certificate:

Let’s have one of those then…

I’ll let you into another revisionist secret… I’d tried to install the SSL cert without the load balancer, but it refused to apply it to my container… and it really looked like it wanted to apply to something else. Which is what made me thing of the nginx server…

Again we need to wait for it to be applied:

When it is, I donlt spot anyhting obvious to show the Let’s Encrypt cert is there, but I did get a confirmation (not shown in screenshots).

So can we log in via https?

Bah.. that’s a sort of yes, isn’t it? The cert’s there:

but there’s http traffic passing through, presumably?

I guess I maybe need another endpoint? https onto port 8888?

I didn’t try at the time — that’s for next time — becuase what I actually did was to save Jim’s pennies…

And confirm…

So… no more than half an hour from a zero start (I was actually tinkering whilst on a call, so only half paying attention too…).

As for the container I used, that was built and pushed to DockerHub by other tools.

The container was originally defined in a Github repo to run on MyBinder using not a Dockerfile, but requirements.txt and apt.txt text files in a binder/ directory.

The Dockerhub image was built using a Github Action:

And for that to be able to push from Github to DockerHub, I had to share my DockerHub username and password as a secret with the Github repo:

But with that done, when I make a release of the repo, having tested it on MyBinder, an image is automatically built and pushed to Dockerhub. And when it’s there, I can pull it into Reclaim Cloud and run it as my own personal service.

Thanks, Jim..

PS It’s too late to play more today now, and this blog post has taken twice as long to write as it took me to get a Jupyter notebook sever up an running from scratch, but things on my to do list next are:

1) see if I can get the https access working;

2) crib from this recipe and this repo to see if I can get a multi-user JupyterHub with a Dockerspawner up and running from a simple Docker Compose script. (I can probably drop the Traefik proxy and Let’s Encrypt steps and just focus on the JupyerHub config; the Nginx reverse proxy can then fill the gap, presumably…)

Educational Content Creation in Jupyter Notebooks — Creating the Tools of Production As You Go

For the last few weeks (and still and 2-3 more weeks to go, at the current rate of progress), I’ve been updating some introductory course materials for a module due to present in 20J (which is to say, October, 2020).

Long time readers wil be familiar with the RobotLab application we’ve been using in various versions of the module for the last 20 years and my on and off attempts looking for possible alternatives (for example, Replacing RobotLab…?).

The alternative I opted for is a new browser based simulator based on ev3devsim. Whilst my tinkering with that, in the form of nbev3devsim is related to this post, I’ll reserve discussion of it for another post…

So what is this post about?

To bury the lede further, the approach I’ve taken to updating the course materials has been to take the original activity materials, in their raw OU-XML form, convert them to markdown (using the tools I’ve also used for republishing OpenLearn content as editable markdown / text documents) and then rewrite them using the new simulator rather than the old RobotLab application. All this whilst I’m updating and building out the replacement simulator (which in part means that the materials drafted early in the process are now outdated as the simulator has been developed; but more of that in another post…).

ALong the way, I’ve been trying to explore all manner of things, including building tools to support the production of media assets used in the course.

For example, the simulator uses a set of predefined backgrounds as the basis of various activities, as per the original simulator. The original backgrounds are not available in the right format / at the right resolution, so I needed to create them in some way. Rather than use a drawing package, and a dsequence of hard to remember and hard replicate mouse and menu actions, I scripted the creation of the diagrams:

This should make maintenance easier, and also provides a set of recipes I can build on, image objects I can process, and so on. (You can see the background generator recipes here.)

The original materials also included a range of flowcharts. The image quality of some of them was a bit ropey, so I started looking for alternatives.

I started off using mermaid.js. I was hoping to use a simple magic that would let me put the chart description into a magicked code cell and then render the result, but on a quick first attempt I could get that to work (managing js dependencies and scope is something I can’t get my head round). So instead, at the moment, the mermaid created flow charts I’m using are created on the fly from a call to a mermaid online API.

Using a live, online image generator is not ideal in presentation. For example, a student may be working on the materials whilst offline. It is okay for creating static assets in production and then saving those for embedding in the materials released to students.

One other thing to note about the flow chart is the provision of the long description text, provided as an aid to visually impaired students using screen readers. I’ve been pondering image descriptions for a long time, and there are a few things I am, and want to, explore as I’m updating the TM129 matierals.

The first question is whether we need long description text anyway, or whether the description should be inlined anyway. When a diagram or chart is used in a text, there are at least two ways of seeing / reading it: first, as a sequence of marks on a page: there is a box here with this label, connected by an arrow to a box to the right of it with that label”. And so on. In a chart, such as a scatterplot, something like “a scatterplot with x-axis labelled this and ranging from this to that, a y-axis labelled whatever ranging wherever, a series of blue points densely arranged in the area (coords)” etc etc.

I’ve done crude sketches previously of how we might start to render Grammar of Graphics (ggplot) described graphics and matplotlib chart objects as text (eg First Thoughts on Automatically Generating Accessible Text Descriptions of ggplot Charts in R but I’ve not find anyone else internally keen to play with that idea (at least, not with me, or to my knowing), so I keep putting off doing more on that. But I do still think it could be a useful thing to do more of).

Another approach might be to generate text via a parser over the diagram’s definition in code (I’ve never really played with parsers; lark and plyplus could provide a start). Or, if the grammar is simple enough, provide students with a description early on of how to “read” the description language and then provide the “generator text” as the description text. (Even simple regexes might help, eg mapping -> to “right arrow” or “leads to” etc.) The visual diagram is often a function of the generator text and a layout algorithm (or, following UKgov public service announcements in abusing “+”, diagram = generator_text + layout) so as long as the layout algorithm isn’t deriving and adding additional content, but is simply re-presenting the description as provided, the generator text is the most concise long description.

The second way of looking at / seeing / reading a chart is to try to interpret the marks made in ink in some way. This sort of description is usually provided in the main text, as a way of helping students learn to read the diagram / chart, and what areas of it to focus on. Note that the “meaning” of a chart is subject to stance and rhetoric. On a line chart, we might have a literal description of the ink say “line up and to the right”, we might then “read” that “increasing”, and we might then interpret that “increasing” as evidence of some effect, as a persuasive rhetorical argument in favour of or against something, and so on. Again, that sort of interpretation is the one we’d offer all students equally.

But when it comes to just the “ink on paper” bit, how should we best provide an accessible equivalent to the visual representation? Just as sighted students in their mind’s eye presumably don’t read lines between boxes as “box connected by a line to box” (or do they?), I wonder whether our long description should be read by visually impaired students through their screen reader as “box connected by a line to box”. Why do we map from a thing, a -> b represented visually in terms of the description provided to visually impaired students using a text description of a visual representation? Does it help? The visual representation itself is a re-presentation of a relationship in a graphical way that tries to communicate that relationship to the reader. The visual is the communicative medium. So why use text to describe a visual intermediary representation in a long description? Would another intermediary representation be more useful? I guess I’m saying: why describe a visually rendered matplotlib object to a visually impaired student in a visual way if we want to communicate the idea of what the matplotlib object represents? Why not describe the chart object, which defines the whatever is being re-presented in a visual way, in other terms? (I guess one reason we want to describe the visual representation to visually impaired studets is so that when they hear sighted people talking in visual terms, they know what they’re talking about…)


So, back to creating tools of production. The mermaid.js route as it currently stands is not ideal; and the flow charts it generates are perhaps “non-standard” in their symbol selection and layout. (Note that is something we could perhaps address by forking and fixing the mermaid.js library so that it does render things as we’d like to see them…)

Another possible flowcharter library I came across was flowchart.js. I did manage to wrap this in a jp_proxy_widget as flowchart_js_jp_proxy_widget to provide a means of rendering flowcharts from a simple description within a notebook:

You can find it here: innovationOUtside/flowchart_js_jp_proxy_widget

I also created a simple magic associated with it…

(Note that the jp_proxy_widget route to this magic is perhaps not the best way of doing things, but I’ve been exploring how to use jp_proxy_widget more generally, and this fitted with that; as a generic recipe, it could be handy. What would be useful is a recipe that does not involve jp_proxy_widget; nb-flowchartjs doesnlt seem to work atm, but could provide a clue as to how to do that…)

The hour or two spent putting that together means I now have a reproducible way of scripting the production of simple flowchart diagrams using flowchart.js. The next step is to try to figure out how to parse the flowchart.js diagram descriptions, and for simple ones at least, have a stab at generating a textualised version of them. (Although as mentioned above, is the diagram description text its own best description?)

Fragment: Towards the Age of Coveillance?

There’s a lot of chat, and a lot of reports out (I’ll get around to listing them when I take the time to…) regarding the potential use of phone apps of various flavours regarding contact tracking as a possible tech solutionist contribution to any release of lockdown, particularly at scale over extended periods…

…so I’m really surprised that folk aren’t making use of the coveillance / #coveillance tag to refer to the strategy, playing on “covid-19″, “contact tracing”, “surveillance“, and even at a push, “panopticon” and so on…

From a quick search, the first reference I could find is from a course several years ago at Berkeley, 290. Surveillance, Sousveillance, Coveillance, and Dataveillance, Autumn/Fall 2009, taught by Deirdre Mulligan, which had the following description:

We live in an information society. The use of technology to support a wide array of social, economic and political interactions is generating an increasing amount of information about who, what and where we are. Through self documentation (sousveillance), state sponsored surveillance, and documentation of interaction with others (coveillance) a vast store of information — varied in content and form — about daily life is spread across private and public data systems where it is subject to various forms of processing, used for a range of purposes (some envisioned and intended, others not), and subject to various rules that meet or upend social values including security, privacy and accountability. This course will explore the complex ways in which these varied forms of data generation, collection, processing and use interact with norms, markets and laws to produce security, fear, control, vulnerability. Some of the areas covered include close-circuit television (CCTV) in public places, radio frequency identification tags in everyday objects, digital rights management technologies, the smart grid, and biometrics. Readings will be drawn from law, computer science, social sciences, literature, and art and media studies

This gives us a handy definition: coveillance: documentation of interaction with others

A more comprehensive discussion is given in the CC licensed 2012 book Configuring the Networked Self by Julie E. Cohen (printable PDF), specifically Chapter 6, pp. 13-16:

Coveillance, Self-Exposure, and the Culture of the Spectacle

Other social and technological changes also can alter the balance of powers and disabilities that exists in networked space. Imagine now that our café-sitting individual engages in some embarrassing and unsavory behavior— perhaps she throws her used paper cup and napkin into the bushes, or coughs on the milk dispenser. Another patron of the café photographs her with his mobile phone and posts the photographs on an Internet site dedicated to shaming the behavior. This example reminds us that being in public entails a degree of exposure, and that (like informational transparency) sometimes exposure can have beneficial consequences. (It also reminds us, again, that online space and real space are not separate.) Maybe we don’t want people to litter or spread germs, and if the potential for exposure reduces the incidence of those behaviors, so much the better. Or suppose our café-sitter posts her own location on an Internet site that lets its members log their whereabouts and activities. This example reminds us that exposure may be desired and eagerly pursued; in such cases, worries about privacy seem entirely off the mark. But the problem of exposure in networked space is more complicated than these examples suggest.
The sort of conduct in the first example, which the antisurveillance activist Steve Mann calls “coveillance,” figures prominently in two different claims about diminished expectations of privacy in public. Privacy critics argue that when technologies for surveillance are in common use, their availability can eliminate expectations of privacy that might previously have existed. Mann argues that because coveillance involves observation by equals, it avoids the troubling political implications of surveillance. But if the café-sitter’s photograph had been posted on a site that collects photographs of “hot chicks,” many women would understand the photographer’s conduct as an act of subordination. And the argument that coveillance eliminates expectations of privacy visà-vis surveillance is a non sequitur. This is so whether or not one accepts the argument that coveillance and surveillance are meaningfully different. If they are different, then coveillance doesn’t justify or excuse the exercise of power that surveillance represents. If they are the same, then the interest against exposure applies equally to both.
In practice, the relation between surveillance and coveillance is more mutually constituting than either of these arguments acknowledges. Many employers now routinely search the Internet for information about prospective hires, so what began as “ordinary” coveillance can become the basis for a probabilistic judgment about attributes, abilities, and aptitudes. At other times, public authorities seek to harness the distributed power of coveillance for their own purposes—for example, by requesting the identification of people photographed at protest rallies.23 Here what began as surveillance becomes an exercise of distributed moral and political power, but it is power called forth for a particular purpose.
Self-exposure is the subject of a parallel set of claims about voyeurism and agency. Some commentators celebrate the emerging culture of selfexposure. They assert that in today’s culture of the electronic image, power over one’s own image resides not in secrecy or effective data protection, which in any case are unattainable, but rather in the endless play of images and digital personae. We should revel in our multiplicity, and if we are successful in our efforts to be many different selves, the institutions of the surveillant assemblage will never be quite sure who is who and what is what. Conveniently in some accounts, this simplified, pop-culture politics of the performative also links up with the celebration of subaltern identities and affiliations. Performance, we are told, is something women and members of racial and sexual minorities are especially good at; most of us are used to playing different roles for different audiences. But this view of the social meaning of performance should give us pause.
First, interpreting self-exposure either as a blanket waiver of privacy or as an exercise in personal empowerment would be far too simple. Surveillance and self-exposure bleed into each other in the same ways that surveillance and coveillance do. As millions of subscribers to social-networking sites are now beginning to learn, the ability to control the terms of self-exposure in networked space is largely illusory: body images intended to assert feminist selfownership are remixed as pornography, while revelations intended for particular social networks are accessed with relative ease by employers, police, and other authority figures. These examples, and thousands of others like them, argue for more careful exploration of the individual and systemic consequences of exposure within networked space, however it is caused.
Other scholars raise important questions about the origins of the desire for exposure. In an increasing number of contexts, the images generated by surveillance have fetish value. As Kirstie Ball puts it, surveillance creates a “political economy of interiority” organized around “the ‘authenticity’ of the captured experience.” Within this political economy, self-exposure “may represent patriotic or participative values to the individual,” but it also may be a behavior called forth by surveillance and implicated in its informational and spatial logics. In the electronic age, performances circulate in emergent, twinned economies of authenticity and perversity in which the value of the experiences offered up for gift, barter, or sale is based on their purported normalcy or touted outlandishness. These economies of performance do not resist the surveillant assemblage; they feed it. Under those circumstances, the recasting of the performative in the liberal legal language of self-help seems more than a little bit unfair. In celebrating voluntary self-exposure, we have not left the individualistic, consent-based structure of liberal privacy theory all that far behind. And while one can comfortably theorize that if teenagers, women, minorities, and gays choose to expose themselves, that is their business, it is likely that the burden of this newly liberatory self-commodification doesn’t fall equally on everyone.
The relation between surveillance and self-exposure is complex, because accessibility to others is a critical enabler of interpersonal association and social participation. From this perspective, the argument that privacy functions principally to enable interpersonal intimacy gets it only half right. Intimate relationships, community relationships, and more casual relationships all derive from the ability to control the presentation of self in different ways and to differing extents. It is this recognition that underlies the different levels of “privacy” enabled (at least in theory) by some—though not all—social-networking sites.Accessibility to others is also a critical enabler of challenges to entrenched perceptions of identity. Self-exposure using networked information technologies can operate as resistance to narratives imposed by others. Here the performative impulse introduces static into the circuits of the surveillant assemblage; it seeks to reclaim bodies and reappropriate spaces.
Recall, however, that self-exposure derives its relational power partly and importantly from its selectivity. Surveillance changes the dynamic of selectivity in unpredictable and often disorienting ways. When words and images voluntarily shared in one context reappear unexpectedly in another, the resulting sense of unwanted exposure and loss of control can be highly disturbing. To similar effect, Altman noted that loss of control over the space-making mechanisms of personal space and territory produced sensations of physical and emotional distress. These effects argue for more explicitly normative evaluation of the emerging culture of performance and coveillance, and of the legal and architectural decisions on which it relies.
Thus understood, the problems of coveillance and self-exposure also illustrate a more fundamental proposition about the value of openness in the information environment: openness is neither neutral nor univalent, but is itself the subject of a complex politics. Some kinds of openness serve as antidotes to falsehood and corruption; others serve merely to titillate or to deepen entrenched inequalities. Still other kinds of openness operate as self-defense; if anyone can take your child’s picture with his mobile phone without you being any the wiser, why shouldn’t you know where all of the local sex offenders live and what they look like? But the resulting “information arms races” may have
broader consequences than their participants recognize. Some kinds of openness foster thriving, broadly shared education and public debate. Other, equally important varieties of openness are contextual; they derive their value precisely from the fact that they are limited in scope and duration. Certainly, the kinds of value that a society places on openness, both in theory and in practice, reveal much about that society. There are valid questions to be discussed regarding what the emerging culture of performance and coveillance reveals about ours.
It is exactly this conversation that the liberal credo of “more information is better” has disabled us from having. Jodi Dean argues that the credo of openness drives a political economy of “communicative capitalism” organized around the tension between secrets and publicity. That political economy figures importantly in the emergence of a media culture that prizes exposure and a punditocracy that assigns that culture independent normative value because of the greater “openness” it fosters.28 Importantly, this reading of our public discourse problematizes both secrecy and openness. It suggests both that there is more secrecy than we acknowledge and that certain types of public investiture in openness for its own sake create large political deficits.
It seems reasonable to posit that the shift to an information-rich, publicity-oriented environment would affect the collective understanding of selfhood. Many theorists of the networked information society argue that the relationship between self and society is undergoing fundamental change. Although there is no consensus on the best description of these changes, several themes persistently recur. One is the emergence and increasing primacy of forms of collective consciousness that are “tribal,” or essentialized and politicized. These forms of collective consciousness collide with others that are hivelike, dictated by the technical and institutional matrices within which they are embedded. Both of these collectivities respond in inchoate, visceral ways to media imagery and content.
I do not mean here to endorse any of these theories, but only to make the comparatively modest point that in all of them, public discourse in an era of abundant information bears little resemblance to the utopian predictions of universal enlightenment that heralded the dawn of the Internet age. Moreover, considerable evidence supports the hypothesis that more information does not inevitably produce a more rational public. As we saw in Chapter 2, information flows in networked space follow a “rich get richer” pattern that channels everincreasing traffic to already-popular sites. Public opinion markets are multiple and often dichotomous, subject to wild swings and abrupt corrections. Quite likely, information abundance produces a public that is differently rational — and differently irrational — than it was under conditions of information scarcity. On that account, however, utopia still lies elsewhere.
The lesson for privacy theory, and for information policy more generally, is that scholars and policy makers should avoid investing emerging norms of exposure with positive value just because they are “open.” Information abundance does not eliminate the need for normative judgments about the institutional, social, and technical parameters of openness. On the contrary, it intensifies the need for careful thinking, wise policy making, and creative norm entrepreneurship around the problems of exposure, self-exposure, and coveillance. In privacy theory, and in other areas of information policy, the syllogism “if open, then good” should be interrogated rather than assumed.

From that book, we also get a pointer to the term appearing in the literature: Mann, Steve, Jason Nolan, and Barry Wellman. “Sousveillance: Inventing and Using Wearable Computing Devices for Data Collection in Surveillance Environments.” Surveillance and Society 1.3 (2003): 331–55 [PDF]:

In conditions of interactions among ordinary citizens being photographed or otherwise having their image recorded by other apparently ordinary citizens, those being photographed generally will not object when they can see both the image and the image capture device … in the context of a performance space. This condition, where peers can see both the recording and the presentation of the images, is neither “surveillance” nor “sousveillance.” We term such observation that is side-to-side “coveillance,” an example of which could include one citizen watching another.

Mann seems to have been hugely interested in wearables and the “veillance” opportunities afforded by them, for example, folk wearing forward facing cameras using something like Google Glass (remember that?!). But the point to pull from the definition is perhaps generalising “seeing” to meaning things like “my device sees yours”, and whilst the device(s) may be hidden, the expectation is that: a) we all have one; b) it is observeable, then we are knowingly in an (assumed) state of coveillance.

By the by, another nice quote from the same paper:

In such a coveillance society, the actions of all may, in theory, be observable and accountable to all. The issue, however, is not about how much surveillance and sousveillance is present in a situation, but how it generates an awareness of the disempowering nature of surveillance, its overwhelming presence in western societies, and the complacency of all participants towards this presence.

Also by the by, I note in passing a rather neat contrary position in the form of coveillance.org, “a people’s guide to surveillance: a hands-on introduction to identifying how you’re being watched in daily life, and by whom” created by “a collective of technologists, organizers, and designers who employ arts-based approaches to demystify surveillance and build communal counterpower”.

PS as promised, some references:

Please feel free to add further relevant links to the comments…

I also note (via tweet a few days ago from Owen Boswarva) that moves are afoot in the UK to open up Unique Property Reference Numbers (UPRNs) and Unique Street Reference Numbers (USRNs) via the Ordnance Survey. These numbers uiniquely reference properties and would, you have to think, make for interesting possibilities as part of a coveillance app.

And finally, given all the hype around Google and Apple working “together” on a tracking app, partly becuase they are device operating system manufacturers with remote access (via updates) to lots of devices…, I note that I haven’t seen folk mentioning data aggregators such as Foursquare in the headlines, given they already aggregate and (re)sell location data to Apple, Samsung etc etc (typical review from within the last year from the New York Intelligencer: Ten Years On, Foursquare Is Now Checking In to You). They’re also acquisitive of other data slurpers, eg buying Placed from Snap last year (a service which “tracks the real-time location of nearly 6 million monthly active users through apps that pay users or offer other types of rewards in exchange for access to their data”) and just recently, Factual, the blog post announcing which declares: “The new Foursquare will offer unparalleled reach and scale, with datasets spanning:”

  • More than 500 million devices worldwide
  • A panel of 25 million opted-in, always on users and over 14 billion user confirmed check-ins
  • More than 105 million points of interest across 190 countries and 50 territories

How come folk aren’t getting twitchy, yet?