Educational Content Creation in Jupyter Notebooks — Creating the Tools of Production As You Go

For the last few weeks (and still and 2-3 more weeks to go, at the current rate of progress), I’ve been updating some introductory course materials for a module due to present in 20J (which is to say, October, 2020).

Long time readers wil be familiar with the RobotLab application we’ve been using in various versions of the module for the last 20 years and my on and off attempts looking for possible alternatives (for example, Replacing RobotLab…?).

The alternative I opted for is a new browser based simulator based on ev3devsim. Whilst my tinkering with that, in the form of nbev3devsim is related to this post, I’ll reserve discussion of it for another post…

So what is this post about?

To bury the lede further, the approach I’ve taken to updating the course materials has been to take the original activity materials, in their raw OU-XML form, convert them to markdown (using the tools I’ve also used for republishing OpenLearn content as editable markdown / text documents) and then rewrite them using the new simulator rather than the old RobotLab application. All this whilst I’m updating and building out the replacement simulator (which in part means that the materials drafted early in the process are now outdated as the simulator has been developed; but more of that in another post…).

ALong the way, I’ve been trying to explore all manner of things, including building tools to support the production of media assets used in the course.

For example, the simulator uses a set of predefined backgrounds as the basis of various activities, as per the original simulator. The original backgrounds are not available in the right format / at the right resolution, so I needed to create them in some way. Rather than use a drawing package, and a dsequence of hard to remember and hard replicate mouse and menu actions, I scripted the creation of the diagrams:

This should make maintenance easier, and also provides a set of recipes I can build on, image objects I can process, and so on. (You can see the background generator recipes here.)

The original materials also included a range of flowcharts. The image quality of some of them was a bit ropey, so I started looking for alternatives.

I started off using mermaid.js. I was hoping to use a simple magic that would let me put the chart description into a magicked code cell and then render the result, but on a quick first attempt I could get that to work (managing js dependencies and scope is something I can’t get my head round). So instead, at the moment, the mermaid created flow charts I’m using are created on the fly from a call to a mermaid online API.

Using a live, online image generator is not ideal in presentation. For example, a student may be working on the materials whilst offline. It is okay for creating static assets in production and then saving those for embedding in the materials released to students.

One other thing to note about the flow chart is the provision of the long description text, provided as an aid to visually impaired students using screen readers. I’ve been pondering image descriptions for a long time, and there are a few things I am, and want to, explore as I’m updating the TM129 matierals.

The first question is whether we need long description text anyway, or whether the description should be inlined anyway. When a diagram or chart is used in a text, there are at least two ways of seeing / reading it: first, as a sequence of marks on a page: there is a box here with this label, connected by an arrow to a box to the right of it with that label”. And so on. In a chart, such as a scatterplot, something like “a scatterplot with x-axis labelled this and ranging from this to that, a y-axis labelled whatever ranging wherever, a series of blue points densely arranged in the area (coords)” etc etc.

I’ve done crude sketches previously of how we might start to render Grammar of Graphics (ggplot) described graphics and matplotlib chart objects as text (eg First Thoughts on Automatically Generating Accessible Text Descriptions of ggplot Charts in R but I’ve not find anyone else internally keen to play with that idea (at least, not with me, or to my knowing), so I keep putting off doing more on that. But I do still think it could be a useful thing to do more of).

Another approach might be to generate text via a parser over the diagram’s definition in code (I’ve never really played with parsers; lark and plyplus could provide a start). Or, if the grammar is simple enough, provide students with a description early on of how to “read” the description language and then provide the “generator text” as the description text. (Even simple regexes might help, eg mapping -> to “right arrow” or “leads to” etc.) The visual diagram is often a function of the generator text and a layout algorithm (or, following UKgov public service announcements in abusing “+”, diagram = generator_text + layout) so as long as the layout algorithm isn’t deriving and adding additional content, but is simply re-presenting the description as provided, the generator text is the most concise long description.

The second way of looking at / seeing / reading a chart is to try to interpret the marks made in ink in some way. This sort of description is usually provided in the main text, as a way of helping students learn to read the diagram / chart, and what areas of it to focus on. Note that the “meaning” of a chart is subject to stance and rhetoric. On a line chart, we might have a literal description of the ink say “line up and to the right”, we might then “read” that “increasing”, and we might then interpret that “increasing” as evidence of some effect, as a persuasive rhetorical argument in favour of or against something, and so on. Again, that sort of interpretation is the one we’d offer all students equally.

But when it comes to just the “ink on paper” bit, how should we best provide an accessible equivalent to the visual representation? Just as sighted students in their mind’s eye presumably don’t read lines between boxes as “box connected by a line to box” (or do they?), I wonder whether our long description should be read by visually impaired students through their screen reader as “box connected by a line to box”. Why do we map from a thing, a -> b represented visually in terms of the description provided to visually impaired students using a text description of a visual representation? Does it help? The visual representation itself is a re-presentation of a relationship in a graphical way that tries to communicate that relationship to the reader. The visual is the communicative medium. So why use text to describe a visual intermediary representation in a long description? Would another intermediary representation be more useful? I guess I’m saying: why describe a visually rendered matplotlib object to a visually impaired student in a visual way if we want to communicate the idea of what the matplotlib object represents? Why not describe the chart object, which defines the whatever is being re-presented in a visual way, in other terms? (I guess one reason we want to describe the visual representation to visually impaired studets is so that when they hear sighted people talking in visual terms, they know what they’re talking about…)

Hmm…

So, back to creating tools of production. The mermaid.js route as it currently stands is not ideal; and the flow charts it generates are perhaps “non-standard” in their symbol selection and layout. (Note that is something we could perhaps address by forking and fixing the mermaid.js library so that it does render things as we’d like to see them…)

Another possible flowcharter library I came across was flowchart.js. I did manage to wrap this in a jp_proxy_widget as flowchart_js_jp_proxy_widget to provide a means of rendering flowcharts from a simple description within a notebook:

You can find it here: innovationOUtside/flowchart_js_jp_proxy_widget

I also created a simple magic associated with it…

(Note that the jp_proxy_widget route to this magic is perhaps not the best way of doing things, but I’ve been exploring how to use jp_proxy_widget more generally, and this fitted with that; as a generic recipe, it could be handy. What would be useful is a recipe that does not involve jp_proxy_widget; nb-flowchartjs doesnlt seem to work atm, but could provide a clue as to how to do that…)

The hour or two spent putting that together means I now have a reproducible way of scripting the production of simple flowchart diagrams using flowchart.js. The next step is to try to figure out how to parse the flowchart.js diagram descriptions, and for simple ones at least, have a stab at generating a textualised version of them. (Although as mentioned above, is the diagram description text its own best description?)

Fragment: Towards the Age of Coveillance?

There’s a lot of chat, and a lot of reports out (I’ll get around to listing them when I take the time to…) regarding the potential use of phone apps of various flavours regarding contact tracking as a possible tech solutionist contribution to any release of lockdown, particularly at scale over extended periods…

…so I’m really surprised that folk aren’t making use of the coveillance / #coveillance tag to refer to the strategy, playing on “covid-19″, “contact tracing”, “surveillance“, and even at a push, “panopticon” and so on…

From a quick search, the first reference I could find is from a course several years ago at Berkeley, 290. Surveillance, Sousveillance, Coveillance, and Dataveillance, Autumn/Fall 2009, taught by Deirdre Mulligan, which had the following description:

We live in an information society. The use of technology to support a wide array of social, economic and political interactions is generating an increasing amount of information about who, what and where we are. Through self documentation (sousveillance), state sponsored surveillance, and documentation of interaction with others (coveillance) a vast store of information — varied in content and form — about daily life is spread across private and public data systems where it is subject to various forms of processing, used for a range of purposes (some envisioned and intended, others not), and subject to various rules that meet or upend social values including security, privacy and accountability. This course will explore the complex ways in which these varied forms of data generation, collection, processing and use interact with norms, markets and laws to produce security, fear, control, vulnerability. Some of the areas covered include close-circuit television (CCTV) in public places, radio frequency identification tags in everyday objects, digital rights management technologies, the smart grid, and biometrics. Readings will be drawn from law, computer science, social sciences, literature, and art and media studies

This gives us a handy definition: coveillance: documentation of interaction with others

A more comprehensive discussion is given in the CC licensed 2012 book Configuring the Networked Self by Julie E. Cohen (printable PDF), specifically Chapter 6, pp. 13-16:

Coveillance, Self-Exposure, and the Culture of the Spectacle

Other social and technological changes also can alter the balance of powers and disabilities that exists in networked space. Imagine now that our café-sitting individual engages in some embarrassing and unsavory behavior— perhaps she throws her used paper cup and napkin into the bushes, or coughs on the milk dispenser. Another patron of the café photographs her with his mobile phone and posts the photographs on an Internet site dedicated to shaming the behavior. This example reminds us that being in public entails a degree of exposure, and that (like informational transparency) sometimes exposure can have beneficial consequences. (It also reminds us, again, that online space and real space are not separate.) Maybe we don’t want people to litter or spread germs, and if the potential for exposure reduces the incidence of those behaviors, so much the better. Or suppose our café-sitter posts her own location on an Internet site that lets its members log their whereabouts and activities. This example reminds us that exposure may be desired and eagerly pursued; in such cases, worries about privacy seem entirely off the mark. But the problem of exposure in networked space is more complicated than these examples suggest.
The sort of conduct in the first example, which the antisurveillance activist Steve Mann calls “coveillance,” figures prominently in two different claims about diminished expectations of privacy in public. Privacy critics argue that when technologies for surveillance are in common use, their availability can eliminate expectations of privacy that might previously have existed. Mann argues that because coveillance involves observation by equals, it avoids the troubling political implications of surveillance. But if the café-sitter’s photograph had been posted on a site that collects photographs of “hot chicks,” many women would understand the photographer’s conduct as an act of subordination. And the argument that coveillance eliminates expectations of privacy visà-vis surveillance is a non sequitur. This is so whether or not one accepts the argument that coveillance and surveillance are meaningfully different. If they are different, then coveillance doesn’t justify or excuse the exercise of power that surveillance represents. If they are the same, then the interest against exposure applies equally to both.
In practice, the relation between surveillance and coveillance is more mutually constituting than either of these arguments acknowledges. Many employers now routinely search the Internet for information about prospective hires, so what began as “ordinary” coveillance can become the basis for a probabilistic judgment about attributes, abilities, and aptitudes. At other times, public authorities seek to harness the distributed power of coveillance for their own purposes—for example, by requesting the identification of people photographed at protest rallies.23 Here what began as surveillance becomes an exercise of distributed moral and political power, but it is power called forth for a particular purpose.
Self-exposure is the subject of a parallel set of claims about voyeurism and agency. Some commentators celebrate the emerging culture of selfexposure. They assert that in today’s culture of the electronic image, power over one’s own image resides not in secrecy or effective data protection, which in any case are unattainable, but rather in the endless play of images and digital personae. We should revel in our multiplicity, and if we are successful in our efforts to be many different selves, the institutions of the surveillant assemblage will never be quite sure who is who and what is what. Conveniently in some accounts, this simplified, pop-culture politics of the performative also links up with the celebration of subaltern identities and affiliations. Performance, we are told, is something women and members of racial and sexual minorities are especially good at; most of us are used to playing different roles for different audiences. But this view of the social meaning of performance should give us pause.
First, interpreting self-exposure either as a blanket waiver of privacy or as an exercise in personal empowerment would be far too simple. Surveillance and self-exposure bleed into each other in the same ways that surveillance and coveillance do. As millions of subscribers to social-networking sites are now beginning to learn, the ability to control the terms of self-exposure in networked space is largely illusory: body images intended to assert feminist selfownership are remixed as pornography, while revelations intended for particular social networks are accessed with relative ease by employers, police, and other authority figures. These examples, and thousands of others like them, argue for more careful exploration of the individual and systemic consequences of exposure within networked space, however it is caused.
Other scholars raise important questions about the origins of the desire for exposure. In an increasing number of contexts, the images generated by surveillance have fetish value. As Kirstie Ball puts it, surveillance creates a “political economy of interiority” organized around “the ‘authenticity’ of the captured experience.” Within this political economy, self-exposure “may represent patriotic or participative values to the individual,” but it also may be a behavior called forth by surveillance and implicated in its informational and spatial logics. In the electronic age, performances circulate in emergent, twinned economies of authenticity and perversity in which the value of the experiences offered up for gift, barter, or sale is based on their purported normalcy or touted outlandishness. These economies of performance do not resist the surveillant assemblage; they feed it. Under those circumstances, the recasting of the performative in the liberal legal language of self-help seems more than a little bit unfair. In celebrating voluntary self-exposure, we have not left the individualistic, consent-based structure of liberal privacy theory all that far behind. And while one can comfortably theorize that if teenagers, women, minorities, and gays choose to expose themselves, that is their business, it is likely that the burden of this newly liberatory self-commodification doesn’t fall equally on everyone.
The relation between surveillance and self-exposure is complex, because accessibility to others is a critical enabler of interpersonal association and social participation. From this perspective, the argument that privacy functions principally to enable interpersonal intimacy gets it only half right. Intimate relationships, community relationships, and more casual relationships all derive from the ability to control the presentation of self in different ways and to differing extents. It is this recognition that underlies the different levels of “privacy” enabled (at least in theory) by some—though not all—social-networking sites.Accessibility to others is also a critical enabler of challenges to entrenched perceptions of identity. Self-exposure using networked information technologies can operate as resistance to narratives imposed by others. Here the performative impulse introduces static into the circuits of the surveillant assemblage; it seeks to reclaim bodies and reappropriate spaces.
Recall, however, that self-exposure derives its relational power partly and importantly from its selectivity. Surveillance changes the dynamic of selectivity in unpredictable and often disorienting ways. When words and images voluntarily shared in one context reappear unexpectedly in another, the resulting sense of unwanted exposure and loss of control can be highly disturbing. To similar effect, Altman noted that loss of control over the space-making mechanisms of personal space and territory produced sensations of physical and emotional distress. These effects argue for more explicitly normative evaluation of the emerging culture of performance and coveillance, and of the legal and architectural decisions on which it relies.
Thus understood, the problems of coveillance and self-exposure also illustrate a more fundamental proposition about the value of openness in the information environment: openness is neither neutral nor univalent, but is itself the subject of a complex politics. Some kinds of openness serve as antidotes to falsehood and corruption; others serve merely to titillate or to deepen entrenched inequalities. Still other kinds of openness operate as self-defense; if anyone can take your child’s picture with his mobile phone without you being any the wiser, why shouldn’t you know where all of the local sex offenders live and what they look like? But the resulting “information arms races” may have
broader consequences than their participants recognize. Some kinds of openness foster thriving, broadly shared education and public debate. Other, equally important varieties of openness are contextual; they derive their value precisely from the fact that they are limited in scope and duration. Certainly, the kinds of value that a society places on openness, both in theory and in practice, reveal much about that society. There are valid questions to be discussed regarding what the emerging culture of performance and coveillance reveals about ours.
It is exactly this conversation that the liberal credo of “more information is better” has disabled us from having. Jodi Dean argues that the credo of openness drives a political economy of “communicative capitalism” organized around the tension between secrets and publicity. That political economy figures importantly in the emergence of a media culture that prizes exposure and a punditocracy that assigns that culture independent normative value because of the greater “openness” it fosters.28 Importantly, this reading of our public discourse problematizes both secrecy and openness. It suggests both that there is more secrecy than we acknowledge and that certain types of public investiture in openness for its own sake create large political deficits.
It seems reasonable to posit that the shift to an information-rich, publicity-oriented environment would affect the collective understanding of selfhood. Many theorists of the networked information society argue that the relationship between self and society is undergoing fundamental change. Although there is no consensus on the best description of these changes, several themes persistently recur. One is the emergence and increasing primacy of forms of collective consciousness that are “tribal,” or essentialized and politicized. These forms of collective consciousness collide with others that are hivelike, dictated by the technical and institutional matrices within which they are embedded. Both of these collectivities respond in inchoate, visceral ways to media imagery and content.
I do not mean here to endorse any of these theories, but only to make the comparatively modest point that in all of them, public discourse in an era of abundant information bears little resemblance to the utopian predictions of universal enlightenment that heralded the dawn of the Internet age. Moreover, considerable evidence supports the hypothesis that more information does not inevitably produce a more rational public. As we saw in Chapter 2, information flows in networked space follow a “rich get richer” pattern that channels everincreasing traffic to already-popular sites. Public opinion markets are multiple and often dichotomous, subject to wild swings and abrupt corrections. Quite likely, information abundance produces a public that is differently rational — and differently irrational — than it was under conditions of information scarcity. On that account, however, utopia still lies elsewhere.
The lesson for privacy theory, and for information policy more generally, is that scholars and policy makers should avoid investing emerging norms of exposure with positive value just because they are “open.” Information abundance does not eliminate the need for normative judgments about the institutional, social, and technical parameters of openness. On the contrary, it intensifies the need for careful thinking, wise policy making, and creative norm entrepreneurship around the problems of exposure, self-exposure, and coveillance. In privacy theory, and in other areas of information policy, the syllogism “if open, then good” should be interrogated rather than assumed.

From that book, we also get a pointer to the term appearing in the literature: Mann, Steve, Jason Nolan, and Barry Wellman. “Sousveillance: Inventing and Using Wearable Computing Devices for Data Collection in Surveillance Environments.” Surveillance and Society 1.3 (2003): 331–55 [PDF]:

In conditions of interactions among ordinary citizens being photographed or otherwise having their image recorded by other apparently ordinary citizens, those being photographed generally will not object when they can see both the image and the image capture device … in the context of a performance space. This condition, where peers can see both the recording and the presentation of the images, is neither “surveillance” nor “sousveillance.” We term such observation that is side-to-side “coveillance,” an example of which could include one citizen watching another.

Mann seems to have been hugely interested in wearables and the “veillance” opportunities afforded by them, for example, folk wearing forward facing cameras using something like Google Glass (remember that?!). But the point to pull from the definition is perhaps generalising “seeing” to meaning things like “my device sees yours”, and whilst the device(s) may be hidden, the expectation is that: a) we all have one; b) it is observeable, then we are knowingly in an (assumed) state of coveillance.

By the by, another nice quote from the same paper:

In such a coveillance society, the actions of all may, in theory, be observable and accountable to all. The issue, however, is not about how much surveillance and sousveillance is present in a situation, but how it generates an awareness of the disempowering nature of surveillance, its overwhelming presence in western societies, and the complacency of all participants towards this presence.

Also by the by, I note in passing a rather neat contrary position in the form of coveillance.org, “a people’s guide to surveillance: a hands-on introduction to identifying how you’re being watched in daily life, and by whom” created by “a collective of technologists, organizers, and designers who employ arts-based approaches to demystify surveillance and build communal counterpower”.

PS as promised, some references:

Please feel free to add further relevant links to the comments…

I also note (via tweet a few days ago from Owen Boswarva) that moves are afoot in the UK to open up Unique Property Reference Numbers (UPRNs) and Unique Street Reference Numbers (USRNs) via the Ordnance Survey. These numbers uiniquely reference properties and would, you have to think, make for interesting possibilities as part of a coveillance app.

And finally, given all the hype around Google and Apple working “together” on a tracking app, partly becuase they are device operating system manufacturers with remote access (via updates) to lots of devices…, I note that I haven’t seen folk mentioning data aggregators such as Foursquare in the headlines, given they already aggregate and (re)sell location data to Apple, Samsung etc etc (typical review from within the last year from the New York Intelligencer: Ten Years On, Foursquare Is Now Checking In to You). They’re also acquisitive of other data slurpers, eg buying Placed from Snap last year (a service which “tracks the real-time location of nearly 6 million monthly active users through apps that pay users or offer other types of rewards in exchange for access to their data”) and just recently, Factual, the blog post announcing which declares: “The new Foursquare will offer unparalleled reach and scale, with datasets spanning:”

  • More than 500 million devices worldwide
  • A panel of 25 million opted-in, always on users and over 14 billion user confirmed check-ins
  • More than 105 million points of interest across 190 countries and 50 territories

How come folk aren’t getting twitchy, yet?

Cliffhangers in Statistical reports…

Noting that the ONS Family spending in the UK reports run on an April to March (preceding finanical year basis), the 2018-19 report having been published on March 19th, 2020, the next edition will make interesting reading, ending presumbaly with a the start of a step change in spending in the last couple of weeks of March, 2020.

And who knows what the report and associated spreadsheets will look like in March 2021 reporting on the 19/20 financial year…

PS there are some loosely related ONS weekly figures, such as EARN01: Average weekly earnings, and monthly figures, such as the DWP Jobseeker’s Allowance figures, which could be interesting to keep track of in the meantime.

PPS The ONS are also getting up to speed with some additional weekly stats, eg Coronavirus and the social impacts on Great Britain (New indicators from the ONS Opinions and Lifestyle Survey).

Fragment: Bring Your Own Infrastructure (BYOI)

Over the years I posted various fragmentary thoughts on delivering software to students in BYOD (bring your own device) environments (eg Distributing Software to Students in a BYOD Environment from 5 years ago,  BYOA (Bring Your Own Application) – Running Containerised Applications on the Desktop from 3 years ago, or Rethinking: Distance Education === Bring Your Own Device? yesterday).

Adding a couple more pieces to the jigsaw, today I notice this Coding Enviroment Landing Page at the University of Colorado:

The environment appears to be a JupyterHub environment bundled with VSCode inside using the jupyter_codeserver_proxy extension and the draw.io picture editor bundled as a JupyterLab extension.

Advice is also given on running arbitrary, proxied web apps within a user session using Jupyter server proxy (Proxying Web Applications). This is a great example of one of the points of contention I have with Jim Groom, “Domain of Your Own” evangelist, and that I’ve tried to articulate over the years (not necessarily very successfully), several times previously (eg in Cloudron – Self-Hosted Docker / Containerised Apps (But Still Not a Personal Application Server?) or Publish Static Websites, Docker Containers or Node.js Apps Just by Typing: now): in particular, the desire to (create,) launch and run applications on a temporary per session basis (during a study session, for the specific purposes of launching and reading an interactive paper in a “serverless” way, etc).

The Colorado example is a really nice example of a simple multi-user environment that can be used to support student computing with an intelligent selection of tools bundled inside the container. (I’m guessing increasing numbers of universities offer similar services. Anyone got additional examples?)

Another jigsaw piece comes in the form of eduID, a federated Swedish identity service that students can use to sign in to their university services, whichever university they attend. One advantage of this is that you can create an identity when you start a university application process, and retain that identity throughout an HE career, even if you switch institution (for example, attending one as an undergrad, another as a postgrad). The eduID can also be linked to an Orcid ID, am international identifier shceme used to identify academic researchers.

What  eduID does then, is provide you with an identity that can be registered with an HE provider and used to access that HEI’s services. Your identity is granted, and grants you, access to their services.

So. Domain of Your Own. Hmmm… (I’ve been here before…) Distance education students, and even students in traditional universities, often study on a “bring your own device” basis. But what if that was an “Infrastructure of Your Own” basis? What would that look like?

I can imagine infrastructure being provide in various ways. For example:

  1. identity: a bring-your-own-identity service such as eduID;

  2. storage: I give the institution access to my Dropbox account or Google Drive account or Microsoft Live Onebox, or something like a personal SparkleShare or Nextcloud server; when I load a personal context on an institutional service, if there is a personal user file area linked to it, it synchs to my remote linked storage;

  3. compute: if I need to install and run software as part of my course, I might normally be expected to install it on my own computer. But what if my computer is a spun-up-on-demand server in the cloud?

(It may also be worth trying to compare those to the levels I sketched out in a fragment from a year ago in Some Rambling Thoughts on Computing Environments in Education.)

I’m absolutely convinced that all the pieces are out there to support a simple web UI that would let me log-in to and launch temporary services on on-demand servers (remote or local) and link and persist files I  was working on using those services to a personal storage server somewhere. And that all it would take is some UI string’n’glue to pull them together.

PS Security wise, something like Tailscale (via @simonw) looks interesting for trying to establish personal private networks between personally hosted services.

PPS anyone else remember PLEs (personal learning environments) and distributed, DIY networked service oriented architecture ideas that were floating around a decade or so ago?

On Stats, and “The Thing”…

Although I work from home pretty much all of the time anyway, I’m rubbish in online meetings and tend to get distracted, often by going on a quick web trawl for things related to whatever happens to be in mind just as the meeting starts…

So for example, yesterday, I started off wondering about mortality stats relating to “the thing”. Public Health England publish a dashboard with some numbers and charts on it but it’s really hard to know what to make of the numbers. You can also get COVID-19 Daily Deaths from NHS England reported against the following constraints:

All deaths are recorded against the date of death rather than the date the deaths were announced. Interpretation of the figures should take into account the fact that totals by date of death, particularly for most recent days, are likely to be updated in future releases. For example as deaths are confirmed as testing positive for COVID-19, as more post-mortem tests are processed and data from them are validated. Any changes are made clear in the daily files.

These figures will be updated at 2pm each day and include confirmed cases reported at 5pm the previous day. Confirmation of COVID-19 diagnosis, death notification and reporting in central figures can take up to several days and the hospitals providing the data are under significant operational pressure. This means that the totals reported at 5pm on each day may not include all deaths that occurred on that day or on recent prior days.

These figures do not include deaths outside hospital, such as those in care homes. This approach makes it possible to compile deaths data on a daily basis using up to date figures.

If the conditions for adding counts to the tally for covid deaths are people who test positive post mortem, then the asymptomatic boy racer who kills themself in a motorbike accident whilst making use of open roads will count, but that’s not really the sort of number we’re interested in.

The ONS (Official of National Statistics) look like they’re trying to capture the number of deaths where “the thing” is the likely cause of death (Counting deaths involving the coronavirus (COVID-19)), adding a Covid19 tab to the weekly provisional mortality stats release along with a related bulletin and covid19 deaths breakout collection:

I don’t think these stats are even labelled as “experimental”, so from my humble position I think the ONS should be commended on the way they’ve managed to pull this new release together so quickly, albeit with a lag in the numbers that results from due process around death certificate registration etc.

One thing that is perhaps unfortunate is that the NHS weekly winter sitreps stopped a few weeks ago; these stats track several hospital and critical care related measures at the hospital level, but they’re only released a few months a year. Whilst continuing the release would have added a burden, I think a lot of planners may have found them useful. (I hope the planners who really need them have access to them anyway.) By the by, some daily data collections relating to managing “the thing” were described in a letter from the PHE Incident Director at Public Health England’s National Infection Service on March 11th.

I also note the suspension of various primary care and secondary care data collections, which means that spotting various side effects of the emergency response activity may not be obvious for some time.

As far as the ONS go, they have published their own statement on ensuring the best possible information during COVID-19 through safe data collection as well as a special ONS — Coronavirus (COVID-19) landing page for all related datasets they are able to release.

[April 2nd, 2020: the ONS have also started publishing several more “faster” society and economic indicators.]

Pondering the ONS response, I started wondering whether there’s a history anywhere of the genesis and evolution of each ONS statistical measure. In my “listen to the online meeting as radio while doing an unrelated web trawl”, I turned up a few potential starting points relating the the history of official stats, including this presented paper on the evolution of the United Kingdom statistical system, which looks like it might have appeared in published form in this (special issue?) of the Statistical Journal of the IAOS – Volume 24, issue 1,2 .

The Royal Statistical Society’s StatsLife online magazine also has a History of Statistics Section tag which pulls up more possibly useful starting points…

Broadening my search a little, I also found this briefing on sources of historical statistics from one of my favourite sources ever, the House of Commons Library; in turn this led to a more general web search on "sources of statistics" site:commonslibrary.parliament.uk/research-briefings which turns up a wealth of briefings on sourcing UK stats in particular subject areas.

And finally, for anyone out there who does have proper skills in the area and the ability to commit resource, there are various initiatives out there looking for volunteers. In particular:

Others with less specific or specialist skill might consider many of the other opportunities for technical sprints in communities that might typically lack access to cognitive surplus in developer communities. For example, this initiative exploring and developing Digital Tools for churches during the Coronavirus. (Don’t let “churches” put you off: for “church” read “body of people” or “community” and go from there…)

Rethinking: Distance Education === Bring Your Own Device?

In passing, an observation…

Many OU modules require students to provide their own computer subject to a university wide “minimum computer specificiation” policy. This policy is a cross-platform one (students can run Windows, Mac or Linux machines) but does allow students to run quite old versions of operating systems. Because some courses require students to install desktop software applications, this also means that tablets and netbooks (eg Chromebooks) do not pass muster.

On the module I work primarily on, we supply students with a virtual machine preconfigured to meet the needs of the course. The virtual machine runs on a cross-platform application (Virtualbox) and will run on a min spec machine, although there is a hefty disk space requirement: 15GB of free space required to install and run the VM (plus another 15-20GB you should always have free anyway if you want your computer to keep running properly, be able to install updates etc.)

Part of the disk overhead comes from another application we require students to use called vagrant. This is a “provisioner” application that manages the operation of the VirtualBox virtual machine from a script we provide to students. The vagrant application caches the raw image of the VM we distribute so that fresh new instances of it can be created. (This means students can throw away the wokring copy of their VM and create a fresh one if they break things; trust me, in distance edu, this is often the best fix.)

One of the reasons why we (that is, I…) took the vagrant route for managing the VM was that it provided a route to ship VM updates to students, if required: just provide them with a new Vagrantfile (a simple text file) that is used to manage the VM and add in an update routine to it. (In four years of running the course, we havenlt actually done this…)

Another reason for using Vagrant was that it provides an abstraction layer between starting and stopping the virtual machine (via a simple commandline command such as vagrant up, or desktop shortcut that runs a similar command) and the virtual machine application that runs the virtual machine. In our case, vagrant instructs Virtualbox running on the student’s own computer, but we can also create Vagrantfiles that allow students to launch the VM on a remote host if they have credentials (and credit…) for that remote host. For example, the VM could be run on Amazon Web Services/AWS, Microsoft Azure, Google Cloud, Linode, or Digital Ocean. Or on an OU host, if we had one.

For the next presentation of the module, I am looking to move away from the Virtualbox VM and move the VM into a Docker container†. Docker offers an abstraction layer in much the same way that vagrant does, but using a different virtualisation model. Specifically, a simple Docker command can be used to launch a Dockerised VM on a student’s own computer, or on a remote host (AWS, Azure, Google Cloud, Digital Ocean, etc.)

We could use separate linked Docker containers for each service used in the course — Jupyter notebooks, PostgreSQL, MongoDB, OpenRefine — or we could use a monolithic container that includes all the services. There are advantages and disadvantages to each that I really do need to set down on paper/in a blog post at some point…

So how does this help in distance education?

I’ve already mentioned that we require students to provide a certain minimum specification computer, but for some courses, this hampers the activities we can engage students in. For example, in our databases course, giving students access to a large database running on their own computer may not be possible; for an upcoming machine learning course, access to a GPU is highly desirable for anything other than really simple training examples; in an updated introductory robotics module, using realistic 3D robot simulators for even simple demos requires access to a gamer level (GPU supported) computer.

In a traditional university, physical access to computers and computer labs running pre-installed, university licensed software packages on machines capable of providing them for students who can’t run the same on their own machines may be available.

In my (distance learning) institution, access to university hosted software is not the norm: students are expected to provide their own computer hardware (at least to minimum spec level) and install the software on it themselves (albeit software we provide, and software that we often build installers for, at least for users of Windows machines).

What we don’t do, however, is either train students in how to provision their own remote servers, or provide software to them that can easily be provisioned on remote servers. (I noted above that our vagrant manager could be used to deploy VMs to remote servers, and I did produce demo Vagrantfiles to support this, but it went no further than that.)

This has made me realise that we make OUr distance learning students pretty much wholly responsible for meeting any computational needs we require of them, whilst at the same time not helping them develop skills that allow that them to avail themselves of self-service, affordable, metered remote computation-on-tap (albeit with the constraint of requiring a netwrok connection to access the remote service).

So what I’m thinking now is that now really is the time to start upskilling OUr distance learners, at least in disciplines that are computationally related, early on and in the following ways:

  1. a nice to have — provide some academic background: teach students about what virtualisation is;

  2. an essential skill, but with a really low floor — technical skills training: show students how to launch virtual servers of their own.

We should also make software available that is packaged in a way that the same environment can be run locally or remotely.

Another nice to have might be helping students reason about personal economic consequences, such as the affordability of different approaches in their local situation, which is to say: buying a computer and running things locally vs. buying something that can run a browser and run things remotely over a network connection.

As much as anything, this is about real platform independence, being open as to, and agnostic of, what physical compute device a student has available at home (whether it’s a gamer spec desktop computer or a bottom of the range Chromebook) and providing them with both software packages that really can run anywhere and the tools and skills to help students run them anywhere.

In many respects, using abstraction layer provisioning tools like vagrant and Docker, the skills to run software remotely are the same as running them locally, with the additional overhead that students have a once only requirement to sign up to a remote host and set up credentials that allow them to access the remote service from the provisioner service that runs on their local machine.

Simple 2D ev3devsim Javascript Simulator Running as an ipywidget in Jupyter Notebooks

So…

…for a course revision upcoming, I’ve been tweaking a thing.

The thing is ev3devsim [repo], a Javascript powered 2 robot simulator that allows you to execute Python code, via Skulpt, in the browser to control a simple simulated robot.

The Python package used to control the robot is a skulpt port of ev3dev-lang-python, a Python wrapper for the ev3dev Linux distribution for Lego Ev3 robots. (Long time readers may recall I explored ev3dev for use in an OU residential school way back when and posted a few related notebooks.)

Anyway… we want to use Python in the module revision, the legacy activities we want to update look similar, ish, sort of, almost, we may be able to use some of them, and I want to do the activities via Jupyter notebooks.

So I’ve had a poke around and think I’ve managed to make the fumblings of a start around an ipywidget wrapper for the simulator that will allows us to embed it in a notebook.

Because I don’t understand ipywidgets at all, I’m using the jp_proxy_widget, which I first played with in the context of wrapping wavesurfer.js (Rapid ipywidgets Prototyping Using Third Party Javascript Packages in Jupyter Notebooks With jp_proxy_widget).

Here’s where I’m at [nbev3devsim; Binder demo available, if the code I checked in works!]:

The first thing to notice is that the terminal has gone. The idea is that you write the code in a code cell and inject it into the simulator. My model for doing this is via cell block magic or by passing code in a variable into the simulato (for generality, I should probably also allow a link to a .py file).

The cell block magic still needs some work, I think; eg a temporary alert with a coloured backrgound to say “code posted to simulator” that disappears on its own after a couple of seconds.) I probably also need an easy  way to preview the code currently assigned to the simulated robot.

You might also notice a chart display. This is actually a plotly streaming line chart that updates with sensor values (at the moment, just the ultrasound sensor; other sensors have different ranges, so do I scale those, or use different charts perhaps?)

There is also an output window your code can print messages to, as the following hello-world magic shows:

We can read state out of the simulator, though the way the callback work this seems to require running code across two cells to get the result into the Python environment?

I’ve also experimented with another approach where the widget’s parent object grabs (or could be regularly updated to mirror, maybe?) logged sensor readings from inside the simulator, which means I can interrogate that object, even as the simulator runs. (I also started to explore using a streaming dataframe for this data, but I’m not convinced that’s the best approach; certainly trying to stream logged data from the simulator into a streaming chart in the notebook context is laggy compared to the chart embedded in the simulator context.)

With the data in the Python context, we can do what we like with it, Like chart it etc.

There’s a lot of tweaks that need to be made and things to be added to run the full complement of activities we ran in the original presentation of the course.

I’d already started to explore what’s required to add Python functions to skulpt (eg Simple Text to Speech With Skulpt), although I’m not sure if that’s blocking (could it be handled asynchronously if so?) and today managed to learn enough from this SO answer on making objects draggable to make the robot draggable on the canvas (I think; a PR is here but not tested in a fresh/isolated environment yet (I only made a PR to give me s/thing to link to here!); the biggest issue I had was converting mouse co-ordinates to robot world canvas co-ordinates. There are still issues there, eg in getting the size of the robot correctly, but the co-ordinate management in the simulator looks a bit involved to me and I want to get my head round it enough that if I do start trying to simplify things, I don’t break other things!)

Other things that really need adding:

  • ability to reset canvas in one go;
  • ability to rotate robot using mouse;
  • ability to add noise to motors and sensors;
  • configure robot from notebook code cell rather than simulator UI? (This could also be seen as an issue about whether to strip as much out of the widget as possible.)
  • predefine sensoble robot configurations; (can we also have a single, centreline front mounted light sensor?)
  • add pen-up / pen-down support (perhaps have a drawing layer in the simulator for this?)
  • explore supporting multiple simulators embedded in one notebook (currently it’s at most one, I suspect in large part becuase of specific id values assigned to DOM elements?)

The layout is also really clunky, the main issue being how to see the code against the simulator (if you need to). Two columns might be better — notebook text and code cells in one, the simulator windows stacked in the other? — but then again, a wide simulator window is really useful. A floatinging / draggable simulator window might be another option? I did thing the simulator window might be tear-offable in JupyterLab, but I have never managed to successfully tear off any jp_proxy_widget in JupyterLab (my experiences using JupyterLab for anything are generally really miserable ones).

The original module simulator allowed you to step through the code, but: a) I don’t know if that would be possible; b) I suspect my coding knowledge / skills aren’t up to it; c) I really should be trying to write the activities, not sinking yet more time into the simulator. (One thing I do need to do is see if any of the code I wrote years ago when scopting things for the residential school is reusable, which could save some time…)

I also need to see if the simulator is actually any good for the practical activities we used in the original version of the course, or whether I need to write a whole new set of activities that do work in this simulator… Erm…