More Thoughts On Jupyter Notebook Search

Following on from initial sketch of Searching Jupyter Notebooks Using lunr, here’s a quick first pass [gist] at pouring Jupyter notebook cell contents (code and markdown) into a SQLite database, running a query over it and then inspecting the results using a modified NLTK text concordancer to show the search phrase in the context of where it’s located in a document.

The concordancer means we can offer a results listing more in accordance with a traditional search engine, showing just the text in the immediate vicinity of a search term. (Hmm, I’d need to check what happens if the search term appears multiple times in the search result text.) This means we can offer a tidier display the dumping the contents of a complete cell into the results listing.

The table the notebook data is added to is created so that it supports full text search. However, I imagine that any stemming that we could apply is not best suited to indexing code.

Similarly, the NLTK tokeniser doesn’t handle code very well. For example, splits occur around # and % symbols, which means things like magics, such as %load_ext, aren’t recognised; instead, they’re split into separate tokens: % and load_ext.

A bigger issue for the db approach is that I need to find a way to update / clean the database as and when notebooks are saved, updated, deleted etc.

Surveillance Art?

An interesting sounding site, Artificial Senses, which “visualizes sensor data of the machines that surround us to develop an understanding how they experience the world”.

Artificial Senses is a project by Kim Albrecht in collaboration with metaLAB (at) Harvard, and supported by the Berkman Klein Center for Internet & Society. The project is part of a larger initiative researching the boundaries between artificial intelligence and society.

But along the way, so you can “participate”, it prompts you for access to various sensors on the device you are viewing the page from. So for example, your location:

To your camera:

And to your microphone:

Here’s the Javascript:

var touching = true;
var seeing = false;
var hearing = false;
var orienting = false;
var moving = false;
var locating = false;

var isSafari = /^((?!chrome|android).)*safari/i.test(navigator.userAgent);

// // // // // // // // // // // // // // // // // // // // Seeing

document.getElementById('liveTouching').style.visibility = "visible";
document.getElementById('touchingLiveButton').style.visibility = "visible";

// // // // // // // // // // // // // // // // // // // // Seeing

var constraintsSee = {
  audio: false,
  video: { }
};

function handleSuccessSee() {
    seeing = true;
    document.getElementById('liveSeeing').style.visibility = "visible";
  document.getElementById('seeingLiveButton').style.visibility = "visible";
}
function handleErrorSee(error) {
  console.log('navigator.getUserMedia error: ', error);
}

if (!isSafari) {
  navigator.mediaDevices.getUserMedia(constraintsSee).then(handleSuccessSee).catch(handleErrorSee);
}

// // // // // // // // // // // // // // // // // // // // Hearing

var constraintsHear = {
  audio: true,
  video: false
};

function handleSuccessHear() {
    hearing = true;
    document.getElementById('liveHearing').style.visibility = "visible";
  document.getElementById('hearingLiveButton').style.visibility = "visible";
}
function handleErrorHear(error) {
  console.log('navigator.getUserMedia error: ', error);
}

if (!isSafari) {
  navigator.mediaDevices.getUserMedia(constraintsHear).then(handleSuccessHear).catch(handleErrorHear);
}

// // // // // // // // // // // // // // // // // // // // Orienting

if (!orienting) {
  window.addEventListener('deviceorientation', function(event) {

    if (event.alpha !== null) {
        // orienting = true;
        document.getElementById('liveOrienting').style.visibility = "visible";
      document.getElementById('orientingLiveButton').style.visibility = "visible";
    }

  });
}  


// // // // // // // // // // // // // // // // // // // // Moving

if (!moving) {
  window.addEventListener('devicemotion', function(event) {

    if (event.acceleration.x !== null) {
        moving = true;
        document.getElementById('liveMoving').style.visibility = "visible";
      document.getElementById('movingLiveButton').style.visibility = "visible";
    }

  });
} 

// // // // // // // // // // // // // // // // // // // // Locating

navigator.geolocation.getCurrentPosition(function(position) {

  locating = true;

  document.getElementById('liveLocating').style.visibility = "visible";
  document.getElementById('locatingLiveButton').style.visibility = "visible";

});

One of the things I wanted to do in my (tiny) bit of the new OU level 1 course, a section on “location based computing”, was try to get folk to reflect on how easily tracked we are through our computational devices. (If you want to play along, try this browser based activity sign up for a Microsoft Live account (OU staff/ALs can sign in with their OUCU*oen.ac.uk credentials) and try these notebooks: TM112 Geo Activity Notebooks.)

The same course has a section on the mobile phone system more generally. I’m not sure if it has similarly minded activities that demonstrate the full range of sensors that that can be found on most of today’s smartphones? If not, the Artifical Senses sight might be worth adding as a resource – with a reminder for folk to disable site access to the sensors once they’ve done playing…

Jigsaw Pieces – Linux Service Indicators, Jupyter Kernel Monitoring and Environment Management

Something I’ve been pondering for some time is how to set up some simple Linux service monitoring so that I can display an in indicator light in a web page to show whether a Linux service is running or not.

For example, in the TM351 VM, it could be handy to display some indicator lights in a Jupyter notebook status bar showing whether the database services we connect to from the notebooks are running correctly,

So here are some pieces that may contribute to that:

My thinking is:

  • use monit to monitor a process; if the process is down, write to a service status file in my www server directory, eg service_servicename_status.txt. If a service is running the contents of this file are 1, otherwise 0;
  • use the JQuery fragment to poll the status file every few seconds;
  • if the status file returns 0, display a red indicator, otherwise green.

Here are some other monitoring / environment managing fragments I’m pondering:

  • something like ps_mem, a Python utility *to accurately report the in core memory usage for a program*. I’m wondering if I could use that to track how much memory each Jupyter notebook python kernel is taking up (or maybe monit can do that?) There’s an old extnesion that looks like ti shows reports: nbtop. Or perhaps use psutil (via this issue, which seems to offer a solution?);
  • a minimal example of setting up notebook homepage tab for a hello world webpage; Writing a notebook server extension looks like it has the ingredients, and nb_conda provides a fuller working example. Actually, that extension looks useful for *Jupyter-as-a-learning-environment* because it lets you select different conda environments, which could be handy for running different activities.

Any other examples out there of Jupyter monitoring / environment management?

Interactive Authoring Environments for Reproducible Media: Stencila

One of the problems associated with keeping up with tech is that a lot of things that “make sense” are not the result of the introduction or availability of a new tool or application in and of itself, but in the way that it might make a new combination of tools possible that support a complete end to end workflow or that can be used to reengineer (a large part of) an existing workflow.

In the OU, it’s probably fair to say that the document workflow associated with creating course materials has its issues. I’m still keen to explore how a Jupyter notebook or Rmd workflow would work, particularly if the authored documents included recipes for embedded media objects such as diagrams, items retrieved from a third party API, or rendered from a source representation or recipe.

One “obvious” problem is that the Jupyter notebook or RStudio Rmd editor is “too hard” to work with (that is, it’s not Word).

A few days ago I saw a tweet mentioning the use of Stencila with Binderhub. Stencila? Apparently, *”[a]n open source office suite for reproducible research”. From the blurb:

[T]oday’s tools for reproducible research can be intimidating – especially if you’re not a coder. Stencila make reproducible research more accessible with the intuitive word processor and spreadsheet interfaces that you and your colleagues are already used to.

That sounds appropriate… It’s available as a desktop app, but courtesy of minrk/jupyter-dar (I think?), it runs on binderhub and can be accessed via a browser too:

 

You can try it here.

As with Jupyter notebooks, you can edit and run code cells, as well as authoring text. But the UI is smoother than in Jupyter notebooks.

(This is one of the things I don’t understand about colleagues’ attitude towards emerging tech projects: they look at today’s UX and think that’s it, because that’s how it is inside an organisation – you take what you’re given and it stays the same for decades. In a living project, stuff tends to get better if it’s being used and there are issues with it…)

The Jupyter-Dar strapline pitches “Jupyter + DAR compatibility exploration for running Stencila on binder”. Hmm. DAR? That’s also new to me:

Dar stands for (Reproducible) Document Archive and specifies a virtual file format that holds multiple digital documents, complete with images and other assets. A Dar consists of a manifest file (manifest.xml) that describes the contents.

Dar is being designed for storing reproducible research publications, but the underlying concepts are suitable for any kind of digital publications that can be bundled together with their assets.

Repo: [substance/dar](https://github.com/substance/dar)

Sounds interesting. And which reminds me: how’s OpenCreate coming along, I wonder? (My permissions appear to have been revoked again; or the URL has changed.)

PS seems like there’s more activity in the “pure web” notebook application world. Hot on the heels of Mike Bostock’s Observable notebooks (rationale) comes iodide, “[a] frictionless portable notebook-style interface for literate scientific computing in the browser” (examples).

I don’t know if these things just require you to use Javascript, or whether they can also embed things like Brython.

I’m not sure I fully get the js/browser notebooks yet? I like the richer extensibility of things like Jupyter in terms of arbitrary language/kernel availability, though I suppose the web notebooks might be able to hook into other kernels using similar mechanics to those used by things like Thebelab?

I guess one advantage is that you can do stuff on a Chromebook, and without a network connection if you cache all the required JS packages locally? Although with new ChromeOS offering support for Linux – and hence, Docker containers – natively, Chromebooks could get a whole lot more exciting over the next few months. From what I can tell, corsvm looks like a ChromeOS native equivalent to something like Virtualbox (with an equivalent of Guest Additions?). It’ll be interesting how well things like audio works? Reports suggest that graphical UIs will work, presumably using some sort of native X11 support rather than noVNC, so now could be a good time to start looking out for souped up Pixelbook…

OER Methods – Generative Designs for Reuse-With-Modification

Via my feeds (The stuff ain’t enough), I notice Martin pointing to some UNESCO draft OER Recommendations.

Martin writes:

… the resources are a necessary starting point, but they are not an end point. Particularly if your goal is to “ensure inclusive and equitable quality education and promote lifelong opportunities for all”, then it is the learner support that goes around the content that is vital.

And on this, the recommendations are largely silent. There is a recommendation to develop “supportive policy” but this is focused on supporting the creation of OER, not the learners. Similarly the “Sustainability models for OER” are aimed at finding ways to fund the creation of OER. I think we need to move beyond this now. Obviously having the resources is important, and I’d rather have OER than nothing, but unless we start recognising, and promoting, the need for models that will support learners, then there is a danger of perpetuating a false narrative around OER – that content is all you need to ensure equity. It’s not, because people are starting from different places.

I’ve always thought that too much focus has always been on “the resources”, but I’ve never really got to grips with how the resources are supposed to be (re)used, either by educators or learners.

For educators, reuse can often come in the form of “assign that thing someone else wrote, and wrap it with your own teaching context”, or “pinch that idea and modify it for your own use”. So if I see a good diagram, I might “reuse” it by inserting it in my own materials or I might redraw it with some tweaks.

Assessment reuse (“open assessment resources”?) can be handy too: a question form that someone else has worked up that I can make use of. In some cases, the question may include either exact, or ‘not drawn to scale’ media assets. But in many cases, I would still need to do work to generalise or customise the answer, and work out my own correct answer or marking guide.

(See for example Generative Assessment Creation.)

If an asset is not being reused directly, but the idea is, with some customisation, or change in parameter values, then creating the new asset may require significant effort, as well as access to, and skills in using, particular drawing packages. In some cases the liquid paper method works: Tipp-Ex out the original numbers, write in your own, photocopy to produce the new asset. Digital cut or crop alternatives are available.

Another post in my feeds today – Enterprise Dashboards with R Markdown, via Rbloggers – described a rationale for using reproducible methods to generate dashboards:

We have been living with spreadsheets for so long that most office workers think it is obvious that spreadsheets generated with programs like Microsoft Excel make it easy to understand data and communicate insights. Everyone in a business, from the newest intern to the CEO, has had some experience with spreadsheets. But using Excel as the de facto analytic standard is problematic. Relying exclusively on Excel produces environments where it is almost impossible to organize and maintain efficient operational workflows. …

[A particular] Excel dashboard attempts to function as a real application by allowing its users to filter and visualize key metrics about customers. It took dozens of hours to build. The intent was to hand off maintenance to someone else, but the dashboard was so complex that the author was forced to maintain it. Every week, the author copied data from an ETL tool and pasted it into the workbook, spot checked a few cells, and then emailed the entire workbook to a distribution list. Everyone on the distribution list got a new copy in their inbox every week. There were no security controls around data management or data access. Anyone with the report could modify its contents. The update process often broke the brittle cell dependencies; or worse, discrepancies between weeks passed unnoticed. It was almost impossible to guarantee the integrity of each weekly report.

Why coding is important

Excel workbooks are hard to maintain, collaborate on, and debug because they are not reproducible. The content of every cell and the design of every chart is set without ever recording the author’s actions. There is no simple way to recreate an Excel workbook because there is no recipe (i.e., set of instructions) that describes how it was made. Because Excel workbooks lack a recipe, they tend to be hard to maintain and prone to errors. It takes care, vigilance, and subject-matter knowledge to maintain a complex Excel workbook. Even then, human errors abound and changes require a lot of effort.

A better approach is to write code. … When you create a recipe with code, anyone can reproduce your work (including your future self). The act of coding implicitly invites others to collaborate with you. You can systematically validate and debug your code. All of these things lead to better code over time.

Many of the issues described there are to do with maintenance. Many of the issues associated with “reusing OERs with modification” are akin to maintenance issues. (When an educator updates their materials year on year – maintenance – they are reusing materials they have permission to use, with modification.)

In both the maintenance and the wider reuse-with-modification activity, it can really help if you have access to the recipe that created the thing you are trying to maintain. Year on year reuse is not buying 10 exact clone pizzas in the first year, freezing 9, taking one out each year, picking off the original topping and adding this year’s topping du jour for the current course presentation. It’s about saving and/or sharing the recipe and generating a fresh version of the asset each year, perhaps with some modification to the recipe.

In other words, the asset created under the reuse-with-modification licence is not subtractive/additive to the original asset, it is (re)generative from the original recipe.

This is where things like Jupyter notebooks or Rmd documents come in – they can be used to deliver educational resources that are in principle reusable-with-modification because they are generative of the final asset: the asset is produced from a modifiable recipe contained within the asset.

I’ve started trying to put together some simple examples of topic based recipes as Jupyter notebooks that can run on Microsoft’s (free) Azure Notebooks service: Getting Started With OER notebooks.

To run the notebooks, you need to create a Microsoft Live account, log in to notebooks.azure.com, and then clone the above linked repository.

OU staff and ALs should be able to log in using their oucu@open.ac.uk credentials. If you work for a company that uses Office 365 / Live online applications, ask them to enable notebooks too…

Once you have cloned the notebooks, you should be able to run them…

PS if you have examples of other things I should include in the demos, please let me know via the comments. I’m also happy to do demos, etc.

Generative Assessment Creation

It’s coming round to that time of year where we have to create the assessment material for courses with an October start date. In many cases, we reuse question forms from previous presentations but change the specific details. If a question is suitably defined, then large parts of this process could be automated.

In the OU, automated question / answer option randomisation is used to provide iCMAs (interactive computer marked assessments) via the student VLE using OpenMark. As well as purely text based questions, questions can include tables or images as part of the question.

One way of supporting such question types is to manually create a set of answer options, perhaps with linked media assets, and then allow randomisation of them.

Another way is to define the question in a generative way so that the correct and incorrect answers are automatically generated.(This seems to be one of those use cases for why ‘everyone should learn to code’;-)

Pinching screenshots from an (old?) OpenMark tutorial, we can see how a dynamically generated question might be defined. For example, create a set of variables:

and then generate a templated question, and student feedback generator, around them:

Packages also exist for creating generative questions/answers more generally. For example, the R exams package allows you to define question/answer templates in Rmd and then generate questions and solutions in a variety of output document formats.


You can also write templates that include the creation of graphical assets such as charts:

 

Via my feeds over the weekend, I noticed that this package now also supports the creation of more general diagrams created from a TikZ diagram template. For example, logic diagrams:

Or automata diagrams:

(You can see more exam templates here: www.r-exams.org/templates.)

As I’m still on a “we can do everything in Jupyter” kick, one of the things I’ve explored is various IPython/notebook magics that support diagram creation. At the moment, these are just generic magics that allow you to write TikZ diagrams, for example, that make use of various TikZ packages:

One the to do list is to create some example magics that template different question types.

I’m not sure if OpenCreate is following a similar model? (I seem to have lost access permissions again…)

FWIW, I’ve also started looking at my show’n’tell notebooks again, trying to get them working in Azure notebooks. (OU staff should be able to log in to noteooks.azure.com using OUCU@open.ac.uk credentials.) For the moment, I’m depositing them at https://notebooks.azure.com/OUsefulInfo/libraries/gettingstarted, although some tidying may happen at some point. There are also several more basic demo notebooks I need to put together (e.g. on creating charts and using interactive widgets, digital humanities demos, R demos and (if they work!) polyglot R and python notebook demos, etc.). To use the notebooks interactively, log in and clone the library into your own user space.