Fragment: On Reproducible Open Educational Resources

Via O’Reilly’s daily Four Short Links feed, I notice the Open Logic Project, “a collection of teaching materials on mathematical logic aimed at a non-mathematical audience, intended for use in advanced logic courses as taught in many philosophy departments”. In particular, “it is open-source: you can download the LaTeX code [from Github]”.

However:

the TeX source does mean you need a (La)TeX environment to run it (and the project does bundle some of the custom .sty style files you need in the repo, which is handy).

Compare this with the Simple Maths Equations and Notation notebook I’ve started sketching as part of a self-started, informal “reproducible OERs with Jupyter notebooks” project I’m dabbling with:

Here, a Jupyter notebook contains LaTeX code can then be rendered (in part?) through the notebook previewer – at least in so far as expressions are written in Mathjax parseable code – and also within a live / running Jupyter notebook. Not only do I share the reproducible source code (as a notebook), I also share a link to at least one environment capable of running it, and that allows it to be reused with modification. (Okay, in this case, not openly so because you have to have an Azure Notebooks account. But the notebook could equally run on Binderhub or a local install, perhaps with one or two additional requirements if you don’t already run a scientific Python environment.)

In short, for a reproducible OER that supports reuse with modification, sharing the means of production also means sharing the machinery of production.

To simplify the user experience, the notebook environment can be preinstalled with packages needed to render a wider range of TeX code, such as drawings rendered using TikZ. Alternatively, code cells can be populated with package installation commands to custom a more vanilla environment, as I do in several demo notebooks:

What the Open Logic Project highlights is that reproducible OERs not only provide ready access to the “source code” of a resource so that it can be easily reused with modification, but that access to an open environment capable of processing that source code and rendering the output document also needs to be provided. (Open / reproducible science researchers have known this for some time…)

Getting a Tex/LateX environment up and running can be a faff – and can also take up a lot of disk space – so the runtime environment requirements are not negligible.

In the case of Jupyter notebooks, LateX support is available, and container images capable of running on Binderhub, for example, relatively easily defined (see for example the Binder LateX example). (I’m not sure how rich Stencila support for LaTeX is too, and/or whether it requires an external LaTeX environment when running the Stencila desktop app?)

It also strikes me that another thing we should be doing is export a copy of the finished work, eg as a PDF or complete, self-standing HTML archive, in case the machinery does break. This is also important where third party services are called. It may actually make sense to use something like requests for all third party URL requests, and save a cached version of all requests (using requests-cache) to provide a local copy of whatever it was that was called when originally flowing the document.

See also: OER Methods – Generative Designs for Reuse-With-Modification

Jupyter Notebooks, Cognitive Tools and Philosophical Instruments

A placeholder post, as much as anything, to mark the AJET Call for Papers for a Special Issue on Re-Examining Cognitive Tools: New Developments, New Perspectives, and New Opportunities for Educational Technology Research as a foil for thinking about what Jupyter notebooks might be good for.

According the the EduTech Wiki, “[c]ognitive tools refer to learning with technology (as opposed to learning through technology)” which doesn’t really makes sense as as sentence and puts me off the idea of ed-tech academe straight away.

In the sense that cognitive tools support a learning process, I think they can do so in several ways. For example, in Programming in Jupyter Notebooks, via the Heavy Metal Umlaut I remarked on several different ways in which the same programme could be constructed within a notebook, each offering a different history and each representing a differently active approach to code creation and programming problem solving.

One of the things I try to do is reflect on my own practice, as I have been doing recently whilst trying to rework fragments of some OpenLearn materials as reproducible educational resources (which is to say, materials that generate their own resources and as such support reuse with modification more generally than many educational resources).

For example, consider the notebook at https://notebooks.azure.com/OUsefulInfo/libraries/gettingstarted/html/3.6.0%20Electronics.ipynb

You can also run the notebook interactively; sign in to Azure notebooks (if you’re OU staff, you can use your staff OUCU/OU password credentials) and clone my Getting Started library into your workspace. If notebooks are new to you, check out the 1.0 Using Jupyter Notebooks in Teaching and Learning - READ ME FIRST.ipynb notebook.

In creating the electronics notebook, I had to learn a chunk of stuff (the lcapy package is new to me and I had to get my head round circuitikz) but I found trying to figure out how to make examples related to the course materials provide a really useful context for giving me things to try to do with the package. In that the sense, the notebook was a cognitive tool (I guess) that supported my learning about lcapy.

For the https://notebooks.azure.com/OUsefulInfo/libraries/gettingstarted/html/1.05%20Simple%20Maths%20Equations%20and%20Notation.ipynb notebook, I had to start getting my head round sympy and on the way cobble together bits and pieces of code that might be useful when trying to produce maths related materials in a reproducible way. (For example, creating equations in sympy that can then be rendered, manipulated and solved throughout the materials in a way that’s appropriate for a set of educational (that is, teaching and/or learning) resources.

Something else that came to mind is that the notebook medium as both an authoring medium and a delivery medium (we can use it just to create assets; or we can also use it deliver content to students) changes the sorts of things you might want to do in the teaching. For example, I had the opportunity to create self test functions, and there is the potential for interactives that let students explore the effect of changing component values in a circuit. (We could also plot responses over a range of variable values, but I haven’t demoed that yet.) In a sense, the interactive affordances of the medium encouraged me to think of opportunities to create philosophical instruments that allow authors – as well as students – to explore the phenomena being described by the materials. Although not a chemistry educator, putting together a reworking of some OpenLearn chemistry materials – https://notebooks.azure.com/OUsefulInfo/libraries/gettingstarted/html/3.1.2%20OpenLearn%20Chemistry%20Demos.ipynb – gave me some ideas about the different ways in which the materials could be worked up to support interactive / self-checking /constructive learning use. (That is, ways in which we could present the notebooks as interactive cognitive tools to support the learning process on the one hand, or as philosophical instruments that would allow the learner explore the subject matter in an investigative and experimental way.)

I like to think the way I’m using the Jupyter notebooks as part of an informal “reproducible-OER” exploration is in keeping with some of the promise of live authoring using the OU’s much vaiunted, though still to be released, OpenCreate authoring environment (at least, as I understand the sorts of thing it is supposed to be able to support) with the advantage of being available now.

It’s important to recognise that Jupyter notebooks can be thought of as a medium that behaves in several ways. In the first case, it’s a rich authoring medium to work with – you can create things in it and for it, for example in the form of interactive widgets or reusable components such as IPython magics (for example, this interactive mapping magic: https://github.com/psychemedia/ipython_magic_folium ). Secondly, it’s a medium qua environment that can itself be extended and customised through the enabling and disabling of notebook extensions, such as ones that support WYSIWYG markdown editing, or hidden, frozen and read-only executable cells, which can be used to constrain the ways in which learners use some of the materials, perhaps as a counterpoint to getting them to engage more actively in editing other cells. Thirdly, it acts as a delivery medium, presenting content to readers who can engage with the content in an interactive way.

I’m not sure if there are any good checklists of what makes a “cognitive tool” or a “philosophical instrument”, but if there are it’d be interesting to try to check Jupyter notebooks off against them…

An Easier Approach to Electrical Circuit Diagram Generation – lcapy

Whilst I might succumb in my crazier evangelical moments to the idea that academic authors (other than those who speak LateX natively) and media developers might engage in the raw circuitikz authoring described Reproducible Diagram Generators – First Thoughts on Electrical Circuit Diagrams, the reality is that it’s probably just way too clunky, and a little bit too far removed from the everyday diagrams educators are likely to want to create, to get much, if any, take up.

However, understanding something of the capabilities of quite low level drawing packages, and reflecting (as in the last post) on some of the strategies we might adopt for creating reusable, maintainable, revisable with modification and extensible diagram scripts puts us in good stead for looking out for more usable approaches.

One such example is the Python lcapy package, a linear circuit analysis package that supports:

  • the description of simple electrical circuits at a sloghtly hoger level than the raw circuitikz circuit creation model;
  • the rendering of the circuits, with a few layout cues, using circuitikz;
  • numerical analysis of the circuits in terms of response in time and frequency domains, and the charting of the results of the analysis; and
  • various forms of symbolic analysis of circuit descriptions in various domains.

Here are some quick examples to give a taste of what’s possible.

You can run the notebook (albeit subject to significant changes) that contains the original working for examples used in this post on Binderhub: Binder

Here’s a simple circuit:

And here’s how we can create it in lcapy from a netlist, annotated with cues for the underlying circuitikz generator about how to lay out the diagram.

from lcapy import Circuit

cct = Circuit()
cct.add("""
Vi 1 0_1 step 20; down
C 1 2; right, size=1.5
R 2 0; down
W 0_1 0; right
W 0 0_2; right, size=0.5
P1 2_2 0_2; down
W 2 2_2;right, size=0.5""")

cct.draw(style='american')

The things to the right of the semicolon on each line are the optional layout elements – they’re not required when defining the actual circuit itself.

The display of nodes and numbered nodes are all controllable, and the symbol styles are selectable between american, british and european stylings.

The lcapy/schematic.py package describes the various stylings as composites of circuitikz regionalisations, and could be easily extended to support a named house style, or perhaps accommodate a regionalisation passed in as an explicit argument value.

if style == 'american':
    style_args = 'american currents, american voltages'
elif style == 'british':
    style_args = 'american currents, european voltages'
elif style == 'european':
    style_args = ('european currents, european voltages, european inductors, european resistors')

As well as constructing circuits from netlist descriptions, we can also create them from network style descriptions:

from lcapy import R, C, L

cct2= (R(1e6) + L(2e-3)) | C(3e-6)
#For some reason, the style= argument is not respected
cct2.draw()

The diagrams generated from networks are open linear circuits rather than loops, which may not be quite what we want. But these circuits are quicker to write, so we can use them to draft netlists for us that we may then want to tidy up a bit further.

print(cct2.netlist())

'''
W 1 2; right=0.5
W 2 4; up=0.4
W 3 5; up=0.4
R 4 6 1000000.0; right
W 6 7; right=0.5
L 7 5 0.002; right
W 2 8; down=0.4
W 3 9; down=0.4
C 8 9 3e-06; right
W 3 0; right=0.5
'''

Circuit descriptions can also be loaded in from a named text file, which is handy for course material maintenance as well as reuse of circuits across materials: it’s easy enough to imagine a library of circuit descriptions.

#Create a file containing a circuit netlist
sch='''
Vi 1 0_1 {sin(t)}; down
R1 1 2 22e3; right, size=1.5
R2 2 0 1e3; down
P1 2_2 0_2; down, v=V_{o}
W 2 2_2; right, size=1.5
W 0_1 0; right
W 0 0_2; right
'''

fn="voltageDivider.sch"
with open(fn, "w") as text_file:
    text_file.write(sch)

#Create a circuit from a netlist file
cct = Circuit(fn)

The ability to create – and share – circuit diagrams in a Python context that plays nicely with Jupyter notebooks is handy, but the lcapy approach becomes really useful if we want to produce other assets around the circuit we’ve just created.

For example, in the case of the above circuit, how do the various voltage levels across the resistors respond when we switch on the sinusoidal source?

import numpy as np
t = np.linspace(0, 5, 1000)
vr = cct.R2.v.evaluate(t)
from matplotlib.pyplot import figure, savefig
fig = figure()
ax = fig.add_subplot(111, title='Resistor R2 voltage')
ax.plot(t, vr, linewidth=2)
ax.plot(t, cct.Vi.v.evaluate(t), linewidth=2, color='red')
ax.plot(t, cct.R1.v.evaluate(t), linewidth=2, color='green')
ax.set_xlabel('Time (s)')
ax.set_ylabel('Resistor voltage (V)');

Not the best example, admittedly, but you get the idea!

Here’s another example, where I’ve created a simple interactive to let me see the effect of changing one of the component values on the response of a circuit to a step input:

(The nice plotting of the diagram gets messed up unfortunately, at least in the way I’ve set things up for this example…)

As the code below shows, the @interact decorator from ipywidgets makes it trivial to create a set of interactive controls based around the arguments passed into a function:

import numpy as np
from matplotlib.pyplot import figure, savefig

@interact(R=(1,10,1))
def response(R=1):
    cct = Circuit()

    cct.add('V 0_1 0 step 10;down')
    cct.add('L 0_1 0_2 1e-3;right')
    cct.add('C 0_2 1 1e-4;right')
    cct.add('R 1 0_4 {R};down'.format(R=R))
    cct.add('W 0_4 0; left')

    t = np.linspace(0, 0.01, 1000)
    vr = cct.R.v.evaluate(t)

    fig = figure()
    #Note that we can add Greek symbols from LaTex into the figure text
    ax = fig.add_subplot(111, title='Resistor voltage (R={}$\Omega$)'.format(R))
    ax.plot(t, vr, linewidth=2)
    ax.set_xlabel('Time (s)')
    ax.set_ylabel('Resistor voltage (V)')
    ax.grid(True)
    
    cct.draw()

Using the network description of a circuit, it only takes a couple of lines to define a circuit and then get the transient response to step function for it:

Again, it doesn’t take much more effort to create an interactive that lets us select component values and explore the effect they have on the damping:

As well as the numerical analysis, lcapy also supports a range of symbolic analysis functions. For example, given a parallel resistor circuit, defined using a network description, we can find the overall resistance in simplest terms:

Or for parallel capacitors:

Some other elementary transformations we can apply – providing expressions for the an input voltage in the time or Laplace/s domain:

We can also create pole-zero plots quite straightforwardly, directly from an expression in the s-domain:

This is just a quick skim through of some of what’s possible with lcapy. So how and why might it be useful as part of a reproducible educational resource production process?

One reason is that several of the functions can reduce the “production distance” between different likely components of a set of educational materials.

For example, given a particular circuit description as a netlist, we can annotate it with direction cribs in order to generate a visual circuit diagram, and we can use a circuit created from it directly (or from the direction annotated script) to generate time or frequency response charts. (We can also obtain symbolic transfer functions.)

When trying to plot things like pole zero charts, where it is important that the chart matches a particular s-domain expression, we can guarantee that the chart is correct by deriving it directly from the s-domain expression, and then rendering that expression in pretty LaTeX equation form in the materials.

The ability to simplify expressions  – as in the example of the simplified expressions for overall capacitance or resistance in the parallel circuit examples above – directly from a circuit description whilst at the same time using that circuit description to render the circuit diagram, also reduces the amount of separation between those two outputs to zero – they are both generated from the self-same source item.

You can run the notebook (albeit subject to significant changes) that contains the original working for examples used in this post on Binderhub: Binder

Reproducible Diagram Generators – First Thoughts on Electrical Circuit Diagrams

I had a quick tinker with one of the demo notebooks I’m putting together to try to work through what I think I mean by various takes on the phrase “reproducible educational materials” this morning, so here’s a quick note to keep track of my thinking.

The images are drawn in Jupyter notebooks using tikzmagic loaded as %load_ext tikz_magic. (The original notebook this post is based on can be found here, although it’s likely subject to significant change…)

The circuitikz LaTeX package (manual) supports the drawing of electrical circuit diagrams. Circuits are constructed by drawing symbols that connect pairs of Cartesian co-ordinates or that:

%%tikz -p circuitikz -s 0.3
    %Draw a resistor labelled R_1 connecting points (0,0) and (2,0)
    \draw (0,0) to[R, l=$R_1$] (2,0);


The requirement to specify co-ordinates means you need to think about the layout – drafting a circuit on graph paper can help with this.

But things may be simplified in maintenance terms if you label co-ordinates and join those.

For example, consider the following circuit:

This can be drawn according to the following script:

%%tikz -p circuitikz -s 0.3
    %Draw a resistor labelled R_1 connecting points (0,0) and (2,0)
    %Extend the circuit out with a wire to (4,0)
    \draw (0,0) to[R, l=$R_1$] (2,0) -- (4,0);

    %Add a capacitor labelled C_1 connecting points (2,0) and (2,-2)
    \draw (2,0) to[C, l=$C_1$] (2,-2);

    %Add a wire along the bottom
    \draw (0,-2) -- (4,-2);

There are a lot of explicitly set co-ordinate values in there, and it can be hard to see what they refer to. Even with such a simple diagram, making changes to it could become problematic.

In the same way that it is good practice to replace inline numerical values in computer programs with named constants or variables, we can start to make the figure more maintainable by naming nodal points and then connecting these named nodes:

%%tikz -p circuitikz -s 0.3

    %Define some base component size dimensions
    \def\componentSize{2}

    %Define the size of the diagram in terms of component width and height
    %That is, how many horizontally aligned components wide is the diagram
    % and how many vertically aligned components high
    \def\componentWidth{2}
    \def\componentHeight{1}

    %Define the y co-ordinate of the top and bottom rails
    \def\toprail{0}
    \def\height{\componentSize * \componentHeight}
    \def\bottomrail{\toprail - \height}

    %Define the right and left extent x coordinate values
    \def\leftside{0}
    \def\width{\componentSize * \componentWidth}
    \def\rightside{\leftside + \width}

    %Name the coordinate locations of particular nodes
    \coordinate (InTop) at (\leftside,\toprail);
    \coordinate (OutTop) at (\rightside,\toprail);
    \coordinate (InBottom) at (\leftside,\bottomrail);
    \coordinate (OutBottom) at (\rightside,\bottomrail);

    %Draw the top rail
    %Define a convenience x coordinate as the
    %  vertical aligned to the topmost component out
    %The number (1) in the product term below is based on
    %  how many components in from the left we are
    \def\R1outX{1 * \componentSize}

    %Add a resistor labelled R_1
    \coordinate (R1out) at (\R1outX,\toprail);
    \draw (InTop) to[R, l=$R_1$] (R1out) -- (OutTop);

    %Add a capacitor labelled C_1
    \coordinate (C1out) at (\R1outX,\bottomrail);
    \draw (R1out) to[C, l=$C_1$] (C1out);

    %Draw the bottom rail
    \draw (InBottom) -- (OutBottom);

Some reflections about possible best practice drawn (!) from this:

  • define named limits on x values to set the width of the diagram, such as \leftside and \rightside. This can be done by counting the number of components wide the diagram is (if we can assume components have width one).
  • name the maximum and minimum height (y) values such as \toprail and \bottomrail. Again, counting vertically place components may help.  Use relative definitions where possible to make the diagram easier to maintain.
  • define connections relative to each other to minimise the number of numerical values that need to be set explicitly;
  • name points sensibly; if we read the diagram from top left to bottom right, we can make use of easily recognised verticals by using named x coordinate values set relative to the topmost component out x co-ordinates (for example, \R1outX) and top leftmost component out y values; full cartesian co-ordinate pairs can then be named relative to nodes associated with top leftmost component outs (for example, \R1out);
  • the code to produce the diagram looks like overkill in its length, but lots of could quickly become boilerplate that could potentially be included in a slightly higher level TeX package that bakes more definitions in. Despite the added length, it also makes the script more readable and supports its self-documenting, literate programming style nature.

PS imagining feedback… “Ah yes, but we don’t draw resistors like that, so it’s no good…” ;-)

%%tikz -p circuitikz -s 0.3
%To ward off the "we don't draw resistors like that" cries...
\ctikzset{resistor = european}

%Draw a resistor labelled R_1 connecting points (0,0) and (2,0)
\draw (0,0) to[R, l=$R_1$] (2,0);

resistor2
And that’s another reason why this approach make sense…

Docker Container and Vagrant / Virtualbox 101s

Some simple recipes for getting started with running some demo virtual applications in variety of ways.

Docker and Zeit Now – flaskdemo

This first demo creates a simple flask app and a Dockerfile with just enough to get it running; it runs locally if you have docker installed locally, and will also run on Zeit Now.

Create a new directory – flaskdemo – and create a couple of files in it, specifically:

flaskdemo.py

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=5005)

Dockerfile

FROM python:3.6-slim-stretch

RUN pip3 install flask

EXPOSE 5005
COPY ./flaskdemo.py /var/flaskdemo.py
CMD ["python3","/var/flaskdemo.py"]

Download and run the Zeit Now application, then on the command line cd into the flaskdemo folder and type: now to launch the application and now ls to lookup the URL where it’s running.

If you prefer, and have docker installed locally:

  • on the command line, cd into the directory and then build the Docker container with the command docker build -t myflaskdemo . (the dot / . is important – it says “the path to the Dockerfile is the current directory)
  • run the container with: docker run -d -p 8087:5005 myflaskdemo and you should see a hello message on port 8087.

Compare this with what I had to do to get a flask app running via the Reclaim CPanel.

Note that as well as deploying to my local docker instance, the docker-machine command line application also allows me to launch remote servers (for example, on Digital Ocean) and then build/deploy the container there (old example). It’s not quite as easy as Zeit Now though…

Docker demo – Jupyter notebook powered API

This demo shows how to knock up an API server where the API is defined in a Jupyter notebook and published using the Jupyter Kernel Gateway (see also Building a JSON API Using Jupyter Notebooks in Under 5 Minutes). This is probably all overkill but it make for a literate definition of your API defining code…

Create a new directory – jupyterkgdemo and copy the following Jupyter notebook into it (gist/psychemedia/jupyterkgdemo.ipynb):

jupyterkgdemo.ipynb

Add the following Dockerfile:

FROM python:3.5-slim-stretch

RUN pip3 install jupyter_kernel_gateway

EXPOSE 5055

COPY ./jupyterkgdemo.ipynb /var/jupyterkgdemo.ipynb
CMD ["/usr/local/bin/jupyter","kernelgateway","--ip=0.0.0.0","--KernelGatewayApp.api='kernel_gateway.notebook_http'","--KernelGatewayApp.seed_uri='/var/jupyterkgdemo.ipynb'", "--port=5055"]

With Docker installed locally:

  • on the command line, cd into the directory and then build the Docker container with the command docker build -t myjupyyterapidemo . (the dot / . is important – it says “the path to the Dockerfile is the current directory)
  • run the container with: docker run -d -p 8089:5055 myjupyyterapidemo and you should see a message on port 8089; go to localhost:8089/hello/whoever and you should get a message personalised to whoever.

Note that this doesn’t seem to run on Zeit Now; another port seems to be detected that I think breaks things?

Vagrant Demo

Vagrant is a tool that helps automate the deployment of virtual machines using Virtualbox. In some senses, it’s overkill; in others, it means we don’t have to provide instructions about setting up various Virtualbox bits and bobs…

Install vagrant and then add the following files to a vagrantdemo folder:

First, the simple flask app demo code we used before:

flaskdemo.py

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=5005)

We can run this as a Linux service inside the VM – we means we need to provide a service definition file:

flaskservice.service

[Unit]
Description=Demo Flask Server

After=network.target

[Service]

Type=simple
PIDFile=/run/flaskdemo.pid

WorkingDirectory=/var/flaskdemo
ExecStart=/usr/bin/python3 flaskdemo.py

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

To install the packages we need, and copy the local files into the correct locations inside the VM, we can define a config shell script:

flaskserver.sh

#!/usr/bin/env bash

apt-get update && apt-get install -y python3 python3-dev python3-pip && apt-get clean

pip3 install flask

THISDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

mkdir -p /var/flaskdemo
cp $THISDIR/flaskdemo.py /var/flaskdemo/
cp $THISDIR/flaskserver.service /lib/systemd/system/flaskserver.service

# Enable autostart
systemctl enable flaskserver.service

# Refresh service config
systemctl daemon-reload

systemctl restart flaskserver

Now we need a Vagrantfile to marshal the VM:

Vagrantfile

Vagrant.configure("2") do |config|
  config.vm.box = "bento/ubuntu-16.04"

  config.vm.network :forwarded_port, guest: 5005, host: 8077, auto_correct: true
  config.vm.synced_folder ".", "/vagrant"

  config.vm.provision :shell, :inline => <<-SH
    cd /vagrant/
    source ./flaskserver.sh
  SH
end

On the command line, cd into the folder and run vagrant up. You should be able to see the hello world on localhost:8077.

Note that as well as using vagrant to provision a VM using Virtualbox, other provisioners are available that would let me automatically fire up a server on a remote host (such as Digital Ocean using the Digital Ocean provisioner) and then run the same set-up script to build the machine there.

PS an example of running a crappy shiny demo (a file uploader) can be found via here. I'll add a demo to this post at some point, though not sure when… In testing, the R/shiny server image is too big to run under the Zeit Now free plan (The built image size (452.8M) exceeds the 100MiB limit).

Publish Static Websites, Docker Containers or Node.js Apps Just by Typing: now

A few days ago, in the latest episode of the Reclaim Today video show (any chance of an audio podcast feed too?) Jim Groom and Tim Owens chatted about Zeit Now [docs], a serverless hosting provider.

I’d half had an opportunity to get on on on the call when the subject matter was mooted, but lacksadaisicality on my part, plus the huge ‘sort my life out’ style “what did I/didn’t say say?”  negative comedown I get after any audio/video recording means I missed the chance and I’ll have to contribute this way…

First up, Tim already knows more about Zeit Now than I do.

Once you install the Zeit Now client on your desktop, you can just drag a folder onto the running app icon and it will be uploaded to the now service. If there’s a Dockerfile in the folder, a corresponding container will be built and an endpoint exposed. If there’s no Dockerfile, but there is a package.json, a node.js application will be created for you. And if there’s no Dockerfile and no package.json file, but there is an index.html page, you’ll find you’ve just published a static website.

As well being able to drag the project containing folder on to the now icon, you can cd into the folder on the command line and just type: now. The files will be pushed and the container/node.js app/website created for you.

If you prefer, you can put the files into a Github repo, and connect the repo to the Zeit now service; whenever you make a commit to the repo, a webhook will trigger a rebuild from the repo of the service running on now.

This much (and more) I learned as a direct consequence of Reclaim Today and a quick read around, and it’s way more powerful than I thought. Building node apps is a real bugbear for me – node.js always seems to want to download the internet, and I can never figure out how to start an app (the docs on any particular app often assume you know what to type to get started, which I never can). Now all I need to do is type: now.

But… and there are buts: in the free plan, the resource limits are quite significant. There’s a limit on the size of files you can upload, and in the Docker case there seems to be a limit on the image size or image build size (I couldn’t see that in the pricing docs though, although the Logs limit looked to be the same as the limiting size I could have on a container?

It looks like you can run as many services as you want (the number of allowed deployments is infinite, where I think a deployment equates to a service: static web app, node.js app, Docker container) although you can hit bandwidth or storage caps. Another thing to note is that in the free plan, the application source files are public.

If anyone would like to buy me a coffee towards a Zeit Now subscription, that would be very nice:-). The Premium plan comes in at about a coffee a week…

Prior to watching the above show, my engagement with Zeit Now had been through using datasette. In the show, Tim tried to install datasette by uploading the datasette folder, but that’s not how it’s designed to be used. Instead, datasette is a Python package that runs as a command-line app. The app can do a several things:

  • launch a local http server providing an interactive HTML UI to a SQLite database;
  • build a local Docker container capable of running a local http server providing an interactive HTML UI to a SQLite database;
  • create a running online service providing an interactive HTML UI to a SQLite database on *Zeit Now* or *Heroku*.

So how I had used datasette with Zeit Now was through the following – install the datasette python package (once only):

pip install datasette

And then, with the Zeit Now app installed and running on my local computer:

datasette publish now mydb.sqlite

This is then responsible for pushing the necessary files to Zeit Now and displaying the URL to the running service. Note that whilst the application runs inside a docker container on Zeit Now, I don’t need Docker installed on my own computer.

It struck me last night that this is a really powerful pattern which I complements a workflow pattern I use elsewhere. To whit, many Python and R packages exist that create HTML pages from templates for displaying interactive charts or maps. The folium package, for example, is a Python package that plays nicely with Jupyter notebooks that can create an embedded map in a notebook. What’s embedded is an HTML page that is generated from a template. The folium package can also add data to the HTML page from a Python programme, so I can load in a location marker or shapefile datasets into python and then push it into the map without needing to know anything about how to write the HTML or javascript needed to render the map. In the R world, things like R‘s leaflet package do something similar.

This pattern – of Python, or R, packages that can create HTML assets – is really useful: it means you can easily create HTML assets that can be used elsewhere form a simple line or two of Python (or R) code.

This is where I think the datasette publish pattern comes in: now we have a pattern whereby a package generates not only an HTML application (in fact, an HTML site) but also provides the means to run it, either locally, packaged as a container, or via an online service (Zeit Now or Heroku). It should be easy enough to pull out the publish aspects of the code and use the same approach to create a python package that can run a Scripted Form locally, or create a container capable of running it, or push a runnable Jupyter notebook scripted API to Zeit Now.

On a related point, the Reclaim Today video show mentioned R/Shiny. As Tim pointed out, R is the programming language, but Shiny is more of an HTML application development framework, written in R. It provides high level R functions for creating interactive HTML UI interfaces and components and binding them to R variables. When one of the running HTML form elements is updated, the value is passed to a corresponding R variable; other R functions can respond to the updated value and generate new outpuys (charts, filtered datatables etc) which are then pass them back to shiny for display in the HTML application. As with things like folium the high level R functions, in this case, are responsible for generating much of the HTML / Javascript automatically.

Thinks: a demo of the datasette publish model for a Shiny app could be quite handy?

A couple more things that occurred to me after watching the video…

Firstly, the build size limits that Zeit Now seems to enforce. Looking at the Dockerfile in the datasette repo, I notice it uses a staged / multi-stage build. That is, the first part of the container builds some packages that are then copied into a ‘restarted’ build. Building / compiling some libraries can require a lot of heavy lifting, with dependencies required for the build not being required in the final distribution. The multi-stage build can thus be used to create relatively lightweight images that contain custom built packages without having to tidy up any of the heavier packages that were installed simply to support the build process. If a container intended for Zeit Now breached resource requirements because of the build, that could block the build, even if the final container is quire light (I’m not sure if it does work like this, just let me think this through as if it does…) One alternative would be to reduce the multi-stage build to a single stage Dockerfile, replacing the second stage FROM with a set of housekeeping routines to clear out the build dependencies (this is what multi-stage presumably replaces?) but this may still hit resource limits in the build stage. A second approach would be to split the multistage build into a first stage build that creates, and tidies up, a base container that can be imported directly into a standalone second stage Dockerfile. This is fine if you can create and push your own base container that the second stage Dockerfile could pull on. But if you don’t have Docker installed, that could be difficult. However, Docker Hub has a facility for building containers from github repos (Docker hub automated builds) in much the same way Zeit Now does. So I’m wondering – is there a now like facility for pushing a build directory to Dockerhub and letting Dockerhub build it without resource limitation in the build step, and then let Zeit Now pull on the finally built, cleaned up image? Or is the fallback to build the base (first stage) build container on Docker hub from Github repo? (Again, there is the downside that the build files will be public.)

The second point to mention is one that relates to next-generation hosting (consider Reclaim’s current hosting offering where users can run prepackaged applications from CPanel as first generation; as far as publishing running containers goes, the current generation model might be though of as using something like Digital Ocean to first launch a server, and then running a container image on that server.).

The Zeit Now model is essentially a serverless offering: end users can push applications to a server instance that is automatically created on their behalf. If I want to run a container in the cloud using docker from my desktop, the first thing I typically need to do is launch a server. Then I push the image to it. The Zeit Now model removes that part of the equation – I can assume the existence of the server, and not need to know anything about how to set it up. I can treat it simply as a Docker container hosting service.

The Zeit Now desktop tools make it really easy to use from the desktop, and the code is available too, but the Github linked deployment is also really powerful. So it’s perhaps worth trying to pick apart what steps are involved, and what alternative approaches are out there, if Reclaim was to pick up on this next generation approach.

So let’s find some entrails… It seems to me that the Zeit Now model can be carved up in several pieces:

  • source files for your website/application need to be somewhere (locally, or in Github)
  • if you’re running an application (node.js, Docker) then the environment needs setting up or building – so you need build files
  • before running an application, you need to build the environment somewhere. For Docker applications, you might want to build a base container somewhere else, and then just pull it in directly in to the *Zeit Now* environment.

To deploy the application, you need to fire up an environment and then add in the source files; the server is assumed.

So what’s needed to offer a rival service is something that can:

  • create server instances on demand;
  • create a custom environment on a server;
  • allow source files to be added to the environment;
  • run the application.

An example of another service that supports this sort of behaviour is Binderhub, an offshoot of the Jupyter project. Binderhub supports the build and on-demand deployment of custom environments / applications built according to the contents of a Github repository.  Here’s an example of the TM112 environment running on MyBinder Binder and here’s the repo.) An assumption is made that a Jupyter environment is run, but I’m guessing the machinery allows that assumption to be relaxed? Binderhub manages server deployment (so it supports a next generation hosting serverless model) as well as application build and deployment.

Supporting tooling includes repo2docker which has some of the feel of datasette‘s local container build model in that it can build a docker container locally if docker is installed from a local directory, but will also build a container from a Github repository. A huge downside is that a local Docker install is required. A --push switch allows the locally built container to be pushed to Docker hub automatically. (The push is managed using the docker desktop application, which requires the installation of docker – which may be tricky for some users. What would be handy would be a standalone dockerhub CLI which would support remote, automated builds from a source directory pushed to Docker hub, as per Zeit Now and as already mentioned above). Hmmm… seems like someone has been working on a Docker hub CLI but it looks like it’s just file listing operations that are supported; the remote build may not be possible from the Docker hub side unless we could find a way of co-opting how the automated build from github pulls work?

One of the things I am trying to lobby for in the OU is a local Binderhub service (it would also be nice if there was a federated Binderhub service where separate organisations could volunteer compute resource in accessed from a single URL…) One thing that strikes me is that it would be nice to see a  localrepo2binder service that could push a local directory to a Binderhub instance, rather than require Binderhub to pull from a github repo. This would then mimic Zeit Now functionality…

PS this looks handy – exoframe (about) – like a self-hosted Zeit Now. As with many nodejs apps I’ve tried to try, I can’t seem to be able to install an appropriate nodejs version properly enough to run the (client) app:-(

Quick Geo Queries Using Datasette – Fact Checking a Conversation About Local Building Works on the Isle of Wight

A few weeks ago, chatting with neighbours, comment was made about some earthworks appearing along the roadside a couple of miles away, out towards an area that has been licensed for fracking exploration. Concerns were raised as to whether this new earth embankment might be associated with that.

It is possible to run queries over historical planning applications on the Isle of Wight council website but the results are provided in tabular form:

However, I’ve also been running a scraper over Isle of Wight planning applications for some time, with the data stored in a simple SQLite database. A couple of the columns give latitude and longitude information associated with each application (oftentimes, quite crudely…), and multiple other columns are also available that can be queried over:

It only took a single command line command to fire up a datasette server that meant I could look for planning applications made in recent times in the area:

datasette data.sqlite

A simple query turns up applications in the appropriate parish – the map is automatically created by the datasette server when latitude and longitude columns are returned from a query:

Zooming in gives a possible candidate for the application behind the recent works- a better choice of selected columns would give a more useful tooltip:

A slightly refined query turns up the application in more detail:

And from there it’s easy enough to go to the application  – the one with ID 32578:

Handy…

See also this previous encounter with IW planning applications: All I Did Was Take the Dog Out For a Walk….

PS And also see also: Running Spatial Queries on Food Standards Agency Data Using Ordnance Survey Shapefiles in SpatiaLite via Datasette.

PPS Here’s a handy side effect of running the datasette command from inside a Jupyter notebook – a copy of the query log.

iwplanning

PPS the scraper actually feeds a WordPress plugin I started trying to develop that displays currently open applications in a standing test blog page. I really should tidy that plugin code up and blog it one day…

Ad Cookie Opt-Outs At the Ad Industry Level

I’ve recently started clicking through on browser cookie alerts to cookie management pages that allow you to opt-out of tracking cookies, and noticed this at the bottom of the tumblr privacy page:

The industry links mentioned are to:

This made me wonder about when I click to disallow cookies from a particular third party on a particular website, does that withdrawal of permission only apply to the third party setting cookies in my browser on that particular website – in which case I have to try to make sure I set permissions against particular third parties on every website if I want to stop them tracking me, even if only occasionally?

Taking the cynical view (and not reading Ts and Cs that carefully…) I’m guessing that’s how it works. But presumably by opting out at the ad industry professional association level, I can prevent specified third parties from setting cookies on whatever website I visit (cf. telephone preference service)?

Anywhere, here’s what happens on clicking through the above links. Note that by taking this options, you presumably allow all the linked adtech companies to profile your browser, grab your IP address and then set a cookie to say “no”…

First, Your Online Choices from the EDAA:

Next up, the DAA. First they run a check:

and then give you some blurb:

Then you can try to opt-out; not all opt-out requests seem to work though – I’m not sure where that leaves you?

One thing that would be handy would be a filter to show which sites are not responding to the opt-out:

Finally, the NAI:

Again, the opt-outs don’t necessarily all work:

Without reading any of the blurb, I assume you have to run this sort of opt-out on each of your browsers and/or devices?

But what happens if you opt out in the above cases, then unwittingly don’t disable an ad in the cookie control panel for each particular website? Or should those control panels respect  / reflect the higher level opt out? (This probably calls for some proper experimentation…)

However, the big ad-tech players at least know how to do cross-browser and cross-device reconciliation, for example Google Ads: About cross-device attribution or Facebook: About cross-device reporting, and picking just one of the ad-tech companies from the various opt-out lists, it seems like other players do too: Intent Media: Cross Browser User Bridging with Dynamodb.

So what the ad industry single page opt-outs such really do is find a way to enforce allow a single opt-opt that works across websites, across browsers, and across devices… Does the current service support that? (You can tell I’m not a journalist – checking that would make this more of a story…;-)

See also: Cookie Acceptance Notices and It’d Be a Shame if You Didn’t Receive the Full Benefits of this Website….

Scraping ASP Web Pages

For a couple of years now, I’ve been using a Python based web scraper that runs once a day on morph.io to scrape planning applications from the Isle of Wight website into a simple SQLite database. (It actually feeds a WordPress plugin I started tinkering with to display currently open applications in a standing test blog post. I really should tidy that extension up and blog it one day…)

In many cases you can get a copy of the HTML content of page you want to scrape simply by making an http GET request to the page. Some pages, however, display content on a particular URL as a result of making a form request on a particular page that makes an http POST request to the same URL, and then gets content back dependent on the POSTed form variables.

In some cases, such as the Isle of Wight Council planning applications page, the form post is masked as a link that fires of a Javascript request that posts form content in order to obtain a set of query results:

The Javascript function draws on state baked into the page to make the form request. This state is required in order to get a valid response – and the lists of the current applications:

We can automate the grabbing of this state as part of our scraper by loading the page, grabbing the state data, mimicking the form content and making the POST request that would otherwise by triggered by the Javascript function run as a result of clicking the “get all applications” link:

import requests

#Get the original page
response = requests.get(url)

#Scrape the state data we need to validate the form request
soup=BeautifulSoup(response.content)
viewstate = soup.find('input' , id ='__VIEWSTATE')['value']
eventvalidation=soup.find('input' , id ='__EVENTVALIDATION')['value']
viewstategenerator=soup.find('input' , id ='__VIEWSTATEGENERATOR')['value']
params={'__EVENTTARGET':'lnkShowAll','__EVENTARGUMENT':'','__VIEWSTATE':viewstate,
        '__VIEWSTATEGENERATOR':viewstategenerator,
        '__EVENTVALIDATION':eventvalidation,'q':'Search the site...'}

#Use the validation data when making the request for all current applications
r = requests.post(url, data=params)

In the last couple of weeks, I’ve noticed daily errors from morph.io trying to run this scraper. Sometimes errors come and go, perhaps as a result of the server on the other end being slow to respond, or maybe even as an edge case in scraped data causing an error in the scraper, but the error seemed to persist, so I revisited the scraper.

Running the scraper script locally, it seemed that my form request wasn’t returning the list of applications, it was just returning the original planning application page. So why had my  script stopped working?

Scanning the planning applications page HTML, all looked to be much as it was before, so I clicked through on the all applications link and looked at the data now being posted to the server by the official page using my Chrome browswer’s Developer Tools (which can be found from the browser View menu):

Inspecting the form data, everything looked much as it had done before, perhaps except for the blank txt* arguments:

Adding those in to the form didn’t fix the problem, so I wondered if the page was now responding to cookies, or was perhaps sensitive to the user agent?

We can handle that easily enough in the scraper script:

#Use a requests session rather than making simple requests - this should allowing the setting and preservation of cookies
# (Running in do not track mode can help limit cookies that are set to essential ones)
session = requests.Session()

#We can also add a user agent string so the scraper script looks like a real browser...
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
session.headers.update(headers)

response =session.get(url)
soup=BeautifulSoup(response.content)

viewstate = soup.find('input' , id ='__VIEWSTATE')['value']
eventvalidation=soup.find('input' , id ='__EVENTVALIDATION')['value']
viewstategenerator=soup.find('input' , id ='__VIEWSTATEGENERATOR')['value']
params={'__EVENTTARGET':'lnkShowAll','__EVENTARGUMENT':'','__VIEWSTATE':viewstate,
        '__VIEWSTATEGENERATOR':viewstategenerator,
        '__EVENTVALIDATION':eventvalidation,'q':'Search the site...'}

#Get all current applications using the same session
r=session.post(url,headers=headers,data=params)

But still no joy… so what headers were being used in the actual request on the live website?

Hmmm… maybe the server is now checking that requests are being made from the planning application webpage using the host, origin and or referrer attributes? That is, maybe the server is only responding to requests it things are being made from its own web pages off it’s own site?

Let’s add some similar data to the headers in our scripted request:

session = requests.Session()
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
session.headers.update(headers)

response =session.get(url)
soup=BeautifulSoup(response.content)

viewstate = soup.find('input' , id ='__VIEWSTATE')['value']
eventvalidation=soup.find('input' , id ='__EVENTVALIDATION')['value']
viewstategenerator=soup.find('input' , id ='__VIEWSTATEGENERATOR')['value']
params={'__EVENTTARGET':'lnkShowAll','__EVENTARGUMENT':'','__VIEWSTATE':viewstate,
        '__VIEWSTATEGENERATOR':viewstategenerator,
        '__EVENTVALIDATION':eventvalidation,'q':'Search the site...'}

#Add in some more header data...
#Populate the referrer from the original request URL
headers['Referer'] = response.request.url
#We could (should) extract this info by parsing the Referer; hard code for now...
headers['Origin']= 'https://www.iow.gov.uk'
headers['Host']= 'www.iow.gov.uk'

#Get all current applications
r=session.post(url,headers=headers,data=params)

And… success:-)

*Hopefully by posting this recipe, the page isn’t locked down further… In mitigation, I haven’t described how to pull the actual planning applications data off the page…

It’d Be a Shame if You Didn’t Receive the Full Benefits of this Website…

…as in: “nice place you have here, it’d be a shame if anything happened to it”, or just “Sorry, you’re not welcome here, unless you give me a …”.

How many times have you seen such a notice, or been asked when buying a big ticket item in a shop for your: name, postcode, address, date of birth, …?

As James Bridle observes in New Dark Age : Technology and the End of the Future, p247:

[I]n most of our interactions with power, data is not something that is freely given but forcibly extracted — or impelled in moments of panic, like a stressed cuttlefish attempting to cloak itself from a predator.