Tagged: ipynb

Authoring Dynamic Documents in IPython / Jupyter Notebooks?

One of the reasons I started writing the Wrangling F1 Data With R book was to see how it felt writing combined text, code and code output materials in the RStudio/RMarkdown context. For those of you that haven’t tried it, RMarkdown lets you insert executable code elements inside a markdown document, either as code blocks or inline. The knitr library can then execture the code and display the code output (which includes tables and charts) and pandoc transforms the output to a desired output document format (such as HTML, or PDF, for example). And all this at the click of a single button.

In IPython (now Jupyter) notebooks, I think we can start to achieve a similar effect using a combination of extensions. For example:

  • python-markdown allows you to embed (and execute) python code inline within a markdown cell by enclosing it in double braces (For example, I could say “{{ print(‘hello world’}}”);
  • hide_input_all is an extension that will hide code cells in a document and just display their executed output; it would be easy enough to tweak this extension to allow a user to select which cells to show and hide, capturing that cell information as cell metadata;
  • Readonly allows you to “lock” a cell so that it cannot be edited; using a notebook server that implements this extension means you can start to protect against accidental changes being made to a cell by mistake within a particular workflow; in a journalistic context, assigning a quote to a python variable, locking that code cell, and then referencing that quote/variable in a python-markdown might be one of working, for example.
  • Printview-button will call nbconvert to generate an HTML version of the current notebook – however, I suspect this does not respect the extension based customisations that operate on cell metadata. To do that, I guess we need to generate our outptut via an nbconvert custom template? (The Download As... notebook option doesn’t seem to save the current HTML view of a notebook either?)

So – my reading is: tools are there to support the editing side (inline code, marking cells to be hidden etc) of dynamic document generation, but not necessarily the rendering to hard copy side, which need to be done via nbconvert extensions?

Related: Seven Ways of Running IPython Notebooks

Seven Ways of Running IPython Notebooks

We’re looking at using IPython notebooks for a MOOC on something or other, so here’s a quick review of the different ways I think we can provide access to them. Please let me know via the comments if there are other models…

User Desktop – Native App

Download a version of a scientific python distribution such as Anaconda (python 2.7 & 3) or Enthought Canopy (python 2.7) and run the notebook from within that. Runs cross platform, requires user admin privileges to install application.

In the context of a MOOC, this approach would require participants to download and install the python distribution on a desktop or laptop computer. This approach will not work on a tablet.
Delivery/use costs: none.
Publisher demands: notebook development.
Maintenance: dependence on distribution provider.
Support issues: installation of 3rd party software.
Custom branding/styling/extensions: can be installed after running a config script.

Browser Extension

The CoLaboratory (about) Chrome extension allows you to run IPython notebooks within the Chrome browser without the need to install any other software. Notebooks are saved to/opened from Google Drive. Python 2.7(?). Requires Google Chrome (cross-platform), Google Account (for Google Drive integration). This approach does not support the installation of arbitrary third party python libraries – only libraries compiled into the extension will work.

In the context of a MOOC, this approach would require participants to download and install Chrome and the CoLaboratory extension. This approach will not work on a tablet.
Delivery/use costs: none.
Publisher demands: notebook development.
Maintenance: dependence on extension publisher; (code available to fork).
Support issues: installation of 3rd party software.
Custom branding/styling/extensions: not supported(?)

iOS App – NO LONGOER AVAILABLE

As of March 2015 – no longer available from appstore
An iOS app, Computable (blog) makes notebooks available on an iPad. The app is free to preview notebooks bundled with the app, but requires a $10 in-app purchase to run your own notebooks. Integrated with Dropbox. Source code not available. Other reports of demos – but again, no code – available, such as: IPython notebook on iPhone. I don’t know if this approach supports the installation of arbitrary third party python libraries; that is, I don’t know if only libraries compiled into the application will work.

In the context of a MOOC, this approach would require participants to download and install the app and then pay the $10 fee. A Dropbox account is also required(?). This approach will only work on an iOS device.
Delivery/use costs: $10 to student.
Publisher demands: notebook development.
Maintenance: dependence on app publisher.
Support issues: installation of 3rd party software.
Custom branding/styling/extensions: not supported(?)

User Desktop – Virtual Machine

Run an IPython server within a virtual machine on the user’s desktop, exposing the notebook via a browser. Requires a virtual machine runner (eg VirtualBox, VMWare), port forwarding; some mechanism for starting up the VM and auto-running the notebook server. Devops tools may be used to deploy VMs either locally or in the cloud (eg JiffyLab (about Jiffylab); the OU course TM351 (in production) is currently exploring the use of VMs managed using vagrant to deliver IPython notebooks along with a range of other services).

Pre-defined IPython notebook VM examples: docker: ipython/notebook; Chef: IPython notebook cookbook.

In the context of a MOOC, this approach would require participants to download and install a virtual machine runner and the virtual machine image. This approach will not work on a tablet.
Delivery/use costs: none.
Publisher demands: development of VM image.
Maintenance: tracking VM runner compatability.
Support issues: installation of 3rd party software; installation of VM; port forwarding conflicts, locating the notebook server address.
Custom branding/styling/extensions: yes.

In the Cloud – Managed Services

Open a notebook running on a managed IPython notebook server such as Wakari or Authorea. Free plans available, requires personal account on managed service, internet connection.

In the context of a MOOC, this approach would require participants to have access to a network connection and create an account on a provider service. This approach will work on any device.
Deliver/use costs: Notebooks created under free plans will be public.
Custom branding/styling/extensions: no – branding may be associated with provider; provider identified extensions may be offered as part of hosted service.

In the Cloud – Ad Hoc tmp Notebooks

Nature recently did a splash on IPython notebooks (Interactive notebooks: Sharing the code) that included a demo notebook for readers to experiment with: IPython interactive demo.

The system uses a ‘temporary notebook’ server described here: Instant Temporary IPython Notebooks (code: Jupyter tmpnb). The sever spawns temporary notebooks that can be used to run particular activities. Other example use cases include the provision of notebooks at conferences/workshops for demo purposes, hour of code demos etc.

In the context of a MOOC, this approach would require participants to have access to a network connection. Notebooks will be temporary and cannot be persisted. This approach would be ideal for one off activities. This approach will work on any device. My assumption is that we would not be able direct participants to the current 3rd party provider of this service without prior agreement.
Delivery/use costs: server provision for running notebooks.
Publisher demands: management of cloud services if self-hosted.
Maintenance: tracking VM runner compatability.
Support issues:
Custom branding/styling/extensions: no.

In the Cloud – Hosted VMs

Another virtual machine approach, but participants fire up a prebuilt virtual machine in the cloud. Examples include: Yhat Sciencebox. Another example is the notebookcloud, a semi-managed service built around Google authentication and user’s personal AWS credentials; ((forked) legacy code). (From the docs: NotebookCloud is service that allows you to launch and control IPython Notebook servers on Amazon EC2 from your browser. This enables you to host your own Python programming environment, on your own Amazon virtual machine, and access it from any modern web browser.)

In the context of a MOOC, this approach would require participants to have access to a network connection and create an account on a provider service. This approach will work on any device.
Delivery/use costs: participant pays machine, storage and bandwidth costs according to usage.
Publisher demands: management of cloud services if self-hosted. Alternatively, we could host a version of NotebookCloud.
Maintenance: potential reliance on 3rd party VM configuration.
Support issues: account creation, VM management.
Custom branding/styling/extensions: no – branding may be associated with provider; provider identified extensions may be offered as part of hosted service.

Hosted MultiUser Systems

IPython and IPython Notebooks are under active development and it looks as if future releases will support multi-user services, for example JupyterHub: A multi-user server for Jupyter notebooks.

Summary

In order to make interactive use of IPython notebooks within a course, we need to make notebooks available, and provide a way of running them. A wide variety of models that support the running of notebooks exists. We can distinguish: where does the notebook server run; where does it load pre-existing notebooks from; where does it save notebooks to.

Running a notebook server incurs a resource cost in terms of installation, maintenance, (remote) access (eg managing multiple instances, port availability etc when offering a hosted service), any financial costs associated with running the service. Who covers the costs and meets any load issues depends on the solution adopted.

As far as usage goes, notebooks are accessed via a web browser and as such are accessible on any device. However, if a server runs locally rather than on a remote host, there are device dependencies to contend with that rule out some solutions on some platforms (VM can’t run on iOS; iOS app won’t run on Windows machine etc).

In an online course environment, we may be able separate concerns and suggest a variety of ways of running notebooks to participants, leaving it up to each participant to find a way of running the notebooks that works for them. Duty of care might then extend only insofar as making notebooks available that will run on all the platforms we have recommended.

In terms of pedagogy, we might distinguish between notebooks as used:

  1. to run essentially standalone exercises, and which we might treat as disposable (a model which the tmpnb solution fits beautifully because no persistence of state is required); this exercises might be pre-prepared notebooks made available to participants, or user created notebooks that users create as a scratch pad to run through a particular set of activities described elsewhere;
  2. for activities where the participant may want to save a work in progress and return to it at a later date (which requires persistence of state on a per user basis); this might be in the context of a notebook we provide to the students, or one they have created and are working on themselves;
  3. for activities where participants create their own notebooks and wish to preserve them.

From the OU perspective, we should probably take a view on the extent to which we develop solutions that work across one or more contexts, such as MOOCs delivered ex- of OU provisioned services (eg FutureLean); MOOCs delivered via an OU context (eg OpenLearn); courses delivered to OU fee paying students via OU systems.

There may be opportunities to develop solutions that work to support the delivery of OU courses, as well as OU MOOCs, and that are further licensable to other institutions, eg to support their own course delivery or MOOC delivery, or appropriate for release as open software in support of the wider community.

Edtech and IPython Notebooks – Activities and Answer Reveals

A few months ago I posted about an interaction style that I’d been using – and that stopped working – in IPython notebooks: answer reveals.

An issue I raised on the relevant git account account turned up a solution that I’ve finally gotten round to trying out – and extending with a little bit of styling. I’ve also reused chunks from another extension (read only code cells) to help style other sorts of cell.

Before showing where I’m at with the notebooks, here’s where OU online course materials are at the moment.

Teaching text is delivered via the VLE (have a look on OpenLearn for lots of examples). Activities are distinguished from “reading” text by use of a coloured background.

m269_closedAns

The activity requires a student to do something, and then a hidden discussion or anser can be revealed so that the student can check their answer.

m269_ansReveal

(It’s easy to be lazy, of course, and just click the button without really participating in the activity. In print materials, a frictional overhead was added by having answers in the back of the study guide that you would have to turn to. I often wonder whether we need a bit more friction in the browser based material, perhaps a time based one where the button can’t be clicked for x seconds after the button is first seen in the viewport (eg triggered using a test like this jQuery isOnScreen plugin)?!)

I drew on this design style to support the following UI in an IPython notebook:

Here’s a markdown cell in activity style and activity answer style.

M269_-_Python_-_Blue1a

The lighter blue background set into the activity is an invitation for students to type something into those cells. The code cell is identified as such by the code line In [ ] label. Whatever students type into those cells can be executed.

The heading is styled from a div element:

M269_-_Python_-_Blue

If we wanted to a slightly different header background style as in the browser materials, we could perhaps select a notebook heading style and then colour the background differently right across the width of the page. (Bah.. should have thought of that earlier!;-)

Markdown cells can also be styled to prompt students to make a text response (i.e. a response written in markdown, though I guess we could also use raw text cells). I don’t yet have a feeling for how much ownership students will take of notebooks and start to treat them as workbooks?

Answer reveal buttons can also be added in:

M269_-_Python_-_Blue2

Clicking on the Answer button displays the answer.

M269_-_Python_-_Blue3

At the moment, the answer background is the same colour as the invitation for student’s to type something, although I guess we could also interpret as part of the tutor-alongside dialogue, and the lighter signifies expected dialogic responses whether from the student or the “tutor” (i.e. the notebook author).

We might also want to make use of answer buttons after a code completion activity. I haven’t figured out the best way to do this as of yet.

M269_-_Python_-_Blue4

At the moment, the answer button only reveals text – and even then the text needs to be styled as HTML (the markdown parsing doesn’t work:-(

M269_-_Python_-_Blue5

I guess one approach might be to spawn a new code cell containing some code written in to the answer button div. Another might be to populate a code cell following the answer button with the answer code, hiding the cell and disabling it (so it can’t be executed/run), then revealing it when the answer button is clicked? I’m also not sure whether answer code should be runnable or not?

The mechanic for setting the cell state is currently a little clunky. There are two extensions, one for the answer button, one for setting the state other cells, that use different techniques for addressing the cells (and that really need to be rationalised). The extensions run styling automatically when the notebook is started, or when triggered. At the moment, I’m using icons from the orgininal code I “borrowed” – which aren’t ideal!

M269_-_Python_-_Blue0

The cell state setter button toggles selected code cells from activity to not-activity states, and markdown cells from activity-description to activity-student-answer to not-activity. The answer button button adds an answer button at every answer div (even if there’s already an answer button rendered). Both extensions style/annotate restarted notebooks correctly.

The current, hacky, user model I have in mind is that authors have an extended notebook with buttons to set the cell styles, and students have an extended notebook without buttons that just styles the notebook when it’s opened.

FWIW, here’s the gist containing extensions code.

Comments/thoughts appreciated…

Pivot Tables, pandas and IPython Notebooks

For the last few months, I’ve found a home in IPython Notebooks for dabbling with data. IPython notebooks provide a flexible authoring tool for combining text with executable code fragments, as well as the outputs from executing code, such as charts, data tables or automatically generated text reports. We can also embed additional HTML5 content into a notebook, either inline or as an iframe.

With a little judicious use of templates, we can easily take data from a source we are working with in the notebook, and then render a view of it using included components. This makes it easy to use hybrid approaches to working with data in the notebook context. (Note: the use of cell magics also let’s us operate on a data set using different languages in the same notebook – for example, Python and R.)

As an example of a hybrid approach to exploratory data analysis, how about the following?

The data manipulation library I’ve spent most of my time in to date in the notebooks is pandas. pandas is a really powerful tool for wrangling tabular data shapes, including reshaping them and running group reports on them. Among the operations pandas supports are pivot tables. But writing the code can be fiddly, and sometimes you just want an interactive hands on play with the data. IPython Notebooks do support widgets (though I haven’t played with them yet), so I guess I could try to write a simple UI for running different pivot table views over dataset in an interactive fashion.

But if I’m happy with reading the numbers the pivot table table reports as an end product, and don’t need access to the report as data, I can use a third party interactive pivot table widget such as Nicolas Kutchen’s Pivot Table component to work with the data in an interactive fashion.

Pivot_Table_demo

I’ve popped a quick demo of a clunky hacky way of feeding a pivot table widget from a pandas dataframe here: pivot table in IPython Notebook demo. A pandas dataframe is written as an HTML table and embedded in a templated page that generates the pivot table from the the HTML table. This page is saved as an HTML file and then loaded in as an IFrame. (We could add the HTML to an iframe using srcdoc, rather than saving it as a file and loading it back in, but I thought it might be handy to have access to a copy of the file. Also, I’m not sure if all browsers support srcdoc?)

(Note: we could also use the pivot table widget with a subset of a large dataset to generate dummy reports to find the ones we want, and test pandas code against the same subset of data against that output to check the code is generating the table we want, and then deploy the code against the full dataset.)

The pivot table has the data in memory as a hidden HTML table in the pivot table page, so performance may be limited for large datasets. On my machine, it was quite happy working with a month’s spending/transparency data from my local council, and provided a quick way of generating different summary views over the data. (The df dataframe is simply the result of loading in the spending data CSV file as a pandas dataframe – no other processing required. So the pivotTable() function could easily be modified to accept the location of a CSV file, such as a local address or a URL, load the file in automatically into a dataframe, and then render it as a pivot table.)

Pivot_Table_demo2

There’s also limited functionality for tunneling down into the data by filtering it within the chart (reather than having to generate a filtered view of the data that is then baked in as a table to the chart HTML page, for example):

Pivot_Table_demo3

I’ve been dabbling with various other embedded charts too which I’ll post when I get a chance.

Anscombe’s Quartet – IPython Notebook

Anyone who’s seen one of my talks that even touches on data and visualisation will probably know how it like to use Anscombe’s Quartet as a demonstration of why it makes sense to look at data, as well as to illustrate the notion of a macroscope, albeit one applied to a case of N=all where all is small…

Some time ago I posted a small R demo – The Visual Difference – R and Anscombe’s Quartet. For the new OU course I’m working on (TM351 – “The Data Course”), our focus is on using IPython Notebooks. And as there’s a chunk in the course about dataviz, I feel more or less obliged to bring Anscombe’s Quartet in:-)

As we’re still finding our way about how to make use of IPython Notebooks as part of an online distance education course, I’m keen to collect feedback on some of the ways we’re considering using the notebooks.

The Anscombe’s Quartet notebook has quite a simple design – we’re essentially just using the cells as computed reveals – but I’m still be keen to hear any comments about how well folk think it might work as a piece of standalone teaching material, particularly in a distance education setting.

The notebook itself is on github (ou-tm351), along with sample data, and a preview of the unexecuted notebook can be viewed on nbviewer: Anscombe’s Quartet – IPython Notebook.

Just by the by, the notebook also demonstrates the use of pandas for reshaping the dataset (as well as linking out to a demonstration of how to reshape the data using OpenRefine) and the ŷhat ggplot python library (docs, code) for visualising the dataset.

Please feel free to post comments here or as issues on the github repo.

New Ed Tech Toys for TM351…

I did a thing earlier this week to the internal OU CALRG conference about some of my thinking ongoing at the moment around new edtech toys for “the data course”, TM351.

Annotated slides here: Imagining TM351: from virtual machines to notebooks.

Having presented it, the slides need reordering, a bit more emphasis needs to be placed on role human readable text can play in notebooks (h/t Alistair Willis for that observation), and I need to do quite a bit more thinking about the spreadsheet-notebook comparison.

Also to do are more thoughts on the “(non)linearity”/”serialisation” aspects of authoring, reading and executing/working through that I touched on in another talk, from last week: From storymaps to notebooks: do your computing one step at a time.

Tracking Changes in IPython Notebooks?

Managing the tracking suggested changes to the same set of docs, along with comments and observations, from multiple respondents in is one of the challenges any organisation who business is largely concerned with the production of documents has to face.

Passing shared/social living documents by reference rather than value, so that folk don’t have to share multiple physical copies of the same document, each annotated separately, is one way. Tools like track changes in word processor docs, wiki page histories, or git diffs, is another.

All documents have an underlying representation – web pages have HTML, word documents have whatever XML horrors lay under the hood, IPython notebooks have JSON.

Change tracking solutions like git show differences to the raw representation, as in this example of a couple of changes made to a (raw) IPython notebook:

Track changes in github

Notebooks can also be saved in non-executable form that includes previously generated cell outputs as HTML, but again a git view of the differences would reveal changes at the HTML code level, rather than the rendered HTML level. (Tracked changes also include ‘useful’ ones, such as changes to cell contents, and (at a WYSWYG level at least) irrelevant ‘administrative’ value changes such as changes to hash values recored in the notebook source JSON.

Tracking changes in a WYSIWYG display shows the changes at the rendered, WYSIWYG level, as for example this demo of a track changes CKEditor plugin demonstrates [docs]:

lite - ck editor track changes

However, the change management features are typically implemented through additional additional metadata/markup to the underlying representation:

lite changes src

For the course we’re working on at the moment, we’re making significant use of IPython notebooks, requiring comments/suggested changes from multiple reviewers over the same set of notebooks.

So I was wondering – what would it take to have an nbviewer style view in something like github that could render WYSIWYG track changes style views over a modified notebook in just cell contents and cell outputs?

This SO thread maybe touches on related issues: Using IPython notebooks under version control.

A similar principle would work the same for HTML too, of course. Hmm, thinks… are there any git previewers for HTML that log edits/diffs at the HTML level but then render those diffs at the WYSIWYG level in a traditional track changes style view?

Hmm… I wonder if a plugin for Atom.io might do this? (Anyone know if atom.io can also run as a service? Eg could I put it onto a VM and then axis it through localhost:ATOMIOPORT?)

PS also on the change management thing in IPython Notebooks, and again something that might make sense in a got context, is the management of ‘undo’ features in a cell.

IPython notebooks have a powerful cell-by-cell undo feature that works at least during a current session (if you shut down a notebook and then restart it, I assume the cell history is lost?). [Anyone know a good link describing/summarising the history/undo features of IPython Notebooks?]

I’m keen for students to take ownership of notebooks and try things out within them, but I’m also mindful that sometimes they make make repeated changes to a cell, lose the undo history for whatever reason, and then reset the cell to the “original” contents, for some definition of “original” (such as the version that was issued to the learner by the instructor, or the version the learner viewed at their first use of the notebook.)

A clunky solution is for students to duplicatea each notebook before they start to work on it so they have an original copy. But this is a bit clunky. I just want an option to reveal a “reset” button by each cell and then be able to reset it. Or perhaps in line with the other cell operations, reset either a specific highlight cell, reset all, cells, or reset all cells above or below a selected cell.