OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Posts Tagged ‘TM351

Edtech and IPython Notebooks – Activities and Answer Reveals

leave a comment »

A few months ago I posted about an interaction style that I’d been using – and that stopped working – in IPython notebooks: answer reveals.

An issue I raised on the relevant git account account turned up a solution that I’ve finally gotten round to trying out – and extending with a little bit of styling. I’ve also reused chunks from another extension (read only code cells) to help style other sorts of cell.

Before showing where I’m at with the notebooks, here’s where OU online course materials are at the moment.

Teaching text is delivered via the VLE (have a look on OpenLearn for lots of examples). Activities are distinguished from “reading” text by use of a coloured background.

m269_closedAns

The activity requires a student to do something, and then a hidden discussion or anser can be revealed so that the student can check their answer.

m269_ansReveal

(It’s easy to be lazy, of course, and just click the button without really participating in the activity. In print materials, a frictional overhead was added by having answers in the back of the study guide that you would have to turn to. I often wonder whether we need a bit more friction in the browser based material, perhaps a time based one where the button can’t be clicked for x seconds after the button is first seen in the viewport (eg triggered using a test like this jQuery isOnScreen plugin)?!)

I drew on this design style to support the following UI in an IPython notebook:

Here’s a markdown cell in activity style and activity answer style.

M269_-_Python_-_Blue1a

The lighter blue background set into the activity is an invitation for students to type something into those cells. The code cell is identified as such by the code line In [ ] label. Whatever students type into those cells can be executed.

The heading is styled from a div element:

M269_-_Python_-_Blue

If we wanted to a slightly different header background style as in the browser materials, we could perhaps select a notebook heading style and then colour the background differently right across the width of the page. (Bah.. should have thought of that earlier!;-)

Markdown cells can also be styled to prompt students to make a text response (i.e. a response written in markdown, though I guess we could also use raw text cells). I don’t yet have a feeling for how much ownership students will take of notebooks and start to treat them as workbooks?

Answer reveal buttons can also be added in:

M269_-_Python_-_Blue2

Clicking on the Answer button displays the answer.

M269_-_Python_-_Blue3

At the moment, the answer background is the same colour as the invitation for student’s to type something, although I guess we could also interpret as part of the tutor-alongside dialogue, and the lighter signifies expected dialogic responses whether from the student or the “tutor” (i.e. the notebook author).

We might also want to make use of answer buttons after a code completion activity. I haven’t figured out the best way to do this as of yet.

M269_-_Python_-_Blue4

At the moment, the answer button only reveals text – and even then the text needs to be styled as HTML (the markdown parsing doesn’t work:-(

M269_-_Python_-_Blue5

I guess one approach might be to spawn a new code cell containing some code written in to the answer button div. Another might be to populate a code cell following the answer button with the answer code, hiding the cell and disabling it (so it can’t be executed/run), then revealing it when the answer button is clicked? I’m also not sure whether answer code should be runnable or not?

The mechanic for setting the cell state is currently a little clunky. There are two extensions, one for the answer button, one for setting the state other cells, that use different techniques for addressing the cells (and that really need to be rationalised). The extensions run styling automatically when the notebook is started, or when triggered. At the moment, I’m using icons from the orgininal code I “borrowed” – which aren’t ideal!

M269_-_Python_-_Blue0

The cell state setter button toggles selected code cells from activity to not-activity states, and markdown cells from activity-description to activity-student-answer to not-activity. The answer button button adds an answer button at every answer div (even if there’s already an answer button rendered). Both extensions style/annotate restarted notebooks correctly.

The current, hacky, user model I have in mind is that authors have an extended notebook with buttons to set the cell styles, and students have an extended notebook without buttons that just styles the notebook when it’s opened.

FWIW, here’s the gist containing extensions code.

Comments/thoughts appreciated…

Written by Tony Hirst

October 8, 2014 at 6:51 pm

Posted in OU2.0, Tinkering

Tagged with ,

Assessing Data Wrangling Skills – Conversations With Data

with one comment

By chance, a couple of days ago I stumbled across a spreadsheet summarising awarded PFI contracts as of 2013 (private finance initiative projects, 2013 summary data).

The spreadsheet has 800 or so rows, and a multitude of columns, some of which are essentially grouped (although multi-level, hierarchical headings are not explicitly used) – columns relating to (estimated) spend in different tax years, for example, or the equity partners involved in a particular project.

As part of the OU course we’re currently developing on “data”, we’re exploring an assessment framework based on students running small data investigations, documented using IPython notebooks, in which we will expect students to demonstrate a range of technical and analytical skills, as well as some understanding of data structures and shapes, data management technology and policy choices, and so on. In support of this, I’ve been trying to put together one or two notebooks a week over the course of a few hours to see what sorts of thing might be possible, tractable and appropriate for inclusion in such a data investigation report within the time constraints allowed.

To this end, the Quick Look at UK PFI Contracts Data notebook demonstrates some of the ways in which we might learn something about the data contained within the PFI summary data spreadsheet. The first thing to note is that it’s not really a data investigation: there is no specific question I set out to ask, and no specific theme I set out to explore. It’s more exploratory (that is, rambling!) than that. It has more of the form what I’ve started referring to as a conversation with data.

In a conversation with data, we develop an understanding of what the data may have to say, along with a set of tools (that is, recipes, or even functions) that allow us to talk to it more easily. These tools and recipes are reusable within the context of the data set or datasets selected for the conversation, and may be transferrable to other data conversations. For example, one recipe might be how to filter and summarise a dataset in a particular way, or generate a particular view or reshaping of it that makes it easy to ask a particular sort of question, or generate a particular sort of a chart (creating a chart is like asking a question where the surface answer is returned in a graphical form).

If we are going to use IPython notebook documented conversations with data as part of an assessment process, we need to tighten up a little more how we might expect to see them structured and what we might expect to see contained within them.

Quickly skimming through the PFI conversation, elements of the following skills are demonstrated:

- downloading a dataset from a remote location;
- loading it into an appropriate data representation;
- cleaning (and type casting) the data;
- spotting possibly erroneous data;
- subsetting/filtering the data by row and column, including the use of partial string matching;
- generating multiple views over the data;
- reshaping the data;
- using grouping as the basis for the generation of summary data;
- sorting the data;
- joining different data views;
- generating graphical charts from derived data using “native” matplotlib charts and a separate charting library (ggplot);
- generating text based interpretations of the data (“data2text” or “data textualisation”).

The notebook also demonstrates some elements of reflection about how the data might be used. For example, it might be used in association with other datasets to help people keep track of the performance of facilities funded under PFI contracts.

Note: the notebook I link to above does not include any database management system elements. As such, it represents elements that might be appropriate for inclusion in a report from the first third of our data course – which covers basic principles of data wrangling using pandas as the dominant toolkit. Conversations held later in the course should also demonstrate how to get data into an out of an appropriately chosen database management system, for example.

Written by Tony Hirst

September 24, 2014 at 11:00 am

Posted in Anything you want, Infoskills

Tagged with

Anscombe’s Quartet – IPython Notebook

Anyone who’s seen one of my talks that even touches on data and visualisation will probably know how it like to use Anscombe’s Quartet as a demonstration of why it makes sense to look at data, as well as to illustrate the notion of a macroscope, albeit one applied to a case of N=all where all is small…

Some time ago I posted a small R demo – The Visual Difference – R and Anscombe’s Quartet. For the new OU course I’m working on (TM351 – “The Data Course”), our focus is on using IPython Notebooks. And as there’s a chunk in the course about dataviz, I feel more or less obliged to bring Anscombe’s Quartet in:-)

As we’re still finding our way about how to make use of IPython Notebooks as part of an online distance education course, I’m keen to collect feedback on some of the ways we’re considering using the notebooks.

The Anscombe’s Quartet notebook has quite a simple design – we’re essentially just using the cells as computed reveals – but I’m still be keen to hear any comments about how well folk think it might work as a piece of standalone teaching material, particularly in a distance education setting.

The notebook itself is on github (ou-tm351), along with sample data, and a preview of the unexecuted notebook can be viewed on nbviewer: Anscombe’s Quartet – IPython Notebook.

Just by the by, the notebook also demonstrates the use of pandas for reshaping the dataset (as well as linking out to a demonstration of how to reshape the data using OpenRefine) and the ŷhat ggplot python library (docs, code) for visualising the dataset.

Please feel free to post comments here or as issues on the github repo.

Written by Tony Hirst

June 30, 2014 at 1:54 pm

Posted in OU2.0

Tagged with , , ,

Confused Again About VM Ecology… I Blame Not Blogging

Via a cc’d tweet from Martin Hawksey, this lovely post from Tom Smith/@everythingabili (who has the best ever twitter bio strapline) on How I Learn ( And What I’m Learning ).

I like to think that I used to write blog posts that had the same sort of sense as that post…

…but for the last few months at least, I don’t think I have.

“Working” for once – starting production on an OU course (TM351, due out October 2015 (sic; I’m gonna be late on the first draft of the 7 weeks of the course I’m responsible for: it’s due in a little over a fortnight…), and also helping out on an investigative project the School of Data is partnering on – has meant that most of the learnings and failings that I used to blog about have been turned inward to emails (which are even more poorly structured in terms of note-taking than this blog is) if at all.

Which is a shame and makes me not happy.

Reading through completed academic papers, making useful (I hope) use of them in the course draft, has been quite fun in part – and made me regret at times not writing up work of my own in a self-contained, peer reviewed way over the last decade or so; getting stuff “into the record”, properly citable, and coherent enough to actually be cited. But as you pick away at the papers, you know they’re a revisionist telling, a straightforward narrative of how the pieces fit together and in which nothing went wrong along the way; (you also know that the closer you get to trying to replicate a paper, the harder it is to find the missing pieces (process, trick, equation or insight) that actually make it work; remember school maths, or physics, and the textbook that goes from one equation to the next with a “hence”, but there’s no way in hell you can figure out how to make that step and you know you’ll be stuck when that bit comes up in the exam…?! That. Or worse. When you bang your head against a wall trying to get something to work, contort your mental models to make it work, sort of, then see the errata list correcting that thing. That, too.)

On the other hand, this blog is not coherent, shapes no whole, but is full of hence steps. Disjointed ones, admittedly. But they keep a track of all the bits I tried at and failed at and worked around, and they keep on getting me out of holes… Like the emails won’t. Lost. Wasted effort because the learning fumblings that are OUseful learning fumblings are lost and locked up in email hell.

It makes me very not happy.

So that, by way of intro, to this: a quick catchup follow-up to Cursory Thoughts on Virtual Machines in Distance Education Courses and Doodling With IPython Notebooks for Education, a partial remembering of the various shades of hell associated with them and trying to share them.

Here’s what I think I now want to do (whether or not it’s the right thing I’m not sure).

  • generate a script that will build a VM. We’ve opted for Virtualbox as the VM container. The VM will need to contain: pandas; IPython notebook (course team want it to run Python 3.3. I’ve lost track of how many hours I’ve spent trying and failing to get Python libraries I think we need trying to run under Python 3.3; wasted effort; I should have settled with Python 2.7 and then revisited 3.3 in several months hence; the 2.7 3.3 tweaks to any code we write for the course should manageable in migration terms. Pratting around on libraries that I’m guessing will get patched as native distributions move to 3.3 by default but don’t work yet is wasted effort. October. 2015. First presentation.); PostgreSQL (perhaps with some extensions); mongodb; ipythonblocks; blockdiag; I came across shellinabox today and wonder if we should run that; OpenRefine (CT against this – I think it’s good for developing mental models); python-nvd3; folium; a ggplot port to python; (CT take – too much new stuff; my take, we should go as high up the stack as we can in terms of the grammar of the calling functions); I think we should run R and RStudio too to make for a really useful VM, making the point that the language doesn’t matter and we use whatever’s appropriate to get things done, but I don’t think anyone else does. if. Which computer language is that from then? for. Or that one? How about in? print? Cars are for driving. Mine’s blue. I have no idea how it works. Can I drive your car? The red one. With the left-hand drive.
  • access the services running on the headless VM via a browser on host. I think we should talk to the databases using Python, but I may be in the minority. We may need something more graphical to talk to postgresql. If we do, I would argue it should be a browser based client – if it’s an app, we’re moving functionality provision outside of the VM.
  • use the script to build to machines with the same package config; CT seem to prefer a single build on a 32 bit machine. I think we should support 64 bit as well. And deployment on at least one cloud service – I;d go for Amazon, but that’s mainly because it’s the only one I’ve tried. If we could support more, even better.
  • as far as maintenance goes, I wrote the vagrant script to update libraries whenever the provisioner is run (which is quite a lot at the mo as I keep finding new things to add to the box!;-) This may or may not be sensible for student use. If there is a bug in a library, an update could help. If there is a security patch to the kernel, we should be updating as good citizens. The current plan is to ship a built box (which I think would have to go on to a USB stick – we can’t rely on folk having computers with a DVD any more, and a 1.5GB download seems to be proving unreliable without a proper download manager. As it is, students will have to download virtualbox and vagrant, and install those themselves. (Unless we can put installers for them on a USB stick too.) If we do ship a built box, we need to think of some way of kickstarting the services and perhaps rebooting the machine (and then kickstarting the services). There is a separate question of whether we should be also be able to update config scripts during presentation. This would presumably have to be done on the host. One way might be to put config scripts on a git server then use git to keep the config scripts on the students’ host machine up to date, but that would probably also require them to install a git commandline tool, even if we automated the rest. Did I say this all has to run cross platform? Students may be on Windows (likely?), Mac or Linux. I think the course should be doable, if painfully, via a tablet, which means the VM needs the cloud hosted option. If we could also find a way to help students configure their whatever platform host so that they could access services from the VM running on it via their tablet, so much the better.
  • files need to be shared between VM and host. This raises an interesting issue for a cloud hosted VM. Either, we need to find a way to synch files between desktop and cloud VM, persist state on the cloud host so that the VM can synch to it, or pop dropbox into the cloud VM (though there would then be a synch delay, as there would with a desktop synch). I favour persisting on the cloud, though there is then the question of the student who is working on a home machine one day and a cloud machine the next.
  • Starting and stopping services: students need to be able to start and stop services running on the VM without having to ssh in. Once click easy. A dashboard with buttons that show if a service is running or not, click a button to toggle the run state of the the service. No idea how to do this.

Here’s the approach I’ve taken:

  • reusing DataminerUK’s infinite-interns model as a starting point, I’m using vagrant to build and provision a VM using puppet. At the moment I have duplicate setups for two different Linux base boxes (precise64 and raring64. The plan is to move to the latest Ubuntu LTS.) I really should have a single setup with the different machine setups called by name from a single Vagrantfile. I think.
  • The puppet provisioner builds the box from a minimal base and starts the services running. It’s aggressive on updates. The precise64 box is running python 2.7 and the raring64 box 3.3. Getting pip3 running in the raring box was a pain, and I don’t know how to tell puppet to use the pip3 thing I eventually installed to update. At the moment I fudge with:
    exec { "pip3-update":
    command => "/usr/local/bin/pip3 install --upgrade pip"
    }

    but it breaks because I’m not convinced that is always the right path (I’d like to hedge on /usr/bin:/usr/local/bin), or that pip3 is actually installed when I try to exec it… I think what I really want to do is something like
    package {
    [
    JUST UPGRADE YOURSELF, PLEASE
    ]: ensure => latest,
    provider => 'pip3';
    }

    with an additional dependency check (=>) that pip3 has been installed first, and from all the other pip3 installs that pip3 has been upgraded first.
  • The IPython notebook is started by a config shell script called from puppet. I think I’m also using a config script to set up a user and test tables in Postgres (though I plan to move to the puppet extension as soon as I can get round to reading the docs/finding out how to do it).
  • There are times when a server needs restarting. At the moment I have to run vagrant provision to do that – or even vagrant halt;vagrant up, which means it also runs the updates. It’d be nice to just be able to run the bits that restart the services (the DBMS’, IPython notebook etc) without doing any of the package updates, installs, checks etc.
  • We need a tool to check whether services are running on the necessary ports to help debugging without requiring a user to SSH into the VM; I’ve also fixed on default ports. We really need to change ports if a default port is being used to a free port and then somehow tell the IPython notebook scripts which port each service is running on. With vagrant letting you run a VM from within a particular directory, being able to know what VMs are being run and from where, wherever vagrant on host started them, would be useful.
  • I don’t use a configurator for the postgres db (it needs seeding with some example tables) but should do – on my to do list is to look at https://github.com/puppetlabs/puppetlabs-postgresql . Similarly for mongo db – and perhaps https://github.com/puppetlabs/puppetlabs-mongodb
  • To make use of python-nvd3, suggested route is to use bower. I got the npm package manager to work but have failed to find a way of installing any actual packages [issue].

Issues to date, aside from things like port clashes and all manner of f**k ups because I distributed a README with an error in it and folk tried to follow it rather than patches posted elsewhere, have been impeded by not having a good way of logging and sharing errors. OU network issues have also contributed to the fun. I always avoid the OU staff network, but nothing seems to work on that. I suspect this is a proxy issue, but I’m unwilling to invest any time exploring it or how to config the VM to cope (no-one else has offered to go down this rat hole). Poxy proxies could well be an issue for some students, but I’m not sure what the best way forward is. Cloud hosted VMs?!

I also had a problem on OU eduroam – mongodb wants to get something signed from a keyserver before it will install mongodb, but OU eduroam (the network I always use) seems to block the keyserver. Tracking that down wasted a good hour.

Here are some other things I’ve heard about:

- https://github.com/psychemedia/notebookcloud This is cloned from https://github.com/carlsmith who appears to have taken his repo – and the app – down? It provided a dashboard for firing up notebook servers on Amazon cloud. If I hadn’t been ‘working’ I’d have blogged screenshots and the workflow. As it is, all I have are vague memories of how it worked and what it did and the ideas that sprung off of having an artefact to talk around. [Hmm... app seems to have come back up - maybe I caught it at a bad time... https://notebookcloud.appspot.com/login ]

- provisioning things: chef, vagrant, puppet, docker.

What should I be using for what?

I thought about different VMs for different services, but that adds too much VM weight and requires networking between the VMs, we could lead to “support issues”. Would docker help here? A base VM built from vagrant and puppet, then docker to put additional machines on top.

What I want is students to be able to:

- install minimum number of third party apps on their desktop (currently virtualbox and vagrant)
- click one button get their VM. My feeling is we should have a largely prebuilt box on a USB stick they can use as a base box, then do a top up build and provision. I suspect CT would like one click somewhere to fire up a machine, get services running, and open a tab to the IPython notebook in their browser, maybe a status button somewhere, a single button to restart any services that have stopped and options to suspend or shutdown machines. In terms of student workflow, I think suspending and resuming machines (if services can resume appropriately) would be the neatest workflow. Note: the course runs over 9 months…
- be able to access files on host that are used in the VM. If they are using multiple VMs (eg on cloud and desktop) to find a sensible way of synching notebooks and data/database state across those machines; which could be tricky at least as far as database state goes.
- if a student is not using postgresql or mongo – and they won’t for the first 8 weeks of the course – it could be useful to run the VM without those services running (perhaps aside from a quick self-test in week 1 so we can check out any issues as early as possible and give ourselves a week or two to come up with any patches before those apps are used in anger). So maybe a control panel to fire up the VM and the services you want to run. Yes mongo, no postgresql. No DB at all. And so on. Would docker help here? Decide on host somehow which services to run, fire up the VM, then load in and start up the required services. Next session, change which services are running in the VM?

All in all, I reckon I’m between 20 and 40% there (further along in terms of time?) but not sure how much further I can push it to the desired level of robustness without learning how to do this stuff a bit more properly… I’m also not really sure what’s practically and reliably possible and what’s best for what. We have to maximise the robustness of stuff ‘just working’ and minimise support issues because students are at a distance and we don’t know what platform they’re on. I think if I started from scratch and rewrote the scripts now they’d be a lot clearer, but I know that’d take half a day and the functional return – for now – I think would be minimal.

That said, I’ve done a fair amount of learning, though large chunks of it have been lost to email and not blogging. Which is a shame. Must do better. Must take public notes.

Written by Tony Hirst

May 15, 2014 at 11:37 pm

Posted in Anything you want

Tagged with

Visualising Pandas DataFrames With IPythonBlocks – Proof of Concept

A few weeks ago I came across IPythonBlocks, a Python library developed to support the teaching of Python programming. The library provides an HTML grid that can be manipulated using simple programming constructs, presenting the outcome of the operations in a visually meaningful way.

As part of a new third level OU course we’re putting together on databases and data wrangling, I’ve been getting to grips with the python pandas library. This library provides a dataframe based framework for data analysis and data-styled programming that bears a significant resemblance to R’s notion of dataframes and vectorised computing. pandas also provides a range of dataframe based operations that resemble SQL style operations – joining tables, for example, and performing grouping style summary operations.

One of the things we’re quite keen to do as a course team is identify visually appealing ways of illustrating a variety of data manipulating operations; so I wondered whether we might be able to use ipythonblocks as a basis for visualising – and debugging – pandas dataframe operations.

I’ve posted a demo IPython notebook here: ipythonblocks/pandas proof of concept [nbviewer preview]. In it, I’ve started to sketch out some simple functions for visualising pandas dataframes using ipythonblocks blocks.

For example, the following minimal function finds the size and shape of a pandas dataframe and uses it to configure a simple block:

def pBlockGrid(df):
    (y,x)=df.shape
    return BlockGrid(x,y)

We can also colour individual blocks – the following example uses colour to reveal the different datatypes of columns within a dataframe:

ipythinblocks pandas type colour

A more elaborate function attempts to visualise the outcome of merging two data frames:

ipythonblocks pandas demo

The green colour identifies key columns, the red and blue cells data elements from the left and right joined dataframes respectively, and the black cells NA/NaN cells.

One thing I started wondering about that I have to admit quite excited me (?!;-) was whether it would be possible to extend the pandas dataframe itself with methods for producing ipythonblocks visual representations of the state of a dataframe, or the effect of dataframe based operations such as .concat() and .merge() on source dataframes.

If you have any comments on this approach, suggestions for additional or alternative ways of visualising dataframe transformations, or thoughts about how to extend pandas dataframes with ipythonblocks style visualisations of those datastructures and/or the operations that can be applied to them, please let me know via the comments:-)

PS some thoughts on a possible pandas interface:

  • DataFrame().blocks() to show the blocks
  • .cat(blocks=True) and .merge(blocks=True) to return (df, blocks)
  • DataFrame().blocks(blockProperties={}) and eg .merge(blocks=True, blockProperties={})
  • blockProperties: showNA=True|False, color_base=(), color_NA=(), color_left=(), color_right=(), color_gradient=[] (eg for a .cat() on many dataframes), colorView=structure|datatypes|missing (the colorView reveals the datatypes of the columns, the structure origins of cells returned from a .merge() or .cat(), or a view of missing data (reveal NA/NaN etc over a base color), colorTypes={} (to set the colors for different datatypes)

Written by Tony Hirst

March 26, 2014 at 11:37 pm

Follow

Get every new post delivered to your Inbox.

Join 841 other followers