Deconstructing the TM351 Virtual Computing Environment via VS Code

For 2020J, which is to say, the 2020 October presentation, of our TM351 Data Management and Analysis course, we’ve deprecated the original VirtualBox packaged virtual machine and moved to a monolithic Docker container that packages all the required software applications and services (a Jupyer notebook server, postgres and mongoDB database servers, and OpenRefine).

As with the VM, the container is headless and exposes applications over http via browser based user interfaces. We also rebranded away from “TM351 VM” to “TM351 VCE”, where VCE stands for Virtual Computing Environment.

Once Docker is installed, the environment is installed and launched purely from the command line using a docker run command. Students early in to the forums have suggested moving to docker compose, which simplifies the command line command significantly, but also at the cost of having to supply a docker-compose.yaml . With OU workflows, it can take weeks, if not months, to get files onto the VLE for the first time, and days to weeks to post updates (along with a host of news announcements and internal strife about the possibility of tutors/ALs and students having different versions of the file). As we need to support cross-platfrom operation, and as the startup command specifies file paths for volume mounts, we’d need different docker-compose files (I think?) because file paths on Mac/Linux hosts, versus Windows hosts, use a different file path syntax (forward vs back slashes as path delimiters. [If anyone can tell me how to write a docker-compose.yaml files with arbitrary paths on the volume mounts, please let me know via the comments…]

Something else that has cropped up early in the forums is mention of VS Code, which presents a way to personalise the way in which the course materials are used.

By default, the course materials we provide for practical activities are all based on Jupyter notebooks, delivered via the Jupyter notebook server in the VCE (or via an OU hosted notebook server we are also exploring this year). The activities are essentially inlined using notebook code cells within a notebook that presents a linear teaching text narrative.

Students access the notebooks via their web browser, wherever the notebook server is situated. For students running the Docker VCE, notebook files (and OpenRefine project files) exist in a directory on the student’s own computer that is then mounted into the container; make changes to the notebooks in the container and those changes are saved in the notebooks mounted from host. Delete the container, and the notebooks are still on your desktop. For students using the online hosted notebook server, there is no way of synchronising files back to the student desktop, as far as I am aware; there was an opportunity to explore how we might allow students to use something like private Github repositories to persist their files in a space they control, but to my knowledge that has not been explored (a missed opportunity, to my mind…).

Using the VS Code Python extension, students installing VS Code on their own computer can connect to the Jupyter server running in the containerised VCE and (I don’t know if the permissions allow this on the hosted server).

The following tm351vce.code-workspace file describes the required settings:

{
"folders": [
{
"path": "."
}
],
"settings": {
"python.dataScience.jupyterServerURI": "http://localhost:35180/?token=letmein"
}
}

The VSCode Python extension renders notebooks, so students can open local copies of files from their own desktop and execute code cells against the containerised kernel. If permissions on the hosted Jupyter service allow remote/external connections, this would provide a workaround for synching notebooks files: students would work with notebook files saved on their own computer but executed against the hosted server kernel.

Queries can be run against the database servers via the code cells in the normal way (we use some magic to support this for the postgres database).

If we make some minor tweaks to the config files for the PostgreSQL and MongoDB database servers, we can use the VS Code PostgreSQL extension and MongoDB extension to run queries from VS Code directly against the databases.

For example, the postgres database:

image

and the mongo database:

image

Note that this is now outside the narrative context of the notebooks, although it strikes me that we could generate .sql and .json text files from notebooks that show code literally and comment out the narrative text (the markdown text in the notebooks).

However, we wouldn’t be able to work directly with the data returned from the database via Python/pandas dataframes, as we do in the notebook case. (Note also that in the notebooks we use a Python API for querying the mongo database, rather than directly issuing Javascript based queries.)

At this point you might ask why we would want to deconstruct / decompose the original structured notebook+notebook UI environment and allow students to use VS Code to access the computational environment, not least when we are in the process of updating the notebooks and the notebook environment to use extensions that add additional style and features to the user environment. Several reasons come to my mind that are motivated by finding ways in which we can essentially lose control, as educators, of the user interface whilst still being reasonably confident that the computational environment will continue to perform as we intend (this stance will probably make many of my colleagues shudder; I call it supporting personalisation…):

  • we want students to take ownership of their computational environment; this includes being able to access it from their own clients that may be better suited to their needs, eg in terms of familiarity, accessibility, productivity, etc;
  • a lot of our students are already working in software development and already have toolchains they are working with. Whilst we see benefits of using the notebook UI from a teaching and learning perspective, the fact remains that students can also complete the activities in other user environments. We should not hinder them from using their own environments — the code should still continue to run in the same way — as long as we explain how the experience may not be the same as the one we are providing, and also noting that some of the graphics / extensions we use in the notebooks may not work in the same way, or may not even work at all, in the VS Code environment.

If students encounter issues when using their own environment, rather than the one we provide, we can’t offer support. If the personalised learning environment is not as supportive for teaching and learning as the environment we provide, it is the student’s choice to use it. As with the Jupyter environment, the VS Code environment sits at the centre of a wide ecosystem of third party extesions. If we can make our materials available in that environment, particulalry for students already familiar with that environment, they may be able to help us by identifying and demonstrating new ways, perhaps even more effective ways, of using the VS Code tooling to support their learning than the enviorment we provide. (One example might be the better support VS Code has for code linting and debugging, which are things we don’t teach, and that our chosen environment perhaps even prevents students who know how to use such tools from making use of them. Of course, you could argue we are doing students a service by grounding them back in the basics where they have to do their own linting and print() statement debugging… Another might be the Live Share/collaboration service that lets two or more users work collaboratively in the same notebook, which might be useful for personal tutorial sessions etc.)

From my perspective, I believe that, over time, we should try to create materials that continue to work effectively to support both teaching and learning in environments that students may already be working in, and not just the user interface environments we provide, not least becuase we potentially increase the number of ways in which students can see how they might make use of those tools / environments.

PS I do note that there may be licensing related issues with VS Code and the VS Code extensions store, which are not as open as they could be; VSCodium perhaps provides a way around that.

OERs in Practice: Re-use With Modification

Over the years, I’ve never really got my head round what other people mean by OERs (Opern Educational Resources) in terms of how they might be used.

From my own perspective, wholesale reuse (“macro reuse”) of a course isn’t relevant to me. When tasked with writing an OU unit, if I just point to a CC licensed course somewhere else and say “use that”, I suspect it won’t go down well.

I may want to quote a chunk a material, but I can do that with books anyway. Or I may want to reuse an activity, and then depending on how much rework or modification is applied, I may reference the original or not.

Software reuse is an another possibility, linking out to or embedding a third party application, but that tends to fall under the banner of openly licensed software reuse as much as OER reuse. Sometimes the embed may be branded; sometimes it may be possible to remove the branding (depending on how the asset is created, and the license terms), sometimes the resource might be a purely white label resource that can be rebranded.

Videos and audio clips are another class of resource that I have reused, partly because they are harder to produce. Video clips tend to come in various forms: on the one hand, things like lectures retain an association with the originator (a lecture given by Professor X of university Y is very obviously associated with Professor X and university Y); on the other hand, an animations, like software embeds, might come in a branded form, white labelled, or branded as distributed but white label licensed so you can remove/rebrand if you want to put the effort in.

Images are also handy things to be able to reuse, again because they can be hard to produce in at least two senses: firstly, coming up with the visual or graphical idea, i.e. how to depict something in a way that supports teaching or learning; secondly, actually producing the finished artwork. One widely used form of image reuse in the OU is the “redrawing” of an image originally produced elsewhere. This represents a reuse, or re-presentation, of an idea. In a sense, the image is treated as a sketch that is then redrawn.

This level of “micro reuse” of a resource, rather than the “macro reuse” of a course, is not something that was invented by OERs – academics have always incorporated and referenced words and pictures created by others – but it can make reuse easier by simplifying the permissions pathway (i.e. simplifying what otherwise might be a laborious copyright clearance process).

One of the other ways of making use of “micro” resources is to reuse them with modification.

If I share a text with you as a JPG of a PDF document, it can be quite hard for you to grab the text and elide a chunk of it (i.e. remove a chunk of it and replace it with … ). If I share the actual text as text, for example, in a Word document, you can edit it as you will.

Reuse with modification is also a fruitful way of reusing diagrams. But it can be harder to achieve in practical terms. For example, in a physics or electronics course, or a geometry course, there are likely to be standard mechanical principle diagrams, electrical circuits or geometrical proofs that you are likely to want to refer to. These diagrams may exist as openly licensed resources, but… The numbers or letters you want to label the diagram with may not be the same as in the original. So what do you do? Redraw the diagram? Or edit the original, which may reduce the quality of the original or introduce some visual artefact the reveals the edit (“photocopy lines”!).

But what if the “source code” or means of producing the diagram. For example, if the diagram is created in Adobe Illustrator or CorelDRAW and the diagram made available as an Adobe Artwork .ai file or a  CorelDRAW .cdr file, and you have an editor (such as the original, or an alternative such as Inkscape) that imports those file formats, you can edit and regenerate a modified version of the diagram at the same level of quality as the original. You could also more easily restyle the diagram, even if you don’t change any of the content. For example, you could change line thickness, fonts or font sizes, positioning, and so on.

One of the problems with sharing image project files for particular applications is that the editing and rendering environment for working with project file is likely separate from your authoring environment. If, while writing the text, you change an item in the text and want to change the same item as referenced in the image, you need to go to the image editor, make the change, export the image, copy it back into your document. This makes document maintenance hard and subject to error. It’s easy for the values of the same item as referenced in the text and the diagram to drift. (In databases, this is why you should only ever store the value of something once and then refer to its value by reference. If I have your address stored in two places, and you change address, I have to remember to change both of them; it’s also quite possible that the address I have for you will drift between the two copies I have of it…)

One way round this is to include the means for creating and editing the image within your text document. This is like editing a Microsoft Word document and including a diagram by using Microsoft drawing tools within the document. If you share the complete document with someone else, they can modify the diagram quite easily. If you share a PDF of the document, they’ll find it harder to modify the diagram.

Another way of generating diagrams is to “write” it, creating a “program” that defines how to draw the diagram and that can be run in a particular environment to actually produce the diagram. By changing the “source code” for the diagram, and rerunning it, you can generate a modified version of the diagram in whatever format you choose.

This is what packages like TikZ support [docs].

And this is what I’ve been exploring in Jupyter notebooks and Binderhub, where the Jupyter notebook contains all the content in the output document, including the instructions to create image assets or interactives, and the Binder container contains all the software libraries and tools required to generate and embed the image assets and interactives within the document from the instructions contained within the document.

That’s what I was trying to say in Maybe Programming Isn’t What You Think It Is? Creating Repurposable OERs (which also contains a link to a runnable example).

PS by the by, I also stumbled across this old post, an unpursued bid, today, that I have no recollection of at all: OERs: Public Service Education and Open Production. Makes me wonder how many other unfinished bids I started…

Maybe Programming Isn’t What You Think It Is? Creating Repurposable & Modifiable OERs

With all the “everyone needs to learn programming” hype around, I am trying to be charitable when it comes to what I think folk might mean by this.

For example, whilst trying to get some IPython magic working, I started having a look at TikZ, a LaTex extension that supports the generation of scientific and mathematical diagrams (and which has been around for decades…).

Getting LaTeX environments up and running can be a bit of a pain, but several of the Binderhub builds I’ve been putting together include LateX, and TikZ,  which means I have an install-free route trying snippets of TikZ code out.

As an example, in my showntell/maths demo includes an OpenLearn_Geometry.ipynb notebook that includes a few worked examples of how to “write” some of the figures that appear in an OpenLearn module on geometry.

From the notebook:

The notebook includes several hidden code cells that generate the a range of geometric figures. To render the images, go to the Cell menu and select Run All.

To view/hide the code used to generate the figures, click on the Hide/Reveal Code Cell Inputs button in the notebook toolbar.

To make changes to the diagrams, click in the appropriate code input cell, make your change, and then run the cell using the Run Cell (“Play”) button in the toolbar or via the keyboard shortcut SHIFT-ENTER.

Entering Ctrl-Z (or CMD-Z) in the code cell will undo your edits…

Launch the demo notebook server on Binder here.

Here’s an example of one of the written diagrams (there may be better ways; I only started learning how to write this stuff a couple of days ago!)

Whilst tinkering with this, a couple of things came to mind.

Firstly, this is programming, but perhaps not as you might have thought of it. If we taught adult novices some of the basic programming and coding skills using Tikz rather than turtle, they’d at least be able to create professional looking diagrams. (Okay, so the syntax is admittedly probably a bit scary and confusing to start with… But it could be simplified with some higher level, more abstracted, custom defined macros that learners could then peek inside.)

So when folk talk about teaching programming, maybe we need to think about this sort of thing as well as enterprise Java. (I spent plenty of time last night on the Stack Exchange TEX site!)

Secondly, the availability of things like Binderhub make it easier to build preloaded distributions that can be run by anyone, from anywhere (or at least, for as long as public Binderhub services exist). Simply by sharing a link, I can point you to a runnable notebook, in this case, the OpenLearn geometry demo notebook mentioned above.

One of the things that excites me, but I can’t seem to convince others about, is the desirability of constructing documents in the way the OpenLearn geometry demo notebook is constructed: all the assets displayed in the document are generated by the document. What this means is that if I want to tweak an image asset, I can do. The means of production – in the example, the TikZ code – is provide; it’s also editable and executable within the Binder Jupyter environment.

When HTML first appeared, web pages were shonky as anything, but there were a couple of buts…: the HTML parsers were forgiving, and would do their best to whatever corruption of HTML was thrown at them; and the browsers supported the ability to View Source (which still exists today; for example, in Chrome, go to the View menu then select Developer -> View Source).

Taken together, this meant that: a) folk could copy and paste other people’s HTML and try out tweaks to “cool stuff” they’d seen on other pages; b) if you got it wrong, the browser would have a go at rendering it anyway; you also wouldn’t feel as if you’d break anything serious by trying things out yourself.

So with things like Binder, where we can build disposable “computerless computing environments” (which is to say, pre-configured computing environments that you can run from anywhere, with just a browser to hand), there are now lots of opportunities to do powerful computer-ingy things (technical term…) from a simple, line at a time notebook interface, where you (or others) can blend notes and/or instruction text along with code – and code outputs.

For things like the OpenLearn demo notebook, we can see how the notebook environment provides a means by which educators can produce repurposeable documents, sharing not only educational materials for use by learners, or appropriation and reuse by other educators, but also the raw ingredients for producing customised forms of the sorts of diagrams contained in the materials: if the figure doesn’t have the labels you want, you can change them and re-render the diagram.

In a sense, sharing repurposeable, “reproducible” documents that contain the means to generate their own media assets (at least, when run in an appropriate environment: which is why Binderhub is such a big thing…) is a way of sharing your working. That is, it encourages open practice, and the sharing of how you’ve created something (perhaps even with comments in the “code” explaining why you’ve done something in a particular way, or where the inspiration/prior art came from), as well as the what of the things you have produced.

That’s it, for now… I’m pretty much burned out on trying to persuade folk of the benefits of any of this any more…

PS TikZ and PGF TikZ and PGF: TeX packages for creating graphics programmatically. Far more useful than turtle and Scratch?

Open Education Versions of Open Source Software: Adding Lightness and Accessibility to User Interfaces?

In a meeting a couple of days ago discussing some of the issues around what sort of resources we might want to provide students to support GIS (geographical information system) related activities, I started chasing the following idea…

The OU has, for a long time, developed software application in-house that is provided to students to support one or more courses. More often than not, the code is devloped and maintained in-house, and not released / published as open source software.

There are a couple of reasons for this. Firstly, the applications typically offer a clean, custom UI that minimises clutter and is designed in order to support usability for learners learning about a particular topic. Secondly, we require software provided by students to be accessible.

For example, the RobotLab software, originally developed, an still maintained, by my colleague Jon Rosewell was created to support a first year undergrad short course, T184 Robotics and the Meaning of Life, elements of which are still used in one of our level 1 courses today. The simulator was also used for many years to support first year undergrad residential schools, as well as a short “build a robot fairground” activity in the masters level team engineering course.

As well as the clean design, and features that support learning (such as a code stepper button in RobotLab that lets students step through code a line at a time), the interfaces also pay great attention to accessibility requirements. Whilst these features are essential for students with particular accessibility needs, they also benefit all out students by adding to the improved usability of the software as a whole.

So those are two, very good reasons, for developing software in-house. But as a downside, it means that we limit the exposure of students to “real” software.

That’s not to say all our courses use in-house software: many courses also provide industry standard software as part of the course offering. But this can present problems too: third party software may come with complex user interfaces, or interfaces that suffer from accessibility issues. And software versions used in the course may drift from latest releases if the software version is fixed for the life of the course. (In fact, the software version may be adopted a year before the start of the course and then expected to last for 5 years of course presentation). Or if software is updated, this may cause significant updates to be made to the course material wrapping the software.

Another issue with professional software is that much of it is mature, and has added features over its life. This is fine for early adopters: the initial versions of the software are probably feature light, and add features slowly over time, allowing the user to grow with them. Indeed, many latterly added features may have been introduced to address issues surrounding a lack of functionality, power or “expressiveness” in use identfied by, and frustrating to, the early users, particularly as they became more expert in using the application.

For a novice coming to the fully featured application, however, the wide range of features of varying levels of sophistication, from elementary, to super-power user, can be bewildering.

So what can be done about this, particularly if we want to avail ourselves of some of the powerful (and perhaps, hard to develop) features of a third party application?

To steal from a motorsport engineering design principle, maybe we can add lightness?

For example, QGIS is a powerful, cross-platform GIS application. (We have a requirement for platfrom neutrality; some of us also think we should be browser first, but let’s for now accept the use of an application that needs to be run on a computer with a “desktop” applciation system (Windows, OS/X, Linux) rather than one running a mobile operating system (iOS, Android) or eveloped for use by a netbook (Chrome OS).)

The interface is quite busy, and arguably hard to quickly teach around from a standing start:

However, as well as being cross-platform, QGIS also happens to be open source.

That is, the source code is available [github: qgis/QGIS].

 

Which means that as well as the code that does all the clever geo-number crunching stuff, we have access to the code that defines the user interface.

*[UPDATE: in this case, we don’t need to customise the UI by forking the code and changing the UI definition files – QGIS provides a user interface configuration / customisation tool.]

For example, if we look for some menu labels in he UI:

we can then search the source code to find the files that contribute to building the UI:

In turn, this means we can take that code, strip out all the menu options and buttons we don’t need for a particular course, and rebuild QGIS with the simplified UI. Simples. (Or maybe not that simples when you actually start getting into the detail, depending on how the software is designed!)

And if the user interface isn’t as accessible as we’d like it, we can try to improve that, and contribute the imporvements back the to parent project. The advantage there is that if students go on to use the full QGIS application outside of the course, they can continue to benefit from the accessiblity improvements. As can every other user, whether they have accessibility needs or not.

So here’s what I’m wondering: if we’re faced with the decision between wanting to use an open source, third party “real” application with usability and access issues, why build the custom learning app, especially if we’re going to keep the code closed and have to maintain it ourselves? Why not join the developer community and produce a simplified, accessible skin for the “real” application, and feed accessibility improvements at least back to the core?

On reflection, I realised we do, of course, do the first part of this already (forking and customising), but we’re perhaps not so good at the latter (contributing accessibility or alt-UI patterns back to the community).

For operational systems, OU developers have worked extensively on Moodle, for example (and I think, committed to the parent project)… And in courses, the recent level 1 computing course uses an OU fork of Scratch called OUBuild, a cross-platform Adobe Air application (as is the original), to teach basic programming, but I’m not sure if any of the code changes have been openly published anywhere, or design notes on why the original was not appropriate as a direct/redistributed download?

Looking at the Scratch open source repos, Scratch looks to be licensed under BSD 3-clause “New” or “Revised” License (“a permissive license similar to the BSD 2-Clause License, but with a 3rd clause that prohibits others from using the name of the project or its contributors to promote derived products without written consent”). Although it doesn’t have to be, I’m not sure the OUBuild source code has been released anywhere or whether commits were made back to the original project? (If you know differently, please let me know:-)) At the very least, it’d be really handy if there was a public document somewhere that identifies the changes that were made to the original and why, which could be useful from a “design learning” perspective. (Maybe there is a paper being worked up somewhere about the software development for the course?) By sharing this information, we could perhaps influence future software design, for example by encouraging developers to produce UIs that are defined from configuration files that can be easily customised and selected from, in that that users can often select language packs).

I can think of a handful of flippant, really negative reasons why we might not want to release code, but they’re rather churlish… So they’re hopefully not the reasons…

But there are good reasons too (for some definition of “good”..): getting code into a state that is of “public release quality”; the overheads of having to support an open code repository (though there are benefits: other people adding suggestions, finding bugs, maybe even suggesting fixes). And legal copyright and licensing issues. Plus the ever present: if we give X away, we’re giving part of the value of doing our courses away.

At the end of the day, seeing open education in part as open and shared practice, I wonder what the real challenges are to working on custom educational software in a more open and collaborative way?

Want to Get Started With Open Data? Looking for an Introductory Programming Course?

Want to learn to code but never got round to it? The next presentation of OUr FutureLearn course Learn to Code for Data Analysis will teach you how to write you own programme code, a line a time, to analyse real open data datasets. The next presentation starts on 6 June, 2016, and runs for 4 weeks, and takes about 5 hrs per week.

I’ve often thought that there are several obstacles to getting started with programming. Firstly, there’s the rationale or context: why bother/what could I possibly use programming for? Secondly, there are the practical difficulties: to write and execute programmes, you need to get an programming environment set up. Thirdly, there’s the so what: “okay, so I can programme now, but how do I use this in the real world?”

Many introductory programming courses reuse educational methods and motivational techniques or contexts developed to teach children (and often very young children) the basics of computer programming to set the scene: programming a “turtle” that can drive around the screen, for example, or garishly coloured visual programming environments that let you plug logical blocks together as if they were computational Lego. Great fun, and one way of demonstrating some of the programming principles common to all programming languages, but they don’t necessarily set you up for seeing how such techniques might be directly relevant to an IT problem or issue you face in your daily life. And it can be hard to see how you might use such environments or techniques at work to help you get perform real tasks… (Because programmes can actually be good at that – automating the repetitive and working through large amounts of stuff on your behalf.) At the other extreme are professional programming environments, like geekily bloated versions of Microsoft Word or Excel, with confusing preference setups and menus and settings all over the place. And designed by hardcore programmers for hardcore programmers.

So the approach we’ve taken in the OU FutureLearn course Learn to Code for Data Analysis is slightly different to that.

The course uses a notebook style programming environment that blends text, programme code, and the outputs of running that code (such as charts and tables) in a single, editable web page accessed via your web browser.

Learn_to_Code_-_SageMathCloud

To motivate your learning, we use real world, openly licensed data sets from organisations such as the World Bank and the United Nations – data you can download and access for yourself – that you can analyse and chart using your own programme code. A line at a time. Because each line does it’s own thing, each line is useful, and you can see what each line does to your dataset directly.

So that’s the rationale: learn to code so you can work with data (and that includes datasets much larger than you can load into Excel…)

The practicalities of setting up the notebook environment still have to be negotiated, of course. But we try to help you there too. If you want to download and install the programming environment on your computer, you can do, in the form of the freely available Anaconda Scientific Computing Python Distribution. Or you can access an online versions of the notebook based programming environment via SageMathCloud and do all your programming online, through your browser.

So that’s the practical issues hopefully sorted.

But what about the “so what”? Well, the language you’ll be learning is Python, a widely used language programming language that makes it ridiculously easy to do powerful things.

Pyython cartoon - via https://xkcd.com/353/

But not that easy, perhaps..?!

The environment you’ll be using – Jupyter notebooks – is also a “real world” technology, inspired as an open source platform for scientific computing but increasingly being used by journalists (data journalism, anyone?) and educators. It’s also attracted the attention of business, with companies such as IBM supporting the development of a range of interactive dashboard tools and backend service hooks that allow programmes written using the notebooks to be deployed as standalone online interactive dashboards.

The course won’t take you quite that far, but it will get you started, and safe in the knowledge that whatever you learn, as well as the environment you’re learning in, can be used directly to support your own data analysis activities at work, or at home as a civically minded open data armchair analyst.

So what are you waiting for? Sign up now and I’ll see you in the comments:-)

Tinkering With MOOC Data – Rebasing Time

[I’ve been asked to take this post down because it somehow goes against, I dunno, something, but as a stop gap I’ll try to just remove the charts and leave the text, to see if I get another telling off…]

As a member of an organisation where academics tend to be course designers and course producers, and kept as far away from students as possible (Associate Lecturers handle delivery as personal tutors and personal points of contact), I’ve never really got my head around what “learning analytics” is supposed to deliver: it always seemed far more useful to me to think about course analytics as way of tracking how the course materials are working and whether they seem to be being used as intended. Rather than being interested in particular students, the emphasis would be more on how a set of online course materials work in much the same way as tracking how any website works. Which is to say, are folk going to the pages you expect, spending the time on them you expect, reaching goal pages as and when you expect, and so on.

Having just helped out on a MOOC, I was allowed to have a copy of the course related data files the provider makes available to partners:

I'm not allowed to show you this, apparently...

The course was on learning to code for data analysis using the Python pandas library, so I thought I’d try to apply what was covered in the course (perhaps with a couple of extra tricks…) to the data that flowed from the course…

And here’s one of the tricks… rebasing (normalising) time.

For example, one of the things I was interested in was how long learners were spending on particular steps and particular weeks on the one hand, and how long their typical study sessions were on the other. This could then all be aggregated to provide some course stats about loading which could feed back into possible revisions of the course material, activity design (and redesign) etc.

Here’s an example of how a randomly picked learner progressed through the course:

I'm not allowed to show you this, apparently...

The horizontal x-axis is datetime, the vertical y axis is an encoding of the week and step number, with clear separation between the weeks and steps within a week incrementally ordered. The points show the datetime at which the learner first visited the step. The points are coloured by “stint”, a trick I borrowed from my F1 data wrangling stuff: during the course of a race, cars complete several “stints”, where a stint corresponds to a set laps completed on a particular set of tyres; analysing races based on stints can often turn up interesting stories…

To identify separate study session (“stints”) I used a simple heuristic – if the gap between start-times of consecutively studied stints exceeded a certain threshold (55 minutes, say), then I assumed that the steps were considered in separate study sessions. This needs a bit of tweaking, possibly, perhaps including timestamps from comments or question responses that can intrude on long gaps to flag them as not being breaks in study, or perhaps making the decision about whether the gap between two steps is actually a long one compared to a typically short median time for that step? (There are similar issues in the F1 data, for example when trying to work out whether a pit stop may actually be a drive-through penalty rather than an actual stop.)

In the next example, I rebased the time for two learners based on the time they first encountered the first step of the course. That is, the “learner time” (in hours) is the time between them first seeing a particular step, and the time they first saw their first step. The colour field distiguishes between the two learners.

I'm not allowed to show you this, apparently...

We can draw on the idea of “stints”, or learner sessions further, and use the earliest time within a stint to act as the origin. So for example, for another random learner, here we see an incremental encoding on of the step number on the y-axis, with the weeks clearly separated, the “elapsed study session time” along the horizontal y-axis, and the colour mapping out the different study sessions.

I'm not allowed to show you this, apparently...

The spacing on the y-axis needs sorting out a bit more so that it shows clearer progression through steps, perhaps by using an ordered categorical axis with a faint horizontal rule separator to distinguish the separate weeks. (Having an interactive pop-up that contains some information the particular step each mark refers to, as well as information about how much time was spent on it, whether there was commenting activity, etc, what the mean and median study time for the step is, etc etc, could also be useful.) However, I have to admit that I find charting in pandas/matplotlib really tricky, and only seem to have slightly more success with seaborn; I think I may need to move this stuff over to R so that I can make use of ggplot, which I find far more intuitive…

Finally, whilst the above charts are at the individual learner level, my motivation for creating them was to better understand how the course materials were working, and to try to get my eye in to some numbers that I could start to track as aggregate numbers (means, medians, etc) over the course as a whole. (Trying to find ways of representing learner activity so that we could start to try to identify clusters or particular common patterns of activity / signatures of different ways of studying the course, is then another whole other problem – though visual insights may also prove helpful there.)

Running Executable Jupyter/IPython Notebooks Directly from Github With Binder

It’s taken me way too long to get round to posting this, but it’s a compelling idea that I think more notice should be taken of… binder ([code]).

The idea is quite simple – specify a public github project (username/repo) that contains one or more Jupyter (IPython) notebooks, hit “go”, and the service will automatically create a docker container image that includes a Jupyter notebook server and a copy of the files contained in the repository.

binder

(Note that you can specify any public Github repository – it doesn’t have to be one you have control over at all.)

Once the container image is created, visiting mybinder.org/repo/gitusername/gitrepo will launch a new container based on that image and display a Jupyter notebook interface at the redirected to URL. Any Jupyter notebooks contained within the original repository can then be opened, edited and executed as an active notebook document.

What this means is we could pop a set of course related notebooks into a repository, and share a link to mybinder.org/repo/gitusername/gitrepo. Whenever the link is visited, a container is fired up from the image and the user is redirected to that container. If I go to the URL again, another container is fired up. Within the container, a Jupyter notebook server is running, which means you can access the notebooks that were hosted in the Github repo as interactive, “live” (that is, executable) notebooks.

Alternatively, a user could clone the original repository, and then create a container image based on their copy of the repository, and then launch live notebooks from their own repository.

I’m still trying to find out what’s exactly going on under the covers of the binder service. In particular, a couple of questions came immediately to mind:

  • how long do containers persist? For example, at the moment we’re running a FutureLearn course (Learn to Code for Data Analysis) that makes use of IPython/Jupyter notebooks (https://www.futurelearn.com/courses/learn-to-code), but it requires learners to install Anaconda (which has caused a few issues). The course lasts 4 weeks, with learners studying a couple of hours a day maybe two days a week. Presumably, the binder containers are destroyed as a matter of course according to some schedule or rule – but what rule? I guess learners could always save and download their notebooks to the desktop and then upload them to a running server, but it would be more convenient if they could bookmark their container and return to it over the life of the course? (So for example, if Futurelearn was operating a binder service, joining the course could provide authenticated access to a container at http://www.futurelearn.com/courses/learn-to-code/USERID/notebook for the duration of the course, and maybe a week or two after? Following ResBaz Cloud – Containerised Research Apps as a Service, it might also allow for a user to export a copy of their container?)
  • how does the system scale? The FutureLearn course has several thousand students registered to it. To use the binder approach towards providing any student who wants one with a web-accessible, containerised version of the notebook application so they don’t have to insall one of their own, how easily would it scale? eg how easy is it to give a credit card to some back-end hosting company, get some keys, plug them in as binder settings and just expect it to work? (You can probably guess at my level devops/sysadmin ability/knowledge!;-)

Along with those immediate questions, a handful of more roadmap style questions also came to mind:

  • how easy would it be to set up the Jupyter notebook system to use an alternative kernel? e.g. to support a Ruby or R course? (I notice that tmpnb.org offers a variety of kernels, for example?)
  • how easy would it be to provide alternative services to the binder model? eg something like RStudio, for example, or OpenRefine? I notice that the binder repository initialisation allows you to declare the presence of a custom Dockerfile within the repo that can be used to fire up the container – so maybe binder is not so far off a general purpose docker-container-from-online-Dockerfile launcher? Which could be really handy?
  • does binder make use of Docker Compose to tie multiple applications together, as for example in the way it allows you to link in a Postgres server? How extensible is this? Could linkages of a similar form to arbitrary applications be configured via a custom Dockerfile?
  • is closer integration with github on the way? For example, if a user logged in to binder with github credentials, could files then saved or synched back from the notebook to that user’s corresponding repository?

Whatever – will be interesting to see what other universities may do with this, if anything…

See also Seven Ways of Running IPython Notebooks and ResBaz Cloud – Containerised Research Apps as a Service.

PS I just noticed an interesting looking post from @KinLane on API business models: I Have A Bunch Of API Resources, Now I Need A Plan, Or Potentially Several Plans. This has got me wondering: what sort of business plan might support a “Studyapp” – applications on demand, as a service – form of hosting?

Several FutureLearn courses, for all their web first rhetoric, require studentslearners to install software onto their own computers. (From what I can tell, FutureLearn aren’t interested in helping “partners” do anything that takes eyeballs away from FutureLearn.com. So I don’t understand why they seem reluctant to explore ways of using tech to provide interactive experiences within the FutureLearn context, like using embedded IPython notebooks, for example. (Trying to innovate around workflow is also a joke.) And IMVHO, the lack of innovation foresight within the OU itself (FutureLearn’s parent…) seems just as bad at the moment… As I’ve commented elsewhere, “[m]y attitude is that folk will increasingly have access to the web, but not necessarily access to a computer onto which they can install software applications. … IMHO, we are now in a position where we can offer students access to “computer lab” machines, variously flavoured, that can run either on a student’s own machine (if it can cope with it) or remotely (and then either on OU mediated services or via a commercial third party on which students independently run the software). But the lack of imagination and support for trying to innovate in our production processes and delivery models means it might make more sense to look to working with third parties to try to find ways of (self-)supporting our students.”. (See also: What Happens When “Computers” Are Replaced by Tablets and Phones?) But I’m not sure anyone else agrees… (So maybe I’m just wrong!;-)

That said, it’s got me properly wondering – what would it take for me to set up a service that provides access to MOOC or university course software, as a service, at least, for uncustomised, open source software, accessible via a browser? And would anybody pay to cover the server costs? How about if web hosting and a domain was bundled up with it, that could also be used to store copies of the software based activities once the course had finished? A “personal, persistent, customised, computer lab machine”, essentially?

Possibly related to this thought, Jim Groom’s reflections on The Indie EdTech Movement, although I’m thinking more of educators doing the institution stuff for themselves as a way of helping the students-do-it-for-themselves. (Which in turn reminds me of this hack around the idea of THEY STOLE OUR REVOLUTION LEARNING ENVIRONMENT. NOW WE’RE STEALING IT BACK !)

PS see also this by C. Titus Brown on Is mybinder 95% of the way to next-gen computational science publishing, or only 90%?

How Much Time Should an Online Course Take?

Five years or so ago, when MOOCs were still a new thing, I commented on what seemed to be the emerging typical duration of open online courses: Open Courses: About 10 Weeks Seems To Be It, Then?

For the OU’s 10 week short courses, which nominally required up to 10 hours study a week (the courses were rated at 10 CAT points), this meant a duration of 100 hours. The cost (at the time) of those courses was about £150, I think. So about £1.50 an hour purchase cost.

Looking at the upcoming OU FutureLearn course Learn to code for data analysis, the time commitment is 4 weeks at 3-4 hours per week, so about 15 hours. If you don’t want to pay anything, you don’t have to.

Although I can’t offhand find any previous OUseful.info blog posts comparing courses to things like books or games (and I guess, DVD/streamed TV “box sets”), as “cultural content consumption items”, it’s one of the reference points I often think about when it comes to trying to imagining how a course – formal (for credit), or informal – fits into the life of the student amongst other competing demands on the their time, attention and finances. If someone is going to take a course for the first time and spend time/attention/cash on it, does the study pattern neatly replace or substitute a previous pattern of activity, or does it require a more significant change in a learners daily or weekly habits. In other words, what are the attention economics associated with taking a course?

This was all brought to mind again lately when I spotted this post – Forty Hours – which opens with the observation that “the majority of videogames were made on the assumption that they would be played for forty hours. Now, games are being made to be played for longer and longer. (I’ve no idea if this is true or not; I don’t really follow game culture. Maybe the longer games are ones where there is an element of social (especially 2-way audio) enhanced gameplay?)

If true, this seems to contrast with the shortening of courses that is perhaps taking place on FutureLearn (again, I don’t have the data to back this up; it’s just an impression; nor do I have the data about evolving course length more widely in MOOC space. Presumably, the Open Education Research Hub is the sort of place where I should be able to find this sort of data?)

If that is the case, then why are games getting longer and online open courses shorter (if, indeed, they are? And in formal ed, where does semesterisation sit in all this?). As the Forty Hours post goes on:

[E]very major commercial game now attempts to ‘capture’ its audience for at least 200 hours, with multiplayer modes being the core method of retention. The forty hour model was a consequence of selling games-as-products, as boxed content that would be played then thrown onto a pile of completed games (although it turns out that the minority of players finish games). The 200 hour model is a consequence of selling games-as-services, with monetization now an on-going process throughout the time the players are engaged with the title in question. …

The big money is no longer out to hold a player’s attention for forty hours, but to hold a player’s attention long enough to get the next game out, or to hold on to groups of players in the hope to pull in a few big spenders, or to hold the player’s attention throughout the year with events crafted to maintain appeal and bring back those who are slipping away into other games. Hobby players – those who commit to a game service over the long term – often play other games on the side, which is a tiny crumb of good news for indies making smaller games. …

The game-as-product approach where the forty hour model had dominated still survives, but only where it has proved difficult or impossible to tie players down for longer lengths of time. The market for videogames is ceasing to be one of packaged experience (like movies and novels) and becoming a fight for retention, as more and more games in the upper market shift their design towards training new hobby players in a ongoing economy.

In other words, why are we looking to shorten the relationship someone has with a course? Is this so we can extend the relationship the platform has with the learner by getting them to take more, shorter courses rather than fewer longer courses? (UPDATE: Or as Helen Noble points out in a comment, is it because the MOOC is actually a loss leading tease intended to draw students into a longer formal commitment? As opposed to being an alumni touch point, encouraging a graduate to maintain some sort of content with their alma mater in the hope getting a donation or bequest out of them later in life?!)

In terms of the completion commitment pitch (that is, what sort of commitment is required of folk to complete a course, or a game), what do the attention spending, cultural content consumers respond to? And how do the economics of competing concerns play out?

(That sounds like a marketing concern, doesn’t it? But it presumably also impacts on learning design within and across courses?)

Distributing Software to Students in a BYOD Environment

Reading around a variety of articles on the various ways of deploying software in education, it struck me that in traditional institutions a switch is may be taking place between students making use of centrally provided computing services – including physical access to desktop computers – to students bringing their own devices on which they may want to run the course software themselves. In addition, traditional universities are also starting to engage increasingly with their own distance education students; and the rise of the MOOCs are based around the idea of online course provision – that is, distance education.

The switch from centrally provided computers to a BYOD regime contrasts with the traditional approach in distance education in which students traditionally provided their own devices and onto which they installed software packaged and provided by their educational institution. That is, distance education students have traditionally been BYOD users.

However, in much the same way that the library in a distance education institution like the OU could not originally provide physical information (book lending) services to students, instead brokering access agreements with other HE libraries, but now can provide a traditional a traditional library service through access to digital collections, academic computing services are perhaps now more in a position where they can provide central computing services, at scale, to their students. (Contributory factors include: readily available network access for students, cheaper provider infrastructure costs (servers, storage, bandwidth, etc).)

With this in mind, it is perhaps instructive for those of us working in distance education to look at how the traditional providers are coping with an an influx of BYOD users, and how they are managing access to, and the distribution of, software to this newly emerging class of user (for them) whilst at the same time continuing to provide access to managed facilities such as computing labs and student accessed machines.


Notes from: Supporting CS Education via Virtualization and Packages – Tools for Successfully Accommodating “Bring-Your-Own-Device” at Scale, Andy Sayler, Dirk Grunwald, John Black, Elizabeth White, and Matthew Monaco SIGCSE’14, March 5–8, 2014, Atlanta, GA, USA [PDF]

The authors describe “a standardized development environment for all core CS courses across a range of both school-owned and student-owned computing devices”, leveraging “existing off-the-shelf virtualization and software management systems to create a common virtual machine that is used across all of our core computer science courses”. The goal was to “provide students with an easy to install and use development environment that they could use across all their CS courses. The development environment should be available both on department lab machines, and as a VM for use on student-owned machines (e.g. as a ‘lab in a box’).”

From the student perspective, our solution had to: a) Run on a range of host systems; b) Be easy to install; c) Be easy to use and maintain; d) Minimize side-effects on the host system; e) Provide a stable experience throughout the semester.

From the instructor perspective, our solution had to: a) Keep the students happy; b) Minimize instructor IT overhead; c) Provide consistent results across student, grader, and instructor machines; d) Provide all necessary software for the course; e) Provide the ability to update software as the course progresses.

Virtualbox was adopted on the grounds that it runs cross-platform, is free, open source software, and has good support for running Linux guest machines. The VM was based on Ubuntu 12.04 (presumably the long term edition available at the time) and distributed as an .ova image.

To support the distribution of software packages for a particular course, Debian metapackages (that simply list dependencies; in passing, I note that the Anaconda python distribution supports the notion of python (conda) metapackages, but pip does not, specifically?) were created on a per course basis that could be used via apt-get to install all the necessary packages required for a particular course (example package files).

In terms of student support, the team published “a central web-page that provides information about the VM, download links, installation instructions, common troubleshooting steps, and related self-help information” along with “YouTube videos describing the installation and usage of the VM”. Initial distribution is provided using BitTorrent. Where face-to-face help sessions are required, VM images are provided on USB memory sticks to avoid download time delays. Backups are handled by bundling Dropbox into the VM and encouraging students to place their files there. (Github is also used.)

The following observation is useful in respect of student experience of VM performance:

“Modern CPUs provide extensions that enable a fast, smooth and enjoyable VM experience (i.e. VT-x). Unfortunately, many non-Apple PC manufacturers ship their machines with these extension disabled in the BIOS. Getting students to enable these extensions can be a challenge, but makes a big difference in their overall impression of VM usability. One way to force students to enable these extensions is to use a 64-bit and/or multi-core VM, which VirtualBox will not start without virtualization extensions enabled.”

The open issues identified by the team are the issue of virtualisation support; corrupted downloads of the VM (mitigation includes publishing a checksum for the VM and verifying against this); and the lack of a computer capable of running the VM (ARM devices, low specification Intel Atom computers). [On this latter point, it may be worth highlighting the distinction between hardware that cannot cope with running computationally intensive applications, hardware that has storage limitations, and hardware that cannot run particular virtualisation services (for example, that cannot run x86 virtualisation). See also: What Happens When “Computers” Are Replaced by Tablets and Phones?]


The idea of using package management is attractive, and contrasts with the approach I took when hacking together the TM351 VM using vagrant and puppet scripts. It might make sense to further abstract the machine components into a Debian metapackage and a simple python/pip “meta” package (i.e. one that simply lists dependencies). The result would be an installation reduced to a couple of lines of the form:

apt-get install ou_tm351=15J.0
pip install ou_tm351==15J.0

where packages are versioned to a particular presentation of an OU course, with a minor version number to accommodate any updates/patches. One downside to this approach is that it splits co-dependency relationships between python and Debian packages relative to a particular application. In the current puppet build files for the monolithic VM build, each application has its own puppet file that installs the additional libraries over base libraries required for a particular application. (In addition, particular applications can specify dependencies on base libraries.) For the dockerised VM build, each container image has it’s own Dockerfile that identifies the dependencies for that image.

Tracing its history (and reflecting the accumulated clutter of my personal VM learning journey!) the draft TM351 VM is currently launched and provisioned using vagrant, partly because I can’t seem to start the IPython Notebook reliably from a startup script:-( Distributing the machine as a start/stoppable appliance (i.e. as an Open Virtualization Format/.ova package) might be more convenient, if we could guarantee that file sharing with host works as required (sharing against a specific folder on host) and any port collisions experienced by the provided services can be managed and worked around?

Port collisions are less of an issue for Sayler et al. because their model is that students will be working within the VM context – a “desktop as a local service” (or “platform as a local service” model); the TM351 VM model provides services that run within the VM, some of which are exposed via http to the host – more of a “software as a local service” model. In the cloud, software-as-a-service and desktop-as-a-service models are end-user delivery models, where users access services through a browser or lightweight desktop client, compared with “platform-as-a-service” offerings where applications can be developed and delivered within a managed development environment offering high level support services, or “infrastructure as a service” offerings, which provide access to base computing components (computational processing, storage, networking, etc.)

Note that what interests me particularly are delivery models that support all three of the following models: BYOD, campus lab, and cloud/remotely hosted offerings (as a crude shorthand, I use ‘cloud’ to mean environments that are responsive in terms of firing up servers to meet demand). The notions of personal computing environments, course computing environments and personal course computing environments might also be useful, (for example, a course computing environment might be a generic container populated with course software, a personal course computing container might then be a container linked to a student’s identity, with persisted state and linked storage, or a course container running on a students own device) alongside research computing environments and personal research computing environments.

Open Practice Roundup…

Perhaps it’s just because my antennae are sensitised at the moment, post posting Open Practice and My Academic Philosophy, Sort Of… Erm, Maybe… Perhaps..?!, but here are a couple more folk saying much the same thing…

From @Downes getting on for five years ago now (The Role of the Educator), he mentions how several elements of his open practice (hacking useful code, running open online courses (though he just calls them “online courses”; five years ago, remember, before “open” was the money phrase?!;-), sharing through a daily links round up and conference presentations, and thinking about stuff) have led:

to an overall approach not only to learning online but to learning generally. It’s not simply that I’ve adopted this approach; it’s that I and my colleagues have observed this approach emerging in the community generally.

It’s an approach that emphasizes open learning and learner autonomy. It’s an approach that argues that course content is merely a tool employed to stimulate and support learning — a McGuffin, as I’ve called it in various presentations, “a plot element that catches the viewers attention or drives the plot of a work of fiction” — rather than the object of learning itself. It’s an approach that promotes a pedagogy of learning by engagement and activity within an authentic learning community — a community of practitioners, where people practice the discipline, rather than merely just talk about it.

It’s an approach that emphasizes exercises involving those competencies rather than deliberate acts of memorization or rote, an approach that seeks to grow knowledge in a manner analogous to building muscles, rather than to transfer or construct knowledge through some sort of cognitive process.

It’s an approach that fosters a wider and often undefined set of competencies associated with a discipline, a recognition that knowing, say, physics, isn’t just to know the set of facts and theories related to physics, but rather to embody a wider set of values, beliefs, ways of observing and even mannerisms associated with being a physicist (it is the caricature of this wider set of competencies that makes The Big Bang Theory so funny).

Concordant with this approach has been the oft-repeated consensus that the role of the educator will change significantly. Most practitioners in the field are familiar with the admonishment that an educator will no longer be a “sage on the stage”. But that said, many others resist the characterization of an educator as merely a “guide by the side.” We continue to expect educators to play an active role in learning, but it has become more difficult to characterize exactly what that role may be.

In my own work, I have stated that the role of the teacher is to “model and demonstrate.” What I have tried to capture in this is the idea that students need prototypes on which to model their own work. Readers who have learned to program computers by copying and adapting code will know what I mean. But it’s also, I suppose, why I see the footprints of Raymond Chandler all through William Gibson’s writing. We begin by copying successful practice, and then begin to modify that practice to satisfy our own particular circumstances and needs.

In order for this to happen, the instructor must be more than just a presenter or lecturer. The instructor, in order to demonstrate practice, is required to take a more or less active role in the disciplinary or professional community itself, demonstrating by this activity successful tactics and techniques within that community, and modeling the approach, language and world view of a successful practitioner. This is something we see in medicine already, as students learn as interns working alongside doctors or nurse practitioners.

Five years ago…

At the other end of the career spectrum, grad student Sarah Crissinger had to write a “one-page teaching philosophy” as part of a recent job application (Reflections on the Job Hunt: Writing a Teaching Philosophy). Reflecting on two different approaches to teaching she had witnessed from two different yoga classes, one good, one bad, she observed of the effective teacher that:

[h]e starts every class by telling students that the session isn’t about replicating the exact pose he is doing. It’s more about how your individual body feels in the pose. In other words, he empowers students to do what they can without feeling shame about not being as flexible as their neighbor. He also solidifies the expectations of the class by saying upfront what the goals are and then he reiterates those expectations by giving modifications for each pose and talking about how your body should feel instead of how it should look.

..which in part reminded me of cookery style promoted by James Barber, aka the urban peasant

Sarah Crissinger also made this nice observation:

Teachers reflect on teaching even when we don’t mean to.

That is, effective teachers are also adaptive learning machines… (Reflection is part of the self-correcting feedback path.)

See also: Sheila McNeil on How do you mainstream open education and OERs? A bit of feedback sought for #oer15, and the comments therefrom. Sheila’s approach also brings to mind The Art Of Guerrilla Research, which emphasises the “just do it” attitude of open practice…