Category: OU2.0

I Just Don’t Understand Why…

…there seems to be so much resistance in OU to Jupyter notebooks, when I’m seeing this sort of thing more and more….

Folk creating open educational resources to support their technical ramblings using IPython (which is to say, Jupyter) notebooks…

I just, …., whatever… #ffs

PS see also: Introducing learnr. I can just imagine what sort of response that would get… Whuurrr? Wossat? No idea…

Rolling Your Own IT – Automating Multiple File Downloads

Yesterday, I caught up with a video briefing on Transforming IT from the OU’s Director of IT, recorded earlier thus year (OU internal link, which, being on Sharepoint, needs Microsoft authentication, rather than OU single sign on?).

The video, in part, describe the 20 year history of some of the OU’s teaching related software services, which tended to be introduced piecemeal and which are necessarily as integrated as they could be…

In the post Decision Support for Third Marking Significant Difference Double Marked Assessments, I mentioned part of the OU process for managing third marking.

Guidance provided for collecting scripts for third marking is something like this:

The original markers’ scores and feedback will be visible in OSCAR.

Electronically submitted scripts can be accessed in the eTMA system via this link: …

Please note the scripts can only be accessed via the EAB/admin tab in the eTMA system ensuring you add the relevant module code and student PI.

[My emphasis.]

Hmmm… OSCAR is accessed via a browser, and supports “app internal” links that display the overall work allocation, a table listing the students, along with their PIs, and links to various data views including the first and second marks table referred to in the post mentioned above.

The front end to the eTMA system is a web form that requests a course code and student PI, which then launches another web page listing the student’s submitted files, a confirmation code that needs to be entered in OSCAR to let you add third marks, and a web form that requires you to select a file download type from a drop down list with a single option and a button to download the zipped student files.

So that’s two web things…

To download multiple student files requires a process something like this:

So why not just have a something on the OSCAR work allocation page that that lets you select – or select all – the students and download all the files, or get all all the confirmation codes?

Thinks… I could do that, sort of, over  coffee…. (I’ve tried to obfuscate details while leaving the general bits of code that could be reused elsewhere in place…)

First up, we need to login and get authenticated:

#Login
!pip3 install MechanicalSoup

import mechanicalsoup
import pandas as pd

USERNAME=''
PASSWORD=''
LOGIN_URL=''
FORM_ID='#' 

def getSession():
 browser = mechanicalsoup.StatefulBrowser()
 browser.open(LOGIN_URL)
 browser.select_form(FORM_ID) #in form: #loginForm
 browser[_USERNAME] = USERNAME
 browser[_PASSWORD] = PASSWORD
 resp = browser.submit_selected()
 return browser

s=getSession()

Now we need a list of PIs. We could scrape these from OSCAR, but that’s a couple of steps and easier just to copy and paste the table from the web page for now:

#Get student PIs - copy and paste table from OSCAR for now

txt='''
CODE\tPI NAME\tMARKING_TYPE\tSTATUS
...
CODE\tPI NAME\tMARKING_TYPE\tSTATUS
'''

#Put that data into a pandas dataframe then pull out the PIs
from io import StringIO

df=pd.read_csv(StringIO(txt),sep='\t',header=None)
pids=[i[0] for i in df[1].str.split()]

We now have a list of student PIs, which we can iterate through to download the relevant files:

#Download the zip file for each student
import zipfile, io, random

def downloader(pid, outdir='etmafiles'):
  print('Downloading assessment for {}'.format(pid))
  !mkdir -p {outdir}
  payload = {FORM_ELEMENT1:FILETYPE, FORM_ELEMENT2: FILE_DETAILS(pid)}
  url=ETMA_DOWNLOARD_URL_PATTERN(pid)
  #Download the file...
  r=s.post(url,data=payload)

  #...and treat it as a zipfile
  z = zipfile.ZipFile(io.BytesIO(r.content))
  #Save a bit more time for the user by unzipping it too...
  z.extractall(outdir)

#Here's the iterator...
for pid in pids:
  try:
    downloader(pid)
  except:
    print('Failed for {}'.format(pid))

We can also grab the “student page” from the eTMA system and scrape it for the confirmation code. (On to do list, try to post the confirmation code back to OSCAR to authorise the upload of third marks, as well as auto-posting a list of marks and comments back.)

#Scraper for confirmation codes
def getConfirmationCode(pid):
  print('Getting confirmation code for {}'.format(pid))
  url=ETMA_STUDENT_PAGE(pid, ASSESSMENT_DETAILS)
  r=s.open(url)
  p=s.get_current_page()

  #scrapy bit
  elements=p.find(WHATEVER)
  confirmation_code, pid=SCRAPE(elements)
  return [confirmation_code, pid]

codes=pd.DataFrame()

for pid in pids:
  try:
    tmp=getConfirmationCode(pid)
    # Add data to dataframe...
    codes = pd.concat([codes, pd.DataFrame([tmp], columns=['PI','Code'])])
  except:
    print('Failed for {}'.format(pid))

codes

So… yes, the systems don’t join up in the usual workflow, but it’s easy enough to hack together some glue as an end-user developed application: given that the systems are based on quite old-style HTML thinking, they are simple enough to scrape and treat as a de facto roll-your-own API.

Checking the time, it has taken me pretty as much as long as it took to put the above code together as it has taken to write this post and generate the block diagram shown above.

With another hour, I could probably learn enough about the new plotly Dash package (like R/shiny for python?) to create a simple browser-based app UI for it.

Of course, this isn’t enterprise grade for a digital organisation, where everything is form/button/link/click easy, but it’s fine for a scruffy digital org where you appropriate what you need and string’n’glue’n’gaffer tape let you get stuff done (and also prototype, quickly and cheaply, things that may be useful, without spending weeks and months arguing over specs and font styles).

Indeed, it’s the digital equivalent of the workarounds all organisations have, where you know someone or something who can hack a process, or a form, or get you that piece of information you need, using some quirky bit of arcane knowledge, or hidden backchannel, that comes from familiarity with how the system actually works, rather than how people are told it is supposed to work. (I suspect this is not what folk mean when they are talking about a digital first organisation, though?!;-)

And if it’s not useful? Well it didn’t take that much time to try it to see if it would be…

Keep on Tuttling…;-)

PS the blockdiagram above was generated using an online service, blockdiag. Here’s the code (need to check: could I assign labels to a variable and use those to cut down repetition?):

[{
  A [label="Work Allocation"];
  B [label="eTMA System"];
  C [label="Student Record"];
  D [label="Download"];
  DD [label="Confirmation Code"]
  E [label="Student Record"];
  F [label="Download"];
  FF [label="Confirmation Code"]
  G [shape="dots"];
  H [label="Student Record"];
  I [label="Download"];
  II [label="Confirmation Code"];

  OSCAR -> A -> B;

  B -> C -> D;
  C -> DD;

  B -> E -> F;
  E -> FF;
  B -> G;

  B -> H -> I;
  H -> II;
}

Is that being digital? Is that being cloud? Is that being agile (e.g. in terms of supporting maintenance of the figure?)?

Decision Support for Third Marking Significant Difference Double Marked Assessments

In the OU, project report assessed courses tend to see project reports  double marked – once by the student’s own tutor, and once by another marker from the same marking pool. Despite our best efforts at producing marking guides and running a marker co-ordination event with a few sample scripts, marks often differ. University regulations suggest that marks that differ by more than 15% or that straddle grade boundaries should be third marked, although these rules can be tweaked a bit.

With a shed load of third marking about to turn up tomorrow, and that needs to be turned round for next Tuesday, I thought I’d have a quick look at what information provided by the first two markers is available to support the third marking effort.

For the course I have to third mark, the mark recording tool we use – OSCAR (Online Score Capture for Assessment Records) – makes available the marks for each marker in a table that also identifies the marking categories. The mark scheme we have in this particular case has five unequally weighted categories, with 8 marks available in each. (Note that this means there are 64 marks in all, and a delta of 10 marks roughly equates to a 15% difference and an automatic sigdiff/third marking flag. If I am doing my sums right and have grade boundaries about right, it also means two markers make give the same grade/classification (pass1, pass2, etc) but still raise a sigdiff.)

To try to make it easier to see where significant differences were arising between two markers, I prototyped a simple spreadsheet based display that calculated the weighted marks and charted them in a couple of ways:

  • a dodged bar chart, by category, so that we could see which categories the markers differed in;
  • a stacked bar chart that shows the total score awarded by each marker.

The stacked bar chart is also overloaded with another bar that loosely identifies the grade boundaries. (The colours in this bar do not relate to the legend.) Ideally, I’d have used grade boundaries as vertical gridlines in the chart to make it clear which grade band a final mark fell into, but I’m not familiar with Excel charting and couldn’t see how to do that offhand. (Also, I guessed at where the grade boundaries are, so don’t read anything too much into the ones presented.)

(I also came across a gotcha in my version of Excel on a Mac… the charts don’t update when I paste the new data in. Instead I have to cut them and paste them back into the sheet, at which point they do update. WTF?!)

A couple of other things that should be quick to add to the prototype:

  • a statement of the grade awarded by each marker (pass 1, pass 2, fail), perhaps also qualified (strong pass 2 (at the top of the band) , bare pass 3 (at the bottom of the band), solid pass 4 (in the middle of the band)) for example;
  • a statement of the average mark and the grade that would result. (One of the heuristics for awarding marks from markers that differ by a small amount is to use the average.)

I should probably also add a slot for third marker marks to be displayed…

More elaborate would be some rules to generate a brief text report that identify which topics the markers differ significantly on, for example, by how many awarded marks, and what this translates to in terms of weighted marks (or even percentage marks).

One reason for doing this is to try to make life easier – a report may not need completely remarking if the markers just differ in one particular respect, for example (which may even be the result of an error when entering the original marks). Such a tool may also be useful at an award board for getting a quick, visually informative view of how markers awarded markers to a particular script.

But this sort of tool may also help us start to understand better why and how markers are marking differently, and what sorts of change we might need to make to the marking scheme or marking guidance. (See the differences in a particular category in a visual way often leaves you with a different feeling to seeing a table of the numerical marks).

It also provides an environment for tinkering with some automated feedback generators, powered by the marks.

Of course, I’d rather be developing these views in Jupyter notebooks/dashboards, or R, and if we had easy access to the data it wouldn’t be hard to roll a simple app together. But as a digital first, cloud organisation, we get to view each set of double marks, one HTML page at a time.

PS I don’t think a scraper would be too hard to write to pull down the marker returns for each student on a course, which are handily all linked to from a single page, and pop them into a single dataframe…. Hypothetically, here’s how we might be able to get in, for example, using the python MechanicalSoup package, which works with python3 (mechanize requires python2)…

!pip3 install MechanicalSoup
import mechanicalsoup

def getSession():
    browser = mechanicalsoup.StatefulBrowser()
    browser.open(LOGIN_URL)
    browser.select_form(FORM_ID) #in form: #loginForm
    browser["username"] = USERNAME
    browser["password"] = PASSWORD
    resp = browser.submit_selected()
    return browser

browser=getSession()

response=browser.open(INTRANET_URL)

Of course, this sort of thing probably goes against the Computing Code of Conduct… I’m also not sure if IT folk are paranoid enough to look for rapid bursts of machine generated requests and lock an account out if they spot it? But that’s not too hard to workaround – just put a random delay in between page requests when running the scrape (which is a nice behaviour anyway).

Innovation Starts At Home…?

Mention was made a couple of times last week in the VC’s presentation to the OU about the need to be more responsive in our curriculum design and course production. At the moment it can take a team of up to a dozen academics over two years to put an introductory course together, that is then intended to last, without significant change, other than in the preparation of assessment material, for five years or more.

The new “agile” production process is currently being trialled by a new authoring tool, OpenCreate, that is currently available to a few select course teams as a partially complete “beta”. I think it is “cloud” based. And maybe also promoting the new “digital” first strategy. (I wonder how many letters in the KPMG ABC bingo card consulting product the OU paid for, and how much per letter? Note: A may also stand for “analytics”.)

I asked I could have a play with the OpenCreate tool, such as it, last week, but told it was still in early testing (so a good time to be able to comment, then?) and so, “no”. (So instead,  I went back to one of the issues I’d raised a few days ago on somebody else’s project on Github to continue helping with the testing of a feature suggestion. (A few days ago; the suggestion has already been implemented and the issue is now closed as completed. making my life easier and hopefully improving the package too.) Individuals know how to do agile. Organisations don’t. ;-))

So why would I wan’t to play with OpenCreate now, while it’s still flaky? Partly because I suspect the team are working on a UI and have settled elements of the backend. For all the f**kwitted nonsense the consultants may have been spouting about  agile, beta, cloud, digital solutions, any improvements are going to come form the way the users use the tools. And maybe workarounds they find. And by looking at how the thing works, I may be able to explore other bits of the UI design space, and maybe even bits of the output space…

Years ago, the OU moved to an XML authoring route, defining and XML schema (OU-XML) that could be used to repurpose content for multiple output formats (HTML, epub, docx, Word). By the by, these are all standardised document formats, which means other people also build tooling around them. The OU-XML document was an internal standard. Which meant only the OU developed tools for it. Or people we paid. I’m not sure if, or how much Microsoft, were paid to produce the OU’s custom authoring extensions for Word that would output OU-XML, for example… Another authoring route was an XML editor (currently, oXygen, I believe). OU-XML also underpinned OpenLearn content.

That said, OU-XML was a standard, so it was in principle possible for people who had knowledge of it to author tools around it. I played with a few myself, though they never generated much interest internally.

  • generating mind maps from OU/OpenLearn structured authoring XML documents: these provided the overview of a whole course and could also be used as a navigation surface (revisited here and here); I made these sort of mindmaps available as an additional asset in the T151 short course, but they were never officially recognised);
  • I then started treating a whole set of OU-XML documents *as a database* which meant we could generate *ad hoc* courses on a particular topic by searching for keywords across OpenLearn courses and then returning a mindmap constructed around components in different courses, again displaying the result as a mindmap (Generating OpenLearn Navigation Mindmaps Automagically). Note this was all very crude and represented playtime. I’d have pushed it further if anyone internally had shown any interest in exploring this more widely.
  • I also started looking at ways of liberating assets and content, which meant we could perform OpenLearn Searches over Learning Outcomes and Glossary Items. That is, take all the learning outcomes from OpenLearn docs and search into that to find units with learning outcomes on that topic. Or provide a “metaglossary” generated (for free) from glossary terms introduced in all OpenLearn materials. Note that I *really* wanted to do this as a cross-OU course content demo, but as the OU has become more digital, access to content has become less open. (You used to be able to look at complete course, OU print materials in academic libraries. No you need a password to access the locked down digital content; I suspect access expires to students after a period of time too; and it also means students can’t sell on their old course materials;
  • viewing OU-XML documents as structured database meant we could also asset strip OpenLearn for  images, providing a search tool to lookup images related to a particular topic. (Internally, we are encouraged to reuse previously created assets, but the discovery problem about helping authors discover what previously created assets are available has never really been addressed; I’m not sure the OU Digital Archive is really geared up for this, either?)
  • we could also extract links from courses and use them as a course powered custom search engine. This wasn’t very successful at the course level, (not enough links) but might have been interesting at across multiple courses;
  • a first proof of concept pass at a tool to export OU-XML documents from Google docs, so you could author documents using Google docs and then upload the result into the OU publishing system.

Something that has also been on my to do list for a long time are templates to convert Rmd (Rmarkdown) and Jupyter notebook ipynb documents to OU-XML.

So… if I could get to see the current beta OpenCreate tool, I might me able to see what document format authors were being encouraged to author into. I know folk often get the “woahh,, too complicated… feeling when reading OUseful.info blog posts*, but at the end of the day whatever magic dreams folk have for using tech, it boils down to a few poor sods having to figure out how to do that using three things: code, document formats (which we might also view as data representations more generally) and transport mechanisms (things like http; and maybe we could also class things like database connections here). Transport moves stuff between stuff. Representations represent the stuff you want to move. Code lets you do stuff with the represented stuff, and also move it between other things that do black box transformations to it (for example, transforming it from one representation to another).

That’s it. (My computing colleagues might disagree. But they don’t know how to think about systems properly ;-)

If OpenCreate is a browser based authoring tool, the content stuff created by authors will be structured somehow, and possibly previewed somehow. There’ll also be a mechanism for posting the authored stuff into the OU backend.

If I know what (document) format the content is authored in, I can use that as a standard and develop my own demonstration authoring tools and routes around that on the input side. For example, a converted that converts Jupyter notebook, or Rmd, or Google docs authored content into that format.

If there is structure in the format (as there was in OU-XML), I can use that as a basis for exploring what might be done if we can treat the whole collection of OU authored course materials as a database and exploring what sorts of secondary products, or alternative ways of using that content, might be possible.

If the formats aren’t sorted yet, maybe my play would help identify minor tweaks that could make content more, or less, useful. (Of course, this might be a distraction.)

I might also be able to comment on the UI…

But is this likely to happen? Is it f**k, because the OU is an enterprise that’s sold corporate, enterprise IT thinking from muppets who only know “agile” (or is that “analytics”?), “beta”, “cloud” and “digital” as bingo terms that people pay handsomely for. And we don’t do any of them because nobody knows what they mean…

* So for example, in Pondering What “Digital First” and “University of the Cloud” Mean…Pondering What “Digital First” and “University of the Cloud” Mean…, I mention things like “virtual machines” and “Docker” and servers and services. If you think that’s too technical, you know what you can do with your cloud briefings…

The OU was innovative because folk understood technologies of all sorts and made creative use of them. Many of our courses included emerging technologies that were examples of the technologies being taught in the courses. We ate the dogfood we were telling students about. Now we’ve put the dog down and just show students cat pictures given to us by consultants.

Pondering What “Digital First” and “University of the Cloud” Mean…

In a briefing to OU staff from senior management earlier this week, VC Peter Horrocks channelled KPMG consultants with talk of the OU becoming “digital first”, and reimagining itself as a “University of the Cloud”, updating the original idea of it being a “University of the Air” [Open University jobs at risk in £100m ‘root and branch’ overhaul].

I have no clear idea what “digital” means, nor what it would mean to be a “university of the cloud” (if things are cloudy, does that mean we can’t do blue skies thinking; or that there is a silver lining somewhere?!;-). But here are a few things I’d like to explore that are based on trends that I think have been emerging for the last few years (and which I can date from historical OUseful.info blog posts, both here and in the original ouseful archive, which dates back to 2005..)

From Applications to Apps

In recent years, we’ve seen a move away from installed software applications that are self contained and run, offline, on a host computer, and towards installed apps that often have tight integration to online services. Apps may run, in part, offline, but they prefer it when there’s a network connection. Online apps exist solely elsewhere and are accessed via a browser.

Where the code lives, and where data files are stored, has implications for the user. If you’re using an online app, you need a reliable network connection. If all you use are online apps, a tablet or a Chromebook are fine. This in turn impacts on providers of services that make make use of software (such as the OU, for example). I’ve been wittering on for years that if all students have is a Chromebook, then we’re excluding them if our courses require them to have a computer onto which you can install a “traditional” software application. This tends to fall on deaf ears – two new level 1 courses, both currently in production and that don’t launch until later this year and next year, and that would typically be expected to have a life of several years, both make use of desktop software installs. I suspect this is not university of the cloud style thinking.

So Browser First…

The view I have had for several years is that all software services we expect students to be able to access should be accessed via a browser. This frees us up to deliver applications onto the desktop that expose themselves to students via the browser, or deliver the services from a remote online host. This could be an OU delivered service (for example, via the OpenSTEM Lab), a third party delivered service (such as Azure Notebooks), or a service managed on a remote host by the student themselves (for example, the TM351 Amazon/AWS AMI we are testing at the moment).

For TM351, we took an early decision to use just such browser accessed tools for the course, in particular Jupyter notebooks and Open Refine (along with some other “headless” database services). For convenience, these were packaged inside a single virtual machine that could be installed on a “traditional” computer (Windows or Mac). Running the virtual machine exposed the services via the browser. A shared directory meant student files were kept on the host computer but could be accessed by the services running inside the VM. Although we did not explicitly provide support for students who only had access to a tablet or Chromebook that could not run the VM, a proof of concept solution using linked Docker containers that could be run on a cloud host was available was an emergency fall back.

In updating the TM351 VM for the next presentation of the course, we are also exploring making the VM available at least as an AWS (Amazon Web Services) machine instance (AMI), which would allow a student to run the VM, at their own cost, on a remote Amazon server and access the course software via their browser.

The applications that live inside the TM351 virtual machine have also been broken out into separate Docker containers. These can be combined and launched in a multitude of ways. For example, OpenRefine running on its own, Jupyter notebooks running on their own, Jupyter notebooks + PostgreSQL running in a linked fashion, Jupyter notebooks + MongoDB running in a linked fashion. The use of Docker means that the services can also be run locally on an offline student computer running Docker, or they can be run on a remote server (that is, in the cloud) and accessed via a browser. This approach means we can continue to provide software to students that runs on their own computer, but we can also provide exactly that same software from a remote host that lives “in the cloud”.

A couple of other examples of using VMs do exist in a couple of computing courses, but from what I can tell there is little interest in trying to push our thinking about how virtualised computing can be used to support either computing courses or other courses with computing needs. The part of the OU that provides “digital” support to the OU has little, if any, experience in providing cloud services, and to date there seems to have been little, if any, capacity in trying to explore this area. Digital. Cloud. Hmmm…

The use of containerised services can also extend outside the computing curriculum to other subject areas. I’ve tried to float the idea of a “digital applications shelf” in the Library that publishes standalone, virtualised (containerised) services, along with scripts for combining them (example, or as in the case of linked TM351 applications) but never really got very far with it. It only takes a little bit of imagination to see how this might work (a Dockerhub image shelf, a repository of Docker compose scripts that can wire containers created from images pulled off the shelf together), but that imagination, again, seems to be lacking. (I could be spectacularly and completely wrong, of course!;-)

What I don’t think we should be doing is making remote desktops available to students that run installed software applications (I don’t think we should give them access to a remote Windows desktop, for example, running the Windows installed software we might traditionally have developed). We should be using software that runs as a service and is delivered directly through the browser. Service based, personal computing in the cloud.

As well as providing software that students can run themselves, there’s also the question of students being provided directly with OU hosted (or at least, OU badged) online applications. We started to have some early discussions internally about a “computational wing” for the OpenSTEM Lab, but that appears to have stalled. My personal roadmap for that would be to start off by making use of a couple of open source programming environments that can be accessed via a browser and that can be run at scale (Jupyter notebooks, RStudio Server, Shiny Server). This would give us operational experience in installing and managing this sort of service – or we could pay someone else to do it… Between them, these three environments support a wide variety of computational activities. Shiny Server, and Jupyter Dashboards, also provide a means for rapidly developing and publishing small interactive applications. Shiny server has already been used in at least one course to deliver some simple interactive applications created by a self-admitted not-very-technical academic.

Exploring this technologies can also support self-service research computing, although I got the feeling the non-teaching related research may be losing support… (That said, I don’t see many/any folk researching “cloud/digital infrastructure and workflows”, or emerging trends in personal computing tech and end user application wrangling, for teaching purposes or otherwise…

Digital Production Methods

I know I’m biased, but I think the OU is way behind the curve in document creation and production methods. Over the last two or three years, reproducible research methods have spawned a range of innovations in supporting the creation and publication of interactive and “generated” content (examples). This speeds up production and cuts down own maintenance. Documents carry the “source code” for the creation of media assets contained within them, and produce assets that reflect the state of other parts of the document around them. This avoids problems of drift between things like code fragments and the outputs the output they produce, as well as syntax errors introduced as part of the editing process.

The depiction of media assets as computational objects also means they can be restyled by applying a different stylistic theme to the asset without changing the actual content (example1, example2).

The ability to both static and interactive outputs from the same media asset object is also very powerful. For example, the mpld3 python package can take a matplotlib chart object that would naturally be rendered using an image format and generate an interactive HTML chart from it – no extra work required.

Helper libraries (with customisable templates) also mean that it can be quite simple to generate complex, templated interactive code quite straightforwardly. I may not know how to write the code to publish an interactive Google map, but I don’t have to when there’s a myriad number of packages out there that will create the code for me, and put a marker on the map in the appropriate place if I give it a location.

Publishing workflows, such as the ones based around the R knitr package or Jupyter notebook nbconvert tool also mean that source documents represented using a simple text format (markdown) can be rendered in a variety of styles and document formats (HTML pages, PDF, .docx, HTML slideshows).

At one point I started to explore Jupyter based workflows, but FutureLearn head in the sand + OU production process fascism put a rapid halt to that. Can it really also be nearly 3 years ago since I used knitr to first publish my Wrangling F1 Data With R book to Leanpub?! (That was motivated originally purely as a way of exploring how to go from RMarkdown to a print publication in an automated way.)

I’m not sure at all what the new OU OpenCreate tool looks like, or supports, or how the workflow flows (I asked for a beta invite… still no reply…) but I wonder if any of the team even looked at things like the knitr workflow and if so, what they thought of it and how the OpenCreate workflow compares? And to what extent asset creation and whole-content maintenance plays an integrated part in it? I also wonder how “digital” it is…

Related, in sense of “digital first” – via Cameron Neylon: As a researcher…I’m a bit bloody fed up with Data Management.

PS The cuts that aren’t cuts because some of the money will be spent elsewhere may also mean I might need a new job soon. FWIW, I tend to work from home, swear a lot on Twitter about whatever I happen to thinking about at the time, and am not a team player. I am convinced my imposter syndrome is actually the Dunning Krueger effect in play. Skillset: quirky.

Using Jupyter Notebooks For Assessment – Export as Word (.docx) Extension

One of the things we still haven’t properly worked out in our Data management and analysis (TM351 course is how best to handle Jupyter notebook based assignments. The assignments are set using a notebook to describe the tasks to be completed and completed by the student. We then need some mechanism for:

  • students to submit the assessment electronically;
  • markers mark assessments for their students: if the document contains a lot of OU text, it can be hard for the marker to locate the student text;
  • markers may provide on-script feedback; this means the marker needs to be able to edit the document and make changes/annotations.
  • markers return scripts to students;
  • students read feedback – so they need to be able to locate and distinguish the marker feedback within the document.

One Frankenstein process we tried was for students to save a Jupyter notebook file as a Markdown or HTML document and then convert it to a Microsoft Word document using pandoc.

This document could then be submitted and marked in a traditional way, with markers using comments and track chances to annotate the student script. Unfortunately, our original 32 bit VM meant we had to use an old version of pandoc, with the result that tabular data was not handled at all well in the conversion-to-Word process.

Updating to a 64 bit virtual machine means we can update pandoc, and the Word document conversion is now much smoother. However, the conversion process still requires students to export the word document as HTML and then use pandoc to convert the HTML to to the Microsoft Word .docx format. (The Jupyter nbconvert utility does not currently export to Word.)

So to make things a little easier, here’s my first attempt at a Download Jupyter Notecbook as Word (.docx) extension to do just that. It makes use of the Jupyter notebook custom bundler extensions API which allows you to add additional options to the notebook File -> Download menu option. The code I used was also cribbed from the dashboards_bundlers which converts a notebook to a dashboard and then downloads it.

One thing it doesn’t handle at the moment are things like embedded interactive maps. I’ve previously come up with a workaround for generating static images of interactive maps created using the folium package by using selenium to render the map and grab a screenshot of it; I’m not sure if that would work in our headless VM, though? (One to try, I guess?) There’s also a related thread in the folium repo issue tracker.

The above script is placed in a wordexport folder inside a package folder containing a simple setup.py script:

from setuptools import setup

setup(name='wordexport',
      version='0.0.1',
      description='Export Jupyter notebook as .docx file',
      author='Tony Hirst',
      author_email='tony.hirst@open.ac.uk',
      license='MIT',
      packages=['wordexport'],
      zip_safe=False)

The package can be installed and the extension enabled using a riff along the lines of the following command-line commands:

echo "...wordexport install..."
#Install the wordexport (.docx exporter) extension package
pip3 install --upgrade --force-reinstall ${THISDIR}/jupyter_custom_files/nbextensions/wordexport

#Enable the wordexport extension
jupyter bundlerextension enable --py wordexport.wordexport  --sys-prefix
echo "...wordexport done"

Restart the Jupyter server after enabling the extension, and the result should be a new MS Word (.docx) option in the notebook File -> Download menu option.

First Attempt at Running the TM351 VM as an AMI on Amazon Web Services

One of the things that’s been on my to do list for ages is trying to get a version of the TM351 virtual machine (VM) up and running on Amazon Web Services (AWS) as an Amazon Machine Instance (AMI). This would allow students who are having trouble running the VM on their own computer to access the services running in the cloud.

(Obviously, it would be preferable if we could offer such a service via OU operated servers, but I can’t do politics well enough, and don’t have the mentality to attend enough of the necessary say-the-same-thing-again-again meetings, to make that sort of thing happen.)

So… a first attempt is up on the eu-west-1 region in all its insecure glory: TM351 AMI v1. The security model is by obscurity as much as anything – there’s no model for setting separate passwords for separate students, for example, or checking back agains an OU auth layer. And I suspect everything runs as root…

(One of the things we have noticed in (brief) testing is that the Getting Started instructions don’t work inside the OU, at least if you try to limit access to your (supposed) IP address. Reminds of when we gave up trying to build the OU VM from machines on the OU network because solving proxy and blocked port issues was an irrelevant problem to have to worry about when working from the outside…)

Open Refine doesn’t seem to want to run with the other services in the free tier micro (1GB) machine instance, but at 2GB everything seems okay. (I don’t know if possible race conditions in starting services means that Open Refine could start and then block the Jupyter service’s request for resource.  I need to do an Apollo 13 style startup sequence exploration to see if all services can run in 1GB, I guess!) One thing I’ve added to the to do list is to split things out so into separate AMIs that will work on the 1GB free tier machines. I also want to check that I can provision the AMI from Vagrant, so students could then launch a local VM or an Amazon Instance that way, just by changing the vagrant provider. (Shared folders/volumes might get a bit messed up in that case, though?)

If services can run one at a time in the 1GB machines, it’d be nice to provide a simple dashboard to start and stop the services to make that easier to manage. Something that looks a bit like this, for example, exposed via an authenticated web page:

This needn’t be too complex – I had in mind a simple Python web app that could run under nginx (which currently provides a simple authentication layer for Open Refine to sit behind) and then just runs simple systemctl start, stop and restart commands on the appropriate service.

#fragment...
import os
os.system('systemctl restart jupyter.service')

I’m not sure how the status should be updated (based on whether a service is running or not) or what heartbeat it should update to. There may be better ways, of course, in which case please let me know via the comments:-)

I did have a quick look round for examples, but the dashboards/monitoring tools that do exist, such as pydash, are far more elaborate than what I had in mind. (If you know of a simple example to do the above, or can knock one up for me, please let me know via the comments. And the simpler the better ;-)

If we are to start exploring the use of browser accessed applications running inside user-managed VMs, this sort of simple application could be really handy… (Another approach would be to use a VM running docker, and then have a container manager running, such as portainer.)