Category: Tinkering

Student Workload Planning – Section Level Word Count Reports in MS Word Docs

One of the things the OU seems to have gone in for big time lately is “learning design”, with all sorts of planning tools and who knows what to try and help us estimate student workloads.

One piece of internal research I saw suggested that we “adopt a University-wide standard for study speed of 35 words per minute for difficult texts, 70 words per minute for normal texts and 120 words per minute for easy texts”. This is complemented by a recommended level 1 (first year equivalent) 60:40 split between module-directed (course text) work and student-directed (activities, exercises, self-assessment questions, forum activity etc) work. Another constraint is the available study time per week – for a 30 CAT point course (300 hours study), this is nominally set at 10 hours study per week. I seem to recall that retention charts show that retention rates go down as mean study time goes up anywhere close to this…

One of the things that seems to have been adopted is the assumption that the first year equivalent study material should all be rated at the 35 words per minute level. For 60% module led work, at 10 hours a week, this gives approximately 35 * 60 * 6 ~ 1200 words of reading per week. With novels coming in around 500 words a page, that’s 20 pages of reading or so.

This is okay for dense text but we tend to write quite around with strong narrative, using relatively straightforward prose, explaining things a step at a time, with plenty of examples. Dense sentences are rewritten and the word count goes up (but not the reading rate… Not sure I understand that?)

As part of the production process, materials go through multiple drafts and several stages of critical reading by third parties. Part of the critical reading process is to estimate (or check) workload. To assist this, materials are chunked and should be provided with word counts and estimated study times. The authoring process uses Microsoft Word.

As far as I can tell, there is an increasing drive to segment all the materials and chunk them all to be just so, one more step down the line rigidly templated materials. For a level 1 study week, the template seems to be five sections per week with four subsections each, each subsection about 500 words or so. (That is, 10 to 20 blog posts per study week…;-)

I’m not sure what, if any, productivity tools there are to automate the workload guesstimates, but over coffee this morning I though I’d have a go at writing a Visual Basic macro to do do some of the counting for me. I’m not really familiar with VB, in fact, I’m not sure I’ve ever written a macro before, but it seemed to fall together okay if the document was structured appropriately.

To whit, the structure I adopted was: a section to separate each section and subsection (which meant I could count words in each section); a heading as the first line after a section break (so the word count could be associated with the (sub)section heading). This evening, I also started doodling a convention for activities, where an activity would include a line on its own of the form – Estimated study time: NN minutes – which could then be used as a basis for an activity count and an activity study time count.

Running the macro generates a pop up report and also inserts the report at the cursor insertion point. The report for a section looks something like this:


A final summary report also gives the total number of words.

It should be easy enough to also insert wordcounts into the document at the start of each section, though I’m not sure (yet) how I could put a placeholder in at the start of each section that the macro could update with the current wordcount each time I run it? (Also how the full report could just be updated, rather than appended to the document, which could get really cluttered…) I guess I could also create a separate Word doc, or maybe populate an Excel spreadsheet, with the report data.

Another natural step would be to qualify each subsection with a conventional line declaring the estimated reading complexity level, detecting this, and using it with a WPM rate to estimate the study time of the reading material. Things are complicated somewhat by my version of Word (on a Mac) not supporting regular expressions, but then, in the spirit of trying to build tools at the same level of complexity as the level at which we’re teaching, regex are probably out of scope (too hard, I suspect…)

To my mind, exploring such productivity tools is the sort of thing we should naturally do; at least, it’s the sort of thing that felt natural in a technology department. Computing seems different; computing doesn’t seem to be about understanding the technical world around us and getting our hands dirty with it. It’s about… actually, I’m not sure what it’s about. The above sketch really was a displacement activity – I have no misconceptions at all that the above will generate any interest at all, not even as a simple daily learning exercise (I still try to learn, build or create something new every day to keep the boredom away…) In fact, the “musical differences” between my view of the world and pretty much everyone else’s is getting to the stage where I’m not sure it’s tenable any more. The holiday break can’t come quickly enough… Roll on HoG at the weekend…

Sub WordCount()

    Dim NumSec As Integer
    Dim S As Integer
    Dim Summary As String

    Dim SubsectionCnt As Integer
    Dim SubsectionWordCnt As Integer
    Dim SectionText As String

    Dim ActivityTime As Integer
    Dim OverallActivityTime As Integer
    Dim SectionActivities As Integer

    Dim ParaText As String

    Dim ActivityTimeStr As String

    ActivityTime = 0
    OverallActivityTime = 0
    SectionActivities = 0

    SubsectionCnt = 0
    SubsectionWordCnt = 0

    NumSec = ActiveDocument.Sections.Count
    Summary = "Word Count" & vbCrLf

    For S = 1 To NumSec
        SectionText = ActiveDocument.Sections(S).Range.Paragraphs(1).Range.Text

        For P = 1 To ActiveDocument.Sections(S).Range.Paragraphs.Count
            ParaText = ActiveDocument.Sections(S).Range.Paragraphs(P).Range.Text
            If InStr(ParaText, "Estimated study time:") Then
                ActivityTimeStr = ParaText
                ActivityTimeStr = Replace(ActivityTimeStr, "Estimated study time: ", "")
                ActivityTimeStr = Replace(ActivityTimeStr, " minutes", "")
                ActivityTime = ActivityTime + CInt(ActivityTimeStr)
                SectionActivities = SectionActivities + 1
            End If

        If InStr(SectionText, "Section") = 1 Then
            OverallActivityTime = OverallActivityTime + OverallActivityTime
            Summary = Summary & vbCrLf & "SECTION SUMMARY" & vbCrLf _
            & "Subsections: " & SubsectionCnt & vbCrLf _
            & "Section Wordcount: " & SubsectionWordCnt & vbCrLf _
            & "Section Activity Time: " & ActivityTime & vbCrLf _
            & "Section Activity Count: " & SectionActivities & vbCrLf & vbCrLf
            SubsectionCnt = 0
            SubsectionWordCnt = 0
            ActivityTime = 0
            SectionActivities = 0
        End If

        Summary = Summary & "[Document Section " & S & "] " _
        & SectionText _
        & "Word count: " _
        & ActiveDocument.Sections(S).Range.Words.Count _
        & vbCrLf

        SubsectionCnt = SubsectionCnt + 1
        SubsectionWordCnt = SubsectionWordCnt + ActiveDocument.Sections(S).Range.Words.Count

    Summary = Summary & vbCrLf & vbCrLf & "Overall document wordcount: " & _

    Summary = Summary & vbCrLf & "Activity Time: " & ActivityTime & " minutes"
    MsgBox Summary

    Selection.Paragraphs(1).Range.InsertAfter vbCr & Summary & vbCrLf
End Sub

PS I’ve no idea what idiomatic VB is supposed to look like; all the examples I saw seemed universally horrible… If you can give me any pointers to cleaning the above code up, feel free to add them in the comments…

PPS Thinks… I guess each section could also return a readability score? Does VB have a readability score function? VB code anywhere implementing readability scores?

Getting Started With Neo4j and Companies House OpenData

One of the things that’s been on my to do list for ages has been to start playing with the neo4j graph database. I finally got round to having a dabble last night, and made a start trying to figure out how to load some sample data in.

The data I looked at came in two flavours, both bulk data downloads from Companies House:, a JSON dataset containing beneficial ownership/significant control data, and a tabular, CSV dataset containing basic company information.

To simplify running neo4j, I created a simple docker-compose.yml file that would fire up a couple of linked containers – one running neo4j, the other running a Jupyter notebook that I could run queries from. (Actually, I think neo4j has its own web UI, but I’m more comfortable in writing Python scripts in the Jupyter environment.)

#visit 7474 and change the default password - eg to: neo4jch
  image: neo4j
    - "7474:7474"
    - "1337:1337"
    - /opt/data

  image: jupyter/scipy-notebook
    - "8890:8888"
    - neo4jch:neo4j
    - ./notebooks:/home/jovyan/work

To launch things, I tend to run Kitematic, launch a docker command line, cd to the directory containing the above YAML file, then run docker-compose up -d. Kitematic then provides links to the neo4j and Jupyter web page UIs. One thing to note is that neo4j seems to want it’s default password changing – go to the container’s page on port 7474 and reset the password – I changed mine to neo4jch. Once launched, the containers can be suspended with the command docker-compose stop and resumed with docker-compose start.

I’ve popped an example notebook up here, along with a couple of sample data files, that shows how to load both sorts of data (the hierarchical JSON data, and the flat CSV table, into neo4j, along with a couple of sample queries.

That said, I’m not sure how good the examples are – I still need to read the documentation! (For example, via @markhneedham, “MERGE is MATCH/CREATE so you can use the same query on new/existing companies” which should let me figure out how to properly create company information nodes and them link to them from beneficial owners.)

Here are some examples of my starting attempts at the data ingest. Firstly, for JSON data that looks like this:

  "company_number": "09145694",
  "data": {
    "address": {
      "address_line_1": "****",
      "locality": "****",
      "postal_code": "****",
      "premises": "****",
      "region": "****"
    "country_of_residence": "England",
    "date_of_birth": {
      "month": *,
      "year": *
    "etag": "****",
    "kind": "individual-person-with-significant-control",
    "links": {
      "self": "/company/09145694/persons-with-significant-control/individual/bIhuKnMFctSnjrDjUG8n3NgOrlU"
    "name": "***",
    "name_elements": {
      "forename": "***",
      "middle_name": "***",
      "surname": "***",
      "title": "***"
    "nationality": "***",
    "natures_of_control": [
    "notified_on": "2016-04-06"

The following bit of Cypher script seems to load the data in:

with'snapshot_beneficialsmall.txt', 'r', 'utf-8-sig') as f:
    for line in f:
        jdata = json.loads(line)
        query = """
WITH {jdata} AS jd
MERGE (beneficialowner:BeneficialOwner {name:}) ON CREATE
  SET beneficialowner.nationality =, beneficialowner.country_of_residence =
MERGE (company:Company {companynumber: jd.company_number})
MERGE (beneficialowner)-[:BENEFICIALOWNEROF]->(company)
FOREACH (noc IN | MERGE (beneficialowner)-[:BENEFICIALOWNEROF {kind:noc}]->(company))
""", jdata = jdata)

For the CSV data, I tried the following recipe:

import csv
#Ideally, we create a company:Company node with a company either here
#and then link to it from the beneficial ownership data?
with open('snapshotcompanydata.csv','r') as csvfile:
    #need to clean the column names by stripping whitespace
    reader = csv.DictReader(csvfile,skipinitialspace=True)
    for row in reader:
        WITH {row} AS row
        MERGE (company:Company {companynumber: row.CompanyNumber}) ON CREATE
  SET = row.CompanyName

        MERGE (address:Address {postcode : row["RegAddress.PostCode"]}) ON CREATE
        SET address.line1=row['RegAddress.AddressLine1'], address.line2=row['RegAddress.AddressLine2'],
        MERGE (company)-[:LOCATION]->(address)

        MERGE (companyactivity:SICCode {siccode:row['SICCode.SicText_1']})
        MERGE (company)-[:ACTIVITY]->(companyactivity)

Note the way that “dotted” column names are handled.

What these early experiments suggest is that I should probably spend a bit of time trying to model the data to work out what sort of graph structure makes sense. My gut reaction was to define node types identifying beneficial owners, companies and SIC codes. Differently attributed BENEFICIALOWNEROF edges identify what sort of control a beneficial owner has.


However, for generality, I think I should define a more general person node, who could also have DIRECTORROLE edges linking them to companies with attributes correpsponding to things like “director”, “company secretary”, “nominee direcotor” etc? (I don’t think director information is available as a download from Companies House, but it could be accreted/cached into my own database each time I look up director information via the Companies House API.)

A couple of other things that need addressing: constraints (so for example, we should only have one node per company number – the correlate of company numbers being a unique key in a relational datatable (via @markhneedham, s/thing like CREATE CONSTRAINT ON (c:Company) ASSERT c. companynumber is UNIQUE maybe…); and indexes – it would probably make sense to create an index on something company numbers, for example.

Next on the to do list, some example queries on the data as I currently have it modelled to see what sorts of question we can ask and what sorts of network we can extract (I may need to add in more than the sample of data – which means I may also need to look at optimising the way the data is imported?). This might also inform how I should be modelling the data!;-)

Related: Trawling the Companies House API to Generate Co-Director Networks.

See also: Getting Started With the Neo4j Graph Database – Linking Neo4j and Jupyter SciPy Docker Containers Using Docker Compose and Accessing a Neo4j Graph Database Server from RStudio and Jupyter R Notebooks Using Docker Containers.

PS also via @markhneedham, one to explore when eg annotating a pre-existing node with additional attributes from a new dataset, something along lines of MERGE (c:Company {…}) SET c.newProp1 = “boo”, c.newProp2 = “blah” etc…

A First Attempt at An Amazon Echo Alexa Skills App Using Python: Parlibot, A UK Parliament Agent

Over the last couple of years, I’ve been dabbling with producing simple textual reports from datasets that can be returned in response to simple natural language style queries using chat interfaces such as Slack (for example, Sketching a Slack Slash Parliamentary Auto-Responder Using AWS Lambda Functions). The Amazon Echo, which  launches in the UK at the end of September, provides another context for publishing natural languages style responses, in this case in the form of spoken responses to spoken requests.

In the same way that apps brought a large amount of feature level functionality to mobile phones, the Amazon Echo provides an opportunity for publishers to develop “skills” that can respond to particular voice command issued within hearing of the Echo. Amazon is hopeful that one class of commands  –  Smart Home Skills – will be used to bootstrap a smart home ecosystem that allows you to interact with smart-home devices though voice commands, such as commands to turn your lights on and off, or questions about the status of your home, (“is the garage door still open?”, for example). Another class of services relate to more general information based services, or even games, which can be developed using a second API environment, the Alexa Skills KitFor a full range of available skills, see the Alexa Skills Store.

The Alexa Skills Kit has a similar sort of usability to other AWS services (i.e. it’s a bit rubbish…), but I thought I’d give it a go repurposing some old functions around the UK Parliament API, such as finding out which committees a particular MP sits on, or who are the members of a particular committee, as well as some new ones.

For example, I thought it might be amusing to try to implement a skill that could respond to questions like the following :

  • what written statements were published last week?
  • were there any written statements published last Tuesday?

using some of the “natural language” date-related Python functions I dabbled with yesterday.

One of the nice things about the Alexa Skills API is that it also supports conversational contexts. For example, an answer to one of the above questions (generated by my code) might take the form “There were 27 written statements published then”, but session state associated with that response can also be passed back as metadata to the Alexa service, and then returned from Alexa as session metadata attached to a follow-up question. The answer to the follow-up questions that can then draw on context generated earlier in the conversation. So for example, exchanges such as the following now become possible:

  • Q: were there any written statements published last Tuesday?
  • A: There were 27 written statements published then. Do you want to know them all?
  • Q: No, just the ones from DCLG.
  • A: Okay, there were three written statements issued by the Department for Communities and Local Government last week. One on …. by….; etc etc 

So how can we build an Alexa Skill? I opted for implementing one using Python, with the answer engine running on my Reclaim Hosting webserver rather than as an AWS Lambda Function, which I think Amazon would prefer. (The AWS Lambda functions are essentially free, but it means you have to go through the pain of using another AWS service.) For an example of getting a Python application up and running on your own web host using cPanel, see here.

To make life simpler, I installed the Flask-ASK library (docs), which extends the Flask web application framework so that it plays nicely with the Alexa Skills API. (There’s a standalone tutorial that runs without the need for any web hosting described here: Flask-Ask: A New Python Framework for Rapid Alexa Skills Kit Development.)

The Flask-Ask library allows you to create two sorts of response types in your application that can respond to “intents” defined as part of the Alexa skill itself:

  • a statement, which is a response Alexa that essentially closes a session;
  • and a question, which keeps the session open and allows you to pop session state into the response so you can get it back as part of the next intent command issued from Alexa in that conversation.

The following bit of code shows how to decorate a function that will handle a particular Alexa Skill intent. The session variable can be used to pass session state back to Alexa that can be returned as part of the next intent. The question() wrapper packages up the response (txt) appropriately and keeps the conversational session alive.

def writtenStatement(period,myperiod):
    session.attributes['period'] = period
    session.attributes['myperiod'] = myperiod
    session.attributes['typ'] = 'WrittenStatementIntent'
    if tmp!='': txt='{} Do you want to hear them all?'.format(txt)
    else: txt='I don't know of any.'
    return question(txt)

We might then handle a response identified as to the affirmative (“yes, tell me them all”) using something like the following, which picks up the session state from the response, generates a paragraph describing all the written statements and returns it, suitably packaged, as a session ending statement().

def sayThemAll():
    period= session.attributes['period']
    myperiod= session.attributes['myperiod']
    return statement(tmp)

So how do we define things on the Alexa side?  (An early draft of my config can be found here.) To start with, we need to create a new skill and give it a name. A unique ID is created for the application that is passed in all service requests that we can use a key to decide whether or not to accept and respond to a request from the Alexa Skill server in our application logic. (For convenience, I defined an open service that can accept all requests. I’m not sure if Flask-Ask has a setting that allows the application to be tied to one or more Alexa Skill IDs?)


The second thing we need to do is actually define the interactions that the skill will engage in. This is composed of three parts:

  • an Intent Schema, defined as a JSON object, that specifies a list of intents that the skill can handle. Each intent must be given a unique label (for example, “AllOfThemIntent”), and may be associated with one or more slots. Each slot has a name and a type. The name corresponds to the name of a variable that may be captured and passed (under that name) to the application handler; the type is either a predefined Amazon data type (for example,  AMAZON.DATE, which captures date like things (including some simple natural language date terms, such as yesterday) or a custom data type;
  • one or more user-defined custom data types, defined as a list of keywords that Alexa will try to match exactly (I think? I don’t think fuzzy match, partial match or regular expression matching is supported? If it is, please let me know how via the comments…)
  • some sample utterances, keyed by intent and giving an example of a phrase that the skill should be able to handle; slots may be included in the example utterances, using the appropriate name as provided in the corresponding intent definition.


In the above case, I start to define a conversation where a WrittenStatementIntent is intended to identify written statements published on a particular day or over a particular period, and then a follow up AllOfThemIntent can be used to list the details of all of them or a  LimitByDeptIntent can be used to limit the reporting to just statements from a specific department.

When you update the interaction model, it needs rebuilding which may take some time (wait for the spinny thing over the Interaction Model menu item to stop before you try to test anything).

The next part of the definition is used to specify where the application logic can be found. As mentioned, this may be defined as an AWS Lambda function, or you can host it yourself on an https server. In the latter case, for a Flask app, you need to provide a URL where the root of the application is served from.


If you are using your own host, you need to provide some information about the trust certificate. I published my application logic as an app on Reclaim Hosting, which appears to offer https out of the can (though I haven’t tried it for a live/published Alexa skill yet.)


With the config stuff all in place, you now just need to make sure some application logic is in place to handle it.

For reference, along with the stub of application logic shown above (which just needs a dummy statementGrabber() function that optionally accepts a couple of arguments and that returns a couple of text strings for testing purposes) I also topped my application with the following set-up components (note that as part of the WSGI handling that cPanel uses to run the app, I am creating an application variable that points to it).

import logging
from random import randint
from flask import Flask, render_template
from flask_ask import Ask, statement, question, session

app = Flask(__name__)


ask = Ask(app, "/")

At the end of the application code, we can fire it up…

if __name__ == '__main__':

Get the app running on the server, and now we can test it from the Alexa Skills environment. Unlike deployed skills accessed via the echo, we don’t need to “summon” the app for testing purposes – we can just enter the utterance directly. The JSON code passed to the server is displayed as the Service Request and the Service Response from the application server is also displayed.


The test panel can also handle conversations established by using Flask-Ask question() wrappers, as shown below:


In this case, we filter down on the written statements for last Thursday to just report on the ones issued by the Department for Culture, Media and Sport.

It’s worth noting that Alexa seems to have a limit on the number of  characters allowed when generating a voice output (8000 characters). For large responses, this suggests that adding some sort of sensible paging handler to the application logic could make sense if you need to return a large response; for example, something that chunks up up the response, tells it you piece by piece, and prompts you between each chunk to check you want to hear the next part.

With testing done, and a working app up and running, all that remains is to go through the legal fluff reuiqred to submit the app for publishing (which I haven’t done; a note says you can’t edit the app whilst it’s undergoing approval, but I;m not sure if you can then go back to editing it once it is published?)

A couple of things I learned along the way: firstly, when defining slots, it can be useful to have a controlled vocabulary to hand. For Parliament, things like the Members’ API Reference Data Service can be handy, eg for generating a list of MP names or committee names (in another post I’ll give some more examples about some of the queries I can run). Secondly, when thinking about conversation design, you need to think about the various bits of state than can be associated with a conversation. For example, when making a query about an MP, it makes sense to retain the name (or an identifier for) the MP as part of the session state so that you can refer to that later. If a conversation went “who is the MP for the Isle of Wight?”, “what committees are they on?”, “who else is on those committees?” , it would make sense to capture the list of committees as state somehow when responding to the second question.

One approach I took to managing state within the application was to cache calls to URLs requested in forming the response to one question. If I preserved enough session state to allow me to pull that cached data, I could reanalyse it without having to re-request it from the original URL when putting together a response to a follow up question.

Something it would be nice to have is a list of synonyms for terms in the slots definition, and maybe even a crude lookup that could be used as part of an OpenRefine style reconciliation service to try to partially match slot terms. (I’m not sure how well the model building does this anyway, eg if you put near misses in the slot definitions; or whether it just does exact matching?)

Another takeaway is that it probably makes sense to try to design the code for generating text from data or APIs so that it can be used in a variety of contexts – Slack, Alexa/Echo, email, press release generation, etc, – without much, if any, retooling. Ideally, it would make sense to define a set of test generation functions or API calls that could in turn be called via use-case application wrappers (eg one for Slack, one for Alexa, etc). Issues arise here when it comes to conversation management. Alexa manages conversations via session state, for example. But maybe can help here, by acting as application independent conversational middleware? That’ll be the next app I need to play with…

PS If you would like to see further posts here exploring Amazon Echo/Alexa skills, why not help me explore the context and gift me an Echo from my Patronage Wishlist?

PPS Example queries: ask parlibot Andrew Turner committees; ask parlibot research papers on animals; any written answers.

PPPS code example.

“Natural Language” Time Periods in Python

Mulling over a search feed that includes date range limits, I had a quick look for a python library that includes “natural language” functions for describing different date ranges. Not finding anything offhand, I popped some quick starter-for-ten functions up at this gist, which should also be embedded below.

It includes things like today(), tomorrow(), last_week(), later_this_month() and so on.

If you know of a “proper” library that does this, please let me know via the comments…

PS more handy fragments:

#Get month and year between two dates
import datetime
from dateutil.rrule import rrule, MONTHLY

strt_dt =,4,1)
end_dt =,10,1)

dates = ['_'.join([dt.strftime('%B').lower(), dt.strftime('%Y')]) for dt in rrule(MONTHLY, dtstart=strt_dt, until=end_dt)]

Creating a Simple Python Flask App via cPanel on Reclaim Hosting

I’ve had my Reclaim Hosting package for a bit over a year now, and now really done anything with it, so I had a quick dabble tonight looking for a way of installing and running a simple Python Flask app.

Searching around, it seems that CPanel offers a way in to creating a Python application:


Seems I then get to choose a python version that will be installed into a virtualenv for the application. I also need to specify the name of a folder in which the application code will live and select the domain and path I want the application to live at:


Setting up the app generates a folder into which to put the code, along with a public folder (into which resources should go) and a file that is used by a piece of installed sysadmin voodoo magic (Phusion Passenger) to actually handle the deployment of the app. (An empty folder is also created in the public_html folder corresponding to the app’s URL path.)


Based on the Minimal Cyborg How to Deploy a Flask Python App for Cheap tutorial, needs to link to my app code.

Passenger is a web application server that provides a scriptable API for managing the running of web apps (Passenger/py documentation).

For runnin Pyhon apps, we  is used to launch the applicationif you change the wsgi file, I think yo

A flask app is normally called by running a command of the form python on the commandline. In the case of a python application, the Passenger web application manager uses a file associated with the application to manage it. In the case of our simple Flask application, this corresponds to creating an object called application  that represents it. If we create an application in a file, and create a variable application that refers to it, we can run it via the file by simply importing it: from myapp import application.

WSGI works by defining a callable object called application inside the WSGI file. This callable expects a request object, which the WSGI server provides; and returns a response object, which the WSGI server serializes and sends to the client.

Flask’s application object, created by a MyApp = Flask(__name__) call, is a valid WSGI callable object. So our WSGI file is as simple as importing the Flask application object (MyApp) from, and calling it application.

But first we need to create the application – for our demo, we can do this using a single file in the app directory. First create the file:


then open it in the online editor:


Borrowing the Minimal Cyborg “Hello World” code:

from flask import Flask
app = Flask(__name__)
application = app # our hosting requires application in passenger_wsgi

def hello():
    return "This is Hello World!\n"

if __name__ == "__main__":

I popped it into the file and saved it.

(Alternatively, I could have written the code in an editor on my desktop and uploaded the files.)

We now need to edit the  file so that it loads in the app code and gets from it an object that the Passenger runner can work with. The simplest approach seemed to be to load in the file (from myapp) and get the variable pointing to the flask application from it (import application). I think that Passenger requires the object be made available in a variable called application?


That is, comment out the original contents of the file (just in case we want to crib from them later!) and import the application from the app file: from myapp import application.

So what happens if I now try to run the app?


Okay – it seemed to do something but threw an error – the flask package couldn’t be imported. Minimal Cyborg provides a hint again, specifically “make sure the packages you need are installed”. Back in the app config area, we can identify packages we want to add, and then update the virtualenv used for the app to install them.

cpanel_-_setup_python_app2And if we now try to run the app again:

So now it seems I have a place I can pop some simple Python apps – like some simple Slack/slash command handlers, perhaps…

PS if you want to restart the application, I’m guessing all you have to do is click the Restart button in the appropriate Python app control panel.

Simple Demo of Green Screen Principle in a Jupyter Notebook Using MyBinder

One of my favourite bits of edtech  in the form of open educational technology infrastucture at the moment is mybinder (code), which allows you to fire up a semi-customised Docker container and run Jupyter notebooks based on the contents of a github repository. This makes is trivial to share interactive, Jupyter notebook demos, as long as you’re happy to make your notebooks public and pop them into github.

As an example, here’s a simple notebook I knocked up yesterday to demonstrate how we could created a composited image from a foreground image captured against a green screen, and a background image we wanted to place behind our foregrounded character.

The recipe was based on one I found in a Bryn Mawr College demo (Bryn Mawr is one of the places I look to for interesting ways of using Jupyter notebooks in an educational context.)

The demo works by looking at each pixel in turn in the foreground (greenscreened) image and checking its RGB colour value. If it looks to be green, use the corresponding pixel from the background image in the composited image; if it’s not green, use the colour values of the pixel in the foreground image.

The trick comes in setting appropriate threshold values to detect the green coloured background. Using Jupyter notebooks and ipywidgets, it’s easy enough to create a demo that lets you try out different “green detection” settings using sliders to select RGB colour ranges. And using mybinder, it’s trivial to share a copy of the working notebook – fire up a container and look for the Green screen.ipynb notebook: demo notebooks on mybinder.


(You can find the actual notebook code on github here.)

I was going to say that one of the things I don’t think you can do at the moment is share a link to an actual notebook, but in that respect I’d be wrong… The reason I thought was that to launch a mybinder instance, eg from the psychemedia/ou-tm11n github repo, you’d use a URL of the form; this then launches a container instance at a dynamically created location – eg http://SOME_IP_ADDRESS/user/SOME_CONTAINER_ID – with a URL and container ID that you don’t know in advance.

The notebook contents of the repo are copied into a notebooks folder in the container when the container image is built from the repo, and accessed down that path on the container URL, such as http://SOME_IP_ADDRESS/user/SOME_CONTAINER_ID/notebooks/Green%20screen%20-%20tm112.ipynb.

However, on checking, it seems that any path added to the mybinder call is passed along and appended to the URL of the dynamically created container.

Which means you can add the path to a notebook in the repo to the notebooks/ path when you call mybinder – – and the path will will passed through to the launched container.

In other words, you can share a link to a live notebook running on dynamically created container – such as this one – by calling mybinder with the local path to the notebook.

You can also go back up to the Jupyter notebook homepage from a notebook page by going up a level in the URL to the notebooks folder, eg .

I like mybinder a bit more each day:-)

Querying Panama Papers Neo4j Database Container From a Linked Jupyter Notebook Container

A few weeks ago I posted some quick doodles showing, on the one hand, how to get the Panama Papers data into a simple SQLite database and in another how to link a neo4j graph database to a Jupyter notebook server using Docker Compose.

As the original Panama Papers investigation used neo4j as its backend database, I thought putting the data into a neo4j container could give me the excuse I needed to start looking at neo4j.

Anyway, it seems as if someone has already pushed a neo4j Docker container image preseeded with the Panama Papers data, so here’s my quickstart.

To use it, you need to have Docker installed, download the docker-compose.yaml file and then run:

docker-compose up

If you do this from a command line launched from Kitematic, Kitematic should provide you with a link to the neo4j database, running on the Docker IP address and port 7474. Log in with the default credentials ( neo4j / neo4j ) and change the password to panamapapers (all lower case).

Download the quickstart notebook into the newly created notebooks directory, and you should be able to see it from the notebooks homepage on Docker IP address port 8890 (or again, just follow the link from Kitematic).

I’m still trying to find my way around both the py2neo Python wrapper and the neo4j Cypher query language, so the demo thus far is not that inspiring!

And I’m not sure when I’ll get a chance to look at it again…:-(