OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Tinkering’ Category

OpenRefine Docker Containers

leave a comment »

I had a go at building a couple of docker containers for OpenRefine, one from the latest release and one from the latest code on github:

In order to create the virtual machine, you should:

  • install boot2docker
  • run boot2docker
  • Either: to run with a project directory solely within the container, in the boot2docker terminal, enter the command docker run --name openrefine -d -p 3334:3333 psychemedia/openrefine
  • Or: to run with a project directory mounted from a shared folder on the host, in the boot2docker terminal, enter the command docker run -d -p 3334:3333 -v /path/to/yourSharedDirectory:/mnt/refine --name openrefine psychemedia/openrefine
  • Or: to run with a project directory in a linked data volume, in the boot2docker terminal, enter the command docker run -d -p 3334:3333 -v /mnt/refine --name openrefine psychemedia/openrefine

(To use the latest release rather than a recent build use psychemedia/docker-openrefine rather than psychemedia/openrefine.)

The port number you will be able to find OpenRefine on is given by the first number set in the flag -p NNNN:3333. To access OpenRefine via port 3334, use -p 3334:3333 etc.

OpenRefine will then be available via your browser at the URL http://IPADDRESS:NNNN. To find the required value of IPADDRESS can be found using the command boot2docker ip

The returned IP address (eg is the IP address you can find OpenRefine on, for example:

Written by Tony Hirst

January 17, 2015 at 1:53 pm

Posted in School_Of_Data, Tinkering

Tagged with

Using Docker to Build Linked Container Course VMs

with 2 comments

Having spent bits of last year tinkering with vagrant and puppet as part of a workflow for building and deploying course related VMs in a scaleable way for a distance education context (trying to be OUseful here…) I’ve more recently started pondering whether it makes more sense to create virtual machines from linked data containers.

Some advantages of the “all in one flat VM” approach seem to be that we can construct puppet files to build particular components and then compose the final machine configuration from a single Vagrant script pulling in those separate components. Whilst this works when developing a VM for use by students on their own machines, it perhaps makes less sense if we were to provide remote hosted access to student VMs. There is an overhead associated with running a VM which needs to be taken into account if you need to scale. In terms of help desk support, the all-in-one VM approach offers a couple of switch it off and switch it on again opportunities: a machine can be shutdown and restarted/reprovisioned, or if necessary can be deleted and reinstalled though this latter loses any state that was saved internally in the VM by the student). If a particular application in the VM needs shutting down and restarting, then a specific stop/start instruction is required for each application.

On the other hand, a docker route in which each virtual application is launched inside its own container, and those containers are then linked together to provide the desired student VM configuration, means that if an application needs to be restarted, we can just destroy the container and fire up a replacement (though we’d probably need to find ways of preserving – or deleting – state associated with a particular application container too). If applications run as services, and for example I have a notebook server connected to a database server, if I destroy the database server container, I should be able to point the notebook server to the new database server – if I know the address of the new database server…

After a bit of searching around, I came across an example of creating a configuration not too dissimilar from the TM351 virtual machine configuration, but built from linked containers: Using Docker for data science, part 2 [Calvin Giles]. The machine is constructed from several containers, wired together using this fig script:

    command: echo created
    image: busybox
        - "~/Google Drive/notebooks:/notebooks"
    command: echo created
    image: busybox
        - "~/Google Drive/data:/data"

    command: echo created
    image: busybox
        - /var/lib/postgresql/data

    image: postgres
        - "5433:5432"
        - devpostgresdata

        - PASSWORD
    image: calvingiles/data-science-environment
        - devpostgres:postgres
        - "443:8888"
        - notebooks
        - data

(WordPress code plugin & editor sucking atm wrt the way it keeps trying to escape stuff…)

(Fig is a tool for building multiple docker containers and wiring them together, a scripted version of something like Panamax. The main analysis application – calvingiles/data-science-environment – is a slight extension of ipython/scipyserver.)

With fig and boot2docker installed, and the fig script downloaded into the current working directory:

curl -L https://gist.githubusercontent.com/calvingiles/b6123c301954fe68e29a/raw/data-science-environment-fig.yml > fig.yml

the following two lines of code make sure that any previous copies of the containers are purged, and a new set of containers fired up with the desired password settings:

fig rm

(The script also creates Google Drive folders into which copies of the notebooks will be located and shared between the VM containers and the host.)

The notebooks can then be accessed via browser, (you need to log in with the specified password – MyPass from the example above); the location of the notebooks is https//IP.ADDRESS:443 (note the https, which may require you saying “yes, really load the page” to Google Chrome – though it is possible to configure the server to use just http) where IP.ADDRESS can be found by running boot2docker ip.

One thing I had trouble with at first was connecting the IPython notebook to the PostgreSQL database server (I couldn’t see it on localhost). I found I needed to connect to the actual IP address within the VM of the database container.

I found this address (IPADDRESS) from the docker commandline using: fig run devpostgres env (where devpostgres is the name of the database server container). The port is the actual server port number rather than the forwarded port number:

import psycopg2
con = psycopg2.connect(host=IPADDRESS,port=5432,user='postgres',password='PGPass')

I also came up with a workaround (as described in this issue I raised) but this seems messy to me – there must be a better way? Note how we connect to the forwarded port:

#Via http://blog.michaelhamrah.com/2014/06/accessing-the-docker-host-server-within-a-container/
#Get the IP address of the docker host server inside the VM
# I assume this is like a sort of 'localhost' for the space in which the containers float around?
IPADDRESS=!netstat -nr | grep '^0\.0\.0\.0' | awk '{print $2}'

#Let's see if we can connect to the db using the forwarded port address
import psycopg2
con = psycopg2.connect(host=IPADDRESS[0],port='5433',user='postgres', password='PGPass')

#Alternativley, connect via SQL magic
!pip3 install ipython-sql
%load_ext sql
postgrescon = 'postgresql://postgres:PGPass@'+IPADDRESS[0]+':5433'

#Then cell magic via:
%%sql $postgrescon

This addressing fiddliness also raises an issue about how we would run container bundles for several students in the same VM under a hosted offering – how would any particular student know how to connect to “their” personal database server(s). [UPDATE: doh! Container linking passes name information into a container as an environmental variable: Linking Containers Together.] Would we also need to put firewall rules in place to partition the internal VM network so that a student could only see other containers from their bundle? And in the event of switch-it-off/destroy-it/start-it-up-again actions, how would any new firewall rules and communication of where to find things be managed? Or am I overcomplicating?!

Anyway – this is a candidate way for constructing a VM out of containers in an automated way. So what are the relative merits, pros, cons etc of using the vagrant/puppet/all-in-one-VM approach as opposed to the containerised approach in an educational context? Or indeed, for different education contexts (eg trad-uni computer lab with VMs running in student desktops (if that’s allowed!); distance education student working on their home machine with a locally hosted VM; multiple students connecting to VM configurations hosted on a local cluster, or on AWS/Google Cloud etc?

Any comments – please add them below… I am sooooooo out of my depth in all this!

PS Seems I can connect with con = psycopg2.connect(host='POSTGRES',port='5432',user='postgres', password="PGPass")

Check other environments with:

import os

Written by Tony Hirst

January 14, 2015 at 5:03 pm

Posted in OU2.0, Tinkering

Tagged with , , , ,

Dockerising Open Data Databases – First Fumblings

leave a comment »

A new year, and I’m trying to reclaim some joy, fun, tech mojo by getting back into the swing of appropriating stuff and seeing what we can do with it. My current universal screwdriver is Docker, so I spent an hour or two this yesterday afternoon and a large chunk of today poking around various repos looking for interesting things to make use of.

The goal I set myself was to find a piece of the datawrangling jigsaw puzzle that would let me easily load a previously exported SQL database into MySQL running in a container, the idea being we should be able to write a single command that points to a directory containing some exported SQL stuff, and then load that data straight into a container in which you can start querying it . (There’s possibly (probably?) a well known way of doing this for the folk who know how to do it, but what do I know….?!;-)

The solution I settled with was a set of python scripts published by Luis Elizondo / luiselizondo [about] that wrap some docker commands to produce a commandline tool for firing up and populating MySQL instances, as well as connecting to them: docker-mysql-scripts.

The data I thought I’d play with is the 2013 UK MOT results database, partly because I found a database config file (that I think may originally have been put together by Ian Hopkinson). My own archive copy of the relevant scripts can be found here.

#Scripts: https://gist.github.com/psychemedia/86980a9e86a6de87a23b
#Create a container mot2013 with MySQL pwd: mot
#/host/path/to/ should contain:
## mot_boot.sql #Define the database tables and populate them
## test_result_2013.txt  #download and uncompress: http://data.dft.gov.uk/anonymised-mot-test/12-03/test_result_2013.txt.gz 
## test_item_2013.txt  #download and uncompress: http://data.dft.gov.uk/anonymised-mot-test/12-03/test_item_2013.txt.gz 
## mdr_test_lookup_tables/ #download and unzip: http://data.dft.gov.uk/anonymised-mot-test/mdr_test_lookup_tables.zip

#Create a container 'mot2013' with MYSQL password: mot
dmysql-server mot2013 mot

#Create a database: motdata
dmysql-create-database mot2013 motdata

#Populate the database using the mot_boot.sql script
dmysql-import-database mot2013 /host/path/to/mot_boot.sql --database motdata

I can then login in to the database with the commsnd dmysql mot2013, connect to the appropriate database from the MySQL client with the SQL command USE motdata; and then run queries on the 2013 MOT data.

The scripts also allow the database contents to be managed via a separate data container:

#Create a new container called: test with an external data container
dmysql-server --with-volume test test
#This creates a second container - TEST_DATA - that manages the database files
#The name is given by: upper($CONTAINERNAME)+'_DATA'

#Create a dummy db
dmysql-create-database test testdb

#We can connect to the database as before
#dmysql test
mysql> exit

#Delete this container, removing any attached volumes with the -v flag
docker rm -v -f test

#Create a new container connected to the original TEST_DATA container
docker run --volumes-from TEST_DATA --name test4 -e MYSQL_ROOT_PASSWORD="test" -d mysql

#Connect to this new container
dmysql test4
#We should see the testdb database there...

#NOTE - I think that only one database server container can be connected to the TEST_DATA container at a time

So far, so confusing… Here’s a brief description of the current state of my understanding/confusion:

What really threw me is that the database container (either the original mot2013 container or the test database with the external data container) don’t appear to store the data inside the container itself. (So for example, the TEST_DATA container does not contain the database.) Instead, the data appears to be contained in a “hidden” volume that is mounted outside the container. I came a cropper with this originally by deleting containers using commands of the form docker rm [CONTAINER_NAME] and then finding that the docker VM was running out of memory. This deletes the container, but leaves a mounted volume (that is associated with the deleted container name) hanging around. To remove those volumes automatically, containers should be removed with commands of the form docker rm -v [CONTAINER_NAME]. What makes things difficult to tidy up is that the mounted volumes can’t be seen using the normal docker ps or docker ps -a commands; instead you need to install docker-volumes to identify them and delete them. (There’s a possible fix that shows how to store the data inside the container, rather than in an externally mounted volume – I think?! – linked to from this post, but I haven’t tried it because I was trying to use root Dockerfile images.)

The use of the external data volumes also means we can’t easily bundle up a data container using docker commit and then hope to create new containers from it (such a model would allow you to spend an hour or two loading a large-ish data set into a database, then push a data container containing that db to dockerhub; other users could then pull down that image, create a container from it and immediately attach it to a MySQL container without having to go through the pain of building the database; this would provide a nifty way of sharing ready-to-query datasets such as the MOT database. You could just pull a data image, mount it as the data volume/container, and get started with running queries).

On the other hand, it is possible to mount a volume inside a container by running the container with a -v flag and specifying the mount point (docker volumes). Luis Elizondo’s scripts allow you to set-up these data volume containers by running dmysql-server with the –with-volume flag as shown in the code snippet above, but at the moment I can’t see how to connect to a pre-existing data container? (See issue here.)

So given that we can mount volumes inside a linked to data container, does this mean we have a route to portable data containers that could be shared by a docker datahub? It seems not… because as it was, I wasted quite a bit more time learning the fact that docker data Container volumes can’t be saved as images! (To make a data container portable, I think we’d need to commit it as an image, share the image, then create a container from that image? Or do I misunderstand this aspect of the docker workflow too…?!)

That said, there does look to be a workaround in the form of flocker. However, I’ve fed up with this whole thing for a bit now… What I hoped would be a quick demo of: get data in docker MySQL container; package container and put image on dockerhub; pull down image, create container and start using data immediately turned into a saga of realising quite how much I don’t understand docker, what it does and how it does it.

I hate computers…. :-(

Written by Tony Hirst

January 13, 2015 at 6:39 pm

Posted in Tinkering

Tagged with

Exporting Markdown and XML From Google Docs

leave a comment »

Just over a year ago, we started production of a new OU course using Google docs as the medium within which we’d share draft course materials. This was something of an experiment to see whether the online social document medium encouraged sharing and discussion of ideas, resources, ongoing feedback and comment of the work in progress amongst the course team, rather than, or at least, in addition to, the traditional handover of significant chunks of the course at set handover dates. (In case you’re wondering, it didn’t…)

The question we’re now faced with is, how do we get the content that’s in Google docs into the next stage of the OU document workflow.

The actual HTML based course materials that appear in the VLE (or the ebook versions of them, etc) are generated automatically from an “OU Structured Content” XML document. The XML documents are prepared using the <oXygen/> XML editor, extended with an OU structured content framework that includes the requisite DTDs/schema files, hooks for rendering, previewing and publishing into the OU environment and so on. (Details about the available tags can be found here: tag guide [OU internal link].)

Whilst the preferred authoring route is presumably that authors use the <oXygen/> editor from the start, the guidance also suggests that many authors use Word (and an appropriate OU style sheet) and then copy and paste content over into the XML editor, at which point some amount of tidying and retagging may be required.

As Google docs doesn’t seem to support the addition of custom style elements or tags (users are limited to customising the visual style of provided style elements), we need to find another way of getting content from Google docs and into <oXygen/>. One approach would be to grab a copy of the Google doc into Microsoft Word, apply an OU template to mark up the content using appropriate custom style elements, and then copy the content over to the XML editor, upon which point it will probably need further tidying.

Another approach is to try to export the data as an XML document. Looking around, I found a Google Apps script script (sic) that allows you to export the content in a Google doc as markdown. Whilst markdown documents don’t have the same tree-like document structure as XML, it did provide an example of how to parse a Google docs document. My first attempt at a script to export a Google doc in the OU XML format can be found here: export Google doc as OU SC-XML. (Note: the script is subject to change; now I have a basic operational/functional spec, I can start to try to tidy up the code and try to parse out more structure…)

Having got a minimal exporter working, the question now arises as to where effort needs to be spent next. The exporter produces a minimal form of OU-XML that sometimes validates and (in early testing) sometimes doesn’t. (If the script is working properly it should produce output that always validate as XML; then it should produce output that always validate as OU-XML). Should time be spent improving the script to produce better XML, or can we live with the fact that the exporter gets the document some way in to <oXygen/>, but further work will be required to fix a few validation breaks?

Another issue that arises is how rich a form of OU-XML we try to export. When working in Microsoft Word environment, a document style can be defined using elements that map onto the elements in OU-XML. When working in Google docs, we need to define a convention that the parser can respond to.

At the moment, the parser is sensitive to:

  • HEADING1: treated as Session;
  • HEADING2: treated as Section;
  • HEADING3: treated as SubSection;
  • LIST_ITEM: numbered and unnumbered lists are treated as BulletedList; sublists to depth 1 are supported as BulletedSubsidiaryList
  • coloured text: treated as AuthorComment; at the moment this may incorrectly grab title elements as such;
  • INLINE_IMAGE: images are rendered as relatively referenced Figure elements with an empty Description element. A copy of the image is locally stored. (Note: INLINE_DRAWING elements are unsupported – there’s no way of exporting them; maybe I should export an empty Figure with a Description saying there’s a missing INLINE_DRAWING?)
  • TABLE: rendered as Table with empty TableHead;
  • bold, italic, LinkUrl: rendered as b, i and a tags respectively;
  • font.COURIER_NEW: rendered as ComputerCode.
  • By convention, we should be able to detect and parse things like activities, exercises and SAQs. Some mechanism needs to be supported for identifying the block elements in such cases. For example, one convention might take the form:

    Exercise N



    The ^Exercise and ^End$ elements denote the block; heading style (eg HEADING4 for the ^Exercise (and perhaps HEADING5 for the ^End$?) could further aid detection?

    Another approach would be to use horizontal lines to denote the start and stop of a block. For example:


    My Answer

    where represents a horizontal line and denotes the start and end of a block. Again, heading styles within the block could either identify or reinforce a particular block element type.

    Rendering a preview of the OU-XML as it would appear in the OU VLE is possible by uploading the OU-XML file, or a zip file containing it and related assets, to an OU URL that sits behind an OU authentication care. The <oXygen/> editor handles previews by using the default web client – your default browser – to post a selected XML document to the appropriate OU upload/preview URL. Unfortunately, the functions that allow http POST operations from Google app script run on Google servers, which means that we can’t just create a button in Google docs that would post and XML export of the current Google doc to the authentication-required OU URL. (I don’t know if this would be possible in an OU/Google apps domain?).

    I’m not sure if a workaround would be to launch a preview window in the browser from Google docs containing a copy of the OU XML version of a document, highlighting the XML, then using a bookmarklet to post the highlighted XML to the OU preview service URL within the browser context using a browser where the user has logged in to the OU web domain? Alternatively, could a Chrome application access both content from Google Drive and then post to the authenticated OU preview URL using browser permissions? (That is, can a Chrome app access both Google Drive using a user’s Google permissions, or machine access permissions, and then post content grabbed from that source to the OU URL using permissions granted to the browser?) As ever, my ignorance about browser security policies on the one hand, and the Google Apps/Chrome apps security model on the other, make it hard to know what workarounds might be possible.

    If any members of OU staff would like to try out the exporter, please get in touch for hints, or let me know how you get on:-) In addition, if any members of OU staff are using Google docs for course production, I’d love to know how you’re using them and how you’re getting on:-)

    PS via @mahawksey, I see that we can associate metadata with a doc using PropertiesService.getDocumentProperties(). Could be handy for adding things like course code, author metadata, publishing route, template etc to a doc. I’m not sure if we can also associate metadata with a folder, though I guess we could also include a file in a folder that contains metadata relating to files held more generally within the same folder?

Written by Tony Hirst

December 4, 2014 at 10:27 pm

Posted in OU2.0, Tinkering

Tagged with

Edtech and IPython Notebooks – Activities and Answer Reveals

A few months ago I posted about an interaction style that I’d been using – and that stopped working – in IPython notebooks: answer reveals.

An issue I raised on the relevant git account account turned up a solution that I’ve finally gotten round to trying out – and extending with a little bit of styling. I’ve also reused chunks from another extension (read only code cells) to help style other sorts of cell.

Before showing where I’m at with the notebooks, here’s where OU online course materials are at the moment.

Teaching text is delivered via the VLE (have a look on OpenLearn for lots of examples). Activities are distinguished from “reading” text by use of a coloured background.


The activity requires a student to do something, and then a hidden discussion or anser can be revealed so that the student can check their answer.


(It’s easy to be lazy, of course, and just click the button without really participating in the activity. In print materials, a frictional overhead was added by having answers in the back of the study guide that you would have to turn to. I often wonder whether we need a bit more friction in the browser based material, perhaps a time based one where the button can’t be clicked for x seconds after the button is first seen in the viewport (eg triggered using a test like this jQuery isOnScreen plugin)?!)

I drew on this design style to support the following UI in an IPython notebook:

Here’s a markdown cell in activity style and activity answer style.


The lighter blue background set into the activity is an invitation for students to type something into those cells. The code cell is identified as such by the code line In [ ] label. Whatever students type into those cells can be executed.

The heading is styled from a div element:


If we wanted to a slightly different header background style as in the browser materials, we could perhaps select a notebook heading style and then colour the background differently right across the width of the page. (Bah.. should have thought of that earlier!;-)

Markdown cells can also be styled to prompt students to make a text response (i.e. a response written in markdown, though I guess we could also use raw text cells). I don’t yet have a feeling for how much ownership students will take of notebooks and start to treat them as workbooks?

Answer reveal buttons can also be added in:


Clicking on the Answer button displays the answer.


At the moment, the answer background is the same colour as the invitation for student’s to type something, although I guess we could also interpret as part of the tutor-alongside dialogue, and the lighter signifies expected dialogic responses whether from the student or the “tutor” (i.e. the notebook author).

We might also want to make use of answer buttons after a code completion activity. I haven’t figured out the best way to do this as of yet.


At the moment, the answer button only reveals text – and even then the text needs to be styled as HTML (the markdown parsing doesn’t work:-(


I guess one approach might be to spawn a new code cell containing some code written in to the answer button div. Another might be to populate a code cell following the answer button with the answer code, hiding the cell and disabling it (so it can’t be executed/run), then revealing it when the answer button is clicked? I’m also not sure whether answer code should be runnable or not?

The mechanic for setting the cell state is currently a little clunky. There are two extensions, one for the answer button, one for setting the state other cells, that use different techniques for addressing the cells (and that really need to be rationalised). The extensions run styling automatically when the notebook is started, or when triggered. At the moment, I’m using icons from the orgininal code I “borrowed” – which aren’t ideal!


The cell state setter button toggles selected code cells from activity to not-activity states, and markdown cells from activity-description to activity-student-answer to not-activity. The answer button button adds an answer button at every answer div (even if there’s already an answer button rendered). Both extensions style/annotate restarted notebooks correctly.

The current, hacky, user model I have in mind is that authors have an extended notebook with buttons to set the cell styles, and students have an extended notebook without buttons that just styles the notebook when it’s opened.

FWIW, here’s the gist containing extensions code.

Comments/thoughts appreciated…

Written by Tony Hirst

October 8, 2014 at 6:51 pm

Posted in OU2.0, Tinkering

Tagged with ,

Pivot Tables, pandas and IPython Notebooks

For the last few months, I’ve found a home in IPython Notebooks for dabbling with data. IPython notebooks provide a flexible authoring tool for combining text with executable code fragments, as well as the outputs from executing code, such as charts, data tables or automatically generated text reports. We can also embed additional HTML5 content into a notebook, either inline or as an iframe.

With a little judicious use of templates, we can easily take data from a source we are working with in the notebook, and then render a view of it using included components. This makes it easy to use hybrid approaches to working with data in the notebook context. (Note: the use of cell magics also let’s us operate on a data set using different languages in the same notebook – for example, Python and R.)

As an example of a hybrid approach to exploratory data analysis, how about the following?

The data manipulation library I’ve spent most of my time in to date in the notebooks is pandas. pandas is a really powerful tool for wrangling tabular data shapes, including reshaping them and running group reports on them. Among the operations pandas supports are pivot tables. But writing the code can be fiddly, and sometimes you just want an interactive hands on play with the data. IPython Notebooks do support widgets (though I haven’t played with them yet), so I guess I could try to write a simple UI for running different pivot table views over dataset in an interactive fashion.

But if I’m happy with reading the numbers the pivot table table reports as an end product, and don’t need access to the report as data, I can use a third party interactive pivot table widget such as Nicolas Kutchen’s Pivot Table component to work with the data in an interactive fashion.


I’ve popped a quick demo of a clunky hacky way of feeding a pivot table widget from a pandas dataframe here: pivot table in IPython Notebook demo. A pandas dataframe is written as an HTML table and embedded in a templated page that generates the pivot table from the the HTML table. This page is saved as an HTML file and then loaded in as an IFrame. (We could add the HTML to an iframe using srcdoc, rather than saving it as a file and loading it back in, but I thought it might be handy to have access to a copy of the file. Also, I’m not sure if all browsers support srcdoc?)

(Note: we could also use the pivot table widget with a subset of a large dataset to generate dummy reports to find the ones we want, and test pandas code against the same subset of data against that output to check the code is generating the table we want, and then deploy the code against the full dataset.)

The pivot table has the data in memory as a hidden HTML table in the pivot table page, so performance may be limited for large datasets. On my machine, it was quite happy working with a month’s spending/transparency data from my local council, and provided a quick way of generating different summary views over the data. (The df dataframe is simply the result of loading in the spending data CSV file as a pandas dataframe – no other processing required. So the pivotTable() function could easily be modified to accept the location of a CSV file, such as a local address or a URL, load the file in automatically into a dataframe, and then render it as a pivot table.)


There’s also limited functionality for tunneling down into the data by filtering it within the chart (reather than having to generate a filtered view of the data that is then baked in as a table to the chart HTML page, for example):


I’ve been dabbling with various other embedded charts too which I’ll post when I get a chance.

Written by Tony Hirst

October 8, 2014 at 9:30 am

Posted in Tinkering

Tagged with

Lazyweb Request – Node-RED & F1 Timing Data

A lazyweb request, because I’m rushing for a boat, going to be away from reliable network connections for getting on for a week, and would like to be able to play from a running start when I get back next week…

In context of the Tata/F1 timing data competition, I’d like to be able to have a play with the data in Node-RED. A feed-based, flow/pipes like environment, Node-RED’s been on my “should play with” list for some time, and this provides a good opportunity.

The data as provided looks something like:

<transaction identifier="101" messagecount="121593" timestamp="14:57:10.878"><data column="23" row="1" colour="PURPLE" value="31.6"/></transaction>
<transaction identifier="103" messagecount="940109" timestamp="14:57:11.219"><data column="2" row="1" colour="YELLOW" value="1:41:13" clock="true"/></transaction>
<transaction identifier="101" messagecount="121600" timestamp="14:57:11.681"><data column="2" row="3" colour="WHITE" value="77"/></transaction>
<transaction identifier="101" messagecount="121601" timestamp="14:57:11.681"><data column="3" row="3" colour="WHITE" value="V. BOTTAS"/></transaction>
<transaction identifier="101" messagecount="121602" timestamp="14:57:11.681"><data column="4" row="3" colour="YELLOW" value="17.7"/></transaction>
<transaction identifier="101" messagecount="121603" timestamp="14:57:11.681"><data column="5" row="3" colour="YELLOW" value="14.6"/></transaction>
<transaction identifier="101" messagecount="121604" timestamp="14:57:11.681"><data column="6" row="3" colour="WHITE" value="1:33.201"/></transaction>
<transaction identifier="101" messagecount="121605" timestamp="14:57:11.686"><data column="9" row="3" colour="YELLOW" value="35.4"/></transaction>


as a text file. (In the wild, it would be a real time data feed over http or https.)

What I’d like as a crib to work from is a Node-RED demo that has:

1) a file reader that reads the data in from the data file and plays it in as a stream in “real time” according to the timestamps, given a dummy start time;

2) an example of handling state – eg keeping track of drivernumber. (The row is effectively race position, Looking at column 2 (driverNumber), we can see what position a driver is in. Keep track of (row,driverNumber) pairs and if a driver changes position, flag it along with what the previous position was);

3) an example of appending the result to a flat file – for example, building up a list of statements “Driver number x has moved from position M to position N” over time.

Shouldn’t be that hard, right? And it would provide a good starting point for other people to be able to have a play without hassling over how to do the input/output bits?

Written by Tony Hirst

July 4, 2014 at 6:15 am

Posted in Tinkering

Tagged with , ,


Get every new post delivered to your Inbox.

Join 1,184 other followers