Category: Anything you want

Programmer Employability – Would HE Prepare You for a Guardian Developer Job Interview?

A post on the Guardian developer blog – The Guardian’s new pairing exercises – describes part of the recruitment process used by the Guardian when appointing new developers: candidates are paired with a Guardian developer and set a an exercise that assesses “how they approach solving a problem and structure code”.

Originally, all candidates were set the same exercise, but to try to reduce fatigue (in the sense of a loss of interest, enthusiasm or engagement) with the same task by the Guardian developers engaged in the pairing activity, a wider range of exercises were developed that the the paired developer can choose from when it’s their turn to work with a candidate.

The exercises used in the process are publicly available – github: guardian/pairing-tests.

Candidates can prepare for the test if they wish but as there are so many tests it is possible to be given an exercise not seen before. It also gives an idea of the skills the Guardian is looking for, the problems do not test knowledge on method names of a language of choice but instead focus on solving a problem in manageable parts over an hour.

One example is to implement a set of rules as per Conway’s Game of Life:

Requirement Cards:

  • A cell can be made “alive”
  • A cell can be “killed”
  • A cell with fewer than two live neighbours dies of under-population
  • A cell with 2 or 3 live neighbours lives on to the next generation
  • A cell with more than 3 live neighbours dies of overcrowding
  • An empty cell with exactly 3 live neighbours “comes to life”
  • The board should wrap

Other’s include the parsing and preliminary analysis of an election results dataset, or a set of controls for a simple robot.

So this got me wondering about a few things…:

  • how does this sort of activity design compare with the sort of assessment activity we give to students as part of a course?
  • how does the assessment of the exercise – in terms of what the recruiter learns about the candidate’s problem solving, programming/coding and interpersonal skills – compare with the assessment design and marking guides we use in HE, and by extension, the sort of assessment we train students up for?
  • how comfortable would a recent graduate be taking part in a paired exercise?

It also made me think again how unemployable I am!;-)

In passing, I also note that if you take a peek behind the Guardian homepage, they’re still running their developer ad there:


But how many graduates would think to look? (I seem to remember that sort of ad made me laugh out loud the first time I saw one whilst rooting through someone’s web page trying to figure out how they’d done something or other…)

(Hmmm… thinks: could that make the basis of an exercise – generate an ASCII-art text banner for an arbitrary phrase? Or emoji?! How would *I* go about doing that (other than justing installing a python package that already does it?! And if I did come up with an exercise and put in a pull request, or made a contribution to one of their other projects on github, would it land me an interview?!;-)

The Rise of Transparent Data Journalism – The BuzzFeed Tennis Match Fixing Data Analysis Notebook

The news today was lead in part by a story broken by the BBC and BuzzFeed News – The Tennis Racket – about match fixing in Grand Slam tennis tournaments. (The BBC contribution seems to have been done under the ever listenable File on Four: Tennis: Game, Set and Fix?)

One interesting feature of this story was that “BuzzFeed News began its investigation after devising an algorithm to analyse gambling on professional tennis matches over the past seven years”, backing up evidence from leaked documents with “an original analysis of the betting activity on 26,000 matches”. (See also: How BuzzFeed News Used Betting Data To Investigate Match-Fixing In Tennis, and an open access academic paper that inspired it: Rodenberg, R. & Feustel, E.D. (2014), Forensic Sports Analytics: Detecting and Predicting Match-Fixing in Tennis, The Journal of Prediction Markets, 8(1).)

Feature detecting algorithms such as this (where the feature is an unusual betting pattern) are likely to play an increasing role in the discovery of stories from data, step 2 in the model described in this recent Tow Center for Digital Journalism Guide to Automated Journalism:]


See also: Notes on Robot Churnalism, Part I – Robot Writers

Another interesting aspect of the story behind the story was the way in which BuzzFeed News opened up the analysis they had applied to the data. You can find it described on Github – Methodology and Code: Detecting Match-Fixing Patterns In Tennis – along with the data and a Jupyter notebook that includes the code used to perform the analysis: Data and Analysis: Detecting Match-Fixing Patterns In Tennis.


You can even run the notebook to replicate the analysis yourself, either by downloading it and running it using your own Jupyter notebook server, or by using the online mybinder service: run the tennis analysis yourself on

(I’m not sure if the BuzzFeed or BBC folk tried to do any deeper analysis, for example poking into point summary data as captured by the Tennis Match Charting Project? See also this Teniis Visuals project that makes use of the MCP data. Tennis etting data is also collected here: If you’re into the idea of analysing tennis stats, this book is one way in: Analyzing Wimbledon: The Power Of Statistics.)

So what are these notebooks anyway? They’re magic, that’s what!:-)

The Jupyter project is an evolution of an earlier IPython (interactive Python) project that included a browser based notebook style interface for allowing users to write and execute code, as well as seeing the result of executing the code, a line at a time, all in the context of a “narrative” text document. The Jupyter project funding proposal describes it thus:

[T]he core problem we are trying to solve is the collaborative creation of reproducible computational narratives that can be used across a wide range of audiences and contexts.

[C]omputation in science is ultimately in service of a result that needs to be woven into the bigger narrative of the questions under study: that result will be part of a paper, will support or contest a theory, will advance our understanding of a domain. And those insights are communicated in papers, books and lectures: narratives of various formats.

The problem the Jupyter project tackles is precisely this intersection: creating tools to support in the best possible ways the computational workflow of scientific inquiry, and providing the environment to create the proper narrative around that central act of computation. We refer to this as Literate Computing, in contrast to Knuth’s concept of Literate Programming, where the emphasis is on narrating algorithms and programs. In a Literate Computing environment, the author weaves human language with live code and the results of the code, and it is the combination of all that produces a computational narrative.

At the heart of the entire Jupyter architecture lies the idea of interactive computing: humans executing small pieces of code in various programming languages, and immediately seeing the results of their computation. Interactive computing is central to data science because scientific problems benefit from an exploratory process where the results of each computation inform the next step and guide the formation of insights about the problem at hand. In this Interactive Computing focus area, we will create new tools and abstractions that improve the reproducibility of interactive computations and widen their usage in different contexts and audiences.

The Jupyter notebooks include two types of interactive cell – editable text cells into which you can write simple markdown and HTML text that will be rendered as text; and code cells into which you can write executable code. Once executed, the results of that execution are displayed as cell output. Note that the output from a cell may be text, a datatable, a chart, or even an interactive map.

One of the nice things about the Jupyter notebook project is that the executable cells are connected via the Jupyter server to a programming kernel that executes the code. An increasing number of kernels are supported (e.g. for R, Javascript and Java as well as Python) so once you hook in to the Jupyter ecosystem you can use the same interface for a wide variety of computing tasks.

There are multiple ways of running Jupyter notebooks, including the mybinder approach described above, – I describe several of them in the post Seven Ways of Running IPython Notebooks.

As well as having an important role to play in reproducible data journalism and reproducible (scientific) research, notebooks are also a powerful, and expressive, medium for teaching and learning. For example, we’re just about to star using Jupyter notebooks, delivered via a virtual machine, for the new OU course Data management and analysis.

We also used them in the FutureLearn course Learn to Code for Data Analysis, showing how code could be used a line at a time to analyse a variety of opendata sets from sources such as the World Bank Indicators database and the UN Comtrade (import /export data) database.

PS for sports data fans, here’s a list of data sources I started to compile a year or so ago: Sports Data and R – Scope for a Thematic (Rather than Task) View? (Living Post).

Another Notebook-UI Cloud Environment – WolframCloud

If Wolfram didn’t make it so difficult to gain free and open access to their tools, I’d probably use them – and blog about them – a lot more. But as the impression I always seem to come away with is that it’ll probably cost me, or others, a not insignificant amount to use the tools regularly, or I’ll find myself using a proprietary (not open) system, I’m loath to invest much time in their products and even less likely to write about it and do their advertising for them.

But that’s not to say that they aren’t doing some interesting and probably useful stuff, or that my impressions are correct. A few weeks ago, for example, I came across the Wolfram Cloud that provides online access to Wolfram Mathematica Notebooks:


Access is free, once you’ve given them an email address and a password, and allows you to create up to 5 notebooks and create 30 day “cloud deployments” (Mathematica objects that can be accessed via a URL – very nifty tool for quick interactive web publishing, but it just all feels too closed for me…:-(

And too commercially driven…


So what’s actually involved with the collaboration capabilities, I wonder? “As an instructor, share editable Explorations and notebooks with your students, creating an interactive learning environment.” Does that mean collaborative editing, so a tutor and a student can edit the same notebook at the same time and follow each other’s work? No idea – I can’t find a way to try it and if they want to make it that hard for me to evaluate their offering I won’t bother…

By the by, it’s also worth noting that having bought in to a plan, you then need to start watching how you spend credit within that plan…


I guess I want my computing to be in principle free and in principle open… and it seems to me that there is nothing in principle free (or open?) about anything to do with Mathematica… (I always get the feeling I should be grateful they’re letting me try something out…)

An approach – and all round offering – I personally find far more compelling is SageMathCloud (review).

See also: IBM DataScientistWorkBench = OpenRefine + RStudio + Jupyter Notebooks in the Cloud, Via Your Browser and Course Management and Collaborative Jupyter Notebooks via SageMathCloud.

Data as Justification?

Way back when, I spent a couple of years doing research on intelligent software agents, which included a chunk of time looking at formal agent logics.

One of the mantras of a particular flavour of epistemic logic we relied on comes in the form of the following definition: “knowledge is justified true belief”.

This unpacks as follows: you know something if you believe it AND your belief is true AND your belief is justified. So for example, I toss a coin and you believe it falls heads up. It is heads-up but I haven’t shown you that, so you don’t know it, even though you believe it and that belief is a true belief (the thing you believe – that the coin is heads up – is true). However, if I show you the coin is heads-up, you now know it because your belief is justified as well as being true. (I seem to recall it gets a bit more fiddly as you introduce time into this explicitly too…)

When we start to look at what data can do for us, one of those things is to provide justification for our beliefs. Hans Rosling’s ever amusing ignorance tests demonstrate why we sometimes need our beliefs challenging and his data rich presentations (such as the OU co-produced Don’t Panic shows on BBC2) use data to either confirm our beliefs – reinforcing our knowledge – or show them to be false beliefs (that is, beliefs we have, but that don’t correspond to the state of the world, i.e. beliefs that are untrue).

As well as acting as justification for a belief, data can also create beliefs. But even if the data is true, we still need to take care that any beliefs we generate from the data are justified.

For example, you may or may not find this sentence confusing – more than half of UK wage earners earner less than the average salary. If you think of an average as a mean value, then a quick example easily demonstrates this: four office workers are sat in a bar with “average”-ish incomes, and in walks Mark Zuckerberg. Add the respective incomes together and divide by five. How many people in the bar now have an income higher than that average (mean) value?

However, if you regard an average in terms of the median – mid-point – value, then one person will have the median income and, assuming the original four had slightly different incomes, two will have an income below it, and two will have an income above it.

So when your data point is an average, even if it is correctly calculated (i.e. the data is true), you need to take care what sort of belief you take away from it… Because even if you correctly identify which average is being talked about, you may still come away with a false belief about how the values are distributed. (Not all distributions are, erm, normal…)

And it goes without saying that you also need to be critical of the data itself. Because it may or may not be true…

Impressions from Data Sketches

By chance, I spotted a tweet this evening from @owenboswarva pointing to a #DVLA data release: number of licence holders with penalty points, broken down by postcode district [link] | #FOI #opendata.

A quick search turned up some DfT driving license open data that includes a couple of postcode district related datasets – one giving a count of number of license holders with a particular number of points in each district, one breaking out the number of full and provisional licence holders by gender in each district. The metadata in the sheets also suggests that the datasets are monthly releases, but that doesn’t seem to be reflected by what’s on the site.


I haven’t done any data sketches for a bit, so I thought I’d have a quick play with the data to see whether any of the Isle of Wight postcode areas seemed have a noticeably higher percentage rate of points holders than other bits of the island, dividing the number of license holders with points by the total number of license holders in each postcode district…


If you’re not familiar with Isle of Wight postal geography, Wikipedia identifies the postcode districts as follows:


So Seaview and Yarmouth, then, are places to watch out for (places that tend to have older, well to do populations…)

I then wondered how the number of license holders might compare based on population estimates. From a quick search, I could only find population estimates for postcode districts based on 2011 census figures, which are five years out of date now.


The main thing that jumped out at me was that for the number of license holders to exceed the population means there must have been a noticeable population increase in the area… The really high percentages perhaps also suggest that those areas have older populations (something we could check from the population demographics from the last census). Secondly, we might note that the proportions rank differently to the first table, though again Yarmouth and Seaview head the leaderboard. This got me thinking that there are perhaps age effects making a difference here. This is something we could start to explore in a little more detail using a couple of the other DfT tables, one that describes the number of licenses issued by gender and age, and another that counts the number of points carrying a particular number of points by age and gender. (These two tables are at a national level, though, rather than broken out by postcode district.)

I guess I should really have coloured the numbers using a choropleth map, or using a new trick I learned earlier this week, displayed the numbers on labels located at the postcode district centroid…


(The map shows Land Registry House Prices Paid data for November, 2015.)

Maybe another day…

Regulating Autonomous Vehicles: Land, Sea and Air…

Whilst the phrase “autonomous vehicle” is quite possibly meaningless to many people, the notion of driverless cars seems to be rapidly leaving the world of science fiction (was it ever even a thing in science fiction?!), even to the extent of a RoboRace driverless support series being announced for the Formula-E (electric single seater race cars) championship in 2016-17.

Whilst putting together a general interest talk on The Current Future of Robotics for an OU residential school last year, it struck me that tracking the development of regulation around a technology might be a useful way of signalling those technologies likely to make some sort of impact over the next 3-10 years: if a technology is going to become widespread in the physical world, there are likely to be policy considerations as well as Health and Safety guidance notes, if not regulations, associated with it. (As an aside, it might be worth considering what sorts of regulatory protections are associated with the roll out of widespread digital technologies…)

So for example, towards the end of last year, the California Department of Motor Vehicles released draft regulations that “require a third-party testing organization to conduct a vehicle test to provide an independent performance verification of the vehicle. That is in addition to the manufacturer being required to certify the robot car meets safety and performance standards” (New DMV Robot Car Rules Prioritize Safety; Follow Consumer Watchdog’s Call To Require Steering Wheel And Pedals; Privacy & Cybersecurity Also Addressed.). You can see the regulations here: Autonomous Vehicles in California: Deployment of Autonomous Vehicles for Public Operation. Interestingly, accidents involving autonomous vehicles must be reported, and are listed here: Report of Traffic Accident Involving an Autonomous Vehicle (OL 316).

In the UK, a light touch is being applied to regulation around autonomous vehicles in the hope of driving (?!) related research activity. See, for example, the guidance that appeared last year (July, 2015) in the form of the DfT Automated vehicle technologies testing: code of practice. (A historical review is also available in the form of the Driverless cars in the UK: a regulatory review policy paper from February 2015.)

Associated work also reviewed the development of autonomous vehicles in a more general sense – that is, not just focussing on “robot cars”. For example, the Innovate UK network’s Robotics and Autonomous Systems Special Interest Group’s RAS 2020 Robotics and Autonomous Systems Roadmap [PDF] (July, 2014) identifies autonomous aerospace and marine robots, as well as autonomous road transportation systems, as areas worthy of development.

So when it comes to thinking about autonomous vehicles, what else is going on apart from robot cars and where might we be able to pick up regulatory or legislative signals that might enable – or announce – their widespread arrival?

In the aerospace domain, drones are being hooked up to ever more autonomous control systems, but when it comes to thinking of “autonomous vehicles” I think we should separate the levels of autonomy associated with how the system operates as a transportation system (e.g. in terms of navigation and control of the vehicle purely as a transportation system) and the operation of other systems mounted on the transportation system (for example, weapons systems). There are undoubtedly considerable concerns associated with the use of military drones, (for example, as reviewed in the University of Birmingham’s Policy Commission report of October, 2014, on The Security Impact of Drones [PDF] or the House of Commons Library briefing paper of Sept/Oct 2015 providing an [o]verview of military drones used by the UK Armed Forces [PDF]; see also the POST Briefing on Civilian Drones from October, 2014) but many of these relate to concerns about the uses to which drones might be put as platforms (for example, mounting surveillance systems or lethal weapons systems).

More generally, then, when it comes to considering the aerospace potential of autonomous vehicles, we might look to reviews produced by industrial programmes such as ASTRAEA (Autonomous Systems Technology Related Airborne Evaluation & Assessment), representing as they do a broader set of concerns, such as this review of the challenges facing the development of unmanned aerial vehicles, or this ASTRAEA progress report presentation from December 2014 reviewing the process steps associated with “[e]nabling the routine use of Unmanned Aircraft Systems (UAS) in all classes of airspace without the need for restrictive or special conditions of operation”.

Cars and drones, then… so far, so familiar. But in the marine world, too, there is much interest in the development of Maritime Autonomous Systems (MAS) (“smart ships”) (for an introduction, see this blog post from a policy advisor at the UK Chamber of Shipping). In terms of regulation, the UK Marine Industries Alliance: Maritime Autonomous Systems Group was set up to explore how a regulatory framework may extend to the marine environment; for example, The Development of a UK Regulatory Framework for Marine Autonomous Systems [PDF] and this Maritime Autonomous Systems (Surface) MAS(S) Industry Code of Practice.

As well as tracking regulatory activity, and any lobbying associated with it (for example, from folk like KPMG and their March 2015 report on Connected and Autonomous Vehicles – The UK Economic Opportunity [PDF], or this RAND Autonomous Vehicle Technology – Guide for policy makers from 2014), it probably also makes sense to see how interested parties might also be seeking to protect their own interests in the form of any insurance they may seek to take out, and how the insurance industry responds to such approaches. So for example, this 2014 report from Lloyd’s on Autonomous vehicles – Handing Over Control: opportunities and risks for insurance [PDF] is perhaps a useful place to start?

Pondering New Ways of Programming Lego EV3 Mindstorms Bricks

We’re due to update our first level residential school day long robotics activity for next year, moving away from the trusty yellow RCX Lego Mindstorms bricks that have served us well for getting on a decade or so, I guess, and up to the EV3 Mindstorms bricks.

Students programmed the old kit via a brilliant user interface developed by my colleague Jon Rosewell, originally for the “Robotics and the Meaning of Life” short course, but soon quickly adopted for residential school, day school, and school-school (sic) outreach activities.


The left hand side contained a palette of textual commands that could be dragged onto the central canvas in order to create a tree-like programme. Commands could be dragged and relocated within the tree. Commands taking variable values had the values set by selecting the appropriate line of code and then using the dialogue in the bottom left corner to set the value.

The programme could be downloaded and executed on the RCX brick, or used to control the behaviour of the simple simulated robot in the right hand panel. A brilliant, brilliant piece of educational software. (A key aspect of this is how easy it is to support in a lab or classroom with a dozen student groups who need help debugging their programmes. The UI is clear, easily seen over the shoulder, and fixes to buggy code can typically be easily be fixed with a word or two of explanation. The text-but-not metaphor reduces typos (it’s really a drag and drop UI but with text blocks rather than graphical blocks) as well as producing pretty readable code.

For the new residential school, we’ve been trying to identify what makes sense software wise. The default Lego software is based on Labview, but I think it looks a bit toylike (which isn’t necessarily a problem) but IMHO could be hard to help debug in a residential school setting, which probably is an issue. “Real” LabView can also be used to program the bricks (I think), but again the complexity of the software, and similar issues in quick-fire debugging, are potential blockers. Various third party alternatives to the Lego software are possible: LeJOS, a version of Java that’s been running on Mindstorms bricks for what feels like forever is one possibility; ev3dev is another, a Linux distribution for the brick that lets you run things like Python, and the python-ev3 python package is another. You can also run an IPython notebook from the brick – that is, the IPython notebook server runs on the brick and you can then access the notebook via a browser running on a machine with a network connection to the brick…

So as needs must (?!;-), I spent a few hours today setting up an EV3 with ev3dev, python-ev3 and an IPython notebook server. Following along the provided instructions, everything seemed to work okay with a USB connection to my Mac, including getting the notebooks to auto-run on boot, but I couldn’t seem to get an ssh or http connection with a bluetooth connection. I didn’t have a nano-wifi dongle either, so I couldn’t try a wifi connection.

The notebooks seem to be rather slow when running code cells, although responsiveness when I connected to the brick via an ssh terminal from my mac seemed pretty good for running command line commands at least. Code popped into an executable, shebanged python file can be run from the brick itself simply by selecting the file from the on-board file browser, so immediately a couple of possible workflows are possible:

  • programme the brick via an IPython notebook running on the brick, executing code a cell at a time to help debug it;
  • write the code somewhere, pop it into a text file, copy it onto the brick and then run it from the brick;

It should also be possible to export the code from a notebook into an executable file that could be run from the on-brick file browser.

Another option might be to run IPython on the brick, accessed from an ssh terminal, to support interactive development a line at a time:


This seems to be pretty quick/responsive, and offers features such as autocomplete prompts, though perhaps not as elegantly as the IPython notebooks manage.

However, the residential school activities require students to write complete programmes, so the REPL model of the interactive IPython interpreter is perhaps not the best environment?

Thinking more imaginatively about setting, if we had wifi working, and with a notebook server running on the brick, I could imagine programming and interacting with the brick from an IPython notebook accessed via a browser on an tablet (assuming it’s easy enough to get network connections working over wifi?) This could be really attractive for opening up how we manage the room for the activity, because it would mean we could get away from the computer lab/computer workstation model for each group and have a far more relaxed lab setting. The current model has two elbow height demonstration tables about 6′ x 3’6 around which students gather for mildly competitive “show and tell” sessions, so having tablets rather than workstations for the programming could encourage working directly around the tables as well?

That the tablet model might be interesting to explore originally came to me when I stumbled across the Microsoft Touch Develop environment, which provides a simple programming environment with a keyboard reminiscent of that of a ZX Spectrum with single keyboard keys inserting complete text commands.


Sigh… those were the days…:


Unfortunately there doesn’t seem to be an EV3 language pack for Touch Develop:-(

However, there does appear to be some activity around developing a Python editor for use in Touch Develop, albeit just a straightforward text editor:

As you may have noticed, this seems to have been developed for use with the BBC Microbit, which will be running MicroPython, a version of Python3 purpose built for microcontrollers (/via The Story of MicroPython on the BBC micro:bit).

It’s maybe worth noting that TouchDevelop is accessed via a browser and can be run in the cloud or locally (touchdevelop local).

We’re currently also looking for a simple Python programming environment for a new level 1 course, and I wonder if something of this ilk might be appropriate for that…?

Finally, another robot related ecosystem that crossed my path this week, this time via @Downes – the Poppy Project, which proudly declares itself as “an open-source platform for the creation, use and sharing of interactive 3D printed robots”. Programming is via pypot, a python library that also works with the (also new to me) V-REP virtual robot experimentation platform, a commercial environment though it does seem to have a free educational license. (The poppy project folk also seem keen on IPython notebooks, auto-running them from the Raspberry Pi boards used to control the poppy project robots, not least as a way of sharing tutorials.)

I half-wondered if this might be relevant for yet another new course, this time at level 2, on electronics – though it will also include some robotics elements, including access (hopefully) to real robots via a remote lab. These will be offered as part of the OU’s OpenSTEM Lab which I think will be complementing the current, and already impressive, OpenScience Lab with remotely accessed engineering experiments and demonstrations.

Let’s just hope we can get a virtual computing lab opened too!

PS some notes to self about using the ev3dev:

  • for IP xx.xx.xx.xx, connect with: ssh root@xx.xx.xx.xx and password r00tme
  • notebook startup with permission 755 in: /etc/init.d/ipev3
    ipython notebook --no-browser --notebook-dir=/home --ip=* --port=8889 &

    update-rc.d ipev3 defaults then update-rc.d ipev3 disable (also: enable | start | stop | remove (this don’t work with current init.d file – need proper upstart script?)
  • look up connection file: eg in notebook %connect info and from a local copy of the json file and appropriate IP address xx.xx.xx.xx ipython qtconsole --existing ~/Downloads/ev3dev.json --ssh root@xx.xx.xx.xx with password r00tme
  • alternatively, on the brick, find the location of the connection file, first via the profile ipython locate profile and then inside e.g. ls -al /root/.ipython/profile_default/security to find it and view it.

See also: