Category: OBU

DON’T PANIC – The Latest OU/Hans Rosling Co-Pro Has (Almost) Arrived…

To tie in with the UN’s Sustainable Development Goals summit later this week, the OU teamed up with production company Wingspan Productions and data raconteur Professor Hans Rosling to produce a second “Don’t Panic” lecture performance that airs on BBC2 at 8pm tonight: Don’t Panic – How to End Poverty in 15 Years.

Here’s a trailer…

…and here’s the complementary OpenLearn site: Don’t Panic – How to End Poverty in 15 Years (OpenLearn).

If you saw the previous outing – DON’T PANIC — The Facts About Population – it takes a similar format, once again using the Musion projection system to render “holographic” data visualisations that Hans tells his stories around.

(I did try to suggest that a spinning 3d chart could be quite compelling up on the big screen to illustrate different country trajectories over time, but was told it’d probably be too complicated graphic for the audience to understand!;-)

Off the back of the previous co-production, the OU commissioned a series of video shorts featuring Hans Rosling that review several ways in which we can make sense of global development using data, and statistics:

One idea for making use of these videos was to incorporate them into a full open course on FutureLearn, but for a variety of (internal?) reasons, that idea was canned. However, some of the material I’d started sketching for that possibility have finally made the light of day. They appear as a series of OpenLearn lessons relating to the first three short films listed above, with the videos cut into bite size fragments and interspersed throughout a narrative text and embedded interactive data charts:

You might also pick up on some of the activity possibilities that are included too…

Note that those lessons are not quite presented as originally handed over… I was half hoping OpenLearn might have a go at displaying them as “scrollytelling” immersive stories as something of experiment, but that appears not to be the case (maybe I should have actually published the stories first?! Hmmm…!). Anyway, here’s what I original drafted, using Storybuilder:

If you have any comments on the charts, or feedback on the immersive story presentation (did it work for you, or did you find it irritating?), please let me know via the comments below.

PS if you are interested in doing a FutureLearn course with an development data feel, at least in part, check out OUr forthcoming FutureLearn MOOC, Learn to Code for Data Analysis. Using interactive IPython notebooks, you’ll learn how to start wrangling and visualising open datasets (including weather data, World Bank indicators data, and UN Comtrade import and export data) using the Python programming language and the pandas data wrangling python package.

The Luddites on the BBC…

The history of the Luddites fascinates me and it’s something I hope to properly immerse myself in one day…

The story is something I keep meaning to pitch as an OU/BBC co-pro, though there are already other BBC Radio 4 vehicles that would make a good home for the story:

  • In Our Time comes to mind in the first instance, and would provide an opportunity to review the overtones of revolution and the clampdown on secret societies that were prevalent at the time and which presumably coloured the state response that turned Huddersfield into a garrison town.
  • The Long View might take another tack, providing a look at the nature of innovation and the human response to it in a time of price hikes: economic factors had a role to play in fomenting civil unrest, as a hike in wheat prices made the daily bread unaffordable, particularly for those skilled workers whose trade was being replaced by mechanisation.

A recent BBC documentary featuring Huddersfield’s Simon Armitage* on The Pendle Witch Child used animation to nice effect as a way of dramatising that story from just over the border, and could perhaps also work as a way of retelling particular elements of the Luddite tale on television. Armitage’s probation officer background that perhaps stood him in good stead for appreciating the social context of the Pendle witch trials, and this could again be brought to bear when considering the 1812 rebellion: the York trials that resulted saw 17 men being hanged.

* who was born in Marsden. Perfick. It couldn’t be much more of a local story to him!

In the meantine, here’s a round-up stub for BBC content on the topic of the Luddites… Please let me know of anything I’m missing…

Generating d3js Motion Charts from rCharts

Remember Gapminder, the animated motion chart popularised by Hans Rosling in his TED Talks and Joy of Stats TV programme? Well it’s back on TV this week in Don’t Panic – The Truth About Population, a compelling piece of OU/BBC co-produced stats theatre featuring Hans Rosling, and a Pepper’s Ghost illusion brought into the digital age courtesy of the Musion projection system:

Whilst considering what materials we could use to support the programme, we started looking for ways to make use of the Gapminder visualisation tool that makes several appearances in the show. Unfortunately, neither Gapminder (requires Java?), nor the Google motion chart equivalent of it (requires Flash?), appear to work with a certain popular brand of tablet that is widely used as a second screen device…

Looking around the web, I noticed that that Mike Bostock had produced a version of the motion chart using d3.js: The Wealth & Health of Nations. Hmmm…

Playing with that rendering on a tablet, I had a few problems when trying to highlight individual countries – the interaction interfered with an invisible date slider control – but a quick shout out to my OU colleague Pete Mitton resulted in a tweaked version of the UI with the date control moved to the side. I also added a tweak to allow specified countries to be highlighted. You can find an example here (source).

Looking at how the data was pulled into the chart, it seems to be quite a convoluted form of JSON. After banging my head against a wall for a bit, a question on Stack Overflow about how to wrangle the data from something that looked like this:

Country Region  Year    V1  V2
AAAA    XXXX    2001    12  13
BBBB    YYYY    2001    14  15
AAAA    XXXX    2002    36  56
AAAA    XXXX    1999    45  67

to something that looked like this:

  {"Country": "AAAA",
    "V1": [ [1999,45], [2001,12] , [2002,36] ],
    "V2":[ [1999,67], [2001,13] , [2002,56] ]
  {"Country": "BBBB",
   "V1":[ [2001,14] ],
   "V2":[ [2001,15] ]

resulted in a handy function from Ramnath Vaidyanathan that fitted the bill.

One of the reasons that I wanted to use R for the data transformation step, rather than something like Python, was that I was keen to try to get a version of the motion charts working with the rCharts library. Such is the way of the world, Ramnath is the maintainer of rCharts, and with his encouragement I had a go at getting the motion chart to work with that library, heavily cribbing from @timelyportfolio’s rCharts Extra – d3 Horizon Conversion tutorial on getting things to work with rCharts along the way.

For what it’s worth, my version of the code is posted here: rCharts_motionchart.

I put together a couple of demo’s that seem to work, including the one shown below that pulls data from the World Bank indicators API and then chucks it onto a motion chart…

UPDATE: I’ve made things a bit easier compared to the original recipe included in this post… we can now generate fertility/GDP/population motion chart for a range of specified countries using data pulled directly from the World Bank development indicators API with just the following two lines of R code:'GB','US','ES','BD'))

It’s not so hard to extend the code to pull in other datasets, either…

Anyway, here’s the rest of the original post… Remember, it’s easier now;-) [Code: See example/demo1.R]

To start with, here are a couple of helper functions:


#A handy helper function for getting country data - this doesn't appear in the WDI package?
getWorldBankCountries <- function(){
  wbCountries <- fromJSON("")
  wbCountries <- data.frame(t(sapply(wbCountries[[2]], unlist)))
  wbCountries$longitude <- as.numeric(wbCountries$longitude)
  wbCountries$latitude <- as.numeric(wbCountries$latitude)
  levels(wbCountries$region.value) <- gsub("\\(all income levels\\)", "", levels(wbCountries$region.value))

pluck_ = function (element){
  function(x) x[[element]]

#' Zip two vectors
zip_ <- function(..., names = F){
  x = list(...)
  y = lapply(seq_along(x[[1]]), function(i) lapply(x, pluck_(i)))
  if (names) names(y) = seq_along(y)

#' Sort a vector based on elements at a given position
sort_ <- function(v, i = 1){
  v[sort(sapply(v, '[[', i), index.return = T)$ix]


This next bit still needs some refactoring, and a bit of work to get it into a general form:

#I chose to have a go at putting all the motion chart parameters into a list



##This bit needs refactoring - grab some data; the year range is pulled from the motion chart config;
##It would probably make sense to pull countries and indicators etc into the params list too?
##That way, we can start to make this block a more general function?


data <- WDI(indicator=c('SP.DYN.TFRT.IN','SP.POP.TOTL','NY.GDP.PCAP.CD'),start = params$start, end = params$end,country=c("BD",'GB'))


#Another bit of Ramnath's magic -
dat2 <- dlply(data, .(Country, Region), function(d){
    Country = d$Country[1],
    Region = d$Region[1],
    Fertility = sort_(zip_(d$Year, d$Fertility)),
    GDP = sort_(zip_(d$Year, d$GDP)),
    Population=sort_(zip_(d$Year, d$Population))

#cat(rjson::toJSON(setNames(dat2, NULL)))

To minimise the amount of motion chart configuration, can we start to set limits based on the data values?

#This really needs refactoring/simplifying/tidying/generalising
#I'm not sure how good the range finding heuristics I'm using are, either?!
  if (!('ymin' %in% names(params))) params$ymin= signif(min(0.9*data[[params$y]]),3)
  if (!('ymax' %in% names(params))) params$ymax= signif(max(1.1*data[[params$y]]),3)
  if (!('xmin' %in% names(params))) params$xmin= signif(min(0.9*data[[params$x]]),3)
  if (!('xmax' %in% names(params))) params$xmax= signif(max(1.1*data[[params$x]]),3)
  if (!('rmin' %in% names(params))) params$rmin= signif(min(0.9*data[[params$radius]]),3)
  if (!('rmax' %in% names(params))) params$rmax= signif(max(1.1*data[[params$radius]]),3)


This is the function that generates the rChart:


#We can probably tidy the way that the parameters are mapped...
#I wasn't sure whether to try to maintain the separation between params and rChart$params?
rChart.generator=function(params, h=400,w=800){
  rChart <- rCharts$new()
  rChart$setTemplate(script = "../motionchart/layouts/motionchart_Demo.html")

   yearMin= params$start,

 rChart$set( data= rjson::toJSON(setNames(dat2, NULL)) )



Aside from tidying – and documenting/commenting – the code, the next thing on my to do list is to see whether I can bundle this up in a Shiny app. I made a start sketching a possible UI, but I’ve run out of time to do much more for a day or two… (I was also thinking of country checkboxes for either pulling in just that country data, or highlighting those countries.)


  headerPanel("Motion Chart demo"),
    selectInput(inputId = 'x',
                label = "X",
                choices = items,
                selected = 'Fertility'),
    selectInput(inputId = 'y',
                label = "Y",
                choices = items,
                selected = 'GDP'),
    selectInput(inputId = 'r',
                label = "Radius",
                choices = items,
                selected = 'Population')
    #The next line throws an error (a library is expected? But I don't want to use one?)

As ever, we’ve quite possibly run out of time on getting much up on the OpenLearn website by Thursday to support the programme as it airs, which is partly why I’m putting this code out now… If you manage to do anything with it that would allow folk to dynamically explore a range of development indicators over the next day or two (especially GDP, fertility, mortality, average income, income distributions (this would require different visualisations?)), we may be able to give it a plug from OpenLearn, and maybe via any tweetalong campaign that’s running as the programme airs…

If you do come up with anything, please let me know via the comments, or twitter (@psychemedia)…

TV Courses

A couple of weeks ago I spotted a BBC news article announcing that a University launches online course with TV show:

In what is being claimed as the biggest ever experiment in “edutainment”, a US television company is forming a partnership with a top-ranking Californian university to produce online courses linked to a hit TV show.

This blurring of the digital boundaries between academic study and the entertainment industry will see a course being launched next month based on the post-apocalypse drama series the Walking Dead.

Television shows might have spin-off video games or merchandising, but this drama about surviving disaster and social collapse is now going to have its own university course.

The OU has supplemented courses with material from TV broadcasts for several decades, and has also wrapped factual programming with OU courses. We’ve even commissioned drama pieces that have been woven into OU courses. But something about wrapping Hollywood hype also seemed familiar… and then I remembered Hollywood Science. But it’s not available on iPlayer, unfortunately, and I don’t think it went to DVD either…which makes this all something of a non-post!

Recent Robotics Reviews on OpenLearn…

A few years ago I worked on an OU robotics course ambitiously titled “Robotics and the Meaning of Life” (the working title had been “Joy, Fun, Robotics”), elements of which have been woven into a new OU course Technologies in practice (hmm, thinks – would folk be interested in a course on data in practice?)

As well as providing a general introduction to robotics technology, the course reviewed a range of social, political and ethical issues that might impact on a society in which mobile, intelligent, autonomous machines were part of our everyday experience. As part of our current co-pro series of the BBC World Service Click radio programme, we’ve been exploring some of the issues associated with recent developments in robotic vehicles. This has also provided an opportunity for me to start scouting around some of the emerging laws that are being considered with a view to regulating the operation -and behaviour – of autonomous intelligent robots. So here’s a quick round up of some of the related articles that I’ve recently posted to OpenLearn…

  • A dark future for warehousing? – robots are playing an increasingly important role in the logistics industry, with robot workers increasingly finding a role in warehouses. This post reviews several different ways in which robots can work with – and instead of – human workers in today’s modern warehouses.
  • Robot cars, part 1: Parking the future for now – the DARPA robot vehicle challenges demonstrated how autonomous robot vehicles could cope with off-road and urban driving conditions, leading in part to the development of things like the Google autonomous car that is currently being tested on public roads in several US states. Whilst the mass availability of such vehicles is still only a remote possibility for a variety of reasons (from cost and safety issues, to legal and ethical considerations), autonomous driving in certain limited situations is now possible.In this post, we look at one such situation, disliked by many a driver – parking – and see how our cars may soon be managing that aspect of driving on our behalf in the near future.
  • Robot cars, part 2: Convoys of the near future – along with the fiddliness of parking, the monotony of stop-start traffic jams and convoy style motorway driving provide another environment in which autopilot systems may be able to improve not only the driving experience, but also road safety. In this post, I review some recent demonstrations in autonomous driver support systems suited to these particular road conditions.
  • Naughty robot: Where’s your human operator? – a wealth of regulations at international, national and even regional (state) level cover the operation of our public highways and public airspace. But when the robots start taking control of their own actions and decision-making in these spaces, do we need further regulation to limit the behaviour of robots as distinct from humans? And when it comes to allowing autonomous robots to bear arms, is that a situation we are comfortable with? In this post, I review some of the emerging laws that are developing around not only the testing and use of autonomous robot cars on our public highways, but also in consideration of autonomous flying vehicles – drones – in both domestic and military settings. in part, this sets up the question – will there be one law for humans and other for robots?

Hear the latest episode of Click radio here: #BBCClickRadio, or keep track of the OU supported special editions via OpenLearn: OU on the BBC: Click – A Route 66 of the future

Twitter Audience Profiling – OU/BBC Feynman Challenger Co-Pro

Another strong piece of TV commissioning via the Open University Open Media Unit (OMU) aired this week in the guide of The Challenger, a drama documentary telling the tale of Richard Feynman’s role in the accident enquiry around the space shuttle Challenger disaster. (OMU also produced an ethical game if you want to try you own hand out at leading an ethics investigation.)

Running a quick search for tweets containing the terms feynman challenger to generate a list of names of Twitter users commenting around the programme, I grabbed a sample of their friends (max 197 per person) and then plotted the commonly followed accounts within that sample.


If you treat this image as a map, you can see regions where the accounts are (broadly) related by topic or interest category. What regions can you see?! (For more on this technique, see Communities and Connections: Social Interest Mapping.)

I also ran a search for tweets containing bbc2 challenger:


Let’s peek in to some of the regions…”Space” related twitter accounts for example:


Or news media:


(from which we might conclude that the audience was also a Radio 4 audience?!;-)

How about a search on bbc2 feynman?


Again, we see distinct regions. As with the other maps, the programme audience also seems to have an interest in following popular science writers:


Interesting? Possibly – the maps provide a quick profile of the audience, and maybe confirm its the sort of audience we might have expected. Notable perhaps are the prominence of Brian Cox and Dara O’Briain, who’ve also featured heavily in BBC science programming. Around the edges, we also see what sorts of comedy or entertainment talent appear to the audience – no surprises to see David Mitchell, Charlton Brooker and Aianucci in there, though I wouldn’t necessarily have factored in Eddie Izzard (though we’d need to look at “proper” baseline interest levels of general audiences to see whether any of these comedians are over-represented in these samples compared to commonly followed folk in a “random” sample of UK TV watchers on Twitter. The patterns of following may be “generally true” rather than highlighting folk atypically followed by this audience.)

Useful? Who knows…?!

(I have PDF versions of the full plots if anyone wants copies…)

Local News Templates – A Business Opportunity for Data Journalists?

As well as serendipity, I believe in confluence

A headline in the Press Gazette declares that Trinity Mirror will be roll[ing] out five templates across 130-plus regional newspapers as emphasis moves to digital. Apparently, this follows a similar initiative by Johnston Press midway through last year: Johnston to roll out five templates for network of titles.

It seems that “key” to the Trinity Mirror initiative is the creation of a new “Shared Content Unit” based in Liverpool that will provide features content to Trinity’s papers across the UK [which] will produce material across the regional portfolio in print and online including travel, fashion, food, films, books and “other content areas that do not require a wholly local flavour”.

[Update – 25/3/13: Trinity Mirror to create digital data journalism unit to produce content for online and printed titles]

In my local rag last week, (the Isle of Wight County Press), a front page story on the Island’s gambling habit localised a national report by the Campaign for Fairer Gambling on Fixed Odds Betting Terminals. The report included a dataset (“To find the stats for your area download the spreadsheet here and click on the arrow in column E to search for your MP”) that I’m guessing (I haven’t checked…) provided some of the numerical facts in the story. (The Guardian Datastore also republished the data (£5bn gambled on Britain’s poorest high streets: see the data) with an additional column relating to “claimant count”, presumably the number of unemployment benefit claimants in each area (again, I haven’t checked…)) Localisation appeared in several senses:

IWCP gambling

So for example, the number of local betting shops and Fixed Odds betting terminals was identified, the mooted spend across those and the spend per head of population. Sensemaking of the figures was also applied by relating the spend to an equivalent number of NHS procedures or police jobs. (Things like the BBC Dimensions How Big Really provide one way of coming up with equivalent or corresponding quantities, at least in geographical area terms. (There is also a “How Many Really” visualisation for comparing populations.) Any other services out there like this? Maybe it’s possible to craft Wolfram Alpha queries to do this?)

Something else I spotted, via RBloggers, a post by Alex Singleton of the University of Liverpool: an Open Atlas around the 2011 Census for England and Wales, who has “been busy writing (and then running – around 4 days!) a set of R code that would map every Key Statistics variable for all local authority districts”. The result is a set of PDF docs for each Local Authority district mapping out each indicator. As well as publishing the separate PDFs, Alex has made the code available.

So what’s confluential about those?

The IWCP article localises the Fairer Gambling data in several ways:
– the extent of the “problem” in the local area, in terms of numbers of betting shops and terminals;
– a consideration of what the spend equates to on a per capita basis (the report might also have used a population of over 18s to work out the average “per adult islander”); note that there are also at least a couple of significant problems with calculating per capita averages in this example: first, the Island is a holiday destination, and the population swings over the summer months; secondly, do holidaymakers spend differently to residents on this machines?
– a corresponding quantity explanation that recasts the numbers into an equivalent spend on matters with relevant local interest.

The Census Atlas takes one recipe and uses it to create localised reports for each LA district. (I’m guessing with a quick tweak,separate reports could be generated for the different areas within a single Local Authority).

Trinity Mirror’s “Shared Content Unit” will produce content “that do[es] not require a wholly local flavour”, presumably syndicating it to its relevant outlets. But it’s not hard to also imagine a “Localisable Content” unit that develops applications that can help produced localised variants of “templated” stories produced centrally. This needn’t be quite as automated as the line taken by computational story generation outfits such as Narrative Science (for example, Can the Computers at Narrative Science Replace Paid Writers? or Can an Algorithm Write a Better News Story Than a Human Reporter?) but instead could produce a story outline or shell that can be localised.

A shorter term approach might be to centrally produce data driven applications that can be used to generate charts, for example, relevant to a locale in an appropriate style. So for example, using my current tool of choice for generating charts, R, we could generate something and then allow local press to grab data relevant to them and generate a chart in an appropriate style (for example, Style your R charts like the Economist, Tableau … or XKCD). This approach saves duplication of effort in getting the data, cleaning it, building basic analysis and chart tools around it, and so on, whilst allowing for local customisation in the data views presented. With the increasing number of workflows available around R, (for example, RPubs, knitr, github, and a new phase for the lab notebook, Create elegant, interactive presentations from R with Slidify, [Wordpress] Bloggin’ from R).

Using R frameworks such as Shiny, we can quickly build applications such as my example NHS Winter Sitrep data viewer (about) that explores how users may be able to generate chart reports at Trust or Strategic Health Authority level, and (if required) download data sets related to those areas alone for further analysis. The data is scraped and cleaned once, “centrally”, and common analyses and charts coded once, “centrally”, and can then be used to generate items at a local level.

The next step would be to create scripted story templates that allow journalists to pull in charts and data as required, and then add local colour – quotes from local representatives, corresponding quantities that are somehow meaningful. (I should try to build an example app from the Fairer Gaming data, maybe, and pick up on the Guardian idea of also adding in additional columns…again, something where the work can be done centrally, looking for meaningful datasets and combining it with the original data set.)

Business opportunities also arise outside media groups. For example, a similar service idea could be used to provide story templates – and pull-down local data – to hyperlocal blogs. Or a ‘data journalism wire service’ could develop applications either to aid in the creation of data supported stories on a particular topic. PR companies could do a similar thing (for example, appifying the Fairer Gambling data as I “appified” the NHS Winter sitrep data, maybe adding in data such as the actual location of fixed odds betting terminals. (On my to do list is packaging up the recently announced UCAS 2013 entries data.)).

The insight here is not to produce interactive data apps (aka “news applications”) for “readers” who have no idea how to use them or what read from them whatever stories they might tell; rather, the production of interactive applications for generating charts and data views that can be used by a “data” journalist. Rather than having a local journalist working with a local team of developers and designers to get a data flavoured story out, a central team produces a single application that local journalists can use to create a localised version of a particular story that has local meaning but at national scale.

Note that by concentrating specialisms in a central team, there may also be the opportunity to then start exploring the algorithmic annotation of local data records. It is worth noting that Narrative Science are already engaged in this sort activity too, as for example described in this ProPublica article on How To Edit 52,000 Stories at Once, a news application that includes “short narrative descriptions of almost all of the more than 52,000 schools in our database, generated algorithmically by Narrative Science”.

PS Hmm… I wonder… is there time to get a proposal together on this sort of idea for the Carnegie Trust Neighbourhood News Competition? Get in touch if you’re interested…