Trying to find useful things to do with emerging technologies in open education and data journalism. Snarky and sweary to anyone who emails to offer me content for the site.
It’s too nice a day to be inside hacking around with Parliament data as a remote participant in today’s Parliamentary hack weekend (resource list), but if it had been a wet weekend I may have toyed with one of the following:
– revisiting this cleaning script for Analysing UK Lobbying Data Using OpenRefine (actually, a look at who finds/offers support for All Party Groups. The idea was to get a dataset of people who provide secretariat and funds to APPGs, as well as who works for them, and then do something with that dataset…)
– tinkering with data from Question Time and Any Questions…
Given panellists (the BBC could be more helpful here in the way it structures its data…), see if we can identify parliamentarians (MP suffix? Lord/Lady title?) and look them up using the new-to-me, not-yet-played-with-it UK Parliament – Members’ Names Data Platform API. Not sure if reconciliation works on parliamentarian lookup (indeed, not sure if there is a reconciliation service anywhere for looking up MPs, members of the House of Lords, etc?)
From Members’ Names API, we can get things like gender, constituency, whether or not they were holding a (shadow) cabinet post, maybe whether they were on a particular committee at the time etc. From programme pages, we may be able to get the location of the programme recording. So this opens up possibility of mapping geo-coverage of Question Time/Any Questions, both in terms of where the programmes visit as well as which constituencies are represented on them.
If we were feeling playful, we could also have a stab at which APPGs have representation on those programmes!
It also suggests a simpler hack – of just providing a briefing around the representatives appearing on a particular episode in terms of their current (or at the time) parliamentary status (committees, cabinet positions, APPGs etc etc)?
Remember Gapminder, the animated motion chart popularised by Hans Rosling in his TED Talks and Joy of Stats TV programme? Well it’s back on TV this week in Don’t Panic – The Truth About Population, a compelling piece of OU/BBC co-produced stats theatre featuring Hans Rosling, and a Pepper’s Ghost illusion brought into the digital age courtesy of the Musion projection system:
Whilst considering what materials we could use to support the programme, we started looking for ways to make use of the Gapminder visualisation tool that makes several appearances in the show. Unfortunately, neither Gapminder (requires Java?), nor the Google motion chart equivalent of it (requires Flash?), appear to work with a certain popular brand of tablet that is widely used as a second screen device…
Looking around the web, I noticed that that Mike Bostock had produced a version of the motion chart using d3.js: The Wealth & Health of Nations. Hmmm…
Playing with that rendering on a tablet, I had a few problems when trying to highlight individual countries – the interaction interfered with an invisible date slider control – but a quick shout out to my OU colleague Pete Mitton resulted in a tweaked version of the UI with the date control moved to the side. I also added a tweak to allow specified countries to be highlighted. You can find an example here (source).
Looking at how the data was pulled into the chart, it seems to be quite a convoluted form of JSON. After banging my head against a wall for a bit, a question on Stack Overflow about how to wrangle the data from something that looked like this:
Country Region Year V1 V2
AAAA XXXX 2001 12 13
BBBB YYYY 2001 14 15
AAAA XXXX 2002 36 56
AAAA XXXX 1999 45 67
One of the reasons that I wanted to use R for the data transformation step, rather than something like Python, was that I was keen to try to get a version of the motion charts working with the rCharts library. Such is the way of the world, Ramnath is the maintainer of rCharts, and with his encouragement I had a go at getting the motion chart to work with that library, heavily cribbing from @timelyportfolio’s rCharts Extra – d3 Horizon Conversion tutorial on getting things to work with rCharts along the way.
For what it’s worth, my version of the code is posted here: rCharts_motionchart.
I put together a couple of demo’s that seem to work, including the one shown below that pulls data from the World Bank indicators API and then chucks it onto a motion chart…
UPDATE: I’ve made things a bit easier compared to the original recipe included in this post… we can now generate fertility/GDP/population motion chart for a range of specified countries using data pulled directly from the World Bank development indicators API with just the following two lines of R code:
To start with, here are a couple of helper functions:
require('WDI')
#A handy helper function for getting country data - this doesn't appear in the WDI package?
#---- https://code.google.com/p/google-motion-charts-with-r/source/browse/trunk/demo/WorldBank.R?r=286
getWorldBankCountries <- function(){
require(RJSONIO)
wbCountries <- fromJSON("http://api.worldbank.org/country?per_page=300&format=json")
wbCountries <- data.frame(t(sapply(wbCountries[[2]], unlist)))
wbCountries$longitude <- as.numeric(wbCountries$longitude)
wbCountries$latitude <- as.numeric(wbCountries$latitude)
levels(wbCountries$region.value) <- gsub("\\(all income levels\\)", "", levels(wbCountries$region.value))
return(wbCountries)
}
#----http://stackoverflow.com/a/19729235/454773
pluck_ = function (element){
function(x) x[[element]]
}
#' Zip two vectors
zip_ <- function(..., names = F){
x = list(...)
y = lapply(seq_along(x[[1]]), function(i) lapply(x, pluck_(i)))
if (names) names(y) = seq_along(y)
return(y)
}
#' Sort a vector based on elements at a given position
sort_ <- function(v, i = 1){
v[sort(sapply(v, '[[', i), index.return = T)$ix]
}
library(plyr)
This next bit still needs some refactoring, and a bit of work to get it into a general form:
#I chose to have a go at putting all the motion chart parameters into a list
params=list(
start=1950,
end=2010,
x='Fertility',
y='GDP',
radius='Population',
color='Region',
key='Country',
yscale='log',
xscale='linear',
rmin=0,
xmin=0
)
##This bit needs refactoring - grab some data; the year range is pulled from the motion chart config;
##It would probably make sense to pull countries and indicators etc into the params list too?
##That way, we can start to make this block a more general function?
tmp=getWorldBankCountries()[,c('iso2Code','region.value')]
names(tmp)=c('iso2Code','Region')
data <- WDI(indicator=c('SP.DYN.TFRT.IN','SP.POP.TOTL','NY.GDP.PCAP.CD'),start = params$start, end = params$end,country=c("BD",'GB'))
names(data)=c('iso2Code','Country','Year','Fertility','Population','GDP')
data=merge(data,tmp,by='iso2Code')
#Another bit of Ramnath's magic - http://stackoverflow.com/a/19729235/454773
dat2 <- dlply(data, .(Country, Region), function(d){
list(
Country = d$Country[1],
Region = d$Region[1],
Fertility = sort_(zip_(d$Year, d$Fertility)),
GDP = sort_(zip_(d$Year, d$GDP)),
Population=sort_(zip_(d$Year, d$Population))
)
})
#cat(rjson::toJSON(setNames(dat2, NULL)))
To minimise the amount of motion chart configuration, can we start to set limits based on the data values?
#This really needs refactoring/simplifying/tidying/generalising
#I'm not sure how good the range finding heuristics I'm using are, either?!
paramsTidy=function(params){
if (!('ymin' %in% names(params))) params$ymin= signif(min(0.9*data[[params$y]]),3)
if (!('ymax' %in% names(params))) params$ymax= signif(max(1.1*data[[params$y]]),3)
if (!('xmin' %in% names(params))) params$xmin= signif(min(0.9*data[[params$x]]),3)
if (!('xmax' %in% names(params))) params$xmax= signif(max(1.1*data[[params$x]]),3)
if (!('rmin' %in% names(params))) params$rmin= signif(min(0.9*data[[params$radius]]),3)
if (!('rmax' %in% names(params))) params$rmax= signif(max(1.1*data[[params$radius]]),3)
params
}
params=paramsTidy(params)
This is the function that generates the rChart:
require(rCharts)
#We can probably tidy the way that the parameters are mapped...
#I wasn't sure whether to try to maintain the separation between params and rChart$params?
rChart.generator=function(params, h=400,w=800){
rChart <- rCharts$new()
rChart$setLib('../motionchart')
rChart$setTemplate(script = "../motionchart/layouts/motionchart_Demo.html")
rChart$set(
countryHighlights='',
yearMin= params$start,
yearMax=params$end,
x=params$x,
y=params$y,
radius=params$radius,
color=params$color,
key=params$key,
ymin=params$ymin,
ymax=params$ymax,
xmin=params$xmin,
xmax=params$xmax,
rmin=params$rmin,
rmax=params$rmax,
xlabel=params$x,
ylabel=params$y,
yscale=params$yscale,
xscale=params$xscale,
width=w,
height=h
)
rChart$set( data= rjson::toJSON(setNames(dat2, NULL)) )
rChart
}
rChart.generator(params,w=1000,h=600)
Aside from tidying – and documenting/commenting – the code, the next thing on my to do list is to see whether I can bundle this up in a Shiny app. I made a start sketching a possible UI, but I’ve run out of time to do much more for a day or two… (I was also thinking of country checkboxes for either pulling in just that country data, or highlighting those countries.)
items=c("Fertility","GDP","Population")
names(items)=items
shinyUI(pageWithSidebar(
headerPanel("Motion Chart demo"),
sidebarPanel(
selectInput(inputId = 'x',
label = "X",
choices = items,
selected = 'Fertility'),
selectInput(inputId = 'y',
label = "Y",
choices = items,
selected = 'GDP'),
selectInput(inputId = 'r',
label = "Radius",
choices = items,
selected = 'Population')
),
mainPanel(
#The next line throws an error (a library is expected? But I don't want to use one?)
showOutput("motionChart",'')
)
))
As ever, we’ve quite possibly run out of time on getting much up on the OpenLearn website by Thursday to support the programme as it airs, which is partly why I’m putting this code out now… If you manage to do anything with it that would allow folk to dynamically explore a range of development indicators over the next day or two (especially GDP, fertility, mortality, average income, income distributions (this would require different visualisations?)), we may be able to give it a plug from OpenLearn, and maybe via any tweetalong campaign that’s running as the programme airs…
If you do come up with anything, please let me know via the comments, or twitter (@psychemedia)…
Many of you will know that the OU co-produces several BBC television programmes, including Coast and The Money Programme, as well as a wide range of one off series.
If you want to keep up-to-date with OU/BBC programmes, you can now watch BBC/OU programmes on their own dedicated DeliTV channel: just bookmark http://pipes.yahoo.com/ouseful/bbcouiplayer to your DeliTV collection:-)
If you interested in the technical details of how this channel was put together, read on…
What I originally hoped to do was make use of an earlier hack that underpinned Recent OU Programmes on the BBC, via iPlayer (also available on iPhone: iPhone 7 Day OU Programme CatchUp, via BBC iPlayer). Unfortunately the pipework behind those applications has broken (note to self: repair them… – DONE:-) becuase they relied on using a search of the BBC website, a search that now appears to be broken in Yahoo pipes (something to do with a robots.txt exclusion:-(
So it was time for a rethink…
My source of recent OU/BBC programmes is the @open2 twitter feed, which gives the title of the programme and the channel:
So what I needed was to find a way of getting the iPlayer programme IDs for these programmes. My first thought was to take each programme title from the @open2 feed, and search twitter with the name using the from:iplayer_bbcone search limit. But the @player_bbcone feed doesn’t seem to be complete, so I ruled that out…
Digging around the iPlayer site, I found a list of feeds containing content by channel currently on iPlayer (I think? God only knows how this’ll scale if they start to do much longer than 7 day catch-up….?!) – BBC iPlayer feeds
[DOH! Something just jumped out at me there… have you seen it yet…? Important post to follow after this one…:-)]
So I created a pipe (BBC TV – Current Programmes on iPlayer) that pulled together the BBC TV feeds, and allowed you to “search” them by title (i.e. search by filtering…;-):
One thing I noticed in one of the @open2 tweets was a capitalisation error, which would fail to match in titles in the filter, so I used a regular expression to remove the effects of capitalisation from the filter stage. (I found the trick from a quick search of the Pipes forums,in a reply by @hapdaniel: replace the grabbed text with the \L prefix (i.e. I used \L$1 as the replacement text to convert everyhting in the $1 string to lower case. \U works for upper (\l replaces applies to the first char, as does \u).)
I could then run the titles from the @open2 feed through the BBC programmes pipe to grab the progamme URIs on iPlayer.
So here’s the pipe. We start by getting the last 50 items from the @open2 updates feed (using ?count=50 to get more than the default number of items from the feed), use a regular expression to parse the tweets to identify the programme titles, remove the duplicate programme title items from the feed using the Unique block, put the time that tweet was sent into a universal/canonical form and then filter by date so we only get tweets from the last 7 days.
We then run each item through the BBC programmes filter described above and return the recent programmes feed:
A couple of tweaks to the DeliTV pipe handle, you know, stuff ;-) and you can now bookmark this pipe – BBC/OU 7 Day TV Catchup (or it’s RSS feed output) to delicious, tagged so that it appears in your DeliTV feed, and you have a channel dedicated to recent BBC/OU TV programmes that have been broadcast on BBC One to Four and that are currently available on iPlayer :-)
Yesterday we had a presentation at the OU from George Entwistle (Controller of Knowledge Commissioning) and Simon Nelson (Controller of Multiplatform Commissioning) about the BBC Multiplatform strategy. I don’t think anything was mentioned that was not for public consumption, except perhaps for a couple of future concepts that I hadn’t already picked up from the various BBC blogs, but the implications were not to blog, so I won’t…
That said, here’s one of the things that quite amused me – an observation by Simon Nelson about the significant effect that Doctor Who has on Google Search Trends:
Do you see that big spike in search volumes in April? Any ideas what caused it…?
[Arghhhh, sometimes, I so need to be able to do split screen screen captures in my browser. anyone know how? Ah sod it… <html>
<head>
<title></title>
</head>
<frameset rows="30%, 70%">
<frame src="<?php echo $_GET['url']; ?>">
<frame src="<?php echo $_GET['url']; ?>">
</frameset>
Any ideas about what might have caused the May spike?
Here are the broadcast dates for another Doctor Who, Series 4 episode:
Using the Google Insights for Search tool, we can actually tunnel down a little more on when the searche volume blip occurred:
(Apparently, things like Merlin also produced a bump… If you find some more, why not at them to Trendspotting, with an explanation of what caused the temporary upsurge in search traffic volume on Google for a particular search term?;-)
Another observation Simon made was that Google News didn’t know what cause the spike – but the BBC did…
So I’m thinking that a neat little mashup might be to add a pulse line – about where the news volume box is on Google trends – that shows a pulse corresponding to dates on which BBC programmes are broadcast that can be found using the search trends search terms.
That is, take a search phrase (“pompeii” or “agatha christie” and run it through the BBC programme catalogue; pull out the transmission dates of the programmes that the search term throws up; plot a heartbeat corresponding those transmission dates in a window beneath the Google trends data, so by eye you can spot potential correlations between search volumes and programme transmissions.
Alternatively, a trace could be run against volumes of web traffic using those search terms on the BBC website, or audience figures for the programmes that are turned up using those search terms…
(Are there any easy-to-use correlation tools out there on the public web, I wonder, that will try to find correlations between two different time series?)
Why’s this interesting – because in the race for eyeballs, if you know what’s likely to be driving traffic on a particular future date, you can put some content out there in advance to try to capture some that traffic…
It might also be interesting to try looking for trends in programme names on Twitter (using something like Twist, maybe, although I’m guessing that tweet volumes for programme names may be too low to register? Which is maybe where having programme hashtags comes in?)
[UPDATE: BBC programmes on iPlayer buzztracking – Shownar]
Although I managed to get third party Youtube movies embedded in an online OU course earlier this year, mentioning the use of embedded Youtube resources in our course materials still causes moments of tension in course team meetings (“what about the rights?”, “can we trust the video will stay at that URL?” and so on), so I keep an eye out for the appearance of embedded Youtube movies on other sites that I can use as examples of how other publishers are happy to make use of embedded resources from other sites…
…like this one for example – embedded Youtube music videos on the bbc.co.uk domain: