Picking up on Political Representation on BBC Political Q&A Programmes – Stub , the quickest of hacks…
In OpenRefine, create a new project by importing data from a couple of URLs – data from the BBC detailing episode IDs for Any Questions and Question Time:
Import the data as XML, highlighting a single programme code row as the import element.
The data we get looks like this – /programmes/b007ck8s#programme – so we can add a column by URL around 'http://www.bbc.co.uk'+value.split('#')+'.json' to get JSON data back for each column.
Parse the JSON that comes back using something like value.parseJson()['programme']['medium_synopsis'] to create a new column containing the medium synopsis information.
The medium synopsis elements typically look like Topical debate from Colchester, with David Dimbleby. On the panel are Peter Hain, Sir Menzies Campbell, Francis Maude, singer Beverley Knight and journalist Cristina Odone. Which is to say they often contain the names of the panellists.
We can try to extract the names contained within each synopsis using the Zemanta API (key required) accessed via the Named-Entity Recognition extension for Google Refine / OpenRefine.
These seem to come back in reconciliation API form with the name set to a name and the id to a URL. We can get a concatenated list of the URLs that are returned by creating a column around something like this: forEach(cell.recon.candidates,v,v.id).sort().join('||') but I’m not sure that’s useful.
We can creata a column based just around the matched ID using cell.recon.match.name.
Let’s use the row view and fill down on programme IDs, then have a look at a duplicate facet and view only rows that are duplicated (that is, where an extracted named entity appears more than once). We can also use a text facet to see which names appear in multiple episodes of Question Time and/or Any Questions.
Selecting a single name allows us to see the programmes that person appeared on. If we pull out the time of first broadcast (value.parseJson()['programme']['first_broadcast_date']) and Edit Cells-Common Transforms-To date, we can also use a date facet to select out programmes first broadcast within a particular date range.
We can also run a text filter to limit records to episodes including a particular person and then use the Date facet to highlight the episodes in which they appeared on the timeline:
What this suggests is that we can use OpenRefine as some sort of ‘application shell’ for creating information tools around a particular dataset without actually having to build UI components ourselves?
If we custom export a table using programme IDs and matched names, and then rename the columns Source and Target, we can visualise them in something like Gephi (you can use the recipe described in the second part of this walkthrough: An Introduction to Mapping Company Networks Using Gephi and OpenCorporates, via OpenRefine).
The directed graph we load into Gephi connects entities (participant names, location names) with programme IDs. There is handy tool – Multimode Networks Projection – that can collapse the graph so that entities are connected to other entities that they shared a programme ID with.
(If you forget to remove the programme nodes, a degree range filter to select only nodes with degree greater than 2 tidies the graph up.)
If we run PageRank on the graph (now as an undirected graph), layout using ForceAtlas2 and size nodes according to PageRank, we can look into the heart of the UK political establishment as evidenced by appearances on Question Time and Any Questions.
The next step would probably be to try to pull info about each recognised entity from dbPedia (forEach(cell.recon.candidates,v,v.id).sort() seems to pull out dbpedia URIs) but grabbing data from dbPedia seems to be borked in my version of OpenRefine atm:-(
Anyway – a quick hack that took longer to write up than it did to do…
OpenRefine project file here.
It’s too nice a day to be inside hacking around with Parliament data as a remote participant in today’s Parliamentary hack weekend (resource list), but if it had been a wet weekend I may have toyed with one of the following:
– revisiting this cleaning script for Analysing UK Lobbying Data Using OpenRefine (actually, a look at who finds/offers support for All Party Groups. The idea was to get a dataset of people who provide secretariat and funds to APPGs, as well as who works for them, and then do something with that dataset…)
– tinkering with data from Question Time and Any Questions…
On that last one:
These gives us generatable URLs for programmes by month with URLs of form http://www.bbc.co.uk/programmes/b006t1q9/broadcasts/2013/01 but how do we get a JSON version of that?! Adding .json on the end doesn’t work?!:-( UPDATE – this could be a start, via @nevali – use pattern /programmes/PID.rdf , such as http://www.bbc.co.uk/programmes/b006qgvj.rdf
We can get bits of information (albeit in semi-structured from) about panellists in data form from programme URL hacks like this: http://www.bbc.co.uk/programmes/b007m3c1.json
Note that some older programmes don’t list all the panelists in the data? So a visit to WIkipedia – http://en.wikipedia.org/wiki/List_of_Question_Time_episodes#2007 – may be in order for Question Time (there isn’t a similar page for Any Questions?)
Given panellists (the BBC could be more helpful here in the way it structures its data…), see if we can identify parliamentarians (MP suffix? Lord/Lady title?) and look them up using the new-to-me, not-yet-played-with-it UK Parliament – Members’ Names Data Platform API. Not sure if reconciliation works on parliamentarian lookup (indeed, not sure if there is a reconciliation service anywhere for looking up MPs, members of the House of Lords, etc?)
From Members’ Names API, we can get things like gender, constituency, whether or not they were holding a (shadow) cabinet post, maybe whether they were on a particular committee at the time etc. From programme pages, we may be able to get the location of the programme recording. So this opens up possibility of mapping geo-coverage of Question Time/Any Questions, both in terms of where the programmes visit as well as which constituencies are represented on them.
If we were feeling playful, we could also have a stab at which APPGs have representation on those programmes!
It also suggests a simpler hack – of just providing a briefing around the representatives appearing on a particular episode in terms of their current (or at the time) parliamentary status (committees, cabinet positions, APPGs etc etc)?