Playing With Processing – arc() and General Election Data, 2005

Over the weekend, I thought I’d have a little bit more of a play with the data I got into shape in Data Handling in Action – Visualising UK General Election 2005 Results. One of the thing’s I’ve been pondering with in Processing is how to draw circles and arcs, so having discovered the arc() formula, I decided to have a little play with it…

[UPDATE – there maybe a minor oops or two in some of the following images… e.g. scaling (proportion of vote cast vs size of constituency may be mangled in a one of charts; and if neither of lib/lab/con won a seat, votes may still show (in order of most-least from among the big three)… But having doodled, now I know the sorts of things I want to measure and plot for doing it properly next time!;-)]

Here we have the 2005 general election results for each constituency for Labour, the Lib Dems and the Conservative party. Each concentric circle is one constituency, in alphabetical order, going from inside to out. The colours are as you might expect, the length of an arc is the percentage of the overall vote in that constituency polled by the corresponding party. The six plots show all the possible orderings…

General Election 2005

It strikes me that I assumed the origin that the arc() function takes… Maybe I should set it more deliberately (but according to what criterion?)

I also experimented with a range of plots based in equal segments, with origin axes 120 degrees apart. Visualisation folk won’t like these because the whole circle does not add up 100% of anything…

Here, each party has 120 degrees; the length of each arc is the percentage of valid votes * 120 degrees… (I think… pretty sure it’s not percentage of possible electorate!)

General election 2005

In the following case, the 120 degree angle about each party is 100% of the eligible voters; the arc length is: 120 degrees * number of votes / number of eligible voters.

faceofdemocracy2 general election 2005

If we just look at winners, the following allocates 120 degrees to the votes cast in the constituency; arcs show number of votes for winner / number of valid votes cast (err, maybe… unless it’s per size of consituency… hmmm):

General election 2005

The next chart shows the size of the majority, as a percentage of the whole circle:

General election 2005

Finally, here’s just the winning party as a whole ring:

General election 2005 - winning seats

UPDATE: and here’s a view showing the percentage votes in order (winner first, second placed candidate, then third, again from just taking into account the three main parties… and ah, I’ve maybe realised an oops for constituencies where one of the three main parties doesn’t win….maybe their votrs are showing too…?! Hmm……)

General Election 2005

If I had any more time to play on these, I’d look at ordering the circles in various different ways. But I don’t have any more time to play on this just now…

PS For more election related data, particulalry in Linked Data from, see @kitwallace’s Election nominations in the Talis Data Incubator and An exercise in RDF – 2005 Election data.

PPS see also this summary/round-up of UK media interactive swingometers

UK General Election 2010 – Interactive Maps and Swingometers

So it seems like the General Election has been a Good Thing for the news media’s interactive developer teams… Here’s a quick round up of some of the interactives I’ve found…

First up, the BBC’s interactive election seat calculator:

BBC election interactive

This lets you set the percentage vote polled by each party and it will try to predict the outcome…

The Guardian swingometer lets you play with swing from any two of the three big parties to the third:

Guardian swingometer

The Daily Telegraph swingometer lets you look at swing between any two parties…

Telegraph election map

The Economist also lets you explore pairwise swings

Economist - election map

The Times doesn’t really let you do much at all… and I wonder – is Ladbrokes in there as product placement?!

Time election interactive

Sky doesn’t go in for modeling or prediction, it’s more of just a constituency browser

Sky Election Map

The Sun probably has Tiffany, 23…

From elsewhere, this swingometer from the Charts & numbers – UK Election 2010 blog lets you model swings between the various parties

Swingometer

As to what swing is? It’s defined in this Parliamentary briefing doc [PDF]/

Data Handling in Action – Visualising UK General Election 2005 Results

Over the weekend, a bit of random googling with filetype:csv OR filetype:xls limits turned up an archive of what looked like a complete set of UK general election results from 2005 on the Electoral Commission website.

(By the by, I also came across the National Digital Archive of Datasets (NDAD), which looks like it could be interesting: “The National Digital Archive of Datasets (NDAD) preserves and provides online access to archived digital datasets and documents from UK central government departments. Our collection spans 40 years of recent history, with the earliest available dataset dating back to about 1963.”)

Anyway, the election results data was presented in a CSV file with a format that looked like:

Election results data

The first thing I wanted to try to do with it was to produce an election votes treemap (e.g. along similar lines to Work In Progress – Voting Treemaps). The target visualisation tool was Many Eyes, simply because it’s quick and easy to use (I think a little bit of setting up is still required to use the wrapper I’ve placed around the JIT treemaps…:-(

In order to use the Many Eyes treemap to it’s full effect, I needed each row to start with the constituency name. Looking at the above screengrab of the Electoral Commission data, we see that the constituency is only explicitly stated for the winning candidate. So what’s the best way of filling in the blanks?

In a couple of recent talks, I’ve made the claim that library folk should be in the business of teaching on effective data handling skills, and this is a good case in point. So what would you do?

The naive response is to go into the spreadsheet and by hand click and drag each separate constituency name into the blanks. With over six hundred constituencies, that should take you how long..?!

Two rather more efficient ways came to my mind. Firstly, use a Python script to read in the CSV file, check to see if the constituency field is populated, and if it isn’t, populate it with the constituency name from the previous row:

import csv
filein='candidates_49576.csv'
fileout='election2005.csv'
fout=open(fileout,"wb+")
csv_file = csv.writer(fout)
file_obj=open(filein,"rb")
for line in csv.reader(file_obj, 
              delimiter=',',    # Your custom delimiter.
              skipinitialspace=True): # Strips whitespace after delimiter.
  if line: # Make sure there's at least one entry.
     if line[0]=='':
       line[0]=last
     else:
       last=line[0]
     csv_file.writerow(  line )
fout.close()

As ever, most of the above code isn’t mine, it’s stolen (in this case, from Stack Overflow).

The second approach I thought of was to use a Google Apps script… (this exercise is left to the reader… feel free to post a copy of the script, or a link to a spreadsheet containing it, in the comments;-)

The third approach was to use a spreadsheet formula and create a new, fully populated constituency column. Here’s one formula that I thought of:

Fill in the blanks... Excel formula

That is, if the cell to the left is empty, copy the contents of the cell above…

So now I had a copy of the data in the form I needed, I could just copy and paste it from a spreadsheet into Many Eyes

You can see a copy of a derived treemap here. The view below uses colour to show the constituencies where the winning candidate won more than half of the valid votes cast in that constituency:

WHere constituencies gave a true majority mandate to a party, general election 2005

The next thing I wanted to do was to look at the actual number of votes a winning candidate beat the second placed candidate by, and also view this as a percentage of the total number of valid votes cast in that constituency. To do this, I created three new columns:

– one containing the total number of valid votes cast;
– one containing the number of votes the winning candidate beat the second placed candidate by;
– one containing the number of votes the winning candidate beat the second placed candidate by divided by the total number of valid votes cast, given as a percentage.

So how would you calculate those items?

Go on – think about it – how would you do it?

Here’s how I did it…

To count the total number of votes cast in a constituency, I used the formula:
=SUMIF(A:A,A2,D:D)
(Note that before yesterday, I’d never heard of this formula… I just went googling for help on something like excel sum similar items…;-)

SUMIF

Select and drag this cell to populate the column…

For the vote majority formula, I only wanted this to apply to the winning candidate, so I put a trap in to see whether the constituency from the previous row is the same as the current row. If it isn’t, then because I assume the first candidate is the winning candidate, I can find the difference between the current row’s votes and that of the second placed candidate, which is immediately below:

=IF(A1<>A2,D2-D3,0)

Once again, select and drag this cell to populate the column…

I guess I really should also trap just to check that the row below is in the same constituency in case someone (the Speaker?) stands unopposed:
=IF(A1<>A2,IF(A2=A3,D2-D3),0)

The percentage value is found simply by dividing the absolute number of votes majority by the total number of valid votes cast and multiplying by 100.

I uploaded the data to Many Eyes here.

So now we can look to see the number of votes majority an individual had, for example, compared to the turn out:

GEneral election 2005

This chart brings to mind two other things to me. Firstly it shows all the candidates – it would be more convenient in this case to just show the data for the winning candidates. Secondly, it suggests that we might also plot the number of votes behind the winning candidate each of the other candidates was… but that’s for another day…

What I will do though is consider the first point, specifically how we might just look at the data for the winning candidates… One way of achieving this is to use Google Spreadsheets as a database using the Google spreadsheets API as described in First Steps Towards a Generic Google Spreadsheets Query Tool, or At Least, A Guardian Datastore Interactive Playground.

Uploading the data to a Google Spreadsheet, and publishing it means I can interrogate it via the API in my Datastore Explorer (howto):

Guardian Datasore explorer

The query also generates a link to a CSV version of the data. It’s not hard then to pop the data into Many Eyes so now we can visualise the winning MP majorities more easily – here we see the votes ahead of the second placed candidate vs. the valid number of votes cast:

VOtes majority vs votes cast, 2005 General Election

We can also use the Datastore explorer to interrogate the database. For example, which MPs are defending a slim majority in terms of absolute number of votes?

Not many votes in it...

Okay, that’s more than enough for now – this was only supposed to be a half hour post… What I hope this post has done, though, is show how I mix and match tools and techniques as a I wander around a data set, as well as happily trawling the net to look for solutions to problems that I’m sure someone else must have solved before (such as summing a number of separate items labeled the same way in an Excel spreadsheet).

PS Somewhere along the line, I think when looking at majorities, a couple of points had negative values – the visualisation had turned up missing rows, like this one:

Data in need of cleaning...