One of the things I try to impress on folk whenever I do a talk about web stats is how showing charts of mean (that is, simple averages) numbers of visitors or time on site is all but useless, because the actual distribution of values is not likely to be normal, and so the simple averages reported are, too all intents and purposes, not good for anything.
So when putting together a change treemap to summarise the changes in funding of quangos form government departments (Visualising Where the Money Goes: Westminster Quangos, Part 1), what I did originally was to visualise the change in funding for each quango within each department, rather than just displaying the following simpler, overall funding by government department treemap that I posted about previously:
The reason being? The overall change treemap might show increases or decreases in expenditure for a department as a whole, but it doesn’t reveal whether the funded quangos were all treated equally, or whether lots of small quangos received a cut whilst one big one received a huge rise in funding, for example:
So how did I create this treemap? The simplest way is just to create a query on the original spreadsheet that pulls in 4 columns – department, quango, and two expnditure columns (one for 2006 and one for 2007). A query built around this kernel, in fact:
select C,D,R,S order by C,R desc
(To see how such a query is put together, see Part 1 of this exercise.)
To generate a query that only displays quangos that had a non-zeor expenditure in both 2006 and 2007, just add:
where (R>0 AND S>0) after the select statement and before order by.
Just as an aside, note that it’s possible to to pull the output of this query into another spreadsheet. This allows you to add derived columns to the spreadsheet, for example, busing relative formaula that act on quantities automativcally inported into other columns in the spreadsheet. (One thing I intend to explore with the data store explorer is a 1-click way of creating a new spreadsheet that pulls in a query created using the explorer. See also: Using Google Spreadsheets and Viz API Queries to Roll Your Own Data Rich Version of Google Squared on Steroids (Almost…))
The CSV feed can then be pulled into a Many Eyes Wikified and visulaised in a variety of ways that reveal both the overall expenditure of quangos funded from within each department, as well as the relative funding over 2006/2007 of each quango:
So for example, for the two biggest quangos by expenditure in Defence, one spent more over 2006/7, and the other spent less…
Using the same data page , we can create other visualisations within the Many Eyes Wikified environment that allow us to explore the data in a little more detail. So for example, here’s a bit more detail about the funding of quangos in the Department of Health. Parallel bands show that quangos spent equivalent amounts in 2006 and 2007, lines that diverge going from left to right show an increase in expenditure, and lines that converge going from left to right depict decreasing expenditure.
A scatter chart readily shows large changes in expnditure over the two years:
And some charts are just shiny ;-)
Compared with just trawling through the spreadsheet data, I think there is a lot to be said for using visualisations to identify out of the norm data points (e.g. using the scatterplot), or unpacking totals (as in the case of departmentally funded quango expenditure) in a quick and easy way as a starting point to exploring the data in a more systematic way, for example in order to ask journalistic questions (whatever they might be).