Search Results
Creating Olympic Medal Treemap Visualisations Using OTS R Libraries
In London Olympics 2012 Medal Tables At A Glance? I posted some treemap visualisations of the Olympics medal tables generated using a Google Visualisation Chart treemap component. I thought it might be worth posting a quick R generated example too, using the off-the-shelf/straight out of CRAN treemap component. (If you want to play along, download the data as CSV from here.)
The original data looks like this:
but ideally we want it to look like this:
I posted a quick recipe showing how to do this sort of reshaping in Google Refine, but in R it’s even easier – just melt the Gold, Silver and Bronze columns into a pair of columns…
Here’s the full code to do the reshaping and generate a simple treemap:
#load in the data from a file
odata = read.csv("~/Downloads/nbc_olympic_medalscrape.csv")
#Reshape the data
require(reshape)
odatar=melt(odata,id=c('cc','ccevent','Event'))
#And generate the treemap in the simplest possible way
require(treemap)
tmPlot(odatar,
index=c("cc", "Event","variable"),
vSize="value", vColor='value',
type="value")
And here’s the treemap, with country blocks ordered in this case by total medal haul:
(To view the countries ordered according to number of Golds, a quick fix would be to order hierarchy with the medal type shown at the highest level of the tree: index=c("variable","cc", "Event").)
Generating variant views (I described six variants in the original post) is easy enough – just tweak the order of the elements of the index setting. (I should have named the melt created columns something more sensible than the default, shouldn’t I? Note that the vSize and vColor value value (sic) refers to the column name that identifies the medalType column. The type value says use the numerical value…. (i.e. it’s literal – it doesn’t refer to a column name…)
Out of the can – simples enough… So what might we be able to do with a little bit more treatment? Examples via the comments, please ;-)
Pragmatic Visualisation – GDS Transaction Data as a Treemap
A week or two ago, the Government Data Service started publishing a summary document containing website transaction stats from across central government departments (GDS: Data Driven Delivery). The transactional services explorer uses a bubble chart to show the relative number of transactions occurring within each department:
The sizes of the bubbles are related to the volume of transactions (although I’m not sure what the exact relationship is?). They’re also positioned on a spiral, so as you work clockwise round the diagram starting from the largest bubble, the next bubble in the series is smaller (the “Other” catchall bubble is the exception, sitting as it does on the end of the tail irrespective of its relative size). This spatial positioning helps communicate relative sizes when the actual diameter of two bubbles next to each other is hard to differentiate between.
Clicking on a link takes you down into a view of the transactions occurring within that department:
Out of idle curiosity, I wondered what a treemap view of the data might reveal. The order of magnitude differences in the number of transactions across departments meant the the resulting graphic was dominated by departments with large numbers of transactions, so I did what you do in such cases and instead set the size of the leaf nodes in the tree to be the log10 of the number of transactions in a particular category, rather than the actual number of transactions. Each node higher up the tree was then simply the sum of values in the lower levels.
The result is a treemap that I decided shows “interestingness”, which I defined for the purposes of this graphic as being some function of the number and variety of transactions within a departement. Here’s a nested view of it, generated using a Google chart visualisation API treemap component:
The data I grabbed had a couple of usable structural levels that we can make use of in the chart. Here’s going down to the first level:
…and then the second:
Whilst the block sizes aren’t really a very good indicator of the number of transactions, it turns out that the default colouring does indicate relative proportions in the transaction count reasonably well: deep red corresponds to a low number of transactions, dark green a large number.
As a management tool, I guess the colours could also be used to display percentage change in transaction count within an area month on month (red for a decrease, green for an increase), though a slightly different size transformation function might be sensible in order to draw out the differences in relative transaction volumes a little more?
I’m not sure how well this works as a visualisation that would appeal to hardcore visualisation puritans, but as a graphical macroscopic device, I think it does give some sort of overview of the range and volume of transactions across departments that could be used as an opening gambit for a conversation with this data?
Practical Visualisation Tools Presentation: #CASEprog
Last week I gave a presentation at the DCMS describing some hands-on tools for getting started with creating data powered visualisations (Visualisation Tools to Support Data Engagement) at the invitation of the Arts Council’s James Doeser from the Arts Council in the context of the DCMS CASE (Culture and Sport Evidence) Programme, #CASEprog:
I’ve also posted a resource list as a delicious stack: CASEprog – Visualisation Tools (Resource List).
Whilst preparing the presentation, I had a dig through the DCLG sponsored Improving Visualisation for the Public Sector site, which provides pathways for identifying appropriate visualisation types based on data type, policy objectives/communication goals and anticipated audience level. It struck me that being able to pick an appropriate visualisation type is one thing, but being able to create it is another.
My presentation, for example, was based very much around tools that could provide a way in to actually creating visualisations, as well as shaping and representing data so that it can be plugged straight in to particular visualisation views.
So I’m wondering, is there maybe an opportunity here for a practical programme of work that builds on the DCLG Improving Visulisation toolkit by providing worked, and maybe templated, examples, with access to code and recipes wherever possible, for actually creating examples of exemplar visualisation types from actual open/public data set that can be found on the web?
Could this even be the basis for a set of School of Data practical exercises, I wonder, to actual create some of these examples?
Exporting and Displaying Scraperwiki Datasets Using the Google Visualisation API
In Visualising Networks in Gephi via a Scraperwiki Exported GEXF File I gave an example of how we can publish arbitrary serialised output file formats from Scraperwiki using the GEXF XML file format as a specific example. Of more general use, however, may be the ability to export Scraperwiki data using the Google visualisation API DataTable format. Muddling around the Google site last night, I noticed the Google Data Source Python Library that makes it easy to generate appropriately formatted JSON data that can be consumed by the (client side) Google visualisation library. (This library provides support for generating line charts, bar charts, sortable tables, etc, as well as interactive dashboards.) A tweet to @frabcus questioning whether the gviz_api Python library was available as a third party library on Scraperwiki resulted in him installing it (thanks, Francis:-), so this post is by way of thanks…
Anyway, here are a couple of examples of how to use the library. The first is a self-contained example (using code pinched from here) that transforms the data into the Google format and then drops it into an HTML page template that can consume the data, in this case displaying it as a sortable table (GViz API on scraperwiki – self-contained sortable table view [code]):
Of possibly more use in the general case is a JSONP exporter (example JSON output (code)):
Here’s the code for the JSON feed example:
import scraperwiki
import gviz_api
#Example of:
## how to use the Google gviz Python library to cast Scraperwiki data into the Gviz format and export it as JSON
#Based on the code example at:
#http://code.google.com/apis/chart/interactive/docs/dev/gviz_api_lib.html
scraperwiki.sqlite.attach( 'openlearn-units' )
q = 'parentCourseCode,name,topic,unitcode FROM "swdata" LIMIT 20'
data = scraperwiki.sqlite.select(q)
description = {"parentCourseCode": ("string", "Parent Course"),"name": ("string", "Unit name"),"unitcode": ("string", "Unit Code"),"topic":("string","Topic")}
data_table = gviz_api.DataTable(description)
data_table.LoadData(data)
json = data_table.ToJSon(columns_order=("unitcode","name", "topic","parentCourseCode" ),order_by="unitcode")
scraperwiki.utils.httpresponseheader("Content-Type", "application/json")
print 'ousefulHack('+json+')'
I hardcoded the wraparound function name (ousefulHack), which then got me wondering: is there a safe/trusted/approved way of grabbing arguments out of the URL in Scraperwiki so this could be set via a calling URL?
Anyway, what this shows (hopefully) is an easy way of getting data from Scraperwiki into the Google visualisation API data format and then consuming either via a Scraperwiki view using an HTML page template, or publishing it as a Google visualisation API JSONP feed that can be consumed by an arbitrary web page and used direclty to drive Google visualisation API chart widgets.
PS as well as noting that the gviz python library “can be used to create a google.visualization.DataTable usable by visualizations built on the Google Visualization API” (gviz_api.py sourcecode), it seems that we can also use it to generate a range of output formats: Google viz API JSON (.ToJSon), as a simple JSON Response (. ToJSonResponse), as Javascript (“JS Code”) (.ToJSCode), as CSV (.ToCsv), as TSV (.ToTsvExcel) or as an HTML table (.ToHtml). A ToResponse method (ToResponse(self, columns_order=None, order_by=(), tqx=”")) can also be used to select the output response type based on the tqx parameter value (out:json, out:csv, out:html, out:tsv-excel).
PPS looking at eg
https://spreadsheets.google.com/tq?key=rYQm6lTXPH8dHA6XGhJVFsA&pub=1
which can be pulled into a javascript google.visualization.Query(), it seems we get the following returned:
google.visualization.Query.setResponse({"version":"0.6","status":"ok","sig":"1664774139","table":{ "cols":[ ... ], "rows":[ ... ] }})
I think google.visualization.Query.setResponse can be a user defined callback function name; maybe worth trying to implement this one day?
Creating Simple Interactive Visualisations in R-Studio: Subsetting Data
Watching a fascinating Google Tech Talk by Hadley Wickham on The Future of Interactive Graphics in R – A Joint Visualization and UseR Meetup, I was reminded of the manipulate command provided in R-Studio that lets you create slider and dropdown widgets that in turn let you dynamically interact with R based visualisations, for example by setting data ranges or subsetting data.
Here are a couple of quick examples, one using the native plot command, the other using ggplot. In each case, I’m generating an interactive visualisation that lets me display as a line chart two user selected data series from a larger data set.
[Data file used in this example]
Here’s a crude first attempt using plot:
hun_2011comprehensiveLapTimes <- read.csv("~/code/f1/generatedFiles/hun_2011comprehensiveLapTimes.csv")
View(hun_2011comprehensiveLapTimes)
library("manipulate")
h=un_2011comprehensiveLapTimes
manipulate(
plot(lapTime~lap,data=subset(h,car==cn1),type='l',col=car) +
lines(lapTime~lap,data=subset(h,car==cn2 ),col=car),
cn1=slider(1,25),cn2=slider(1,25)
)
This has the form manipulate(command1+command2, uiVar=slider(min,max)), so we see for example two R commands to plot the two separate lines, each of them filtered on a value set by the correpsonding slider variable.
Note that we plot the first line using plot, and the second line using lines.
The second approach uses ggplot within the manipulate context:
manipulate(
ggplot(subset(h,h$car==Car_1|car==Car_2)) +
geom_line(aes(y=lapTime,x=lap,group=car,col=car)) +
scale_colour_gradient(breaks=c(Car_1,Car_2),labels=c(Car_1,Car_2)),
Car_1=slider(1,25),Car_2=slider(1,25)
)
In this case, rather than explicitly adding additional line layers, we use the group setting to force the display of lines by group value. The initial ggplot command sets the context, and filters the complete set of timing data down to the timing data associated with at most two cars.
We can add a title to the plot using:
manipulate(
ggplot(subset(h,h$car==Car_1|car==Car_2)) +
geom_line(aes(y=lapTime,x=lap,group=car,col=car)) +
scale_colour_gradient(breaks=c(Car_1,Car_2),labels=c(Car_1,Car_2)) +
opts(title=paste("F1 2011 Hungary: Laptimes for car",Car_1,'and car',Car_2)),
Car_1=slider(1,25),Car_2=slider(1,25)
)
My reading of the manipulate function is that if you make a change to one of the interactive components, the variable values are captured and then passed to the R command sequences, which then executes as normal. (I may be wrong in this assumption of course!) Which is to say: if you write a series of chained R commands, and can abstract out one or more variable values to the start of the sequence, then you can create corresponding interactive UI controls to set those variable values by placing the command series with the manipulate() context.
Slides from OU Rise Library Analytics Workshop: Rambling about Visualisation
For what it’s worth, slides from my presentation yesterday… As ever, they’re largely pointless without commentary…
… and even with the commentary, it was all a bit more garbled than usual (I forgot to breathe, had no real idea in my own mind what I wanted to say, etc etc…)
On reflection, here’s what I took from thinking back about what I should have tried to say:
- my assumption is that folk who are interested in asking data related questions should feel as if they can actually work with the data itself (direct data manipulation); I appreciate this is already way off the mark for some people who want someone else to work the data and then just read reports about it – but then that means you can’t ask or discover your own questions about the data, just read answers (maybe) to questions that someone else has asked, presented in a way they decided;
- you need to feel confident in working with data files – or at least, you need to be prepared to have a go at working with data files! (Bear in mind that many of the blog posts I write are write ups – of a sort – of how to do something I didn’t know how to do a couple of hours before… The web usually has answers to most of the questions that I come up against – and if I can’t find the answers, I can often request them via things like Twitter or Stack Overflow…) This can range from using command line tools, to using applications that let you take data in using one format and getting it out as another);
- different tools do different things; if you can get a dataset into a tool in the right way, it may be able to do magical things very very easily indeed…
- three tools that can do a lot without you having to know a lot (though you may have to follow a tutorial or two to pick up the method/recipe….or at least recognise a picture you like and a dataset whose shape you can replicate using your own data, and then the ability to see which bits you need to cut and paste into the command line…):
-=- Gephi: great for plotting networks and graphs. It can also be appropriated to draw line charts (if you can work out how to ‘join the dots’ in the data file by turning the line into a set of points connected by edges) or scatter plots (just load in nodes – no edges connecting them – and lay it out using Gephi’s geolayout tool which also lets you plot “rectilinear” plots based on x and y axis values; (I haven’t worked out a reliable way of working with CSV in Gephi – yet…); it’s amazing what you can describe as a graph when you put your mind to it…
-=- gnuplot: command line tool for plotting scatter plots and line graphs (eg from time series) using data stored in simple text file (e.g. TSV or CSV)
-=- R (and ggplot if you’re feeling adventurous and want :pretty”, nicely designed graphs out); another command line tool (I find R-Studio helps) that again loads in data from a CSV file; R can generate statistical graphs very easily from the command line (it does the stats calculations for you given the raw data).
- Visual analytics/graphical data analysis is a process – you tease out questions and answers through directly manipulating the data and engaging with it in a visual way;
- when you see a visualisation you like, look at it closely: what do you see? Spending five mins or so looking at a Gestalt psychology/visual perception tutorial will give you all sorts of tricks and tips for how to construct visualisations so that structure your eye can detect will jump out at you;
- I think I may have confused folk talking about “dimensions”: what I meant what, how many columns could you represent in a given visulisation at the same time, if each data point corresponds to a single row in a data set. So for example, if you have an x-y plot (2 dimensions), with different symbols (1 dimension) available for plotting the points, as well as different colours (1 dimension) and different possible size (1 dimension) for each symbol, along with a label (1 dimension) for each point, and maybe control over the size (1 dimension), colour (1 dimension) and even font (1 dimension) applied to the label, you might find you can actually plot quite a few columns/dimensions for each data point on your chart… Whether or not you can actually decipher it is another matter of course! My Gephi charts generally have 2 explicit dimensions (node size and colour), as well as making use of two spatial dimensions (x, y) to lay out points that are in some sense “close” to each other in network space. It’s worth remembering though, that if you’re using a tool to engage in a conversation with a dataset as you try to get it to tell its story to you, it may not matter that the visualisation looks a mess to anyone else (a bit like an involved conversation may not make sense if someone else suddenly tries to join it). (Presentation graphics, on the other hand, are usually designed to communicate something that the data is trying to say to another person in a very explicit way.)
- working with data is a tactile thing… you have to be prepared to get your hands dirty…
OU Related Courses Network Visualisation Using Protovis and Open University Open Data
This is something I’ve been meaning to do for ages, so spurred on by Martin Hawksey’s wonderful Google Gadgets port of my ad hoc Twitter network visulisation thing using Protovis (which Martin points out doesn’t work with IE9), I finally got round to it today: a wiring up of the OU modules Linked Data to the protovis app:
The data is pulled in from the OU Linked Data endpoint via Sparqlproxy (which provides a JSON output from the query that I can pull directly into the web page).
The query I’m using looks for courses related to the course of interest, and the courses related to those courses:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?name1 ?code2 ?name2 ?code3 ?name3 from <http://data.open.ac.uk/context/course> where {
?x a <http://purl.org/vocab/aiiso/schema#Module>.
?x <http://data.open.ac.uk/saou/ontology#courseLevel> <http://data.open.ac.uk/saou/ontology#undergraduate>.
?x <http://courseware.rkbexplorer.com/ontologies/courseware#has-title> ?name1.
?x <http://purl.org/goodrelations/v1#isSimilarTo> ?z.
?z <http://courseware.rkbexplorer.com/ontologies/courseware#has-title> ?name2.
?x <http://purl.org/vocab/aiiso/schema#code> 'T215'^^xsd:string.
?z <http://purl.org/vocab/aiiso/schema#code> ?code2.
?z <http://purl.org/goodrelations/v1#isSimilarTo> ?zz.
?zz <http://courseware.rkbexplorer.com/ontologies/courseware#has-title> ?name3.
?zz <http://purl.org/vocab/aiiso/schema#code> ?code3.
} LIMIT 100
(The endpoint is data.open.ac.uk/query; the explicit ‘T215′ course code identifier is paramterised in the URI that runs the query through Sparqlproxy.)
There’s all sorts of opportunities for coloring the nodes (eg to distinguish between the focal point course, it’s direct neighbours, and the neighbours of those neighbors) but that’s an exercise for another day. I should probably have a go at labeling them sensibly too…
(The ability to drag nodes around within the graph has also been added (back) – Martin noticed the order of a couple of the Protovis commands influenced whether this worked or not. Being able to relayout the chart reminds me how rubbish the force layout algorithm Protovis uses actually is!)
Drawing on Martin’s work (i.e. directly pinching his Google Gadget definition!) I also created a widget/gadget (XML) that lets you view the network of courses around a course in your own page…
Here’s the config page:
Of course, this being a WordPress.com hosted blog, I donlt think I can directly embed the gadget to prove that it works…
Related:
- data.open.ac.uk Linked Data Now Exposing Module Information
- Getting Started With data.open.ac.uk Course Linked Data
- Open University Undergraduate Module Map
PS to do – a reimagining of this, probably using arbor.js, where we just do the direct neigbours of a course code, but allow nodes to be clickable so that additional nodes and edges can be added to the graph dynamically… It might also be interesting to support search by keywords, and display courses that match keywords (in one colour) as well as related courses (in another), along with edges showing which courses are related…?
Google Visualisation API Controls Support Interactive Data Queries Within a Web Page
The only way I can keep up with updates to Google warez at the moment is to feed off tips, tricks and noticings shared by @mhawksey. Yesterday, Martin pointed put to me a couple of new controls offered by the Google visualization API – interactive dashboard controls (documentation), and an in-page chart editor.
What the interactive components let you do is download a dataset from a Google spreadsheet and then dynamically filter the data within the page.
So for example, over on the F1Datajunkie blog I’ve been posting links to spreadsheets containing timing data from recent Formula One races. What I can now do is run a query on one of the spreadsheets to pull down particular data elements into the web page, and then filter the results within the page using a dynamic control. An example should make that clear (unfortunately, I can’t embed a live demo in this hosted WordPress blog page:-(

I’ve posted a copy of the code used to generate that example as gist here: Google Dynamic Chart control, feeding off Google Spreadsheet/visualisation API query
Here’s the key code snippet – the ControlWrapper populates the control using the unique data elements found in a specified column (by label) within the downloaded dataset, and is then bound to a chart type which updates when the control is changed:
var data = response.getDataTable();
var namePicker = new google.visualization.ControlWrapper({
'controlType': 'CategoryFilter',
'containerId': 'filter_div',
'options': {
'filterColumnLabel': 'driver',
'ui': {
'labelStacking': 'vertical',
'allowTyping': false,
'allowMultiple': false
}
}
});
var laptimeChart = new google.visualization.ChartWrapper({
'chartType': 'LineChart',
'containerId': 'chart_div',
'options': {
'width': 800,
'height': 800
}
});
var dashboard = new google.visualization.Dashboard(document.getElementById('dashboard_div')).
bind(namePicker, laptimeChart).
draw(data)
As well a drop down lists, there is a number range slider control which can be used to set minimum and maximum values of numerical filter, and a string filter that lets you filter data within a column using a particular term (it doesn’t seem to support Boolean search operators though…) Read more about the controls here: Google visualisation API chart controls
Something else I hadn’t noticed before: sort events applied to tables can also be used to trigger the sorting of data within a chart, which means you can offer interactions akin to some of those found on Many Eyes.
Whilst looking through the Google APIs interactive playground, I also noticed a couple of other in-page data shaping tools that I hadn’t noticed before: group and join
Group, which lets you group rows in a table and present and aggregated view of them:
That is, if you have data loaded into a datatable in a web page, you can locally produce summary reports based on that data using the supported group operation?
There’s also a join operation that allows you to merge data from two datatables where there is a commmon column (or at least, common entries in a given column) between the two tables:
What the join command means is that you can merge data from separate queries onto one or more Google spreadsheets within the page.
With all these programming components in place, it means that Google visulisation API support is now comprehensive to do all sorts of interactive visualisations within the page (I’m not sure of any other libraries that offer quite so many tools for wrangling data in the page? (The YUI datatable supports sorting and filtering, but I think that’s about it for data manipulation?)
I guess it also means that you can start to treat a web page as a database containing one or more datatables within it, along with tool support/function calls that allow you to work that database and display the results in a variety of visual ways?! And more than that, you can use interactive graphical components to construct dynamic queries onto the data in a visual way?!
PS here are a couple of other ways of using a Google spreadsheet as a database:
- Using Google Spreadsheets as a Database with the Google Visualisation API Query Language
- Using Google Spreadsheets Like a Database – The QUERY Formula
On the Public Understanding of – and Public Engagement With – Statistics: Reflections on the OU Statistics Group Conference on “Visualisation and Presentation in Statistics”
Last week I attended the OU Statistics conference on Visualisation and Presentation in Statistics (VIPS) (notes: here and here)
One of the things that struck me from conversations and some of the presentations was that statistics – and in particular public engagement around statistics – appears to be lagging science efforts in this area.
When I first moved to the OU as a lecturer a dozen or so years ago, I got involved with various activities that, at the time, were classed as “public understanding of science and technology”, though at the time the whole sci-comm area was in a state of flux and ideas were moving towards a focus on public engagement with science. As a member of the NESTA Crucible one year, I saw how there was also concern around engagement with science and technology policy, and how it could be moved “upstream”, to a point where dialogue with various publics could actually contribute to, and even influence, policy development.
(The NESTA Crucible experience significantly influenced my world view and was one of the most rewarding schemes I have ever been involved with…)
Since then, it seems to me that the school science curriculum has witnessed a similar change, with a move away from a focus purely on the basic science (and perhaps industrial applications?) to one that includes a consideration of socio-technical considerations (one might say, policy implications…)
At the VIPS event, one of the phrases that jumped out at me in at least one presentation (aside from repeated mentions to RSS…;-) talked about difficulties in promoting the public understanding of statistics. Ally this with the fact that the school maths curriculum seems not to have evolved so much, (“averages”, means and histogram still seem to be the focus?!) and I wonder: is statistics today where science was a decade or so ago?
The recent rhetoric around – and actual release of – “open public data” suggests that, as citizens and journalists, there is an increasing number of opportunities to hold governments and public bodies to account using evidentiary data and maybe also engage in data-driven (or at least data informed) policy formulation. With so much data out there, and so many possible ways of combining and interrogating it – so many possible different questions to ask and places to ask them – there are increasingly opportunities for informed amateurs to make a very real contribution (in the same way that amateur astronomers can make a real contribution to the recording and analysis of astronomical observations).
The growing instrumentation of our world also means that there is increasing amounts of data about ourselves that we can have access to in the form of personal data dashboards (for example, think of various social media/reputation tools, but also expect to see various tools appearing that allow you to mine your health/fitness, financial or shopping transaction data, for example). These dashboards will be visually rich, and designed to give at-a-glance overviews of the state of this, or that quantity or metric. But to get most from them, we will need to include more complex and powerful visualisation types, and find a way of helping people learn how to “see” them, “read” them and interpret them/
So to what extent do we need to engage with the “public understanding of statistics” as compared to the development of skills in the public appreciation of statistics and improvements in the way the public can engage with each other and with policy makers in discussions where statistics play a role? (Public engagement in statistics? Public engagement with statistics?)
Over the last few weeks, I’ve started trying to immerse myself in the world of statistical graphics, on the basis that our perceptual apparatus is pretty good at pattern detection and can help us get to grip with visually meaningful properties of distributions of data without us necessarily having to understand much in the way of formal statistics. (Of course, the visual apparatus can also be conned by misleading graphs and charts, which is where some semblance of critical understanding and, dare I say it, statistical literacy, comes in.)
My intuition is that it will be easier to develop a visual literacy in the reading and interpretation of charts (i.e. building on “folk statistical graphics/visual statistics”) than a widespread mathematical understanding of statistics. (I suspect that for most people, pie charts – and more recently ‘donut’ charts – as well as line graphs and simple bar charts are about the limit of what they are comfortable with, along with thematic maps (in particular, choropleth maps) and (in recent years again?) proportional symbol maps. I also know from asking even well informed audiences that awareness of more recently developed techniques, such as treemaps, are not widespread.)
At the moment, the infographics designers appear to be leading the charge into public consciousness of data-driven graphics, but as I’m finding out, the stats community has a wealth of visual techniques already to hand that are maybe “sounder” in terms of deriving visual representations that reflect statistical properties and concerns than the tricks the infographics crowd are using. (This is all just my anecdotal opinion, and not based in any formal research!)
Many infographics build on a common visual grammar (in the West, line charts up to the right increase over time; for area based charts, the bigger the area the more of something is being represented). But many infographics are also limited by the chart types we are all familiar with (line charts, bar charts, coloured maps…) Maybe the place to start is the stats community finding ways of introducing new-to-the-majority statistical graphs into the mainstream media along with a strong narrative to explain what is going on in those charts (and not necessarily so much discussion about the actual maths and stats…)?
Quick Summary of Second and Third Sessions of “Visualisation and Presentation in Statistics”
Kevin McConway (
http://statistics.open.ac.uk/People/k.j.mcconway
@kjm2 ): showing off some gratuitous use of numbers to illustrate Guardian stories #ouvpstats
Where do surveys reported in the press come from? ONS, market research companies. PR companies…… #ouvpstats
Get paid to do a (PR?) survey onepoll.com and youngpoll.com #ouvpstats
Not PR commissioned polls, err, maybe, err, hmmm….
http://72point.com/
#ouvpstats
Why are there numbers in the news? PR, Entertainment, eyecandy. Special status of “number facts” #ouvpn
Mary Poovey “A History of the Modern Fact”
http://www.press.uchicago.edu/ucp/books/book/chicago/H/bo3614698.html
#ouvpn
Need to distinguish between facts, analysis and narrative… #ouvpstats
What’s wrong with PR stats? ’tis the road to cynicism, or looking good rather than communicating well #ouvpstats
So what can we do about it? Statisticians need to engage with the public and work with journalists #ouvpstats
Statisticians’ view of journalists: innumerate, distort and oversimplify, don’t understand quantitative reasoniong, won’t listen #ouvpstats
Journalists’ view of statisticians: illiterate pedantic, boring, focus on ifs and buts, won’t listen #ouvpstats
Journalists work to tight timescales, have a view of “newsworrthiness”, are good storytellers #ouvpstats
Martin Bland ( https://hsciweb.york.ac.uk/research/public/Staff.aspx?ID=129 )
From papers during one issue from 1972 and 2010 Lancet and BMJ, mean population size has gone up 2-3 orders of magniture (tens to thousands+
Description of stats: very cursory, 2010: far more comprehensive statistical method reported. Shift from significance testing to estimation
Move towards evidence-based medicine starting around 1990s (bound to includes statistics)
“Why do we need some large, simple randomized trials?” Yusuf et al. 1984
Move to confidence intervals not p-values Gardner & Altman
http://www.bmj.com/content/292/6522/746.abstract
Journals started to introduce systematic requires and statistical referees
Consort guidelines for stats in randomised medical trials
http://www.consort-statement.org/
Statisticians should point out where wrong conclusions have been drawn as a results of stats mistakes…
Rosemary Bailey
http://www.maths.qmul.ac.uk/~rab/
Problems with box and whisker plots (referred to as box and aerial/antenna plot?), which are now popular in medicine, biology, engineering (not least becuase folk don’t know what the whisker means). Antenna doesn’t take into account variability across conditions. [My naive understanding of these diagrams is that they are trying to say something different? But my knowledge is so hazy I can't argue for what I do think they describe!]
Hasse diagrams – cords, dyes and constants(?) [I'm a bit lost at this point...]
Michel van de Velden
http://www.erim.eur.nl/ERIM/People/Person_Details?p_aff_id=799
Perceptual maps – mutltivariate methods for plotting high-dimensional data
Exploit natural spatial recognition/visual abilities
Examples: Tufte 1983 cleveland and McGill 1987, Wainer 2005
Caption should convey enough info to allow reader in possession of data (and appropriate tools) to recreate the perceptual map
Shape paramter (aspect ratio) – ratio of x scale to y scale. If it can be 1, it should be… (changes aspect ratio of photo of Kate Middleton to make the point about distortion if not 1 when it could/should be…)
If perception of map relies in part on angle of point/line, need to know where the origin is.
Excel charts – hard to explicity set an exact aspect ratio (same with many tools?)
Perceptual maps may require guidance as to how to read a map – e.g. icons
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1572196
[Me]
Jill Leyland, Vice President, Royal Statistical Society
Lots of folk think UK official statistics are not free of politcal interference, nor do they necessarily trust(?) them, scores very poorly compared to rest of Europe.
National Stats have high integrity and free of political interference. Perception of political interference is one reason why low degree of trust. UKSA (UK Statistics Authority) scrutinises official statistics: “promoting and safeguarding the porduction and publication of statistics that serve the public good”
No politicial interference, but: many key stats produced in depts, UKSA role not fully understood (scrutineer as well as publisher); pre-release access – Ministers can see statistics 24 hrs before they are released (up to 5 days in Scotland and Wales), and suspicion that Ministers may use this time for mischief…
Role of media – UK media are interested in statistics, but “stats are wrong” stories get more covereage than “stats are right”, and journalists often don’t understand statistical issues (as well as tight deadline, no specialist knowledge). BUT official statisticians could do better; ONS website a joke… (though new one due to launch at end of August). Far too little interaction with stats users outside government.
What can be done? Continuing efforts to improve presentation; need to differerntiate between independent national statistics and those produced by departments. Better education for journalists [and statisticians eg ito communications?]; reduction/elimination of pre-release access.
















