OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Search Results

Creating Olympic Medal Treemap Visualisations Using OTS R Libraries

with one comment

In London Olympics 2012 Medal Tables At A Glance? I posted some treemap visualisations of the Olympics medal tables generated using a Google Visualisation Chart treemap component. I thought it might be worth posting a quick R generated example too, using the off-the-shelf/straight out of CRAN treemap component. (If you want to play along, download the data as CSV from here.)

The original data looks like this:

but ideally we want it to look like this:

I posted a quick recipe showing how to do this sort of reshaping in Google Refine, but in R it’s even easier – just melt the Gold, Silver and Bronze columns into a pair of columns…

Here’s the full code to do the reshaping and generate a simple treemap:

#load in the data from a file
odata = read.csv("~/Downloads/nbc_olympic_medalscrape.csv")

#Reshape the data
require(reshape)
odatar=melt(odata,id=c('cc','ccevent','Event'))

#And generate the treemap in the simplest possible way
require(treemap)
tmPlot(odatar, 
       index=c("cc", "Event","variable"), 
       vSize="value", vColor='value',
       type="value")

And here’s the treemap, with country blocks ordered in this case by total medal haul:

(To view the countries ordered according to number of Golds, a quick fix would be to order hierarchy with the medal type shown at the highest level of the tree: index=c("variable","cc", "Event").)

Generating variant views (I described six variants in the original post) is easy enough – just tweak the order of the elements of the index setting. (I should have named the melt created columns something more sensible than the default, shouldn’t I? Note that the vSize and vColor value value (sic) refers to the column name that identifies the medalType column. The type value says use the numerical value…. (i.e. it’s literal – it doesn’t refer to a column name…)

Out of the can – simples enough… So what might we be able to do with a little bit more treatment? Examples via the comments, please ;-)

Written by Tony Hirst

August 8, 2012 at 12:25 am

Posted in Rstats, Visualisation

Pragmatic Visualisation – GDS Transaction Data as a Treemap

leave a comment »

A week or two ago, the Government Data Service started publishing a summary document containing website transaction stats from across central government departments (GDS: Data Driven Delivery). The transactional services explorer uses a bubble chart to show the relative number of transactions occurring within each department:

The sizes of the bubbles are related to the volume of transactions (although I’m not sure what the exact relationship is?). They’re also positioned on a spiral, so as you work clockwise round the diagram starting from the largest bubble, the next bubble in the series is smaller (the “Other” catchall bubble is the exception, sitting as it does on the end of the tail irrespective of its relative size). This spatial positioning helps communicate relative sizes when the actual diameter of two bubbles next to each other is hard to differentiate between.

Clicking on a link takes you down into a view of the transactions occurring within that department:

Out of idle curiosity, I wondered what a treemap view of the data might reveal. The order of magnitude differences in the number of transactions across departments meant the the resulting graphic was dominated by departments with large numbers of transactions, so I did what you do in such cases and instead set the size of the leaf nodes in the tree to be the log10 of the number of transactions in a particular category, rather than the actual number of transactions. Each node higher up the tree was then simply the sum of values in the lower levels.

The result is a treemap that I decided shows “interestingness”, which I defined for the purposes of this graphic as being some function of the number and variety of transactions within a departement. Here’s a nested view of it, generated using a Google chart visualisation API treemap component:

The data I grabbed had a couple of usable structural levels that we can make use of in the chart. Here’s going down to the first level:

…and then the second:

Whilst the block sizes aren’t really a very good indicator of the number of transactions, it turns out that the default colouring does indicate relative proportions in the transaction count reasonably well: deep red corresponds to a low number of transactions, dark green a large number.

As a management tool, I guess the colours could also be used to display percentage change in transaction count within an area month on month (red for a decrease, green for an increase), though a slightly different size transformation function might be sensible in order to draw out the differences in relative transaction volumes a little more?

I’m not sure how well this works as a visualisation that would appeal to hardcore visualisation puritans, but as a graphical macroscopic device, I think it does give some sort of overview of the range and volume of transactions across departments that could be used as an opening gambit for a conversation with this data?

Written by Tony Hirst

August 2, 2012 at 12:29 pm

Posted in Data, Visualisation

Practical Visualisation Tools Presentation: #CASEprog

with 4 comments

Last week I gave a presentation at the DCMS describing some hands-on tools for getting started with creating data powered visualisations (Visualisation Tools to Support Data Engagement) at the invitation of the Arts Council’s James Doeser from the Arts Council in the context of the DCMS CASE (Culture and Sport Evidence) Programme, #CASEprog:

I’ve also posted a resource list as a delicious stack: CASEprog – Visualisation Tools (Resource List).

Whilst preparing the presentation, I had a dig through the DCLG sponsored Improving Visualisation for the Public Sector site, which provides pathways for identifying appropriate visualisation types based on data type, policy objectives/communication goals and anticipated audience level. It struck me that being able to pick an appropriate visualisation type is one thing, but being able to create it is another.

My presentation, for example, was based very much around tools that could provide a way in to actually creating visualisations, as well as shaping and representing data so that it can be plugged straight in to particular visualisation views.

So I’m wondering, is there maybe an opportunity here for a practical programme of work that builds on the DCLG Improving Visulisation toolkit by providing worked, and maybe templated, examples, with access to code and recipes wherever possible, for actually creating examples of exemplar visualisation types from actual open/public data set that can be found on the web?

Could this even be the basis for a set of School of Data practical exercises, I wonder, to actual create some of these examples?

Written by Tony Hirst

June 1, 2012 at 9:55 am

Exporting and Displaying Scraperwiki Datasets Using the Google Visualisation API

with 5 comments

In Visualising Networks in Gephi via a Scraperwiki Exported GEXF File I gave an example of how we can publish arbitrary serialised output file formats from Scraperwiki using the GEXF XML file format as a specific example. Of more general use, however, may be the ability to export Scraperwiki data using the Google visualisation API DataTable format. Muddling around the Google site last night, I noticed the Google Data Source Python Library that makes it easy to generate appropriately formatted JSON data that can be consumed by the (client side) Google visualisation library. (This library provides support for generating line charts, bar charts, sortable tables, etc, as well as interactive dashboards.) A tweet to @frabcus questioning whether the gviz_api Python library was available as a third party library on Scraperwiki resulted in him installing it (thanks, Francis:-), so this post is by way of thanks…

Anyway, here are a couple of examples of how to use the library. The first is a self-contained example (using code pinched from here) that transforms the data into the Google format and then drops it into an HTML page template that can consume the data, in this case displaying it as a sortable table (GViz API on scraperwiki – self-contained sortable table view [code]):

Of possibly more use in the general case is a JSONP exporter (example JSON output (code)):

Here’s the code for the JSON feed example:

import scraperwiki
import gviz_api

#Example of:
## how to use the Google gviz Python library to cast Scraperwiki data into the Gviz format and export it as JSON

#Based on the code example at:
#http://code.google.com/apis/chart/interactive/docs/dev/gviz_api_lib.html

scraperwiki.sqlite.attach( 'openlearn-units' )
q = 'parentCourseCode,name,topic,unitcode FROM "swdata" LIMIT 20'
data = scraperwiki.sqlite.select(q)

description = {"parentCourseCode": ("string", "Parent Course"),"name": ("string", "Unit name"),"unitcode": ("string", "Unit Code"),"topic":("string","Topic")}

data_table = gviz_api.DataTable(description)
data_table.LoadData(data)

json = data_table.ToJSon(columns_order=("unitcode","name", "topic","parentCourseCode" ),order_by="unitcode")

scraperwiki.utils.httpresponseheader("Content-Type", "application/json")
print 'ousefulHack('+json+')'

I hardcoded the wraparound function name (ousefulHack), which then got me wondering: is there a safe/trusted/approved way of grabbing arguments out of the URL in Scraperwiki so this could be set via a calling URL?

Anyway, what this shows (hopefully) is an easy way of getting data from Scraperwiki into the Google visualisation API data format and then consuming either via a Scraperwiki view using an HTML page template, or publishing it as a Google visualisation API JSONP feed that can be consumed by an arbitrary web page and used direclty to drive Google visualisation API chart widgets.

PS as well as noting that the gviz python library “can be used to create a google.visualization.DataTable usable by visualizations built on the Google Visualization API” (gviz_api.py sourcecode), it seems that we can also use it to generate a range of output formats: Google viz API JSON (.ToJSon), as a simple JSON Response (. ToJSonResponse), as Javascript (“JS Code”) (.ToJSCode), as CSV (.ToCsv), as TSV (.ToTsvExcel) or as an HTML table (.ToHtml). A ToResponse method (ToResponse(self, columns_order=None, order_by=(), tqx=”")) can also be used to select the output response type based on the tqx parameter value (out:json, out:csv, out:html, out:tsv-excel).

PPS looking at eg
https://spreadsheets.google.com/tq?key=rYQm6lTXPH8dHA6XGhJVFsA&pub=1
which can be pulled into a javascript google.visualization.Query(), it seems we get the following returned:
google.visualization.Query.setResponse({"version":"0.6","status":"ok","sig":"1664774139","table":{ "cols":[ ... ], "rows":[ ... ] }})
I think google.visualization.Query.setResponse can be a user defined callback function name; maybe worth trying to implement this one day?

Written by Tony Hirst

April 3, 2012 at 11:28 am

Posted in onlinejournalismblog, Tinkering

Tagged with

Creating Simple Interactive Visualisations in R-Studio: Subsetting Data

leave a comment »

Watching a fascinating Google Tech Talk by Hadley Wickham on The Future of Interactive Graphics in R – A Joint Visualization and UseR Meetup, I was reminded of the manipulate command provided in R-Studio that lets you create slider and dropdown widgets that in turn let you dynamically interact with R based visualisations, for example by setting data ranges or subsetting data.

Here are a couple of quick examples, one using the native plot command, the other using ggplot. In each case, I’m generating an interactive visualisation that lets me display as a line chart two user selected data series from a larger data set.

manipulate UI builder in RStudio

[Data file used in this example]

Here’s a crude first attempt using plot:

hun_2011comprehensiveLapTimes <- read.csv("~/code/f1/generatedFiles/hun_2011comprehensiveLapTimes.csv")
View(hun_2011comprehensiveLapTimes)

library("manipulate")
h=un_2011comprehensiveLapTimes

manipulate(
plot(lapTime~lap,data=subset(h,car==cn1),type='l',col=car) +
lines(lapTime~lap,data=subset(h,car==cn2 ),col=car),
cn1=slider(1,25),cn2=slider(1,25)
)

This has the form manipulate(command1+command2, uiVar=slider(min,max)), so we see for example two R commands to plot the two separate lines, each of them filtered on a value set by the correpsonding slider variable.

Note that we plot the first line using plot, and the second line using lines.

The second approach uses ggplot within the manipulate context:

manipulate(
ggplot(subset(h,h$car==Car_1|car==Car_2)) +
geom_line(aes(y=lapTime,x=lap,group=car,col=car)) +
scale_colour_gradient(breaks=c(Car_1,Car_2),labels=c(Car_1,Car_2)),
Car_1=slider(1,25),Car_2=slider(1,25)
)

In this case, rather than explicitly adding additional line layers, we use the group setting to force the display of lines by group value. The initial ggplot command sets the context, and filters the complete set of timing data down to the timing data associated with at most two cars.

We can add a title to the plot using:

manipulate(
ggplot(subset(h,h$car==Car_1|car==Car_2)) +
geom_line(aes(y=lapTime,x=lap,group=car,col=car)) +
scale_colour_gradient(breaks=c(Car_1,Car_2),labels=c(Car_1,Car_2)) +
opts(title=paste("F1 2011 Hungary: Laptimes for car",Car_1,'and car',Car_2)),
Car_1=slider(1,25),Car_2=slider(1,25)
)

My reading of the manipulate function is that if you make a change to one of the interactive components, the variable values are captured and then passed to the R command sequences, which then executes as normal. (I may be wrong in this assumption of course!) Which is to say: if you write a series of chained R commands, and can abstract out one or more variable values to the start of the sequence, then you can create corresponding interactive UI controls to set those variable values by placing the command series with the manipulate() context.

Written by Tony Hirst

August 5, 2011 at 1:05 pm

Posted in Anything you want

Tagged with , , , , ,

Slides from OU Rise Library Analytics Workshop: Rambling about Visualisation

with 2 comments

For what it’s worth, slides from my presentation yesterday… As ever, they’re largely pointless without commentary…

… and even with the commentary, it was all a bit more garbled than usual (I forgot to breathe, had no real idea in my own mind what I wanted to say, etc etc…)

On reflection, here’s what I took from thinking back about what I should have tried to say:

- my assumption is that folk who are interested in asking data related questions should feel as if they can actually work with the data itself (direct data manipulation); I appreciate this is already way off the mark for some people who want someone else to work the data and then just read reports about it – but then that means you can’t ask or discover your own questions about the data, just read answers (maybe) to questions that someone else has asked, presented in a way they decided;

- you need to feel confident in working with data files – or at least, you need to be prepared to have a go at working with data files! (Bear in mind that many of the blog posts I write are write ups – of a sort – of how to do something I didn’t know how to do a couple of hours before… The web usually has answers to most of the questions that I come up against – and if I can’t find the answers, I can often request them via things like Twitter or Stack Overflow…) This can range from using command line tools, to using applications that let you take data in using one format and getting it out as another);

- different tools do different things; if you can get a dataset into a tool in the right way, it may be able to do magical things very very easily indeed…

- three tools that can do a lot without you having to know a lot (though you may have to follow a tutorial or two to pick up the method/recipe….or at least recognise a picture you like and a dataset whose shape you can replicate using your own data, and then the ability to see which bits you need to cut and paste into the command line…):

-=- Gephi: great for plotting networks and graphs. It can also be appropriated to draw line charts (if you can work out how to ‘join the dots’ in the data file by turning the line into a set of points connected by edges) or scatter plots (just load in nodes – no edges connecting them – and lay it out using Gephi’s geolayout tool which also lets you plot “rectilinear” plots based on x and y axis values; (I haven’t worked out a reliable way of working with CSV in Gephi – yet…); it’s amazing what you can describe as a graph when you put your mind to it…

-=- gnuplot: command line tool for plotting scatter plots and line graphs (eg from time series) using data stored in simple text file (e.g. TSV or CSV)

-=- R (and ggplot if you’re feeling adventurous and want :pretty”, nicely designed graphs out); another command line tool (I find R-Studio helps) that again loads in data from a CSV file; R can generate statistical graphs very easily from the command line (it does the stats calculations for you given the raw data).

- Visual analytics/graphical data analysis is a process – you tease out questions and answers through directly manipulating the data and engaging with it in a visual way;

- when you see a visualisation you like, look at it closely: what do you see? Spending five mins or so looking at a Gestalt psychology/visual perception tutorial will give you all sorts of tricks and tips for how to construct visualisations so that structure your eye can detect will jump out at you;

- I think I may have confused folk talking about “dimensions”: what I meant what, how many columns could you represent in a given visulisation at the same time, if each data point corresponds to a single row in a data set. So for example, if you have an x-y plot (2 dimensions), with different symbols (1 dimension) available for plotting the points, as well as different colours (1 dimension) and different possible size (1 dimension) for each symbol, along with a label (1 dimension) for each point, and maybe control over the size (1 dimension), colour (1 dimension) and even font (1 dimension) applied to the label, you might find you can actually plot quite a few columns/dimensions for each data point on your chart… Whether or not you can actually decipher it is another matter of course! My Gephi charts generally have 2 explicit dimensions (node size and colour), as well as making use of two spatial dimensions (x, y) to lay out points that are in some sense “close” to each other in network space. It’s worth remembering though, that if you’re using a tool to engage in a conversation with a dataset as you try to get it to tell its story to you, it may not matter that the visualisation looks a mess to anyone else (a bit like an involved conversation may not make sense if someone else suddenly tries to join it). (Presentation graphics, on the other hand, are usually designed to communicate something that the data is trying to say to another person in a very explicit way.)

- working with data is a tactile thing… you have to be prepared to get your hands dirty…

Written by Tony Hirst

July 5, 2011 at 1:47 pm

Posted in Data, Presentation

OU Related Courses Network Visualisation Using Protovis and Open University Open Data

with 2 comments

This is something I’ve been meaning to do for ages, so spurred on by Martin Hawksey’s wonderful Google Gadgets port of my ad hoc Twitter network visulisation thing using Protovis (which Martin points out doesn’t work with IE9), I finally got round to it today: a wiring up of the OU modules Linked Data to the protovis app:

The data is pulled in from the OU Linked Data endpoint via Sparqlproxy (which provides a JSON output from the query that I can pull directly into the web page).

The query I’m using looks for courses related to the course of interest, and the courses related to those courses:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#&gt;
select distinct ?name1 ?code2 ?name2 ?code3 ?name3 from <http://data.open.ac.uk/context/course&gt; where {
?x a <http://purl.org/vocab/aiiso/schema#Module&gt;.
?x <http://data.open.ac.uk/saou/ontology#courseLevel&gt; <http://data.open.ac.uk/saou/ontology#undergraduate&gt;.
?x <http://courseware.rkbexplorer.com/ontologies/courseware#has-title&gt; ?name1.
?x <http://purl.org/goodrelations/v1#isSimilarTo&gt; ?z.
?z <http://courseware.rkbexplorer.com/ontologies/courseware#has-title&gt; ?name2.
?x <http://purl.org/vocab/aiiso/schema#code&gt; 'T215'^^xsd:string.
?z <http://purl.org/vocab/aiiso/schema#code&gt; ?code2.
?z <http://purl.org/goodrelations/v1#isSimilarTo&gt; ?zz.
?zz <http://courseware.rkbexplorer.com/ontologies/courseware#has-title&gt; ?name3.
?zz <http://purl.org/vocab/aiiso/schema#code&gt; ?code3.
} LIMIT 100

(The endpoint is data.open.ac.uk/query; the explicit ‘T215′ course code identifier is paramterised in the URI that runs the query through Sparqlproxy.)

There’s all sorts of opportunities for coloring the nodes (eg to distinguish between the focal point course, it’s direct neighbours, and the neighbours of those neighbors) but that’s an exercise for another day. I should probably have a go at labeling them sensibly too…

(The ability to drag nodes around within the graph has also been added (back) – Martin noticed the order of a couple of the Protovis commands influenced whether this worked or not. Being able to relayout the chart reminds me how rubbish the force layout algorithm Protovis uses actually is!)

Drawing on Martin’s work (i.e. directly pinching his Google Gadget definition!) I also created a widget/gadget (XML) that lets you view the network of courses around a course in your own page…

Here’s the config page:

Of course, this being a WordPress.com hosted blog, I donlt think I can directly embed the gadget to prove that it works…

Related:
- data.open.ac.uk Linked Data Now Exposing Module Information
- Getting Started With data.open.ac.uk Course Linked Data
- Open University Undergraduate Module Map

PS to do – a reimagining of this, probably using arbor.js, where we just do the direct neigbours of a course code, but allow nodes to be clickable so that additional nodes and edges can be added to the graph dynamically… It might also be interesting to support search by keywords, and display courses that match keywords (in one colour) as well as related courses (in another), along with edges showing which courses are related…?

Written by Tony Hirst

June 8, 2011 at 4:49 pm

Posted in OU2.0, Tinkering

Tagged with

Google Visualisation API Controls Support Interactive Data Queries Within a Web Page

with 3 comments

The only way I can keep up with updates to Google warez at the moment is to feed off tips, tricks and noticings shared by @mhawksey. Yesterday, Martin pointed put to me a couple of new controls offered by the Google visualization API – interactive dashboard controls (documentation), and an in-page chart editor.

What the interactive components let you do is download a dataset from a Google spreadsheet and then dynamically filter the data within the page.

So for example, over on the F1Datajunkie blog I’ve been posting links to spreadsheets containing timing data from recent Formula One races. What I can now do is run a query on one of the spreadsheets to pull down particular data elements into the web page, and then filter the results within the page using a dynamic control. An example should make that clear (unfortunately, I can’t embed a live demo in this hosted WordPress blog page:-(

I’ve posted a copy of the code used to generate that example as gist here: Google Dynamic Chart control, feeding off Google Spreadsheet/visualisation API query

Here’s the key code snippet – the ControlWrapper populates the control using the unique data elements found in a specified column (by label) within the downloaded dataset, and is then bound to a chart type which updates when the control is changed:

  var data = response.getDataTable();
  var namePicker = new google.visualization.ControlWrapper({
    'controlType': 'CategoryFilter',
    'containerId': 'filter_div',
    'options': {
      'filterColumnLabel': 'driver',
      'ui': {
        'labelStacking': 'vertical',
        'allowTyping': false,
        'allowMultiple': false    
      }
    }
  });

  var laptimeChart = new google.visualization.ChartWrapper({
    'chartType': 'LineChart',
    'containerId': 'chart_div',
    'options': {
      'width': 800,
      'height': 800
    }
  });
  
  var dashboard = new google.visualization.Dashboard(document.getElementById('dashboard_div')).
    bind(namePicker, laptimeChart).
    draw(data)

As well a drop down lists, there is a number range slider control which can be used to set minimum and maximum values of numerical filter, and a string filter that lets you filter data within a column using a particular term (it doesn’t seem to support Boolean search operators though…) Read more about the controls here: Google visualisation API chart controls

Something else I hadn’t noticed before: sort events applied to tables can also be used to trigger the sorting of data within a chart, which means you can offer interactions akin to some of those found on Many Eyes.

Whilst looking through the Google APIs interactive playground, I also noticed a couple of other in-page data shaping tools that I hadn’t noticed before: group and join

Group, which lets you group rows in a table and present and aggregated view of them:

That is, if you have data loaded into a datatable in a web page, you can locally produce summary reports based on that data using the supported group operation?

There’s also a join operation that allows you to merge data from two datatables where there is a commmon column (or at least, common entries in a given column) between the two tables:

What the join command means is that you can merge data from separate queries onto one or more Google spreadsheets within the page.

With all these programming components in place, it means that Google visulisation API support is now comprehensive to do all sorts of interactive visualisations within the page (I’m not sure of any other libraries that offer quite so many tools for wrangling data in the page? (The YUI datatable supports sorting and filtering, but I think that’s about it for data manipulation?)

I guess it also means that you can start to treat a web page as a database containing one or more datatables within it, along with tool support/function calls that allow you to work that database and display the results in a variety of visual ways?! And more than that, you can use interactive graphical components to construct dynamic queries onto the data in a visual way?!

PS here are a couple of other ways of using a Google spreadsheet as a database:
- Using Google Spreadsheets as a Database with the Google Visualisation API Query Language
- Using Google Spreadsheets Like a Database – The QUERY Formula

Written by Tony Hirst

June 6, 2011 at 2:33 pm

Quick Summary of Second and Third Sessions of “Visualisation and Presentation in Statistics”

with 2 comments

Kevin McConway (
http://statistics.open.ac.uk/People/k.j.mcconway
@kjm2 ): showing off some gratuitous use of numbers to illustrate Guardian stories #ouvpstats
Where do surveys reported in the press come from? ONS, market research companies. PR companies…… #ouvpstats
Get paid to do a (PR?) survey onepoll.com and youngpoll.com #ouvpstats
Not PR commissioned polls, err, maybe, err, hmmm….
http://72point.com/
#ouvpstats
Why are there numbers in the news? PR, Entertainment, eyecandy. Special status of “number facts” #ouvpn
Mary Poovey “A History of the Modern Fact”
http://www.press.uchicago.edu/ucp/books/book/chicago/H/bo3614698.html
#ouvpn
Need to distinguish between facts, analysis and narrative… #ouvpstats
What’s wrong with PR stats? ’tis the road to cynicism, or looking good rather than communicating well #ouvpstats
So what can we do about it? Statisticians need to engage with the public and work with journalists #ouvpstats
Statisticians’ view of journalists: innumerate, distort and oversimplify, don’t understand quantitative reasoniong, won’t listen #ouvpstats
Journalists’ view of statisticians: illiterate pedantic, boring, focus on ifs and buts, won’t listen #ouvpstats
Journalists work to tight timescales, have a view of “newsworrthiness”, are good storytellers #ouvpstats

Martin Bland ( https://hsciweb.york.ac.uk/research/public/Staff.aspx?ID=129 )
From papers during one issue from 1972 and 2010 Lancet and BMJ, mean population size has gone up 2-3 orders of magniture (tens to thousands+
Description of stats: very cursory, 2010: far more comprehensive statistical method reported. Shift from significance testing to estimation
Move towards evidence-based medicine starting around 1990s (bound to includes statistics)
“Why do we need some large, simple randomized trials?” Yusuf et al. 1984
Move to confidence intervals not p-values Gardner & Altman
http://www.bmj.com/content/292/6522/746.abstract

Journals started to introduce systematic requires and statistical referees
Consort guidelines for stats in randomised medical trials
http://www.consort-statement.org/

Statisticians should point out where wrong conclusions have been drawn as a results of stats mistakes…

Rosemary Bailey
http://www.maths.qmul.ac.uk/~rab/

Problems with box and whisker plots (referred to as box and aerial/antenna plot?), which are now popular in medicine, biology, engineering (not least becuase folk don’t know what the whisker means). Antenna doesn’t take into account variability across conditions. [My naive understanding of these diagrams is that they are trying to say something different? But my knowledge is so hazy I can't argue for what I do think they describe!]
Hasse diagrams – cords, dyes and constants(?) [I'm a bit lost at this point...]

Michel van de Velden
http://www.erim.eur.nl/ERIM/People/Person_Details?p_aff_id=799

Perceptual maps – mutltivariate methods for plotting high-dimensional data
Exploit natural spatial recognition/visual abilities
Examples: Tufte 1983 cleveland and McGill 1987, Wainer 2005
Caption should convey enough info to allow reader in possession of data (and appropriate tools) to recreate the perceptual map
Shape paramter (aspect ratio) – ratio of x scale to y scale. If it can be 1, it should be… (changes aspect ratio of photo of Kate Middleton to make the point about distortion if not 1 when it could/should be…)
If perception of map relies in part on angle of point/line, need to know where the origin is.
Excel charts – hard to explicity set an exact aspect ratio (same with many tools?)
Perceptual maps may require guidance as to how to read a map – e.g. icons
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1572196

[Me]

Jill Leyland, Vice President, Royal Statistical Society
Lots of folk think UK official statistics are not free of politcal interference, nor do they necessarily trust(?) them, scores very poorly compared to rest of Europe.
National Stats have high integrity and free of political interference. Perception of political interference is one reason why low degree of trust. UKSA (UK Statistics Authority) scrutinises official statistics: “promoting and safeguarding the porduction and publication of statistics that serve the public good”
No politicial interference, but: many key stats produced in depts, UKSA role not fully understood (scrutineer as well as publisher); pre-release access – Ministers can see statistics 24 hrs before they are released (up to 5 days in Scotland and Wales), and suspicion that Ministers may use this time for mischief…
Role of media – UK media are interested in statistics, but “stats are wrong” stories get more covereage than “stats are right”, and journalists often don’t understand statistical issues (as well as tight deadline, no specialist knowledge). BUT official statisticians could do better; ONS website a joke… (though new one due to launch at end of August). Far too little interaction with stats users outside government.
What can be done? Continuing efforts to improve presentation; need to differerntiate between independent national statistics and those produced by departments. Better education for journalists [and statisticians eg ito communications?]; reduction/elimination of pre-release access.

Written by Tony Hirst

May 18, 2011 at 4:26 pm

Posted in Anything you want

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 428 other followers