Archive for the ‘digital storytelling’ Category
Over on F1DataJunkie, 2011 Season Review Doodles…
Things have been a little quiet, post wise here, of late, in part because of the holiday season… but I have been posting notes on a couple of charts in progress over on the F1DataJunkie blog. Here are links to the posts in chronological order – they capture the evolution of the chart design(s) to date:
- F1 2011 Progress Throughout the Year
- F1 2011 Review – Another Look at Fastest Laptime Evolution
- F1 2011 Review – Qualifying Progress
- F1 2011 Review – Grid/Final Classification Deltas
- F1 2011 Review – Grid vs FInal Classification, Redux
- F1 2011 Review – Driver and Race Position Charts
You can find a copy of the data I used to create the charts here: F1 2011 Year in Review spreadsheet.
I used R to generate the charts (scripts are provided and/or linked to from the posts, or included in the comments – I’ll tidy them and pop them into a proper Github repository if/when I get a chance), loading the data in to RStudio using this sort of call:
require(RCurl)
gsqAPI = function(key,query,gid=0){ return( read.csv( paste( sep="",'http://spreadsheets.google.com/tq?', 'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', curlEscape(gid) ), na.strings = "null" ) ) }
key='0AmbQbL4Lrd61dEd0S1FqN2tDbTlnX0o4STFkNkc0NGc'
sheet=4
qualiResults2011=gsqAPI(key,'select *',sheet)
If any other folk out there are interested in using R to wrangle with F1 data, either from 2011 or looking forward to 2012, let me know and maybe we could get a script collection going on Github:-)
Getting My Eye In Around F1 Quali Data – Parallel Coordinate Plots, Sort of…
Looking over the sector times from the qualifying session for tomorrow’s Hungarian Grand Prix, I noticed that Vettel was only fastest in one of the sectors.
Whilst looking for an easy way of shaping an R data frame so that I could plot categorical values sector1, sector2, sector3 on the x-axis, and then a line for each driver showing their time in the sector on the y-axis (I still haven’t worked out how to do that? Any hints? Add them to the comments please…;-), I came across a variant of the parallel coordinate plot hidden away in the lattice package:
What this plot does is for each row (i.e. each driver) take values from separate columns (i.e. times from each sector), normalise them, and then plot lines between the normalised value, one “axis” per column; each row defines a separate category.
The normalisation obviously hides the magnitude of the differences between the time deltas in each sector (the min-max range might be hundredths in one sector, tenths in another), but this plot does show us that there are different groupings of cars – there is clear(?!) white space in the diagram:
Whilst the parallel co-ordinate plot helps identify groupings of cars, and shows where they may have similar performance, it isn’t so good at helping us get an idea of which sector had most impact on the final lap time. For this, I think we need to have a single axis in seconds showing the delta from the fastest time in the sector. That is, we should have a parallel plot where the parallel axes have the same scale, but in terms of sector time, a floating origin (so e.g. the origin for one sector might be 28.6s and for another, 22.4s). For convenience, I’d also like to see the deltas shown on the y-axis, and the categorical ranges arranged on the x-axis (in contrast to the diagrams above, where the different ranges appear along the y-axis).
PS I also wonder to what extent we can identify signatures for the different teams? Eg the fifth and sixth slowest cars in sector 1 have the same signature across all three sectors and come from the same team; and the third and fourth slowest cars in sector 2 have a very similar signature (and again, represent the same team).
Where else might we look for signatures? In the speed traps maybe? Here’s what the parallel plot for the speed traps looks like:
(In this case, max is better = faster speed.)
To combine the views (timings and speed), we might use a formulation of the flavour:
parallel(~data.frame(a$sector1,a$sector2,a$sector3, -a$inter1,-a$inter2,-a$finish,-a$trap))
This is a bit too cluttered to pull much out of though? I wonder if changing the order of parallel axes might help, e.g. by trying to come up with an order than minimises the number of crossed lines?
And if we colour lines by team, can we see any characteristics?
Using a dashed, rather than solid, line makes the chart a little easier to read (more white space). Using a thinking line also helps bring out the colours.
parallel(~data.frame(a$sector1,-a$inter1,-a$inter2,a$sector2,a$sector3, -a$finish,-a$trap),col=a$team,lty=2,lwd=2)
Here’s another ordering of the axes:
Here are the sector times ordered by team (min is better):
Here are the speeds by team (max is better):
Again, we can reorder this to try to make it easier(?!) to pull out team signatures:
(I wonder – would it make sense to try to order these based on similarity eg derived from a circuit guide?)
Hmmm… I need to ponder this…
Playing With R/ggplot2 Online (err, I think..?!)
Trying to get my head round what to talk about in another couple of presentations – an online viz tools presentation for the JISC activity data synthesis project tomorrow, and an OU workshop around the iChart eSTeEM project – I rediscovered an app that I’d completely forgotten about: an online R server that supports the plotting of charts using the ggplot library (err, I think?!): http://www.yeroon.net/ggplot2/
Example of how to use http://www.yeroon.net/ggplot2/
By the by, I have started trying to get my head round R using RStudio, but the online ggplot2 environment masks the stats commands and just focusses on helping you create quick charts. I randomly uploaded one of my F1 timing data files from the British Grand Prix, had a random click around, and in 8(?) clicks – from uploading the file, to rendering the chart – I’d managed to create this:
What it shows is a scatterplot for each car showing the time on the current leader lap that the leader is ahead. When the plotted points drop from 100 or so seconds behind to just a few seconds behind, that car has been lapped.
What this chart shows (which I stumbled across just by playing with the environment) is a birds-eye view over the whole of the race, from each driver’s point of view. One thing I don’t make much use of is the colour dimension – or the size of each plotted point – but if tweak the input file to include the number of laps a car is behind the leader, their race position, the number of pitstops they’ve had, or their current tyre selection, I could easily view a couple more of these dimensions.
Where there’s a jump in the plotted points for a lap or two, if the step/break goes above the trend line (the gap to leader increases by 20s or so), the leader has lapped before the car. If the jump goes below the trend line (the gap to the leader has decreased), the leader has pitted before the car in question.
But that’s not really the point; what is the point is that here is a solution (and I think mirroring options are a possibility) for hosting within an institution an interactive chart generator. I also wonder to what extent it would be possible to extend the environment to detect single sign on credentials and allow a student to access a set of files related to a particular course, for example? Alternatively, it looks as if there is support for loading files in from Google Docs, so would it be possible to use this environment as a way of providing a graphing environment for data files stored (and maybe shared via a course) within a student’s Google Apps account?
On the Public Understanding of – and Public Engagement With – Statistics: Reflections on the OU Statistics Group Conference on “Visualisation and Presentation in Statistics”
Last week I attended the OU Statistics conference on Visualisation and Presentation in Statistics (VIPS) (notes: here and here)
One of the things that struck me from conversations and some of the presentations was that statistics – and in particular public engagement around statistics – appears to be lagging science efforts in this area.
When I first moved to the OU as a lecturer a dozen or so years ago, I got involved with various activities that, at the time, were classed as “public understanding of science and technology”, though at the time the whole sci-comm area was in a state of flux and ideas were moving towards a focus on public engagement with science. As a member of the NESTA Crucible one year, I saw how there was also concern around engagement with science and technology policy, and how it could be moved “upstream”, to a point where dialogue with various publics could actually contribute to, and even influence, policy development.
(The NESTA Crucible experience significantly influenced my world view and was one of the most rewarding schemes I have ever been involved with…)
Since then, it seems to me that the school science curriculum has witnessed a similar change, with a move away from a focus purely on the basic science (and perhaps industrial applications?) to one that includes a consideration of socio-technical considerations (one might say, policy implications…)
At the VIPS event, one of the phrases that jumped out at me in at least one presentation (aside from repeated mentions to RSS…;-) talked about difficulties in promoting the public understanding of statistics. Ally this with the fact that the school maths curriculum seems not to have evolved so much, (“averages”, means and histogram still seem to be the focus?!) and I wonder: is statistics today where science was a decade or so ago?
The recent rhetoric around – and actual release of – “open public data” suggests that, as citizens and journalists, there is an increasing number of opportunities to hold governments and public bodies to account using evidentiary data and maybe also engage in data-driven (or at least data informed) policy formulation. With so much data out there, and so many possible ways of combining and interrogating it – so many possible different questions to ask and places to ask them – there are increasingly opportunities for informed amateurs to make a very real contribution (in the same way that amateur astronomers can make a real contribution to the recording and analysis of astronomical observations).
The growing instrumentation of our world also means that there is increasing amounts of data about ourselves that we can have access to in the form of personal data dashboards (for example, think of various social media/reputation tools, but also expect to see various tools appearing that allow you to mine your health/fitness, financial or shopping transaction data, for example). These dashboards will be visually rich, and designed to give at-a-glance overviews of the state of this, or that quantity or metric. But to get most from them, we will need to include more complex and powerful visualisation types, and find a way of helping people learn how to “see” them, “read” them and interpret them/
So to what extent do we need to engage with the “public understanding of statistics” as compared to the development of skills in the public appreciation of statistics and improvements in the way the public can engage with each other and with policy makers in discussions where statistics play a role? (Public engagement in statistics? Public engagement with statistics?)
Over the last few weeks, I’ve started trying to immerse myself in the world of statistical graphics, on the basis that our perceptual apparatus is pretty good at pattern detection and can help us get to grip with visually meaningful properties of distributions of data without us necessarily having to understand much in the way of formal statistics. (Of course, the visual apparatus can also be conned by misleading graphs and charts, which is where some semblance of critical understanding and, dare I say it, statistical literacy, comes in.)
My intuition is that it will be easier to develop a visual literacy in the reading and interpretation of charts (i.e. building on “folk statistical graphics/visual statistics”) than a widespread mathematical understanding of statistics. (I suspect that for most people, pie charts – and more recently ‘donut’ charts – as well as line graphs and simple bar charts are about the limit of what they are comfortable with, along with thematic maps (in particular, choropleth maps) and (in recent years again?) proportional symbol maps. I also know from asking even well informed audiences that awareness of more recently developed techniques, such as treemaps, are not widespread.)
At the moment, the infographics designers appear to be leading the charge into public consciousness of data-driven graphics, but as I’m finding out, the stats community has a wealth of visual techniques already to hand that are maybe “sounder” in terms of deriving visual representations that reflect statistical properties and concerns than the tricks the infographics crowd are using. (This is all just my anecdotal opinion, and not based in any formal research!)
Many infographics build on a common visual grammar (in the West, line charts up to the right increase over time; for area based charts, the bigger the area the more of something is being represented). But many infographics are also limited by the chart types we are all familiar with (line charts, bar charts, coloured maps…) Maybe the place to start is the stats community finding ways of introducing new-to-the-majority statistical graphs into the mainstream media along with a strong narrative to explain what is going on in those charts (and not necessarily so much discussion about the actual maths and stats…)?
My Presentation at OU Statistics Conference – Visualisation Tools for the Rest of Us
Slides from my presentation to the OU Visualisation and Presentation in Statistics earlier today… will update this post with notes and links as an when I get round to it! In the meantime, you’ll have to use Google…(though other search engines are available). (Slodes via Slideshare)
Should Academic Journal Papers Have Video Trailers?
I don’t read academic journal papers very much any more, partly because folk rarely link to them, but today I read a paper (“Narrative Visualization: Telling Stories with Data”, Edward Segel, Jeffrey Heer, IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis), 2010) in response to this video trail that brought it to my attention (Journalism in the Age of Data, Ch. 3: Telling “Data Stories”):
I encourage you to watch the video – not necessarily for what it’s about, but for the way that a journal article is used to hold bits of the video together. Note that the video is not just about the paper, but it’s not hard to see how a video could be made that was just about the paper…
So I wonder: should we be making voiced over “papercasts” of academic papers to provide a quick summary of what they contain, and maybe also enriching them with photos and footage relating to what the content of the paper is about? (I know this might not make sense for the subject matter of every paper, but if a journal paper is about a particular online tool, for example, here would be an opportunity to show a few seconds of the tool in use, and contextualise it/demonstrate it a little more interestingly than a single, simple screenshot can convey?)
UPDATE: @der_no tweets: “Always enjoyed technical papers preview @ #SIGGRAPH (esp considering many of actual papers are beyond me)” See an example conference papers trailer here – SIGGRAPH 2010 : Technical Papers Trailer:
If the conference matter is appropriate (robotics related conferences come to my mind, for example), couldn’t this sort of approach provide an additional legacy resource that can continue to give an event life after the fact?
PS I believe that several of the OpenLearn folk are also looking at ways of pulling together video and audio in the way they package their material, for example looking at the use of Xtranormal videos, or Slideshare slidecasts. (Note that it’s easy (or used to be!) to publish Xtranormal clips into Youtube, and Youtube clips can also be embedded in Slideshare presentations, so all manner of fusions of content become possible!)
PPS Very, very loosely related to the above is another thread I want to link in to, here. That is, the extent to which academics might take up various sorts of (“new”) media training to explore different ways of engaging with (and maybe helping reinvent?) scientific communication. For example, a recent initiative in the OU has seen more than a few brave academic volunteers engaging in podcast training as part of Martin’s Podstars project (I couldn’t find a better link?!).
Running parallel to this, the OBU’s media training team have been helping other academics put together short showreels that have since been published on the OU podcast site – OU Experts:
In terms of finding training materials that are already out there, it struck me that the BBC College of Journalism might be a good start, particularly in the skills area?
Multi-Dimensional and Multiple-Perspective Storytelling
In Reversible, Reverse History and Side-by-Side Storytelling I linked to an example of side-by-side video storie where two videos are played next to each other. @cogdog then shared a link in the comments to the HBO video cube which adds a whole other dimension!
(I seem to remember the BBC experimenting with similar forms in the past – e.g. co-broadcasting different perspectives of the same story on different channels, or even different media (such as radio and television [anyone got an links to write-ups of those?).
Anyway, today, via GigaOm, I came across something related - a virtual choir:
For some reason, that video also reminded me of this?!
Anyway, it struck me that backchannel commentary is another approach that demonstrates this mutliple perspective idea. Which is why I think that the Twitter video captions idea has a long way to run yet. (By the by, Martin Hawksey has a done a great job posting a version of Gordon Brown’s Building Britain’s Digital Future announcement with twitter subtitles. If anyone wants to volunteer or help fund us to develop this app a little further, we have the ideas but just need to secure the developer time…:-)













