The Chart Equivalent of Comic Sans..?

Whilst looking at the apparently conflicting results from a couple of recent polls by YouGov on press regulation (reviewed in a piece by me over on OpenLearn: Two can play at that game: When polls collide in support of a package on the OU/BBC co-produced Radio 4 programme, More Or Less), my eye was also drawn to the different ways in which the survey results were presented graphically.

The polls were commissioned by The Sun newspaper on the one hand, and the Media Standards Trust/Hacked Off on the other. If you look at the poll data (The Sun/YouGov [PDF] and Media Standards Trust/YouGov [PDF] respectively), you’ll see that it’s reported in a standard format. (I couldn’t find actual data releases, but the survey reports look as if they are generated in a templated way, so getting the core of a generic scraper together for them shouldn’t be too difficult…) But how was that represented to readers (text based headlines and commentary aside?

Here are a couple of grabs from the Sun’s story (State-run watchdog ‘will gag free press’):

Pie-charts in 3D, with a tilt… gorgeous… erm, not… And the colour choice for the bar chart inner-column text is a bit low on contrast compared to the background, isn’t it?

It looks a bit like the writer took a photo of the print edition of the story on their phone, uploaded it and popped it into the story, doesn’t it?

I guess credit should be given for keeping the risk responses separate in the second image, when they could have just gone for the headline figures as pulled out in the YouGov report:

So what I’m wondering now is the extent to which a chart’s “theme” or style reflects the authority or formal weight we might ascribe to it, in much the same way as different fonts carry different associations? Anyone remember the slating that CERN got for using Comic Sans in their Higgs-Boson discovery announcement (eg here, here or here)?

Things could hardly have been more critical if they had used CrappyGraphs or an XKCD style chart generator (as for example described in Style your R charts like the Economist, Tableau … or XKCD ; or alternatively, XKCD-style for matplotlib).

XKCD - Science It works [XKCD]

Oh, hang on a minute, it almost looks like they did!

Anyway – back to the polls. The Media Standards Trust reported on their poll using charts that had a more formal look about them:

The chart annotations are also rather clearer to read.

So what, if anything, do we learn from this? That maybe you need to think about chart style, in the same way you might consider your font selection. From the R charts like the Economist, Tableau … or XKCD post, we also see that some of the different applications we might use to generate charts have their own very distinctive, and recognisable, style (as do many Javascript charting libraries). A question therefore arises about the extent to which you should try to come up with your own distinctive (but still clear) style that fits the tone of your communication, as well as its context and in sympathy with any necessary branding or house styling.

PS with respect to the Sun’s copyright/syndication notice, and my use of the images above:

I haven’t approached the copyright holders seeking permission to reproduce the charts here, but I would argue that this piece is just working up to being research into the way numerical data is reported, as well as hinting at criticism and review. So there…

PPS As far as bad charts go, they may also be, misrepresentations and underhand attempts at persuasion, graphic style, are also possible, as SimplyStatistics describes: “The statisticians at Fox News use classic and novel graphical techniques to lead with data” [ The statisticians at Fox News use classic and novel graphical techniques to lead with data ] See also: OpenLearn – Cheating with Charts.

Narrative Charts Tell the Tale…

A couple of days ago, I got a message from @fantasticlfe asking if I’d done any tinkerings around what turned out to be “narrative charts”. I kept misapprehending what he was after (something to do with continuity?!;-), so here’s a summary of various graphical devices for looking at narrative texts that we passed back and forth, along with some we didn’t..

A Sankey diagram typically uses variable thickness lines to show flow between different elements in a system. (For this reason it’s often used to show energy flows throuygh a system, though it can also be used to good effect to show money flows.) The chart Michael linked to comes from xkcd:

xkcd narrative chart

In this chart, we have time along the horizontal x-axis. The y-axis is ambiguous (some sort of nominal ordering?) and the line thickness appears to represent army size.

To a certain extent, this diagram is reminiscent of Minard’s famous chart…

(See also What Makes a Minard? for some contemporary Minard diagrams. Is code available, I wonder?)

However, in the case of Minard’s chart (which I personally don’t like at all!), the x-y and co-ordinates represent map co-ordinates – the thick lines aren’t thick lines in a line chart (which a glanced “up and to the right” view might make you assume), they’re flow lines across a map.

I got distracted for a while by the Sankey aspect, and dug around my own bits of code. For example, Generating Sankey Diagrams from rCharts, an rCharts wrapper for the d3.js Sankey diagram. Michael was particularly interested in being able to group lines vertically (though I wasn’t sure what the y-axis would actually correspond to: some loose function of “location”, maybe as a categorical variable? Time was definitely to be on the horizontal x-axis); a posting on Stack Overflow (d3 sankey charts – manually position node along x axis) seemed likely to be able to help with that.

I then started going off on one…

Would a variant of nltk style lexical dispersion plots help, using characters rather than word categories? That would show when a character was in scene, but not much else?

lexical dispersion

How about sentence drawing, in which we show “turns” taken by different speakers?

sentence drawing

This shows something, but again, not relevant…

Nor are Kurt Vonnegut’s shapes-of-stories diagrams that plot some sort of emotional state on y and time on x:

Hmmm… Michael wanted to be able to look at scenes on x and presumably some function of location on y. Hmm… why? And how might we actually order those axes? Scenes occur in order in a film or play, but scene is a ranked, ordinal value. That said, scenes also have duration in terms of screentime, which may or may not be the same as the “interval” that the scene portrays in terms of the world it represents (this must have a name? eg a 20 second screen time scene shows a plane flying and this represents x hours in the story). The scene may also have a ‘calendar time’ associated with it in the story – so where you have a flashback scene this corresponds to a previous calendar time in the represented world. Did Michael want any of these dimensions capturing?

Related to shapes of stories, here’s how someone analysed several thousand plots: Examining the arc of 100,000 stories: a tidy analysis.

And then there’s location… how should these be represented? Locations are a distance apart and, perhaps more importantly from a continuity point of view, a travel time apart; as well as maybe a timezone difference apart. Did that need capturing in any way? Ordering axes for this could be quite hard if we wanted close things in space (distance? travel time?) to be close together on a single axis (A is 10 minutes from B and C, B is ten minutes from C: how do you show that intransitive relation on a single dimension? [Maybe relevant? Storygraph: Extracting patterns from spatio-temporal data, A Shrestha et al., Advances in Visual Computing.] Hmm… If we can capture distance between locations, and some sensible notion of time relating to scenes, could we maybe use line thickness to show that a person has lots of time to move between one (time, location) and another, as compared to scenetime? Do filmwriters have tools to support this? Do the police…?! Is the Mythology Engine relevant?

How about thinking about it as a graph? I’ve used Gephi before as a foil for getting me to think about ordered series as connected events in a graph – for example, Visualising F1 Timing Sheet Data. If we encode scene number as the x-coordinate and location number as the y-coordinate, with each graph line being the connected series of scenes a particular individual is in, then we can simply use a line chart to connect “individual lines” to different scene and location numbers. We’d also have a couple of extra dimensions to play with – node size and node colour, at each location. We’d also have the opportunity to play with edge (that is, line) colour and edge thickness?

Maybe I need to try to do some demos? But no time for that right now…

How about trying to find some? Here are some discovered via @jamesjefferies:

Here’s a view of connected (by travel between) locations in Game of Thrones:

game of thrones connected places

There’s also an animation of event in Game of Thrones, but I can’t quite figure out how to read it?!

geame of thrones events

Let’s go back to the sort of thing Michael was after – narrative charts..

@imhelenj found a related if cluttered interactive describing the evolution of web tech:

web histroy narrative chart

Then Michael shared a link to Comic Book Narrative Charts, a project for automatically generating xkcd style narrative charts:

xkc narrative chart d3js

Hovering over these charts, I noticed they were interactive d3.js charts. A quick View Source and the code for generating the chart dynamically from a characters file and a narrative file appeared to be there. Which I think is what Michael wanted all along…!

(By the by, the post also describes how the developers started thinking about fixing the vertical y-coordinate values. Here’s another example of someone thinking aloud around producing a narrative chart for the Holy Week story.)

Ho hum, an interesting set of detours nonetheless – and it got me thinking about the time-space complexity of a scene based tale that could keep be confused for weeks! :-)

PS this is quite interesting – visualising a process, via Tactical Tech Drawing By Numbers project:

visualise process

PPS some more bits: @r4isstatic points to Some visualisations of stories and narratives, another summary post similar to this one. Also via Paul Rissen, and picking up on whether the police have any interesting actor/event/time/location diagramming techniques, Vispol – An Interactive Scenario Visualization.

Elsewhere, I find Storyline Visualizations, which includes a paper (Design Considerations for Optimizing Storyline Visualizations, Y Tanahashi, and K-L Ma, IEEE Trans on Visualisation and Computer Graphics, 18(12) 2012, pp2679-2688 and some python code.

PPPS Some more… A collection by Stewart McKie of techniques for visualising screenplays: Screenplay Visualization: Concepts and Practice. The posts I wrote on the Digital Worlds game design uncourse blog about narrative structure. Sort of via Scott Wilson, some crime analysis software from xanalys.com (Link Explorer – White Paper) which includes descriptions of an event chart, a transaction chart and an activity timeline:

xanalys event chart

xanalys transaction chart

xanalys activity timeline

Via the comments, this rather lovely animated discourse map:

trinker.github animated discourse map

A Couple of Interesting Interactive Data Storytelling Devices

A couple of interesting devices for trying to engage folk in a data mediated story. First up, a chart that is reminiscent in feel to Hans Rosling’s ignorance test, in which (if you aren’t familiar with it) audiences are asked a range of data-style questions and then demonstrate their ignorance based on their preconceived ideas about what they think the answer to the question is – and which is invariably the wrong answer (with the result that audiences perform worse than random – or as Hans Rosling often puts is, worse than a chimpanzee; by the by, Rosling recently gave a talk at the World Bank, which included a rendition of the ignorance test. Rosling’s dressing down of the audience – who make stats based policy and help spend billions in the areas covered by Rosling’s questions, yet still demonstrated their complete lack of grasp about what the numbers say – is worth watching alone…).

Anyway – the chart comes from the New York Times, in a post entitled You Draw It: How Family Income Predicts Children’s College Chances. A question is posed and the reader is invited to draw the shape of the curve they think describes the relationship between family income and college chances:

nty-youDrawIt

Once you’ve drawn your line and submitted it, you’re told how close your answer is the the actual result:
You_Draw_It__How_Family_Income_Predicts_Children’s_College_Chances_-_The_New_York_Times

Another display demonstrates the general understanding calculated from across all the submissions.

You_Draw_It__How_Family_Income_Predicts_Children’s_College_Chances_-_The_New_York_Times2

Textual explanations also describe the actual relationship, putting it into context and trying to explain the basis of the relationship. As ever, a lovely piece of work, and once again with Amanda Cox in the credits…

The second example comes from Bloomberg, and riffs on the idea of immersive stories to produce a chart that gets updated as you scroll through (I saw this described as “scrollytelling” by @arnicas):

What_s_Really_Warming_the_World__Climate_deniers_blame_natural_factors__NASA_data_proves_otherwise

The piece is about global warming and shows the effect of various causal factors on temperature change, at first separately, and then in additive composite form to show how they explain the observed increase. It’s a nice device for advocacy, methinks…

It also reminds me that I never got round to trying to the Knight Lab Storymap.js with a zoomified/tiled chart image as the basis for a storymap (or should that be, storychart? For other storymappers, see Seven Ways to Create a Storymap). I just paid out the $19 or so for a copy of zoomify to convert large images to tilesets to work with that app, though I guess I really should have tried to hack a solution out with something like Imagemagick (I think that can export tiles?) or Inkscape (which would let me convert a vector image to tiles, I think?). Anyway, I just need a big image to try out now, which I guess I could generate from some F1 data using ggplot?

Data Structure + Narrative Chart = StoryLine?

A couple of years ago, prompted by a query from Michael Smethurst/@fantasticlife (then of the BBC, now of UK Parliament), I put together a post that described several ways for visually exploring the structure of a story or narrative – Narrative Charts Tell the Tale… (see also: From Storymaps to Notebooks).

One of the chart types described was the XKCD inspired narrative chart:

xkcd__Movie_Narrative_Charts

which led to demos (produced by Michael, drawing on a third party library – comic book narrative charts?) such as this one:

Macbeth_-_Macbeth

The data is supplied in two data files: an XML file that identifies the characters, and a JSON file that contains a list of scenes, with each scene comprising a set of characters associated with the scene.

More recently, the chart style was taken up by ABC News in an attempt to untangle a complicated story around a political scandal:

www_abc_net_au_news_2014-08-21_untangling-the-web-how-the-icac-scandal-unfolded_5686346

The code for that demo is available here – Github/abcnews/d3-layout-narrative (also check out the interesting way in which they annotated the source – and described in the post Automating XKCD-Style Narrative Charts.

The code library defines the layout engine, with the data for the graphic contained in a separate JSON file that contains a list of characters and a list of scenes:

icacDataCallback({"characters":{"name":"characters", "elements":[{"id":"EO1", "name":"Eddie Obeid", "bio":"A former member of the New South Wales Parliament, Mr Obeid was a Labor powerbroker who ICAC has previously found used his influence within the party to corruptly further coal mining interests for himself and his family.", "affiliation":"ALP", "investigated":"yes", "imageurl":"http://www.abc.net.au/news/image/5049312-1x1-160x160.jpg", "imagecredit":"AAP: Dean Lewins","rowNumber":1},''']},
"scenes":{"name":"scenes","elements":[{"id":"", "title":"", "plot":"Nick Di Girolamo becomes aware of a deal Australian Water Holdings (AWH) has with Sydney Water to provide and manage water and sewerage pipes in Sydney’s north-west that allows AWH to charge all its costs to Sydney Water. By 2007 Mr Di Girolamo is CEO of AWH and a majority owner of the company. ICAC has heard Mr Di Girolamo embarked on a plan to use the contract with Sydney Water to try transform the organisation into a major infrastructure company. ", "characters":"ND1", "date":"2006", "core-participants":"yes", "eightbyfive":"", "ofarrell":"", "rowNumber":1},...]}})

The list order of scenes appears to define the order in which they appear in the chart.

A more recent chart captures the storylines of all the Start Wars movies – the different coloured threads are perhaps a useful device for highlighting players in a political story, or distinguishing teams or players in a sports based storyline?

www_abc_net_au_news_2015-12-16_star-wars-every-scene_7013826

Again, the structure of the data is based around characters and scenes, with additional metadata elements.

starwarsDataCallback({"characters":{"name":"characters", "elements":[{"id":"R2D", "name":"R2-D2", "bio":"A resourceful astromech droid, R2-D2 served Padmé Amidala, Anakin Skywalker and Luke Skywalker in turn, showing great bravery in rescuing his masters and their friends from many perils. A skilled starship mechanic and fighter pilot's assistant, he formed an unlikely but enduring friendship with the fussy protocol droid C-3PO.", "affiliation":"Light", "initialgroup":"0", "core":"*", "remove":""},...]},
"scenes":{"name":"scenes", "elements":[{"title":"Opening Logos", "plot":"", "episode":"Episode I", "characters":"", "dvaffiliation":"light"},...]}});

The rendering of the charts – from which we can read the story and get an idea of the flow of events – is simply a visual realisation of the way the data is structured an ordered in the data.

Which has got me thinking: could this be a handy way of viewing events detected from something like the F1 timing data? For example, pit stops and accidents are a given in the timing sheets, it’s easy enough to detect when the lead changes, I started exploring things like undercut detection, and so on. The actors are known (the drivers) and event can be sequenced by lap number, or race elapsed time. (Qualifying also presents an opportunity for telling the story using a narrative chart.)

There are two ways to approach this: first, I could just try to create some data files. Second, I wonder if I could text mine some race reports, treat each paragraph as a possible event, extract driver names (and perhaps even event keywords?) from the each paragraph, and then render the race report down as a narrative chart data file? And then start to iterate, improving the race report parser on the one hand, and building story trope generators (and detectors) into the timing sheet analysis in order to generate storylines automatically?

That is, can we use the narrative chart data format as in intermediary representation for picking apart and analysing human generated race reports, and as a target for automated storypoint identification routines?

See also: Notes on Robot Churnalism, Part I – Robot Writers.