Data Structure + Narrative Chart = StoryLine?

A couple of years ago, prompted by a query from Michael Smethurst/@fantasticlife (then of the BBC, now of UK Parliament), I put together a post that described several ways for visually exploring the structure of a story or narrative – Narrative Charts Tell the Tale… (see also: From Storymaps to Notebooks).

One of the chart types described was the XKCD inspired narrative chart:


which led to demos (produced by Michael, drawing on a third party library – comic book narrative charts?) such as this one:


The data is supplied in two data files: an XML file that identifies the characters, and a JSON file that contains a list of scenes, with each scene comprising a set of characters associated with the scene.

More recently, the chart style was taken up by ABC News in an attempt to untangle a complicated story around a political scandal:


The code for that demo is available here – Github/abcnews/d3-layout-narrative (also check out the interesting way in which they annotated the source – and described in the post Automating XKCD-Style Narrative Charts.

The code library defines the layout engine, with the data for the graphic contained in a separate JSON file that contains a list of characters and a list of scenes:

icacDataCallback({"characters":{"name":"characters", "elements":[{"id":"EO1", "name":"Eddie Obeid", "bio":"A former member of the New South Wales Parliament, Mr Obeid was a Labor powerbroker who ICAC has previously found used his influence within the party to corruptly further coal mining interests for himself and his family.", "affiliation":"ALP", "investigated":"yes", "imageurl":"", "imagecredit":"AAP: Dean Lewins","rowNumber":1},''']},
"scenes":{"name":"scenes","elements":[{"id":"", "title":"", "plot":"Nick Di Girolamo becomes aware of a deal Australian Water Holdings (AWH) has with Sydney Water to provide and manage water and sewerage pipes in Sydney’s north-west that allows AWH to charge all its costs to Sydney Water. By 2007 Mr Di Girolamo is CEO of AWH and a majority owner of the company. ICAC has heard Mr Di Girolamo embarked on a plan to use the contract with Sydney Water to try transform the organisation into a major infrastructure company. ", "characters":"ND1", "date":"2006", "core-participants":"yes", "eightbyfive":"", "ofarrell":"", "rowNumber":1},...]}})

The list order of scenes appears to define the order in which they appear in the chart.

See also: Y. Tanahashi and K. Ma, “Design Considerations for Optimizing Storyline Visualizations,” in IEEE Transactions on Visualization and Computer Graphics, vol. 18, no. 12, pp. 2679-2688, Dec. 2012, doi: 10.1109/TVCG.2012.212. [h/t @hydrosquall]

A more recent chart captures the storylines of all the Start Wars movies – the different coloured threads are perhaps a useful device for highlighting players in a political story, or distinguishing teams or players in a sports based storyline?


Again, the structure of the data is based around characters and scenes, with additional metadata elements.

starwarsDataCallback({"characters":{"name":"characters", "elements":[{"id":"R2D", "name":"R2-D2", "bio":"A resourceful astromech droid, R2-D2 served Padmé Amidala, Anakin Skywalker and Luke Skywalker in turn, showing great bravery in rescuing his masters and their friends from many perils. A skilled starship mechanic and fighter pilot's assistant, he formed an unlikely but enduring friendship with the fussy protocol droid C-3PO.", "affiliation":"Light", "initialgroup":"0", "core":"*", "remove":""},...]},
"scenes":{"name":"scenes", "elements":[{"title":"Opening Logos", "plot":"", "episode":"Episode I", "characters":"", "dvaffiliation":"light"},...]}});

The rendering of the charts – from which we can read the story and get an idea of the flow of events – is simply a visual realisation of the way the data is structured an ordered in the data.

Which has got me thinking: could this be a handy way of viewing events detected from something like the F1 timing data? For example, pit stops and accidents are a given in the timing sheets, it’s easy enough to detect when the lead changes, I started exploring things like undercut detection, and so on. The actors are known (the drivers) and event can be sequenced by lap number, or race elapsed time. (Qualifying also presents an opportunity for telling the story using a narrative chart.)

There are two ways to approach this: first, I could just try to create some data files. Second, I wonder if I could text mine some race reports, treat each paragraph as a possible event, extract driver names (and perhaps even event keywords?) from the each paragraph, and then render the race report down as a narrative chart data file? And then start to iterate, improving the race report parser on the one hand, and building story trope generators (and detectors) into the timing sheet analysis in order to generate storylines automatically?

That is, can we use the narrative chart data format as in intermediary representation for picking apart and analysing human generated race reports, and as a target for automated storypoint identification routines?

See also: Notes on Robot Churnalism, Part I – Robot Writers.

Googling the Future – from the Present and the Past

An XKCD cartoon today described Googling the future using search terms such as “in <year>” and “by <year>”:

So I tried it:

Hmm – results from the future?

So I had a play in Google News… could this be a good way of searching forecasts?

By searching the past, we can search for old forecasts of the future…

I leave it as an exercise for the reader to search results from 2006, 2001, and 1991 for the 5, 10 and 20 years forecasts respectively for this year… let me know in the comments if anything interesting turns up;-)

See also: Google Impact…? The “Google Suggest” Factor

PS ANd this: Quantifying the Advantage of Looking Forward, which looks for different countries at the ratio of searches for year+1 and year-1 over the course of a year, then plots the resulting quotient against GDP. The results appear to suggest that there is a correlation between GDP and the forward looking tendency of the population. (But is this right? Do the search volumes get normalised (on Google Trends) by the volume of the first term at the start of the trend period? If the user numbers are growing over the course of the year, might we be skewing the future looking component because of loaded terms at the end of the year?)