Algorithmic Truthiness

With a media who failed to hold jokers to account when they had their chance, preferring “balanced” reporting that biases news reports and gives equal measure to unequally validated ideas, and social media opting for truthiness rather than fact to generate momentum for spreading (fake) news, it seems we’re told by commentators we’re now in a “post-truth”/”post-factual” world.

As the OED define it, truthiness is The quality of seeming or being felt to be true, even if not necessarily true.

Although the definition could be debated…

_post-truth__is_just_a_rip-off_of__truthiness__-_youtube

Sound familiar?

A few years ago, at the dawn of the age of Big Data, the idea that segmenting and modelling large datasets in a “theory-free” way (Big data and the end of theory?) perhaps gave an inkling that truthiness was on its way in, big time. (Compare this also with anti-expert rhetoric over the last couple of years. I’m all for slamming certain classes of academic outlook and activity, but I also think there are reasons for trusting certain sorts of claims more than others…)

The fact that data processing algorithms are likely to have ever increasing power of what we read – not only in terms of selecting which stories to show us in our personalised news feeds, but also because other machines may themselves have written the stories we’re reading – means that we need to start getting a feel for what sorts of biases are likely to be baked into these algorithms.

In contrast to earlier generation of rile based expert systems that could be asked to “explain” their reasoning, today’s systems are often black box statistical machines. Whereas rule based systems used logical reasoning to come up with answers, Deep Learning algorithms and their ilk have gut reactions: rule based expert systems reasoned towards a truth associated with the logical statements asserted into them in an explainable way; black boxes have gut reactions and deal in truthiness.

But whereas we might be suspicious about a person making a truthy claim (“that doesn’t sound quite right to me…”) once we start to trust machine – because they appear to be right-ish, most of the time – we start to over-trust them. I think – I haven’t checked. Sounds truthy to me…

So with a tech news report doing the rounds at the moment that a “Neural Network Learns to Identify Criminals by Their Faces”, it seems that the paper authors “have demonstrated that via supervised machine learning, data-driven face classifiers are able to make reliable inference on criminality” as well as identifying “a law of normality for faces of noncriminals. After controlled for race, gender and age, the general law-biding public have facial appearances that vary in a significantly lesser degree than criminals”. (It’s not hard to imagine this being used a ranking factor for something…) The (best) false positive rate looked on one of the charts (figure 4 in the paper) to be around 6%. Are the decisions “true”, then, or just “truthy”? What level of false positivity makes the difference? (Bear in mind behaviourist training  – partial reinforcement can be really powerful…) I also wonder if the researchers ran the same training schedule against IQ? Or etc etc

(In passing, another recent preprint report on arXiv – Lip Reading Sentences in the Wild reports on an automated lip reading system trained on several hours of people talking on BBC television (the UK based researchers were license fee payers, I suspect, but the Google Deepmind sponsor..?!) (If you’d rather read a pop sci write up, New Scientist has one here: Google’s DeepMind AI can lip-read TV shows better than a pro.) For reference, the best word error rate the researchers report is 3.3%. So are the outputs true or truthy?)

So… I’m wondering… algorithmic truthiness: the extent to which the outputs of an algorithm feel as if they could be true, even if not necessarily true. … a useful conceit, or not?

Or maybe we need an alt definition, such as “The extent to which you believe the output of an algorithm to be true rather than what you know to be true”?!

“Local stories of national interest” – New Johnston Press (Data Journalism) Investigations Unit

Complementing the approach of Trinity Mirror, who launched a cross-group data journalism unit back in 2013, Johnston Press has pulled together a (virtual?) Investigations Unit made up from several investigative and data skilled reporters from across the Johnston Press regional titles (press release).

The unit’s first campaign is focussed on sentences awarded for causing death by dangerous driving. The campaign allows the unit to report on national datasets, as such, as well as developing local stories based on examples taken from the national dataset, bubbling up local stories to wider national interest as campaign hooks. From the press release announcing the launch of the unit, it seems as if this campaigning style of national/local investigative reporting will be underpin the unit’s activities.

“As well as carrying out investigations, and telling powerful human interest stories, the unit has a campaigning and lobbying role at its heart” – Johnston Press press release.

The use of campaigns means the same theme can be kept alive and repeatedly reported on as on ongoing series over an extended period of time, tracked nationally but reported in a local context on the one hand, promoting local campaigns and then reporting them widely on the other.

The national/local model is one that I’ve long thought makes sense, though I’ve not really considered it in terms of the local to national twist. Instead, I’ve been framing it as an opportunity to address centrally common pain points that may be experienced trying to produce a story from data at a local level, as discussed in these thoughts on a locally targeted, nationally scoped datawire.

National dataset local story

One advantage of this approach is scale: graphics communicating national level statistics can be produced centrally and reused across local titles, perhaps with local customisation; local stories can be used to provide relevance to generic “national context” inserts reused across titles; and story templates can be customised to generate local reports from the same national dataset.

Another advantage with looking at national datasets is that they can help flag the newsworthiness of a local story given its national context (for example, national rankings generate story points for the top M, bottom N rankings).

I haven’t spent much time thinking about the campaign aspect, but on quick reflection I think that campaigns can act as nice wrappers for a wider range of templated activities an outputs.

For example, I’ve written a couple of times about the notion of story templates, noting how these have been rolled out in previous years by at least the Johnston Press and Trinity Mirror (Local News Templates – A Business Opportunity for Data Journalists?).

And eighteen months or so ago, I was fortunate enough to spend a couple of days seeing how Ruby Kitchen, then of the Harrogate Advertiser, now of the Yorkshire Post / Yorkshire Evening Post and the Johnston Press Investigations Unit, worked on a Food Standards Agency story on (Data Journalism in Practice). One of the takeaways for me from that was what was involved in actually making use of leads thrown up from a data trawl and then chasing down people for comment. The work involved in putting together an investigation at a single local level may need to be repeated for other locales, but the process can be reused – the investigatory process can be templated.

On the way back home from Harrogate, I’d started fantasising about putting together a training pack based on the the Food Standards Agency food hygiene ratings data (h/t Andy Dickinson for tangentially reminding me of this a couple of days ago :-), with a dual objective in mind: firstly, to produce a training pack for demonstrating various aspects of how to practically work with national datasets at a local level; secondly, to template a data journalism investigation that could be worked through by local or hyperlocal journalists, or journalism students, to produce a feature local food hygiene ratings. (It’s still sitting on the to do pile… Maybe I should have tried kickstarter!)

(Note that it’s not just news organisations that can scale templated systems, or reuse locally developed solutions for national benefit. For example, see the post Putting Public Open Data to Work…? for several examples of online services developed by local councils and used to publish local data that can also be scaled across other council areas.)

Whilst newspaper groups such as Trinity Mirror or Johnston Press have the scale in terms of the number of local outlets to merit a co-ordinated centre reducing the pain once for working with national datasets and then scaling out the benefits across the regional and local titles, independent hyperlocals are often more resource bound when it comes to pursuing investigations (though The Bristol Cable among others repeatedly shows how hyperlocal led investigations are possible).

Whilst I keep not starting to properly scope a hyperlocal datawire service, Will Perrin’s  Local News Engine seems to have gained some traction in its development recently (Early proof of concept for Local News Engine [code]). This service “is testing the theory that story leads can be found in local data where a newsworthy person or place is engaged in a newsworthy activity”, searching local datasources (license applications, planning applications) for notable names (see for example What data are we using in Local News Engine? and Who, what and where is newsworthy for Local News Engine?). The approach taken – named entity extraction cross-referenced with the names of local notables – complements an alternative approach that I favour for the datawire that would flag local stories from national datasets based on things like top N, bottom M rankings, outliers, notable trends or dramatic change in statistics for a local area from a national dataset based on a comparison with previous data releases, other locales and national averages.

PS you can tell this is a personal blog post, not a piece of journalism – I didn’t reach out to anyone from the Johnston Press, or Trinity Mirror, or get in touch with Will Perrin to check facts or ask for comment. It’s all just my personal comment, bias, interpretation and opinion….

PPS See also Archant’s Investigations Unit (2015 announcement) – h/t Andy Dickinson.

Be Wary Of Simulations

An old (well, relatively speaking – from March this year) video has recently resurfaced on the Twitterz describing how researchers are (were) using virtual worlds to train Deep Learning systems for possible use in autonomous vehicles:

It reminded me of a demo by Karl Sims at a From Animals to Animats conference years ago in which he’d evolved creatures in a 3D world to perform various forms of movement:

One thing I remember, but not shown in the video above, related to one creature being evolved to jump as high as it could. Apparently, it found a flaw in the simulated physics of the world within which the critters were being evolved that meant it could jump to infinity…

In turn, Sims critters reminded me of a parable about neural networks getting image recognition wrong*, retold here: Detecting Tanks. In trying to track down the origins of that story, references are made to this November 1993 Fort Carson RSTA Data Collection Final Report. In passing, I note that the report (on collecting visual scene information to train systems to detect military vehicles in natural settings) refers to a Surrogate Semiautonomous Vehicle (SSV) Program; which in turn makes me think: how many fits and starts has autonomous vehicle research gone through prior to it’s current incarnation?

* In turn, this reminds me of another possibly apocryphal story – of a robot trained to run a maze being demoed for some important event. The robot ran the maze fine, but then the maze was moved to another part of the lab for the Big Important Demo. At which point, the robot messed up completely: rather than learning the maze, the robot had trained its escape based on things it could see in the lab – such as the windows – that were outside the maze. The problem with training machines is you’re never quite sure what they’re focussing on…

PS via Pete Mitton, another great simulation snafu story: the tale of the kangaroos. Anyone got any more?:-)

Datadive Reproducibility – Time for a DataBox?

Whilst at the Global Witness “Beneficial Ownership” datadive a couple of weeks ago, one of the things I was pondering  – how to make the weekend’s discoveries reproducible on the one hand, useful as a set of still working legacy tooling on the other – blended into another: how to provide an on-ramp for folk attending the event who were not familiar with the data or the way in which t was provided.

Event facilitators DataKind worked in advance with Global Witness to produce an orientation exercise based around a sample dataset. Several other prepped datasets were also made available via USB memory sticks distributed as required to the three different working groups.

The orientation exercise was framed as a series of questions applied to a core dataset, a denormalised flat 250MB or so CSV file containing just over a million or so rows, with headers. (I think Excel could cope with this – not sure if that was by design or happy accident.)

For data wranglers expert at working with raw datafiles and their own computers, this doesn’t present much of a problem. My gut reaction was to open the datafile into a pandas dataframe in a Jupyter notebook and twiddle with it there; but as pandas holds dataframes in memory, this may not be the best approach, particularly if you have multiple large dataframes open at the same time. As previously mentioned, I think the data also fit into Excel okay.

Another approach after previewing the data, even if just by looking at it on the command line with a head command, was to load the data into a database and look at it from there.

This immediately begs several questions of course  – if I have a database set up on my machine and import the database without thinking about it, how can someone else recreate that? If I don’t have a database on my machine (so I need to install one and get it running) and/or I don’t then know how to get data into the database, I’m no better off. (It may well be that there are great analysts who know how to work with data stored in databases but don’t know how to do the data engineering stuff of getting the database up and running and populated with data in the first place.)

My preferred solution for this at the moment is to see whether Docker containers can help. And in this case, I think they can. I’d already had a couple of quick plays looking at getting the Companies House significant ownership data into various databases (Mongo, neo4j) and used a recipe that linked a database container with a Jupyter notebook server that I could write my analysis scripts in (linking RStudio rather than Jupyter notebooks is just as straightforward).

Using those patterns, it was easy enough to create a similar recipe to link a Postgres database container to a Jupyter notebook server. The next step – loading the data in. Now it just so happens that in the days before the datadive, I’d been putting together some revised notebooks for an OU course on data management and analysis that dealt with quick ways of loading data into a Postgres data, so I wondered whether those notes provided enough scaffolding to help me load the sample core data into a database: a) even if I was new to working with databases, and b) in a reproducible way. The short answer was “yes”. Putting the two steps together, the results can be found here: Getting started – Database Loader Notebook.

With the data in a reproducibly shareable and “live” queryable form, I put together a notebook that worked through the orientation exercises. Along the way, I found a new-to-me HTML5/d3js package for displaying small  interactive network diagrams, visjs2jupyter. My attempt at the orientation exercises can be found here: Orientation Activities.

Whilst I am all in favour of experts datawranglers using their own recipes, tools and methods for working with the data – that’s part of the point of these expert datadives – I think there may also be mileage in providing a base install where the data is in some sort of immediately queryable form, such as in a minimal, even if not properly normalised, database. This means that datasets too large to be manipulated in memory or loaded into Excel can be worked with immediately. It also means that orientation materials can be produced that pose interesting questions that can be used to get a quick overview of the data, or tutorial materials produced that show how to work with off-the-shelf powertool combinations (Jupyter notebooks / Python/pandas / PostgreSQL, for example, or RStudio /R /PostgreSQL ).

Providing a base set up to start from also acts as an invitation to extend that environment in a reproducible way over the course of the datadive. (When working on your own computer with your own tooling, it can be way too easy to forget what packages (apt-get, pip and so on) you have pre-installed that will cause breaking changes to any outcome code you show with others who do not have the same environment. Creating a fresh environment for the datadive, and documenting what you add to it, can help with that, but testing in a linked container, but otherwise isolated, context really helps you keep track of what you needed to add to make things work!

If you also keep track of what you needed to do handle undeclared file encodings, weird separator characters, or password protected zip files from the provided files, it means that others should be able to work with the files in a reliable way…

(Just a note on that point for datadive organisers – metadata about file encodings, unusual zip formats, weird separator encodings etc is a useful thing to share, rather than have to painfully discover….)

Using tools like Docker is one way of improving the shareability of immediately queryable data, but is there an even quick way? One thing I want to explore on my to do list is the idea of a “databox”, a Raspberry Pi image that when booted runs a database server and Jupyter notebook (or RStudio) environment. The database can be pre-seeded with data for the datadive, so all that should be required is for an individual to plug the Raspberry Pi into their computer with an ethernet cable, and run from there. (This won’t work for really large datasets – the Raspberry Pi lacks grunt – but it’s enough to get you started.)

Note that these approaches scale out to other domains, such as data journalism projects (each project on its own Raspberry PI SD card or docker-compose setup…)

What Nationality Did You Say You Were, Again?

For the first time in way too long, I went to a data dive over the weekend, facilitated by DataKind on behalf of Global Witness, for a couple of days messing around with the UK Companies House  Significant Control (“beneficial ownership”) register.

One of the data fields in the data set is the nationality of a company’s controlling entity, where that’s a person rather than a company. The field is a free text one, which means that folk completing a return have to write their own answer in to the box, rather than selecting from a specified list.

The following are the more popular nationalities, as declared…

country_match

Note that “English” doesn’t count – for the moment, the nationality should be declared as “British”…

And some less popular ones  – as well as typos…:

country_match2

So how can we start to clean this data?

One the libraries I discovered over the weekend was fuzzyset, that lets you add “target” strings to a set and then do a fuzzy match retrieval from the set using a word or phrase you have been provided with.

If we find a list of recognised nationalities, we could add these to a canonical “nationality” set, and then try to match supplied nationalities against them.

The UK Foreign & Commonwealth Office register of country names, a register that lists formalised country names for use in government, also includes nationalities – so maybe we can use that?

Adding the FCO nationalities to a  fuzzyset list, and then matching nationalities from the significant control register against those nationalities, gives a glimpse into the cleanliness (or otherwise!) of the data. For example, here’s what was matched against “British”:

British | Britsh | Bristish | Brisith | Scottish | Britsih | British/Greek | Greek/British | Briitish | British/Czech | Bitish | Brtisih | British/Welsh | Brirish | Brtish | British. | British Norfolk | British Cornish | British Subject | British English | Uk British | British/Irish | Britiah | British/Swedish | Biitish | Brititsh | British/English | Briish | British/Persian | Britiish | Brittish | French British | British/German | British/Syrian | Britihs | Briitsh | British /English | British / English | Brits | Kenyan/British | Britis | American British | Btitish | British/Bahrain Dual | Brtitish | Polish/British | Dual British/Irish | Brirtish | British- | British Uk | Brutish | Britich | British (Naturalised) | British (Canada Born) | Brithish | British Irish | British & Usa | Britisch | British/French | British/Israeli | Britrish | Britsh - English | American/British | Britisb | White British | Birtish | English / British | British/Turkish | Dual Usa/British | British/Swiss | Biritish | Britishu | Britisah | European British | British / Scottish | British & Israeli | British Swiss | Scotish | British Welsh | Britisn | Briti | Britihs & Irish | Britishi | Brfitish | Usa And British | American / British | British-United Kingdom | British Usa | Britisg | Israeli/British | Britih | Welsh British | Us & British | British Indian | British Asian | B Ritish | Emaratis | British/Bosnian | White Brtitish | British - English | Welsh/British | German/British | British & Irish | British-Israeli | British / Greek | Great British | Beitish | White Uk British | Belizean & British | Brithish English | Brituish | Britiash | Indian British | British Caribbean | Swedish/British | Britisjh | British Amercian | Britisk | Turkish/British | Brtiish | Br5itish | Brritish | Welsh, British | Brtitsh | U.K British | Britidh | Kurdish/British | English British | Brith | Irish/British | Britisj | British/Pakistan | I'M British | Britisih | American & British | British / Welsh | British / Swiss | Brittsh | British Icelandic | Swiss / British | Brotish | British Sikh | English/British | Britiswh | Bristsh | British European | British And Usa | British / Israeli | British Bengali | British Afghan | Brithsh | Brit6ish | British/Indian | British/Libyan | British/Polish | British Israeli | British National | Swiss British | Briritsh | Britishh | British / Irish | Brithis | Britshi | British And Thai | Britush | Britiss | British, English | Bfritish | Btritish | Brisitsh | White English | British/Mosotho | Usa & British | British/ Eu National | Finnish/British | Israeli + British | British And Polish | Bartish | Nritish | Brishish | British Manx | German And British | Britiosh | British (Bermudian) | Britishbritish | Naturalised British | English - British | Welsh - British | Dual American/British | British,Uk | British And Us | Uk Brittish | British Overseas | British & Swiss | English-British | British & Polish | Us/British | Swiss & British | British And Greek | Iraqi, British | Breitish | Black British | U.K. British | Afghan British | Brit / English | British/Asian | Awhite British | Asian British | British / Polish | Caucasian British | Britosh | Bristih | Britsish | British Libyan | Britisth | Brisish | British & Spanish | Britinsh | Britisht | Britsith | Britash | Irish / British | Brisitish | Brirtsh | Bruitish | Dutch / British | Bristis | Ritish | Welsh, Bristish | British Resident | British And French | British/ English | British (Welsh) | French/British | Dual British - French | Bristiah | Great Britain & Usa | British & Us | Uk Scottish | British Scott | Brititish | Dual: British, Usa | .British | British (Scots) | Scottish Uk | British/Scottish | Brittiish | British-Irish | Btittish | Scottish. | Britisy | Bruttish | Dual British Irish | Scottish/British

In passing, English matched best with Bangladeshi, so we maybe need to tweak the lookup somewhere, perhaps adding English, Scottish, Northern Irish, Welsh, and maybe the names of UK counties, into the fuzzyset, and then in post-processing mapping from these to British?

Also by the by, word had it that Companies House didn’t consider there to be any likely significant data quality issues with this field… so that’s alright then….

PS For various fragments of code I used to have a quick look at the nationality data, see this gist. If you look through the fuzzy matchings to the FCO nationalities, you’ll see there are quite a few false attributions. It would be sensible to look at the confidence ratings on the matches, and perhaps set thresholds for automatically allocating submitted nationalities to canonical nationalities. In a learning system, it may be possible to bootstrap – add high confidence mappings to the fuzzyset (with a map to the canonical nationality) and then try to match again the nationalities still unmatched at a particular level of confidence?

You’ll Know the Drones Are Coming When…

… legislation, regulations and codes of conduct mention them.

For example, I spotted a scene  this video today – The First Starship Robot Delivery in Redwood City, California – showing a delivery bot trundling it’s way through city streets…

the_first_starship_robot_delivery_in_redwood_city__california_-_youtube

which got me wondering: does the driver have to give way and stop?

Cue The Highway Code and The Zebra, Pelican and Puffin Pedestrian Crossings Regulations and General Directions 1997.

First, The Highway Code:

using_the_road__159_to_203__-_the_highway_code_-_guidance_-_gov_uk

Hmm… nothing about drones or delivery bots there…

What do the regulations say?

Significance of give-way lines at Zebra crossings

14.  A give-way line included in the markings placed pursuant to regulation 5(1)(b) and Part II of Schedule 1 shall convey to vehicular traffic proceeding towards a Zebra crossing the position at or before which a vehicle should be stopped for the purpose of complying with regulation 25 (precedence of pedestrians over vehicles at Zebra crossings).

Precedence of pedestrians over vehicles at Zebra crossings

25.—(1) Every pedestrian, if he is on the carriageway within the limits of a Zebra crossing, which is not for the time being controlled by a constable in uniform or traffic warden, before any part of a vehicle has entered those limits, shall have precedence within those limits over that vehicle and the driver of the vehicle shall accord such precedence to any such pedestrian.

(2) Where there is a refuge for pedestrians or central reservation on a Zebra crossing, the parts of the crossing situated on each side of the refuge for pedestrians or central reservation shall, for the purposes of this regulation, be treated as separate crossings.

See also recent news reports about how the First self-driving cars will be unmarked so that other drivers don’t try to bully them

Time to set up an alert on things like: drone OR unmanned site:www.legislation.gov.uk/uksi

And for example, we already have things like The Air Navigation Order 2016 which covers“Small unmanned aircraft”  and “Small unmanned surveillance aircraft” (as referenced in The Air Navigation (Restriction of Flying) (Wales Rally GB) Regulations 2016) or The Air Navigation (Restriction of Flying) (Nuclear Installations) Regulations 2016 which references “small unmanned aircraft”.

PS The above reminds me…

30673073435_6c5ab23fb6_k

Spectator Centric Motor Racing Circuit Commentary

A bit over a decade ago, and several times since, I’ve idly wondered about being able to compete virtually in replay of an actual sporting event (Re:Play – The Future of Sports Gaming? “I’ll Take it From Here…”). Every so often, the idea pops up again (for example, Real racing in the virtual world), but now, it seems that real time gaming against live F1 racers [is] “only two years away”:

“We launched our virtual Grand Prix channel this year, which gives us the platform to produce a fully virtual version of the race live using the data,” said Morrison [John Morrison, Chief Technical Officer, Formula One Management]. “The thing we have to crack is we have to produce accurate positioning.
“Then we can do the gaming stuff and you can be in the car racing against other drivers. I reckon we are about two years away from that. We need accuracy to the nearest centimetre, so cars aren’t touching when they shouldn’t be touching. Right now we are more at 100-200mm accuracy.”

Whatever…

With multiple cameras offering 360 views, there are increasing opportunities for providing customised viewing perspectives using real footage. But simulated views from arbitrary viewpoints are also possible. For example, think of the virtual camera views that can be generated by Hawk Eye over a snooker table and then apply the same thing to 3D rendered models of F1 cars as they drive round a circuit (which has also been lidar scanned):

But that’s video… What about providing audio commentaries for spectators at a circuit that are created specifically for the listener according to where they are on the circuit?

For example, as a particular car goes by, I want my personal commentary to tell me what position they are in, as well as having bits of more general commentary about what’s going on elsewhere on the circuit. Through knowing the position of the cars on the circuit, and the position of the listener on the circuit (for example, based on wifi hotspot triangulation), we should be able to automatically generate a textual commentary that passes on information about the cars that the spectator can see from their current location, and then render that commentary to audio via a text to speech service.

Increasingly, I think there is a market in the automated generation of sports commentaries from sports data, it’s just I hadn’t thought about generating commentaries from a particular perspective to support the viewing of a live event from a particular location (“location specific” or “location sensitive” commentary).

The Associated Press (AP) would perhaps agree, aspiring as they are to the automation of 80 percent of their content production by 2020 (The AP wants to use machine learning to automate turning print stories into broadcast ones). They’re also looking at generating multiple versions of the same story, appropriate for different formats, from a single source.

Apparently, [o]n average, when an AP sportswriter covers a game, she produces eight different versions of the same story. Aside from writing the main print story, they have to write story summaries, separate ledes for both teams, convert the story to broadcast format, and more. How much easier it would be to just write one version and then generate the alternative presentations from it, which leads to this:

… a cross-sectional team of five AP staffers has been working on developing a framework to automate the process of converting print stories to broadcast format.

The team built a prototype that just identifies elements in print stories that need to be altered for broadcast. (Stories are shorter, sentences are more concise, attribution comes at the beginning of a sentence, numbers are rounded, and more.)

Hmmm… for location specific commentaries, I see another possibility: a generic commentary about events happening across a motor-racing circuit, intercut with live, custom commentary relating to what the spectator can actually see in front of them at that time, as if the commentator were sat by their side.

Related: eg in terms of automatically generating race commentaries from data – Detecting Undercuts in F1 Races Using R.