Notes on Robot Churnalism, Part I – Robot Writers

In Some Notes on Churnalism and a Question About Two Sided Markets, I tried to pull together a range of observations about the process of churnalism, in which journalists propagate PR copy without much, if any, critique, contextualisation or corroboration.

If that view in any way represents a fair description of how some pre-packaged content, at least, makes its way through to becoming editorial content, where might the robots fit in? To what extent might we start to see “robot churnalism“, and what form or forms might it take?

There are two particular ways in which we might consider robot churnalism:

  1. “robot journalists” that produce copy acts as a third conveyor belt complementary to PA-style wire and PR feedstocks;
  2. robot churnalists as ‘reverse’ gatekeepers, choosing what wire stories to publish where based on traffic stats and web analytics.

A related view is taken by Philip Napoli (“Automated media: An institutional theory perspective on algorithmic media production and consumption.” Communication Theory 24.3 (2014): 340-360; a shorter summary of the key themes can be found here) who distinguishes roles for algorithms in “(a) media consumption and (b) media production”. He further refines the contributions algorithms may make in media production by suggesting that “[t]wo of the primary functions that algorithms are performing in the media production realm at this point are: (a) serving as a demand predictor and (b) serving as content creator.”

Robot Writers

“Automated content can be seen as one branch of what is known as algorithmic news” writes Christer Clerwall (2014, Enter the Robot Journalist, Journalism Practice, 8:5, pp519-531), a key component of automated journalism “in which a program turns data into a news narrative, made possible with limited — or even zero — human input” (Matt Carlson (2015) The Robotic Reporter, Digital Journalism, 3:3, 416-431).

In a case study based around the activities of Narrative Science, a company specialising in algorithmically created, data driven narratives, Carlson further conceptualises “automated journalism” as “algorithmic processes that convert data into narrative news texts with limited to no human intervention beyond the initial programming”. He goes on:

The term denotes a split from data analysis as a tool for reporters encompassed in writings about “computational and algorithmic journalism” (Anderson 2013) to indicate wholly computer-written news stories emulating the compositional and framing practices of human journalism (ibid, p417).

Even several years ago, Arjen van Dalen observed that “[w]ith the introduction of machine-written news computational journalism entered a new phase. Each step of the news production process can now be automated: “robot journalists” can produce thousands of articles with virtually no variable costs” (The Algorithms Behind the Headlines, Journalism Practice, 6:5-6, 648-658, 2012, p649).

Sport and financial reporting examples abound from the bots of Automated Insights and Narrative Science (for example, Notes on Narrative Science and Automated Insights or Pro Publica: How To Edit 52,000 Stories at Once, and more recently e.g. Robot-writing increased AP’s earnings stories by tenfold), with robot writers generating low-cost content to attract page views, “producing content for the long tail, in virtually no time and with low additional costs for articles which can be produced in large quantities” (ibid, p649).

Although writing back in 2012, van Dalen noted in his report on “the responses of the journalistic community to automatic content creation” that:

[t]wo main reasons are mentioned to explain why automated content generation is a trend that needs to be taken seriously. First, the journalistic profession is more and more commercialized and run on the basis of business logics. The automation of journalism tasks fits in with the trend to aim for higher profit margins and lower production costs. The second reason why automated content creation might be successful is the quality of stories with which it is competing. Computer-generated news articles may not be able to compete with high quality journalism provided by major news outlets, which pay attention to detail, analysis, background information and have more lively language or humour. But for information which is freely available on the Internet the bar is set relatively low and automatically generated content can compete (ibid, p651).

As Christer Clerwall writes in Enter the Robot Journalist, (Journalism Practice, 8:5, 2014, pp519-531):

The advent of services for automated news stories raises many questions, e.g. what are the implications for journalism and journalistic practice, can journalists be taken out of the equation of journalism, how is this type of content regarded (in terms of credibility, overall quality, overall liking, to mention a few aspects) by the readers? p520.

van Dalen puts it thus:

Automated content creation is seen as serious competition and a threat for the job security of journalists performing basic routine tasks. When routine journalistic tasks can be automated, journalists are forced to offer a better product in order to survive. Central in these reflections is the need for journalists to concentrate on their own strengths rather than compete on the strengths of automated content creation. Journalists have to become more creative in their writing, offer more in-depth coverage and context, and go beyond routine coverage, even to a larger extent than they already do today (ibid, p653).

He then goes on to produce the following SWOT analysis to explore just how the humans and the robots compare:

algo_behind_headlines

One possible risk associated with the automated production of copy is that it becomes published without human journalistic intervention, and as such is not necessarily “known”, or even read, by any member at all of the publishing organisation. To paraphrase Daniel Jackson and Kevin Moloney, “Inside Churnalism: PR, journalism and power relationships in flux”, Journalism Studies, 2015, this would represent an extreme example of churnalism in the sense of “the use of unchecked [robot authored] material in news”.

This is dangerous, I think, on many levels. The more we leave the setting of the news agenda and the identification of news values to machines, the more we lose any sensitivity to what’s happening in the world around us and what stories are actually important to an audience as opposed to merely being Like-bait titillation. (As we shall see, algorithmic gatekeepers that channel content to audiences based on various analytics tools respond to one definition of what audiences value. But it is not clear that these are necessarily the same issues that might weigh more heavily in a personal-political sense. Reviews of the notion of “hard” vs. “soft” news (e.g. Scherr, S., & Legnante, G. (2011). Hard and soft news: A review of concepts, operationalizations and key findings. Journalism, 13(2) pp221–239)) may provide lenses to help think about this more deeply?)

Of course, machines can also be programmed to look for links and patterns across multiple sources of information and at far greater scale than a human journalist could hope to cover, but we are then in danger of creating some sort of parallel news world, where events are only recognised, “discussed” and acted upon by machines and human actors are oblivious to them. (For an example, The Wolf of Wall Tweet: A Web-reading bot made millions on the options market. It also ate this guy’s lunch that describes how bots read the news wires and trade off the back them. They presumably also read wire stories created by other bots…)

So What It Is That Robot Writers Actually Do All Day?

In a review of Associated Press’ use of Automated Insight’s Wordsmith application (In the Future, Robots Will Write News That’s All About You), Wired reported that Wordsmith “essentially does two things. First, it ingests a bunch of structured data and analyzes it to find the interesting points, such as which players didn’t do as well as expected in a particular game. Then it weaves those insights into a human readable chunk of text.”

One way of getting deeper into the mind of a robot writer is to look to the patents held by the companies who develop such applications. For example, in The Anatomy of a Robot Journalist, one process used by Narrative Science is characterised as follows:

narrativeScience

Identifying newsworthy features (or story points) is a process of identifying features and then filtering out the ones that are somehow notable. Angles are possibly defined as in terms of sets of features that need to be present within a particular dataset for that angle to provide a possible frame for story. The process of reconciling interesting features with angle points populates the angle with known facts, and a story engine then generates the natural language text within a narrative structure suited to an explication of the selected angle.

(An early – 2012 – presentation by Narrative Science’s Larry Adams also reviews some of the technicalities: Using Open Data to Generate Personalized Stories.)

In actual fact, the process may be a relatively straightforward one, as demonstrated by the increasing numbers of “storybots” that populate social media. One well known class of examples are earthquake bots that tweet news of earthquakes (see also: When robots help human journalists: “This post was created by an algorithm written by the author”). (It’s easy enough to see various newsworthiness filters might work here: a geo-based one for reporting a story locally, a wider interest one for reporting an earthquake above a particular magnitude, and so on.)

It’s also easy enough to create your own simple storybot (or at least, an “announcer bot”) using something like IFTT that can take in an RSS feed and make a tweet announcement about each new item. A collection of simple twitterbots produced as part of a journalism course on storybots, along with code examples, can be found here: A classroom experiment in Twitter Bots and creativity. Here’s another example, for a responsive weatherbot that tries to geolocate someone sending a message to the bot and respond to them with a weather report for their location.


Not being of a journalistic background, and never having read much on media or communications theory, I have to admit I don’t really have a good definition for what angles are, or a typology for them in different topic areas, and I’m struggling to find any good structural reviews of the idea, perhaps because it’s so foundational? For now, I’m sticking with a definition of “an angle” as being something along the lines of the thing you want focus on and dig deeper around within the story (the thing you want to know more about or whose story you want to tell; this includes abstract things: the story of an indicator value for example, over time). The blogpost Framing and News Angles: What is Bias? contrasts angles with the notions of framing and bias. Entman, Robert M. “Framing: Towards clarification of a fractured paradigm.” McQuail’s reader in mass communication theory (1993): 390-397 [pdf] seems foundational in terms of the framing idea, De Vreese, Claes H. “News framing: Theory and typology.” Information design journal & document design 13.1 (2005): 51-62 [PDF] offers a review (of sorts) of some related literature, and Reinemann, C., Stanyer, J., Scherr, S., & Legnante, G. (2011). Hard and soft news: A review of concepts, operationalizations and key findings. Journalism, 13(2) pp221–239 (PDF) perhaps provides another way in to related literature? Bias is presumably implicit in the selection of any particular frame or angle? Blog posts such as What makes a press release newsworthy? It’s all in the news angle look to be linkbait, perhaps even stolen content (eg here’s a PDF), but I can’t offhand find a credible source or inspiration for the original list? Resource packs like this one on Working with the Media from the FAO gives a crash course into what I guess are some of the generally taught basics around story construction?


One comment

  1. Pingback: The Week in Robots