If Only I’d Been More Focussed… National-Local Data Robot Media Wire

And so it came to pass that Urbs Media started putting out their Arria NLG generated local data stories, customised from national data sets, on the PA news wire, as reported by the Press GazetteFirst robot-written stories from Press Association make it into print in ‘world-first’ for journalism industry – and Hold the Front Page: Regional publishers trial new PA robot reporting project.

Ever keen to try new approaches out, my local hyperlocal, OnTheWight, have already run a couple of the stories. Here’s an example: Few disadvantaged Isle of Wight children go to university, figures show.

Long term readers might remember that this approach is one that OnTheWight have explored before, of course, as described in OnTheWight: Back at the forefront of next wave of automated article creation.

Back in 2015, I teamed up with them explore some ideas around “robot journalism”, reusing some of my tinkerings to automate the production of a monthly data story OnTheWight run around local jobless statistics. You can see a brief review from the time here and an example story from June 2015 here. The code was actually developed a bit further to include some automatically generated maps (example) but the experiment had petered out by then (“musical differences”, as I recall it!;-) (I think we’re talking again now.. ;-) I’d half imagined actually making a go of some sort of offering around this, but hey ho… I still have some related domains I bought on spec at the time…

At the time, we’d been discussing ways for what to do next. The “Big Idea” as I saw it was that doing the work of churning through a national dataset, with data at the local level, once, (for OntheWight), meant that the work was already done for everywhere.

robot_intermediatePR

To this end, I imagined a “datawire” – you can track the evolution of that phrase through OUseful.info posts here – that could be used to distribute localised press releases automatically generated from national datasets. One of the important things for OnTheWIght was the importance of getting data reports out quickly once a data set had been released. (I seem to remember we raced each other – the manual route versus the robot one.) My tools weren’t fully automated – I had to keep hitting reload to fetch the data rather than having a cron job start pinging the Nomis website around the time of the official release, but that was as much because I didn’t run any servers as anything. One thing we did do was automatically push the robot generated story into the OnTheWight WordPress blog draft queue, from where it could be checked and published by a human editor. The images were handled circuitously (I don’t think I had a key to push image assets to the OnTheWight image server?)

The data wire idea was actually sketched out a couple of years ago at a community journalism conference (Time for a Local Data Wire?), and that was perhaps where our musical differences about the way forward started to surface? :-(

One thing you may note is the focus on producing press releases, with the intention that a journalist could build a story around the data product, rather than the data product standing in wholesale for a story.

I’m not sure this differs much from the model being pursued by Urbs Media, the organisation that’s creating the PA data stories, and that is funded in part at least by a Google Digital News Initiative (DNI) grant: PA awarded €706,000 grant from Google to fund a local news automation service in collaboration with Urbs Media.

FWIW, give me three quarters of a million squids, or Euros, and that’d do me as a private income for the rest of working my life; which means I’d be guilt free enough to play all time…!

One of the things that I think the Urb stories are doing is including quotes on the national statistical context taken from the original data release. For example:

Which reminds me – I started to look at the ONS JSON API when it appeared (example links), but don’t think I got much further than an initial play... One to revisit, to see if it can be used as a source from which automated quote extraction is possible…

Something our original job stats stories didn’t really get to evolve as far  as being the inspiration for contextualising reporting – they were more or less a literal restatement of the “data generated press release”. I seem to recall that this notion of data-to-text-to-published-copy started to concern me, and I began to explore it in a series of posts on “robot churnalism” (for example, Notes on Robot Churnalism, Part I – Robot Writers and Notes on Robot Churnalism, Part II – Robots in the Journalism Workplace).

(I don’t know how many of the stories returned in that search were from PA stories. I think that regional news group operators such as Johnston Press and Archant also run national units producing story templates that can be syndicated, so some templated stories may come from there.)

I think there are a couple more posts in that series still in my draft queue somewhere which I may need to finish off… Perhaps we’ll see how the new stories start to play out to see whether we start to see the copy being reprinted as is or being used to inspire more contextualised local reporting around the data.

I also recall presenting on the topic of “Robot Writers” at ILI in 2016 (I wasn’t invited back this year:-(

So what sort of tech is involved in producing the PA data wire stories? From the preview video on the Urbs Media website, the technology behind the Radar project –  Reporters and Data and Robots  – looks to be the Articulator Lite application developed by Arria NLG. If you haven’t been keeping up, Arria NLG is the UK equivalent of companies like Narrative Science and Automated Insights in the US which I’ve posted about on and off for the last few years (for example, Notes on Narrative Science and Automated Insights).

Anyway, it’ll be interesting to see how the PA / Urbs Media thing plays out. I don’t know if they’re automating the charts’n’maps production thing yet, but if they do then I hope they generate easily skinnable graphic objects that can be themed using things like ggthemes or matplotlib styles.

There’s a stack of practical issues and ethical issues associated with this sort of thing, and it’ll be interesting to see if any concerns start to be aired, or oopses appear. The reporting around the Births by parents’ characteristics in England and Wales: 2016 could easily be seen as judgemental, for example.

PS I wonder if they run a Slack channel data wire? Slackbot Data Wire, Initial Sketch Maybe there’s still a gap in the market for one of my ideas?! ;-)

I’ve Not Been Keeping Up With Robot Journalists (and Completely Missed Mentioning Document Automation Before Now…)

It’s been some time since I last had a look around at where people are at with robot journalism. The companies that still tend to come to mind (for me) in this area are still Automated Insights (US, Wordsmith), Narrative Science (US, Quill), Arria NLG (UK), AX Semantics (DE), Yseop (FR, Compose) and Tencent (CN, Dreamwriter) (was it really four years ago when I first started paying attention to this?!) but things have moved on since then, so I probably need to do another round up…

A recent addition I hadn’t noticed comes from the Washington Post, (owner: a certain Mr Jeff Bezos), and their Heliograf tool (via Wired: What News-Writing Bots Mean for the Future of Journalism); this currently seems to write under the byline powered by Heliograf and had its first major outing covering the 2016 Olympics (The Washington Post experiments with automated storytelling to help power 2016 Rio Olympics coverage); since then, it’s also moved into US election reporting (The Washington Post to use artificial intelligence to cover nearly 500 races on Election Day).

The model appears, in part, to be using Heliograf as a drafting tool, which is one of more interesting ways I always thought this sort of stuff would play out: “using Heliograf’s editing tool, Post editors can add reporting, analysis and color to stories alongside the bot-written text. Editors can also overwrite the bot text if needed” (my emphasis, via).

It seems that automated election reporting has also been used in Finland. Via @markkuhanninen, “[o]ur national broadcasting company Yle did the same for elections and some sports news (NHL ice hockey). All articles seem to be in Finnish.”

(Hmmm… thinks… maybe I should have tried doing something for UK General Election? Anyone know if anyone was using robot journalists to produce ward or constituency level reports for UK general election or local elections?)

Looking back to 2015, Swedish publisher Mittmedia (interview) look like they started off (as many do) with a simple weather reporting back in 2015, as well as using NLG to report internally on media stats. The Swedish national wire service TT Nyhetsbyrån also look to have started building their own robot reporter (TT building “reporter robot”) which perhaps made it’s debut last November? “The first robot reporter has left the factory on @ttnyhetsbyran and given out in reality. Hope it behaves” (translated by a bot…) (@matsrorbecker).

And in Norway, it looks like the Norwegian news agency NTB also started out with some automated sports reports last year, I think using orbit.ai? (Norwegian News Agency is betting on automation for football coverage, and an example of the recipe: Building a Robot Journalist).

2016 also saw Bloomberg start to look at making more use of automation: Bloomberg EIC: Automation is ‘crucial to the future of journalism’.

Offhand, I haven’t found a specific mention of Thomson Reuters using automation for producing business reports (although I suspect they do), but I did notice that they appear to have been in the document automation game for years with their Contract Express application, a template solution that supports the automated creation of legal documents. A quick skim around suggests that document automation is a commodity level service in the legal industry, with lots of players offering a range of template based products.

Thinking in terms of complexity, I wonder if it’s useful imagining automated journalism in the context of something like: mail merge, document automation (contracts), automated reporting (weather, sports, financial, election, …)? Certainly, weather reporting and sports reporting appear to be common starting points, perhaps because they are “low risk” as folk get comfortable with producing and publishing automated copy.

I also wonder (again?) about how bylines are used, and have been evolving, to attribute the automated creation of news content. Is anyone maintaining a record or collection of such things, I wonder?

Robot Journalists or Robot Press Secretaries? Why Automated Reporting Doesn’t Need to be That Clever

Automated content generators, aka robot journalists, are turning everywhere at the moment, it seems: the latest to cross my radar being a mention of “Dreamwriter” from Chinese publisher Tencent (End of the road for journalists? Tencent’s Robot reporter ‘Dreamwriter’ churns out perfect 1,000-word news story – in 60 seconds) to add to the other named narrative language generating bots I’m aware of, Automated Insight’s Wordsmith and Narrative Science’s Quill, for example.

Although I’m not sure of the detail, I assume that all of these platforms make use of quite sophisticated NLG (natural language generation) algorithms, to construct phrases, sentences, paragraphs and stories from atomic facts, identified story points, journalistic tropes and encoded linguistic theories.

One way of trying to unpick the algorithms is to critique, or even try to reverse engineer, stories known to be generated by the automatic content generators, looking for clues as to how they’re put together. See for example this recent BBC News story on Robo-journalism: How a computer describes a sports match.

Chatting to media academic Konstantin Dörr/@kndoerr in advance of the Future of Journalism conference in Cardiff last week (I didn’t attend the conference, just took the opportunity to grab a chat with Konstantin a couple of hours before his presentation on the ethical challenges of algorithmic journalism) I kept coming back to thoughts raised by my presentation at the Community Journalism event the day before [unannotated slides] about the differences between what I’m trying to explore and these rather more hyped-up initiatives.

In the first place, as part of the process, I’m trying to stay true to posting relatively simple – and complete – recipes that describe the work I’m doing so that others can play along. Secondly, in terms of the output, I’m not trying to do the NLG thing. Rather, I’m taking a template based approach – not much more than a form letter mail merge approach – to putting data into a textual form. Thirdly, the audience for the output is not the ultimate reader of a journalistic piece; rather, the intended audience is an intermediary, a journalist or researcher who needs an on-ramp providing them with useable access to data relevant to them that they can then use as the possible basis for a story.

In other words, the space I’m exploring is in-part supporting end-user development / end user programming (for journalist end-users, for example), in part automated or robotic press secretaries (not even robot reporters; see for example Data Reporting, not Data Journalism?) – engines that produce customised press releases from a national dataset at a local level that report a set of facts in a human readable way, perhaps along with supporting assets such as simple charts and very basic observational analysis (this month’s figures were more than last month’s figures, for example).

This model is one that supports a simple templated approach for a variety of reasons:

  1. each localised report has the same form as any other localised report (eg a report on jobseeker’s allowance figures for the Isle of Wight can take the same form as a report for Milton Keynes);
  2. it doesn’t matter so much if the report reads a little strangely, as long as the facts and claims are correct, because the output is not intended for final publication, as is, to the public – rather, it could be argued that it’s more like a badly written, fact based press statement that at least needs to go through a copy editor! In other words, we can start out scruffy…
  3. the similarity in form of one report to another is not likely to be a distraction to the journalist in the way that it would be to a general public reader presented with several such stories and expecting an interesting – and distinct – narrative in each one. Indeed, the consistent presentation might well aid the journalist in quickly spotting the facts and deciding on a storyline and what contextualisation may be required to add further interpretative value to it.
  4. targeting intermediary users rather than end user: the intermediary users all get to add their own value or style to the piece before the wider publication of the material, or use the data in support of other stories. That is, the final published form is not decided by the operator of the automatic content generator; rather, the automatically generated content is there to be customised, augmented, or used as supporting material, by an intermediary, or simply act as a “conversational” representation of a particular set of data provided to an intermediary.

robot_intermediatePR

The generation of the local datasets rom the national dataset is trivial – having generated code to slice out one dataset (by postcode or local authority, for example), we can slice out any other. The generation of the press releases from the local datasets can make use of the same template. This can be applied locally (a hyperlocal using it’s own template, for example) or centrally created and managed as part of a datawire service.

At the moment, the couple of automatically generated stories published with OnTheWight have been simple fact reporting, albeit via a human editor, rather than acting as the starting point for a more elaborate, contextualised, narrative report. But how might we extend this approach?

In the case of Jobseeker’s Allowance figures, contextualising paragraphs such as the recent closure of a local business, or the opening of another, as possible contributory factors to any month on month changes to the figures, could add colour or contextualisation to a monthly report.

Or we might invert the use of the figures, adding them as context to workforce, employment or labour related stories. For example, in the advent of a company closure, contextualisation of what the loss of numbers relative to local unemployment figures. (This fact augmented reporting is more likely to happen if the figures are readily available/to hand, as they are via autoresponder channels such as a Slackbot Data Wire.)

But I guess we have to start somewhere! And that somewhere is the simple (automatically produced, human copy edited) reporting of the facts.

PS in passing, I note via Full Fact that the Department of Health “will provide press officers [with an internal ‘data document’] with links to sources for each factual claim made in a speech, as well as contact details for the official or analyst who provided the information”, Department of Health to speed up responses to media and Full Fact. Which gets me thinking: what form might a press office publishing “data supported press releases” take, cf. a University Expert Press Room or Social Media Releases and the University Press Office, for example?

Robot Journalism in Germany

By chance, I came across a short post by uber-ddj developer Lorenz Matzat (@lorz) on robot journalism over the weekend: Robot journalism: Revving the writing engines. Along with a mention of Narrative Science, it namechecked another company that was new to me: [b]ased in Berlin, Retresco offers a “text engine” that is now used by the German football portal “FussiFreunde”.

A quick scout around brought up this Retresco post on Publishing Automation: An opportunity for profitable online journalism [translated] and their robot journalism pitch, which includes “weekly automatic Game Previews to all amateur and professional football leagues and with the start of the new season for every Game and detailed follow-up reports with analyses and evaluations” [translated], as well as finance and weather reporting.

I asked Lorenz if he was dabbling with such things and he pointed me to AX Semantics (an Aexea GmbH project). It seems their robot football reporting product has been around for getting on for a year (Robot Journalism: Application areas and potential[translated]) or so, which makes me wonder how siloed my reading has been in this area.

Anyway, it seems as if AX Semantics have big dreams. Like heralding Media 4.0: The Future of News Produced by Man and Machine:

The starting point for Media 4.0 is a whole host of data sources. They share structured information such as weather data, sports results, stock prices and trading figures. AX Semantics then sorts this data and filters it. The automated systems inside the software then spot patterns in the information using detection techniques that revolve around rule-based semantic conclusion. By pooling pertinent information, the system automatically pulls together an article. Editors tell the system which layout and text design to use so that the length and structure of the final output matches the required media format – with the right headers, subheaders, the right number and length of paragraphs, etc. Re-enter homo sapiens: journalists carefully craft the information into linguistically appropriate wording and liven things up with their own sugar and spice. Using these methods, the AX Semantics system is currently able to produce texts in 11 languages. The finishing touches are added by the final editor, if necessary livening up the text with extra content, images and diagrams. Finally, the text is proofread and prepared for publication.

A key technology bit is the analysis part: “the software then spot patterns in the information using detection techniques that revolve around rule-based semantic conclusion”. Spotting patterns and events in datasets is an area where automated journalism can help navigate the data beat and highlight things of interest to the journalist (see for example Notes on Robot Churnalism, Part I – Robot Writers for other takes on the robot journalism process). If notable features take the form of possible story points, narrative content can then be generated from them.

To support the process, it seems as if AX Semantics have been working on a markup language: ATML3 (I’m not sure what it stands for? I’d hazard a guess at something like “Automated Text ML” but could be very wrong…) A private beta seems to be in operation around it, but some hints at tooling are starting to appear in the form of ATML3 plugins for the Atom editor.

One to watch, I think…

Notes on Robot Churnalism, Part II – Robots in the Journalism Workplace

In the first part of this series (Notes on Robot Churnalism, Part I – Robot Writers), I reviewed some of the ways in which robot writers are able to contribute to the authoring of news content.

In this part, I will consider some of the impacts that might arise from robots entering the workplace.

Robot Journalism in the Workplace

“Robot journalists” have some competitive advantages which are hard for human journalists to compete with. The strengths of automated content generation are the low marginal costs, the speed with which articles can be written and the broad spectrum of sport events which can be covered.
Arjen van Dalen, The Algorithms Behind the Headlines, Journalism Practice, 6:5-6, 648-658, 2012, p652

One thing machines do better is create value from large amounts of data at high speed. Automation of process and content is the most under-explored territory for reducing costs of journalism and improving editorial output. Within five to 10 years, we will see cheaply produced information monitored on networks of wireless devices.
Post Industrial Journalism: Adapting to the Present, Chris Anderson, Emily Bell, Clay Shirky, Tow Center for Digital Journalism Report, December 3, 2014

Year on year, it seems, the headlines report how the robots are coming to take over a wide range of professional jobs and automate away the need to employ people to fill a wide range of currently recognised roles (see, for example, this book: The Second Machine Age [review], this Observer article: Robots are leaving the factory floor and heading for your desk – and your job, this report: The Future of Employment: How susceptible are jobs to computerisation? [PDF], this other report: AI, Robotics, and the Future of Jobs [review], and this business case: Rethink Robotics: Finding a Market).

Stories also abound fearful of a possible robotic takeover of the newsroom: ‘Robot Journalist’ writes a better story than human sports reporter (2011), The robot journalist: an apocalypse for the news industry? (2012), Can an Algorithm Write a Better News Story Than a Human Reporter? (2012), Robot Writers and the Digital Age (2013), The New Statesman could eventually be written by a computer – would you care? (2013), The journalists who never sleep (2014), Rise of the Robot Journalist (2014), Journalists, here’s how robots are going to steal your job (2014), Robot Journalist Finds New Work on Wall Street (2015).

It has to be said, though, that many of these latter “inside baseball” stories add nothing new, perhaps reflecting the contributions of another sort of robot to the journalistic process: web search engines like Google…

Looking to the academic literature, in his 2015 case study around Narrative Science, Matt Carlson describes how “public statements made by its management reported in news about the company reveal two commonly expressed beliefs about how its technology will improve journalism: automation will augment— rather than displace — human journalists, and it will greatly expand journalistic output” p420 (Matt Carlson (2015), The Robotic Reporter, Digital Journalism, 3:3, 416-431).

As with the impact of many other technological innovations within the workplace, “[a]utomated journalism’s ability to generate news accounts without intervention from humans raises questions about the future of journalistic labor” (Carlson, 2015, p422). In contrast to the pessimistic view that “jobs will lost”, there are at least two possible positive outcomes for jobs that may result from the introduction of a new technology: firstly, that the technology helps transform the original job and in so doing help make it more rewarding, or allows the original worker to “do more”; secondly, that the introduction of the new technology creates new roles and new job opportunities.

On the pessimistic side, Carlson describes how:

many journalists … question Narrative Science’s prediction that its service would free up or augment journalists, including Mathew Ingram (GigaOm, April 25, 2012): “That’s a powerful argument, but it presumes that the journalists who are ‘freed up’ because of Narrative Science … can actually find somewhere else that will pay them to do the really valuable work that machines can’t do. If they can’t, then they will simply be unemployed journalists.” This view challenges the virtuous circle suggested above to instead argue that some degree of displacement is inevitable.(Carlson, 2015, p423)

On the other hand:

[a]ccording to the more positive scenario, machine-written news could be complementary to human journalists. The automation of routine tasks offers a variety of possibilities to improve journalistic quality. Stories which cannot be covered now due to lack of funding could be automated. Human journalists could be liberated from routine tasks, giving them more time to spend on quality, in-depth reporting, investigative reporting. (van Dalen, p653)

This view thus represents the idea of algorithms working alongside the human journalists, freeing them up from the mundane tasks and allow them to add more value to a story… If a journalist has 20 minutes to spend on a story, if that time is spent searching a database and pulling out a set of numbers that may not even be very newsworthy, how much more journalistically productive could that journalist be if a machine gave them the data and a canned summary of it for free, then allowing the journalist to use the few minutes allocated to that story to take the next step – adding in some context, perhaps, or contacting a second source for comment?

A good example of the time-saving potential of automated copy production can be seen in the publication of earnings reports by AP, as reported by trade blog journalism.co.uk, who quoted vice president and managing editor Lou Ferrara’s announcement of a tenfold increase in stories from 300 per quarter produced by human journalists, to 3,700 with machine support (AP uses automation to increase story output tenfold, June, 2015).

The process AP went through during testing appears to be one that I’m currently exploring with my hyperlocal, OnTheWight, for producing monthly JobSeekers Allowance reports (here’s an example of the human produced version, which in this case was corrected after a mistake was spotted when checking that an in-testing machine generated version of the report was working correctly..! As journalism.co.uk reported about AP, “journalists were doing all their own manual calculations to produce the reports, which Ferrara said had ‘potential for error’.” Exactly the same could have been said of the OnTheWight process…)

In the AP case, “during testing, the earnings reports were produced via automation and journalists compared them to the relevant press release and figured out bugs before publishing them. A team of five reporters worked on the project, and Ferrara said they still had to check for everything a journalist would normally check for, from spelling mistakes to whether the calculations were correct.” (I wonder if they check the commas, too?!) The process I hope to explore with OnTheWight builds in the human checking route, taking the view that the machine should generate press-release style copy that does the grunt work in getting the journalist started on the story, rather than producing the complete story for them. At AP, it seems that automation “freed up staff time by one fifth”. The process I’m hoping to persuade OnTheWight to adopt is that to begin with, the same amount of time should be spent on the story each month, but month on month we automate a bit more and the journalistic time is then spent working up what the next paragraph might be, and then in turn automate the production of that…

Extending the Promise?

In addition to time-saving, there is the hope that the wider introduction of robot journalists will create new journalistic roles:

Beyond questions of augmentation or elimination, Narrative Science’s vision of automated journalism requires the transformation of journalistic labor to include such new positions as “meta-writer” or “metajournalist” to facilitate automated stories. For example, Narrative Science’s technology can only automate sports stories after journalists preprogram it with possible frames for sports stories (e.g., comeback, blowout, nail-biter, etc.) as well as appropriate descriptive language. After this initial programming, automated journalism requires ongoing data management. Beyond the newsroom, automated journalism also redefines roles for non-journalists who participate in generating data. (Carlson, 2015, p423)

In the first post of these series, I characterised the process used by Narrative Science which included the application of rules for detecting signals and angles, and the linkage of detected “facts” to story points within an a particular angle that could then be used to generate a narrative told through automatically generated natural language. Constructing angles, identifying logical processes that can identify signals and map them on to story elements, and generating turns of phrase that can help explicate narratives in a natural way are all creative acts that are likely to require human input for the near future at least, albeit tasking the human creative with the role of supporting the machine. This is not necessarily that far removed from the some of the skills already employed by journalists, however. As Carlson suggests, “Scholars have long documented the formulaic nature underlying compositional forms of news exposed by the arrival of automated news. … much journalistic writing is standardized to exclude individual voice. This characteristic makes at least a portion of journalistic output susceptible to automation” (p425). What’s changing, perhaps, is that now the journalists mush learn to capture those standardised forms and map them onto structures that act as programme fodder for their robot helpers.

Audience Development

Narrative Science also see potential in increasing the size of the total potential audience by accommodating the very specific needs of a large number of niche audiences.

“While Narrative Science flaunts the transformative potential of automated journalism to alter both the landscape of available news and the work practices of journalists, its goal when it comes to compositional form is conformity with existing modes of human writing. The relationship here is telling: the more the non-human origin of its stories is undetectable, the more it promises to disrupt news production. But even in emulating human writing, the application of Narrative Science’s automation technology to news prompts reconsiderations of the core qualities underpinning news composition. The attention to the quality and character of Narrative Science’s automated news stories reflects deep concern both with existing news narratives and with how automated journalistic writing commoditizes news stories.” Carlson, 2015, p424

In the midst of this mass of stories, it’s possible that there will be some “outliers” that are of more general interest which can, with some additional contextualisation and human reporting, be made relevant to a wider audience.

There is also the possible of searching for “meta-stories” that tell not the specifics of particular cases, but identify trends across the mass of stories as whole. (Indeed, it is by looking for such trends and patterns that outliers may be detected). In addition, patterns that only become relevant when looking across all the individual stories might in turn lead to additional stories. (For example, a failing school operated by a particular provider is perhaps of only local interest, but if it turns out that the majority of schools operated by a particular provider we turned round from excellent to failing by that provider, questions might, perhaps, be worth asking…?!)

When it comes to the case for expanding the range of content that is available, Narrative Science’s hope appears to be that:

[t]he narrativization of data through sophisticated artificial intelligence programs vastly expands the terrain of news. Automated journalism becomes a normalized component of the news experience. Moreover, Narrative Science has tailored its promotional discourse to reflect the economic uncertainty of online journalism business models by suggesting that its technology will create a virtuous circle in which increased news revenue supports more journalists (Carlson, 2015, p 421).

The alternative, fearful view, of course, is that revenues will be protected by reducing the human wage bill, using robot content creators operating at a near zero marginal cost on particular story types to replace human content creation.

Whether news organisations will use automation to extend the range of producers in the newsroom, or contribute to the reduction of human creative input to the journalistic process, is perhaps still to be seen. As Anderson, Bell & Shirky noted, “the reality is that most journalists at most newspapers do not spend most of their time conducting anything like empirically robust forms of evidence gathering.” Perhaps now is the time for them to stop churning the press releases and statistics announcements – after all, the machines can do that faster and better – and concentrate more on contextualising and explaining the machine generated stories, as well as spending more time out hunting for stories and pursuing their own investigative leads?

Notes on Robot Churnalism, Part I – Robot Writers

In Some Notes on Churnalism and a Question About Two Sided Markets, I tried to pull together a range of observations about the process of churnalism, in which journalists propagate PR copy without much, if any, critique, contextualisation or corroboration.

If that view in any way represents a fair description of how some pre-packaged content, at least, makes its way through to becoming editorial content, where might the robots fit in? To what extent might we start to see “robot churnalism“, and what form or forms might it take?

There are two particular ways in which we might consider robot churnalism:

  1. “robot journalists” that produce copy acts as a third conveyor belt complementary to PA-style wire and PR feedstocks;
  2. robot churnalists as ‘reverse’ gatekeepers, choosing what wire stories to publish where based on traffic stats and web analytics.

A related view is taken by Philip Napoli (“Automated media: An institutional theory perspective on algorithmic media production and consumption.” Communication Theory 24.3 (2014): 340-360; a shorter summary of the key themes can be found here) who distinguishes roles for algorithms in “(a) media consumption and (b) media production”. He further refines the contributions algorithms may make in media production by suggesting that “[t]wo of the primary functions that algorithms are performing in the media production realm at this point are: (a) serving as a demand predictor and (b) serving as content creator.”

Robot Writers

“Automated content can be seen as one branch of what is known as algorithmic news” writes Christer Clerwall (2014, Enter the Robot Journalist, Journalism Practice, 8:5, pp519-531), a key component of automated journalism “in which a program turns data into a news narrative, made possible with limited — or even zero — human input” (Matt Carlson (2015) The Robotic Reporter, Digital Journalism, 3:3, 416-431).

In a case study based around the activities of Narrative Science, a company specialising in algorithmically created, data driven narratives, Carlson further conceptualises “automated journalism” as “algorithmic processes that convert data into narrative news texts with limited to no human intervention beyond the initial programming”. He goes on:

The term denotes a split from data analysis as a tool for reporters encompassed in writings about “computational and algorithmic journalism” (Anderson 2013) to indicate wholly computer-written news stories emulating the compositional and framing practices of human journalism (ibid, p417).

Even several years ago, Arjen van Dalen observed that “[w]ith the introduction of machine-written news computational journalism entered a new phase. Each step of the news production process can now be automated: “robot journalists” can produce thousands of articles with virtually no variable costs” (The Algorithms Behind the Headlines, Journalism Practice, 6:5-6, 648-658, 2012, p649).

Sport and financial reporting examples abound from the bots of Automated Insights and Narrative Science (for example, Notes on Narrative Science and Automated Insights or Pro Publica: How To Edit 52,000 Stories at Once, and more recently e.g. Robot-writing increased AP’s earnings stories by tenfold), with robot writers generating low-cost content to attract page views, “producing content for the long tail, in virtually no time and with low additional costs for articles which can be produced in large quantities” (ibid, p649).

Although writing back in 2012, van Dalen noted in his report on “the responses of the journalistic community to automatic content creation” that:

[t]wo main reasons are mentioned to explain why automated content generation is a trend that needs to be taken seriously. First, the journalistic profession is more and more commercialized and run on the basis of business logics. The automation of journalism tasks fits in with the trend to aim for higher profit margins and lower production costs. The second reason why automated content creation might be successful is the quality of stories with which it is competing. Computer-generated news articles may not be able to compete with high quality journalism provided by major news outlets, which pay attention to detail, analysis, background information and have more lively language or humour. But for information which is freely available on the Internet the bar is set relatively low and automatically generated content can compete (ibid, p651).

As Christer Clerwall writes in Enter the Robot Journalist, (Journalism Practice, 8:5, 2014, pp519-531):

The advent of services for automated news stories raises many questions, e.g. what are the implications for journalism and journalistic practice, can journalists be taken out of the equation of journalism, how is this type of content regarded (in terms of credibility, overall quality, overall liking, to mention a few aspects) by the readers? p520.

van Dalen puts it thus:

Automated content creation is seen as serious competition and a threat for the job security of journalists performing basic routine tasks. When routine journalistic tasks can be automated, journalists are forced to offer a better product in order to survive. Central in these reflections is the need for journalists to concentrate on their own strengths rather than compete on the strengths of automated content creation. Journalists have to become more creative in their writing, offer more in-depth coverage and context, and go beyond routine coverage, even to a larger extent than they already do today (ibid, p653).

He then goes on to produce the following SWOT analysis to explore just how the humans and the robots compare:

algo_behind_headlines

One possible risk associated with the automated production of copy is that it becomes published without human journalistic intervention, and as such is not necessarily “known”, or even read, by any member at all of the publishing organisation. To paraphrase Daniel Jackson and Kevin Moloney, “Inside Churnalism: PR, journalism and power relationships in flux”, Journalism Studies, 2015, this would represent an extreme example of churnalism in the sense of “the use of unchecked [robot authored] material in news”.

This is dangerous, I think, on many levels. The more we leave the setting of the news agenda and the identification of news values to machines, the more we lose any sensitivity to what’s happening in the world around us and what stories are actually important to an audience as opposed to merely being Like-bait titillation. (As we shall see, algorithmic gatekeepers that channel content to audiences based on various analytics tools respond to one definition of what audiences value. But it is not clear that these are necessarily the same issues that might weigh more heavily in a personal-political sense. Reviews of the notion of “hard” vs. “soft” news (e.g. Scherr, S., & Legnante, G. (2011). Hard and soft news: A review of concepts, operationalizations and key findings. Journalism, 13(2) pp221–239)) may provide lenses to help think about this more deeply?)

Of course, machines can also be programmed to look for links and patterns across multiple sources of information and at far greater scale than a human journalist could hope to cover, but we are then in danger of creating some sort of parallel news world, where events are only recognised, “discussed” and acted upon by machines and human actors are oblivious to them. (For an example, The Wolf of Wall Tweet: A Web-reading bot made millions on the options market. It also ate this guy’s lunch that describes how bots read the news wires and trade off the back them. They presumably also read wire stories created by other bots…)

So What It Is That Robot Writers Actually Do All Day?

In a review of Associated Press’ use of Automated Insight’s Wordsmith application (In the Future, Robots Will Write News That’s All About You), Wired reported that Wordsmith “essentially does two things. First, it ingests a bunch of structured data and analyzes it to find the interesting points, such as which players didn’t do as well as expected in a particular game. Then it weaves those insights into a human readable chunk of text.”

One way of getting deeper into the mind of a robot writer is to look to the patents held by the companies who develop such applications. For example, in The Anatomy of a Robot Journalist, one process used by Narrative Science is characterised as follows:

narrativeScience

Identifying newsworthy features (or story points) is a process of identifying features and then filtering out the ones that are somehow notable. Angles are possibly defined as in terms of sets of features that need to be present within a particular dataset for that angle to provide a possible frame for story. The process of reconciling interesting features with angle points populates the angle with known facts, and a story engine then generates the natural language text within a narrative structure suited to an explication of the selected angle.

(An early – 2012 – presentation by Narrative Science’s Larry Adams also reviews some of the technicalities: Using Open Data to Generate Personalized Stories.)

In actual fact, the process may be a relatively straightforward one, as demonstrated by the increasing numbers of “storybots” that populate social media. One well known class of examples are earthquake bots that tweet news of earthquakes (see also: When robots help human journalists: “This post was created by an algorithm written by the author”). (It’s easy enough to see various newsworthiness filters might work here: a geo-based one for reporting a story locally, a wider interest one for reporting an earthquake above a particular magnitude, and so on.)

It’s also easy enough to create your own simple storybot (or at least, an “announcer bot”) using something like IFTT that can take in an RSS feed and make a tweet announcement about each new item. A collection of simple twitterbots produced as part of a journalism course on storybots, along with code examples, can be found here: A classroom experiment in Twitter Bots and creativity. Here’s another example, for a responsive weatherbot that tries to geolocate someone sending a message to the bot and respond to them with a weather report for their location.


Not being of a journalistic background, and never having read much on media or communications theory, I have to admit I don’t really have a good definition for what angles are, or a typology for them in different topic areas, and I’m struggling to find any good structural reviews of the idea, perhaps because it’s so foundational? For now, I’m sticking with a definition of “an angle” as being something along the lines of the thing you want focus on and dig deeper around within the story (the thing you want to know more about or whose story you want to tell; this includes abstract things: the story of an indicator value for example, over time). The blogpost Framing and News Angles: What is Bias? contrasts angles with the notions of framing and bias. Entman, Robert M. “Framing: Towards clarification of a fractured paradigm.” McQuail’s reader in mass communication theory (1993): 390-397 [pdf] seems foundational in terms of the framing idea, De Vreese, Claes H. “News framing: Theory and typology.” Information design journal & document design 13.1 (2005): 51-62 [PDF] offers a review (of sorts) of some related literature, and Reinemann, C., Stanyer, J., Scherr, S., & Legnante, G. (2011). Hard and soft news: A review of concepts, operationalizations and key findings. Journalism, 13(2) pp221–239 (PDF) perhaps provides another way in to related literature? Bias is presumably implicit in the selection of any particular frame or angle? Blog posts such as What makes a press release newsworthy? It’s all in the news angle look to be linkbait, perhaps even stolen content (eg here’s a PDF), but I can’t offhand find a credible source or inspiration for the original list? Resource packs like this one on Working with the Media from the FAO gives a crash course into what I guess are some of the generally taught basics around story construction?