If Only I’d Been More Focussed… National-Local Data Robot Media Wire

And so it came to pass that Urbs Media started putting out their Arria NLG generated local data stories, customised from national data sets, on the PA news wire, as reported by the Press GazetteFirst robot-written stories from Press Association make it into print in ‘world-first’ for journalism industry – and Hold the Front Page: Regional publishers trial new PA robot reporting project.

Ever keen to try new approaches out, my local hyperlocal, OnTheWight, have already run a couple of the stories. Here’s an example: Few disadvantaged Isle of Wight children go to university, figures show.

Long term readers might remember that this approach is one that OnTheWight have explored before, of course, as described in OnTheWight: Back at the forefront of next wave of automated article creation.

Back in 2015, I teamed up with them explore some ideas around “robot journalism”, reusing some of my tinkerings to automate the production of a monthly data story OnTheWight run around local jobless statistics. You can see a brief review from the time here and an example story from June 2015 here. The code was actually developed a bit further to include some automatically generated maps (example) but the experiment had petered out by then (“musical differences”, as I recall it!;-) (I think we’re talking again now.. ;-) I’d half imagined actually making a go of some sort of offering around this, but hey ho… I still have some related domains I bought on spec at the time…

At the time, we’d been discussing ways for what to do next. The “Big Idea” as I saw it was that doing the work of churning through a national dataset, with data at the local level, once, (for OntheWight), meant that the work was already done for everywhere.


To this end, I imagined a “datawire” – you can track the evolution of that phrase through OUseful.info posts here – that could be used to distribute localised press releases automatically generated from national datasets. One of the important things for OnTheWIght was the importance of getting data reports out quickly once a data set had been released. (I seem to remember we raced each other – the manual route versus the robot one.) My tools weren’t fully automated – I had to keep hitting reload to fetch the data rather than having a cron job start pinging the Nomis website around the time of the official release, but that was as much because I didn’t run any servers as anything. One thing we did do was automatically push the robot generated story into the OnTheWight WordPress blog draft queue, from where it could be checked and published by a human editor. The images were handled circuitously (I don’t think I had a key to push image assets to the OnTheWight image server?)

The data wire idea was actually sketched out a couple of years ago at a community journalism conference (Time for a Local Data Wire?), and that was perhaps where our musical differences about the way forward started to surface? :-(

One thing you may note is the focus on producing press releases, with the intention that a journalist could build a story around the data product, rather than the data product standing in wholesale for a story.

I’m not sure this differs much from the model being pursued by Urbs Media, the organisation that’s creating the PA data stories, and that is funded in part at least by a Google Digital News Initiative (DNI) grant: PA awarded €706,000 grant from Google to fund a local news automation service in collaboration with Urbs Media.

FWIW, give me three quarters of a million squids, or Euros, and that’d do me as a private income for the rest of working my life; which means I’d be guilt free enough to play all time…!

One of the things that I think the Urb stories are doing is including quotes on the national statistical context taken from the original data release. For example:

Which reminds me – I started to look at the ONS JSON API when it appeared (example links), but don’t think I got much further than an initial play... One to revisit, to see if it can be used as a source from which automated quote extraction is possible…

Something our original job stats stories didn’t really get to evolve as far  as being the inspiration for contextualising reporting – they were more or less a literal restatement of the “data generated press release”. I seem to recall that this notion of data-to-text-to-published-copy started to concern me, and I began to explore it in a series of posts on “robot churnalism” (for example, Notes on Robot Churnalism, Part I – Robot Writers and Notes on Robot Churnalism, Part II – Robots in the Journalism Workplace).

(I don’t know how many of the stories returned in that search were from PA stories. I think that regional news group operators such as Johnston Press and Archant also run national units producing story templates that can be syndicated, so some templated stories may come from there.)

I think there are a couple more posts in that series still in my draft queue somewhere which I may need to finish off… Perhaps we’ll see how the new stories start to play out to see whether we start to see the copy being reprinted as is or being used to inspire more contextualised local reporting around the data.

I also recall presenting on the topic of “Robot Writers” at ILI in 2016 (I wasn’t invited back this year:-(

So what sort of tech is involved in producing the PA data wire stories? From the preview video on the Urbs Media website, the technology behind the Radar project –  Reporters and Data and Robots  – looks to be the Articulator Lite application developed by Arria NLG. If you haven’t been keeping up, Arria NLG is the UK equivalent of companies like Narrative Science and Automated Insights in the US which I’ve posted about on and off for the last few years (for example, Notes on Narrative Science and Automated Insights).

Anyway, it’ll be interesting to see how the PA / Urbs Media thing plays out. I don’t know if they’re automating the charts’n’maps production thing yet, but if they do then I hope they generate easily skinnable graphic objects that can be themed using things like ggthemes or matplotlib styles.

There’s a stack of practical issues and ethical issues associated with this sort of thing, and it’ll be interesting to see if any concerns start to be aired, or oopses appear. The reporting around the Births by parents’ characteristics in England and Wales: 2016 could easily be seen as judgemental, for example.

PS I wonder if they run a Slack channel data wire? Slackbot Data Wire, Initial Sketch Maybe there’s still a gap in the market for one of my ideas?! ;-)