Data Reporting, not Data Journalism?

As a technology optimist, I don’t tend to go in so much for writing up long critical pieces about tech (i.e. I don’t do the “proper” academic thing), instead preferring to spend time trying to work out how make use of it, either on its own or in tandem with other technologies. (My criticality is often limited to quickly ruling out things I don’t want to waste my time on because they lack interestingness, or in spending time how to appropriate one thing to do something it perhaps wasn’t intended to do originally.)

I also fall in to the camp of lacking in confidence about things other people think I know about, generally assuming there are probably “proper” ways of doing things for someone properly knowledgeable in a tradition, although I don’t know what they are. (Quite where this situates me on a scale of incompetence, I’m not sure… e.g. Why the Unskilled Are Unaware: Further Explorations of (Absent) Self-Insight Among the Incompetent h/t @Downes).

A couple of days ago, whilst doodling with a data-to-text notebook (the latest incarnation of which can be found here) I idly asked @fantasticlife whether it was an example of data journalism (we’ve been swapping links about bot-generated news reports for some time), placing it also in the context of The Changing Task Composition of the US Labor Market: An Update of Autor, Levy, and Murnane (2003), David H. Autor & Brendan Price,June 21, 2013, and in particular this chart about the decline in work related “routine cognitive tasks”:

economics_mit_edu_files_9758

His response? “No, that’s data reporting”.

So now I’m wondering: how do data reporting and data journalism differ? And to what extent is writing some code along the lines of:

def pc(amount,rounding=''):
    if rounding=='down': rounding=decimal.ROUND_DOWN
    elif rounding=='up': rounding=decimal.ROUND_UP
    else: rounding=decimal.ROUND_HALF_UP

    ramount=float(decimal.Decimal(100 * amount).quantize(decimal.Decimal('.1'), rounding=rounding))
    return '{0:.1f}%'.format(ramount)

def otwMoreLess(now,then):
    delta=now-then
    if delta>0:
        txt=random.choice(['more'])
    elif delta<0:
        txt=random.choice(['less'])
    return txt

def otwPCmoreLess(this,that):
    delta=this-that
    return '{delta} {diff}'.format(delta=pc(abs(delta)),diff=otwMoreLess(this,that))

otw3='''
That means {localrate} of the resident {localarea} population {poptype} are {claim} \
– {regiondiff} than the rest of the {region} ({regionrate}), \
and {ukdiff} than the whole of the UK ({ukrate}).
'''.format(localrate=pc(jsaLocal_rate),
           localarea=jsaLocal['GEOGRAPHY_NAME'].iloc[0],
           poptype=decase(get16_64Population(localcode)['CELL_NAME'].iloc[0].split('(')[1].split(' -')[0]),
           claim=decase(jsaLocal['MEASURES_NAME'].iloc[0]),
           regiondiff=otwPCmoreLess(jsaLocal_rate,jsaRegion_rate),
           region=jsaRegion['GEOGRAPHY_NAME'].iloc[0],
           regionrate=pc(JSA_rate(regionCode)),
           ukdiff=otwPCmoreLess(jsaLocal_rate,jsaUK_rate),
           ukrate=pc(jsaUK_rate))

print(otw3)

to produce text from nomis data of the form:

That means 1.9% of the resident Isle of Wight population aged 16-64 are persons claiming JSA – 0.5% more than the rest of the South East (1.4%), and 0.5% less than the whole of the UK (2.4%).

an output that is data reporting, an example of journalistic practice? For example, is locating the data source a journalistic act? Is writing a script to parse the data into words through the medium of code a journalistic act?

Does it become “more journalistic” if the text generating procedure comments on the magnitude of the changes? For example, using something like:

def _txt6(filler=','):
    
    def _magnitude(term):

        #A heuristic here
        if propDelta<0.05:
            mod=random.choice(['slight'])
        elif propDelta<0.10:
             mod=random.choice(['significant'])
        else:
             mod=random.choice(['considerable','large' ])
        term=' '.join([mod,term])
        return p.a(term)
            
    txt6=txt2[:-1]
    
    propDelta= abs(yeardelta)/mostRecent['Total']
    
    if yeardelta==0:
        txt6+=', exactly the same amount.'
    else:
        if yeardelta <0: direction=_magnitude(random.choice([ 'decrease','fall']))
        else: direction=_magnitude(random.choice(['increase','rise']))
        
        txt6+='{_filler} {0} of {1} since then.'.format(p.a(direction),
                                                         abs(yeardelta),
                                                         _filler= filler )
    return txt6

print(txt1)
print(_txt6())

to produce further qualified text (in terms of commenting on the magnitude of the amount) that can take a variety of slightly different forms?

– The most recent figure (August 2014) for persons claiming JSA for the Isle of Wight area is 1502.
– This compares with a figure of 2720 from a year ago (August 2013), a large fall of 1218 since then.
– This compares with a figure of 2720 from a year ago (August 2013), a considerable fall of 1218 since then

At what point does the interpretation we can bake into the text generator become “journalism”, if at all? Can the algorithm “do journalism”? Is the crafting of the algorithm “journalism”? Or it is just computer assisted reporting?

In the context of routine cognitive tasks – such as the task of reporting the latest JSA figures in this town or that city – is this sort of automation likely? Is it replacing one routine cognitive task with another, or is it replacing it with a “non-routine analytical task”? Might it a be a Good Thing, allowing the ‘reporting’ to be done automatically, at least in draft form, and freeing up journalists to do other tasks? Or a Bad Thing, replace a routine, semi-skilled task by automation? To what extent might the algorithms so embed more intelligence and criticism, for example, automatically flagging up figures that are “interesting” in some way? (Would that be algorithmic journalism? Or would it be more akin to an algorithmic stringer?)

PS Twitter discussion thread around this post from 29/9/14

14 comments

  1. Andy Turner

    Please forgive me, I am no expert either. I consider the difference as something like the difference between data processing and data analysis which produces information and debate. Data reporting that is contextualised and is grounded in and gets set within reflective discussion makes it more than reporting. Data reporting like data exploration can be the basis of forming understanding and opinion which can be wrapped up with reflection and reference to an existing narrative, it can also be used to provoke a reaction and start a narrative. Perhaps data journalism has to have an interactive element to it?

    • Tony Hirst

      @Andy

      As a starting point, if reporting is a “relaying of facts” function, then I imagine journalistic elements to include:
      – editorial selection of what facts to relay;
      – some sort of quality control (verification or validation of the facts, for example, or a qualification that they ‘are as yet unverified/unconfirmed’ etc);
      – the addition of any contextualisation or explanation of the facts.

      If we have data reporting as “simply” the relaying of the facts in a “readable” way, then the journalistic process might be what marshals the code and the data so that it can produce the report?

      “Data journalism” is then perhaps more about the way in which a journalist uses one or more data sets as a “source” in the discovery or development of a story?

  2. Andy Turner

    I’ve got an idea what data journalism is, but I don’t have the confidence yet to add a link and create a new Wikipedia page on it to link from https://en.wikipedia.org/wiki/Journalism. In terms of the people doing the work, maybe there is another difference. There are some important differences between university researchers and journalists especially in terms of ethics and protecting data and sources. I’m not sure if any of that would also be relevant to the differences between data reporting and data journalism.

  3. Pingback: Datenjournalismus im November 2014 | Datenjournalist
  4. Pingback: blognetnews » Datenjournalismus im November 2014