Tagged: ddj

Routine Sources, Court Reporting, the Data Beat and Metadata Journalism

In The Re-Birth of the “Beat”: A hyperlocal online newsgathering model (Journalism Practice 6.5-6 (2012): 754-765), Murray Dick cites various others to suggest that routine sources are responsible for generating a significant percentage of local news reports:

Schlesinger [Schlesinger, Philip (1987) Putting ‘Reality’ Together: BBC News. Taylor & Francis: London] found that BBC news was dependent on routine sources for up to 80 per cent of its output, while later [Franklin, Bob and Murphy, David (1991) Making the Local News: Local Journalism in Context. Routledge: London] established that local press relied upon local government, courts, police, business and voluntary organisations for 67 per cent of their stories (in [Keeble, Richard (2009) Ethics for Journalists, 2nd Edition. Routledge: London], p114-15)”].

As well as human sources, news gatherers may also look to data sources at either a local level, such as local council transparency (that is, spending data), or national data sources with a local scope as part of a regular beat. For example, the NHS publish accident and emergency statistics as the provider organisation level on a weekly basis, and nomis, the official labour market statistics publisher, publish unemployment figures at a local council level on a monthly basis. Ratings agencies such as the Care Quality Commission (CQC) and the Food Standards Agency (FSA) publish inspections data for local establishments as it becomes available, and other national agencies publish data annually that can be broken down to a local level: if you want to track car MOT failures at the postcode region level, the DVLA have the data that will help you do it.

To a certain extent, adding data sources to a regular beat, or making a beat purely from data sources enables the automatic generation of data driven press releases that can be used to shorten the production process of news reports about a particular class of routine stories that are essentially reports about “the latest figures” (see, for example, my nomis Labour Market Statistics textualisation sketch).

Data sources can also be used to support the newsgathering process by processing the data in order to raise alerts or bring attention to particular facts that might otherwise go unnoticed. Where the data has a numerical basis, this might relate to sorting a national dataset on the basis of some indicator value or other and highlighting to a particular local news outlet that their local X is in the top M or bottom N of similar establishments in the rest of the country, and that there may be a story there. Where the data has a text basis, looking for keywords might pull out paragraphs or records that are of particular interest, or running a text through an entity recognition engine such as Thomson Reuters’ OpenCalais might automatically help identify individuals or organisations of interest.

In this context of this post, I will be considering the role that metadata about court cases that is contained within court lists and court registers might have to play in helping news media identify possibly newsworthy stories arising from court proceedings. I will also explore the extent to which the metadata may be processed, both in order to help identify court proceedings that may be worth reporting on, as well to produce statistical summaries that may in themselves be newsworthy and provide a more balanced view over the activity of the courts than the impression one might get about their behaviour simply from the balance of coverage provided by the media.

Continue reading

Local Data Journalism – Care Homes

The front page of this week’s Isle of Wight County Press describes a tragic incident relating to a particular care home on the Island earlier this year:

20140914_103541

20140914_103611

(Unfortunately, the story doesn’t seem to appear on the County Press’ website? Must be part of a “divide the news into print and online, and never the twain shall meet” strategy?!)

As I recently started pottering around the CQC website and various datasets they publish, I thought I’d jot down a few notes about what I could find. The clues from the IWCP article were the name of the care home – Waxham House, High Park Road, Ryde – and the proprietor – Sanjay Ramdany.

Using the “CQC Care Directory – With Filters” from the CQC data and information page, I found a couple of homes registered to that provider.

1-120578256, 19/01/2011, Waxham House, 1 High Park Road, Ryde, Isle of Wight, PO33 1BP
1-120578313, 19/01/2011, Cornelia Heights 93 George Street, Ryde, Isle of Wight, PO33 2JE

1-101701588, Mr Sanjay Prakashsingh Ramdany & Mrs Sandhya Kumari Ramdany
	

Looking up “Waxham House” on the CQC website gives us a copy of the latest report outcome:

Waxham_House

Looking at the breadcrumb navigation, it seems we can directly get a list of other homes operated by the same proprietors:

cqc provider

I wonder if we can search the site by proprietor name too?

cqc properieot search

Looks like it…

So how did their other home fare?

Cornelia_Heights

Cornelia_Heights2

Hmmm…

By the by, according to the Food Standards Agency, how’s the food?

Food_Standards_Agency_-waxham

Food_Standards_Agency_-cornelia

And how much money is the local council paying these homes?

(Note – I haven’t updated the following datasets for a bit – I also note I need to add dates to the transaction tables. local spending explorer info; app.)

[Click through on the image to see the app – hit Search to remove the error message and load the data!]

IW_Council_Spending_Explorer_waxham

IW_Council_Spending_Explorer_cornelia

Why the refunds?

A check on OpenCorporates for director names turned up nothing.

I’m not trying to offer any story here about the actual case reported by the County Press, more a partial story about how we can start to look for data around a story to see if there may be more to the story we can find from open data sources.

AP Business Wire Service Takes on Algowriters

Via @simonperry, news that AP will use robots to write some business stories (Automated Insights are one of several companies I’ve been tracking over the years who are involved in such activities, eg Notes on Narrative Science and Automated Insights).

The claim is that using algorithms to do the procedural writing opens up time for the journalists to do more of the sensemaking. One way I see this is that we can use data2text techniques to produce human readable press releases of things like statistical releases, which has a couple of advantages at least.

Firstly, the grunt – and error prone – work of running the numbers (calculating month on month or year on year changes, handling seasonal adjustments etc) can be handled by machines using transparent and reproducible algorithms. Secondly, churning numbers into simple words (“x went up month on month from Sept 2013 to Oct 2013 and down year on year from 2012″) makes them searchable using words, rather than having to write our own database or spreadsheet queries with lots of inequalities in them.

In this respect, something that’s been on my to do list for way to long is to produce some simple “press release” generators based on ONS releases (something I touched on in Data Textualisation – Making Human Readable Sense of Data).

Matt Waite’s upcoming course on “automated story bots” looks like it might produce some handy resources in this regard (code repo). In the meantime, he already shared the code described in How to write 261 leads in a fraction of a second here: ucr-story-bot.

For the longer term, on my “to ponder” list is what might something like “The Grammar of Graphics” be for data textualisation? (For background, see A Simple Introduction to the Graphing Philosophy of ggplot2.)

For example, what might a ggplot2 inspired gtplot library look like for converting data tables not into chart elements, but textual elements? Does it even make sense to try to construct such a grammar? What would the corollaries to aesthetics, geoms and scales be?

I think I perhaps need to mock-up some examples to see if anything comes to mind and that the function names, as well as the outputs, might look like, let alone the code to implement them! Or maybe code first is the way, to get a feel for how to build up the grammar from sensible looking implementation elements? Or more likely, perhaps a bit of iteration may be required?!

Personal Recollections of the “Data Journalism” Phrase

@digiphile’s being doing some digging around current popular usage of the phrase data journalism – here are my recollections…

My personal recollection of the current vogue is that “data driven journalism” was the phrase that dominated the discussions/community I was witness to around early 2009, though for some reason my blog doesn’t give any evidence for that (must take better contemporaneous notes of first noticings of evocative phrases;-). My route in was via “mashups”, mashup barcamps, and the like, where folk were experimenting with building services on newly emerging (and reverse engineered) APIs; things like crime mapping and CraigsList maps were in the air – putting stuff on maps was very popular I seem to recall! Yahoo were one of the big API providers at the time.

I noted the launch of the Guardian datablog and datastore in my personal blog/notebook here – http://blog.ouseful.info/2009/03/10/using-many-eyes-wikified-to-visualise-guardian-data-store-data-on-google-docs/ – though for some reason don’t appear to have linked to a launch post. With the arrival of the datastore it looked like there were to be “trusted” sources of data we could start to play with in a convenient way, accessed through Google docs APIs:-) Some notes on the trust thing here: http://blog.ouseful.info/2009/06/08/the-guardian-openplatform-datastore-just-a-toy-or-a-trusted-resource/

NESTA did an event on News Innovation London in July 2009, a review of which by @kevglobal mentions “discussions about data-driven journalism” (sic on the hyphen). I seem to recall that Journalism.co.uk (@JTownend) were also posting quite a few noticings around the use of data in the news at the time.

At some point, I did a lunchtime at the Guardian for their developers – there was a lot about Yahoo Pipes, I seem to remember! (I also remember pitching the Guardian Platform API to developers in the OU as a way of possibly getting fresh news content into courses. No-one got it…) I recall haranguing Simon Rogers on a regular basis about their lack of data normalisation (which I think in part led to the production of the Rosetta Stone spreadsheet) and their lack of use (at the time) of fusion tables. Twitter archives may turn something up there. Maybe Simon could go digging in the Twitter archives…?;-)

There was a session on related matters at the first(?) news:rewired event in early 2010 but I don’t recall the exact title of the session (I was in a session with Francis Irving/@frabcus from the then nascent Scraperwiki) http://blog.ouseful.info/2010/01/14/my-presentation-for-newsrewired-doing-the-data-mash/ Looking at the content of that presentation, it’s heavily dominated by notions of data flow; the data driven journalism (hence #ddj) phrase, seemed to fit this well.

Later that year, summer, was a roundtable event hosted by the ECJ on “data driven journalism” – I recall meeting Mirko Lorenz there (who maybe had a background in business data? and since helped launch datawrapper.de) and Jonathan Gray – who then went on to help edit the Data Journalism handbook – among others.
http://blog.ouseful.info/2010/08/25/my-slides-from-the-data-driven-journalism-round-table-ddj/

For me the focus at the time was very much on using technology to help flow data into useable content, (eg in a similar but perhaps slightly weaker sense than the more polished content generation services that Narrative Science/Automated Insights have since come to work on, or other data driven visualisations or what I guess we might term local information services; more about data driven applications with a weak local news/specific theme or issue general news relevance, perhaps). I don’t remember where the sense of the journalist was in all this – maybe as someone who would be able to take the flowed data, or use tools that were being developed to get the stories out of data with tech support?

My “data driven journalism” phrase notebook timeline
http://blog.ouseful.info/?s=%22data%20driven%20journalism%22&order=asc

My “data journalist” phrase notebook timeline
http://blog.ouseful.info/?s=%22data%20journalist%22&order=asc

My first blogged used of the data journalism phrase, in quotes, as it happens, so it must have been a relatively new sounding phrase to me, was here: http://blog.ouseful.info/2009/05/20/making-it-a-little-easier-to-use-google-spreadsheets-as-a-database-hopefully/ (h/t @paulbradshaw)

Seems like my first use of the “data journalist” phrase was in picking up on a job ad – so certainly the phrase was common to me by then.
http://blog.ouseful.info/2010/12/04/what-is-a-data-journalist/

As a practice and a commonplace, things still seemed to be developing in 2011 enough for me to comment on a situation where the Guardian and Telegraph teams were co-opetitively bootstrapping each other’s ideas: http://blog.ouseful.info/2011/09/13/data-journalists-engaging-in-co-innovation/

I guess the deeper history of CAR, database journalism, precision journalism may throw off trace references, though maybe not representing situations that led to the phrase gaining traction in “popular” usage?

Certainly, now I’m wondering what the relative rise in popularity of “data journalist” versus “data journalism” was? Certainly, for me, “data driven journalism” was a phrase I was familiar with way before the other two, though I do recall a sense of unease about it’s applicability to news stories that were perhaps “driven” by data more in the sense of being motivated or inspired by it, or whose origins lay in a data set, rather than “driven” in a live, active sense of someone using an interface that was powered by flowing data.

An(other Attempt at an) Intro to Data Journalism…

I was pleased to be invited back to the University of Lincoln again yesterday to give a talk on data journalism to a couple of dozen or so journalism students…

I’ve posted a copy of the slides, as well as a set of annotated handouts onto slideshare, and to get a bump in my slideshare stats for meaningless page views, I’ve embedded the latter here too…

I was hoping to generate a copy of the slides (as images) embedded in a markdown version of the notes but couldn’t come up with a quick recipe for achieving that…

When I get a chance, it looks as if the easiest way will be to learn some VBA/Visual Basic for Applications macro scripting… So for example:

* How do I export powerpoint slide notes to individual text files?
* Using VBA To Export PowerPoint Slides To Images

If anyone beats me to it, I’m actually on a Mac, so from the looks of things on Stack Overflow, hacks will be required to get the VBA to actually work properly?

So what is a data journalist exactly? A view from the job ads…

A quick snapshot of how the data journalism scene is evolving at the moment based on job ads over the last few months…

Via mediauk, I’m not sure when this post for a Junior Data Journalist, Trinity Mirror Regionals (Manchester) was advertised (maybe it was for its new digital journalism unit?)? Here’s what they were looking for:

Trinity Mirror is seeking to recruit a junior data journalist to join its new data journalism unit.

Based in Manchester, the successful applicant will join a small team committed to using data to produce compelling and original content for its website and print products.

You will be expected to combine a high degree of technical skill – in terms of finding, interrogating and visualising data – with more traditional journalistic skills, like recognising stories and producing content that is genuinely useful to consumers.

Reporting to the head of data journalism, the successful candidate will be expected to help create and develop data-based packages, solve problems, find and ‘scrape’ key sources of data, and assist with the production of regular data bulletins flagging up news opportunities to editors and heads of content across the group.

You need to have bags of ideas, be as comfortable with sport as you are with news, know the tools to source and turn data into essential information for our readers and have a strong eye for detail.
This is a unique opportunity for a creative, motivated and highly-skilled individual to join an ambitious project from its start.

You will be expected to combine a high degree of technical skill – in terms of finding, interrogating and visualising data – with more traditional journalistic skills, like recognising stories and producing content that is genuinely useful to consumers.

Reporting to the head of data journalism, the successful candidate will be expected to help create and develop data-based packages, solve problems, find and ‘scrape’ key sources of data, and assist with the production of regular data bulletins flagging up news opportunities to editors and heads of content across the group.

You need to have bags of ideas, be as comfortable with sport as you are with news, know the tools to source and turn data into essential information for our readers and have a strong eye for detail.

News International were also recruiting a data journalist earlier this year, but I can’t find a copy of the actual ad.

From March, £23k-26k pa was on offer for a “Data Journalist” role that involved:

Identification of industry trends using quantitative-based research methods
Breaking news stories using digital research databases as a starting point
Researching & Analysing commercially valuable data for features, reports and events

Maintaining the Insolvency Today Market Intelligence Database (MID)
Mastering search functions and navigation of public databases such as London Gazette, Companies House, HM Court Listings, FSA Register, etc.
Using data trends as a basis for news stories and then using qualitative methods to structure stories and features.
Researching and producing content for the Insolvency cluster of products. (eg. Insolvency Today, Insolvency News, Insolvency BlackBook, Insolvency & Rescue Awards, etc.)
Identifying new data sources and trends, relevant to the Insolvency cluster.
Taking news stories published from rival sources and creating ‘follow up’ and analysis pieces using fresh data.
Occasional reporting from the High Court.
Liaising with the sales, events and marketing teams to share relevant ideas.
Sharing critical information with, and supporting sister editorial teams in the Credit and Payroll clusters.
Attending industry events to build contacts, network and represent the company.

On the other hand, a current rather clueless looking ad is offering £40k-60k for a “Data Journalist/Creative Data Engineer”:

Data Journalist/Creative Data Engineer is required by a leading digital media company based in Central London. This role is going to be working alongside a team of data modellers/statistical engineers and “bringing data to life”; your role will specifically be looking over data and converting it from “technical jargon” to creative, well written articles and white papers. This role is going to be pivotal for my client and has great scope for career progression.

To be considered for this role, you will ideally be a Data Journalist at the moment in the digital media space. You will have a genuine interest in the digital media industry and will have more than likely produced a white paper in the past or articles for publications such as AdAge previously. You will have a creative mind and will feel confident taking information from data and creating creative and persuasive written articles. Whilst this is not a technical role by anymeans it would definitely be of benefit if you had some basic technical knowledge with data mining or statistical modelling tools.

Here’s what the Associated Press were looking for from “Newsperson/Interactive Data Journalist”:

The ideal candidate will have experience with database management, data analysis and Web application development. (We use Ruby for most of our server-side coding, but we’re more interested in how you’ve solved problems with code than in the syntax you used to solve them.) Experience with the full lifecycle of a data project is vital, as the data journalist will be involved at every stage: discovering data resources, helping craft public records requests, managing data import and validation, designing queries and working with reporters and interactive designers to produce investigative stories and interactive graphics that engage readers while maintaining AP’s standards of accuracy and integrity.

Experience doing client-side development is a great advantage, as is knowledge of data visualization and UI design. If you have an interest in DevOps, mapping solutions or advanced statistical and machine learning techniques, we will want to hear about that, too. And if you have shared your knowledge through technical training or mentorship, those skills will be an important asset.

Most importantly, we’re looking for someone who wants to be part of a team, who can collaborate and communicate with people of varying technical levels. And the one absolute requirement is intellectual curiosity: if you like to pick up new technologies for fun and aren’t afraid to throw yourself into research to become the instant in-house expert on a topic, then you’re our kind of candidate.

And a post that’s still open at the time of writing – “Interactive Data Journalist ” with the FT:

The Financial Times is seeking an experienced data journalist to join its Interactive News team, a growing group of journalists, designers and developers who work at the heart of the FT newsroom to develop innovative forms of online storytelling. This position is based at our office in London.

You will have significant experience in obtaining, processing and presenting data in the context of news and features reporting. You have encyclopedic knowledge of the current best practices in data journalism, news apps, and interactive data visualisation.

Wrangling data is an everyday part of this job, so you are a bit of a ninja in Excel, SQL, Open Refine or a statistics package like Stata or R. You are conversant in HTML and CSS. In addition, you will be able to give examples of other tools, languages or technologies you have applied to editing multimedia, organising data, or presenting maps and statistics online.
More important than your current skillset, however, is a proven ability to solve problems independently and to constantly update your skills in a fast-evolving field.

While you will primarily coordinate the production of interactive data visualisations, you will be an all-round online journalist willing and able to fulfil other roles, including podcast production, writing and editing blog posts, and posting to social media.

We believe in building people’s careers by rotating them into different jobs every few years so you will also be someone who specifically wants to work for the FT and is interested in (or prepared to become interested in) the things that interest us.

So does that make it any clearer what a data journalist is or does?!

PS you might also find this relevant: Tow Center for Digital Journalism report on Post Industrial Journalism: Adapting to the Present