Writing Each Row of a Spreadsheet as a Press Release?

A few days ago, I saw via the @HSCICOpenData Twitter feed that an annually released dataset on Written Complaints in the NHS has just been published.

The data comes in the form of a couple of spreadsheets in which each row describes a count of the written complaints received and upheld under a variety of categories for each GP and dental practice, or local NHS trust.

The practice level spreadsheet looks like this:

nhs_complaints_data

Each practice is identified solely by a practice code – to find the name and address of the actual practice requires looking up the code in another dataset.

The column headings supplied in the CSV document only partially identify each column (and indeed, there are duplicates such as Total number of written complaints received, that a spreadsheet reader might disambiguate by adding numerical suffix to) – a more complete description (that shows how the columns are actually hierarchically defined) is provided in an associated metadata spreadsheet.

nhs_complaints_metadata

For a reporter wanting to know whether or not any practices in their area fared particularly badly in terms of the number of upheld complaints, the task might be broken down as follows:

  1. identify the practices in of interest from their practice codes (which requires finding a set of practice codes of interest);
  2. for each of those practices, look along the row to see whether or not there are any large numbers in the complaints upheld column.

But if you have a spreadsheet with 10, 20, 30 or more columns, scanning along a row looking for items of interest can rapidly become quite a daunting task.

So an idea I’ve been working on, which I suspect harkens back to the earliest days of database reporting, is to look at ways of turning each row of data into a text based, human readable report.

Something like the following, for example:

Sketching_a_handcrafted_data2text_report_for_diabetes_prescribing_

Each record, each “Complaint Report”, is a textual rendering of a single row from the practice complaints spreadsheet, with a bit of administrative metadata enrichment in the form of the practice name, address (and in later versions, telephone number).

These reports are quicker to scan, and could be sort or highlighted depending on the number of upheld complaints, for example. The journalist can then quickly review the reports, and identify any practices that might be worth phoning up for a comment to ask why they appear to have received a large number of upheld complaints in a particular area, for example… Data driven press releases used to assist reporting, in other words.

FWIW, I popped up a sketch script that generates the above report from the data, and also pulls in practice administrative metadata from an epracurr spreadsheet, here: NHS complaints spreadsheet2text sketch. See also: Data Driven Press Releases From HSCIC Data – Diabetes Prescribing.

PS I’m not Microsoft Office suite user, but I suspect you can get a fair way along this sort of process by using a mail merge? There may be other ways of generating templated reports too. Any Microsoft Office users fancy letting me know how you’d go about doing something like the above in Word and Excel? I’d guess complicating factors are the requirements to make use of the column headers and only display the items associated with non-zero counts, which perhaps requires some macro magic? Things could perhaps be simplified by reshaping the data, perhaps putting it into a long form by melting the complaints columns, or melting the complaints columns cannily to provide two value columns, once for complaints received and one for complaints upheld?

Screenshot_26_08_2015_23_48

Then you could filter out the blank rows before the merge.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: