AP Business Wire Service Takes on Algowriters

Via @simonperry, news that AP will use robots to write some business stories (Automated Insights are one of several companies I’ve been tracking over the years who are involved in such activities, eg Notes on Narrative Science and Automated Insights).

The claim is that using algorithms to do the procedural writing opens up time for the journalists to do more of the sensemaking. One way I see this is that we can use data2text techniques to produce human readable press releases of things like statistical releases, which has a couple of advantages at least.

Firstly, the grunt – and error prone – work of running the numbers (calculating month on month or year on year changes, handling seasonal adjustments etc) can be handled by machines using transparent and reproducible algorithms. Secondly, churning numbers into simple words (“x went up month on month from Sept 2013 to Oct 2013 and down year on year from 2012″) makes them searchable using words, rather than having to write our own database or spreadsheet queries with lots of inequalities in them.

In this respect, something that’s been on my to do list for way to long is to produce some simple “press release” generators based on ONS releases (something I touched on in Data Textualisation – Making Human Readable Sense of Data).

Matt Waite’s upcoming course on “automated story bots” looks like it might produce some handy resources in this regard (code repo). In the meantime, he already shared the code described in How to write 261 leads in a fraction of a second here: ucr-story-bot.

For the longer term, on my “to ponder” list is what might something like “The Grammar of Graphics” be for data textualisation? (For background, see A Simple Introduction to the Graphing Philosophy of ggplot2.)

For example, what might a ggplot2 inspired gtplot library look like for converting data tables not into chart elements, but textual elements? Does it even make sense to try to construct such a grammar? What would the corollaries to aesthetics, geoms and scales be?

I think I perhaps need to mock-up some examples to see if anything comes to mind and that the function names, as well as the outputs, might look like, let alone the code to implement them! Or maybe code first is the way, to get a feel for how to build up the grammar from sensible looking implementation elements? Or more likely, perhaps a bit of iteration may be required?!

Open Data (or Not) About Designated Public Place Orders on the Isle of Wight

A couple of notices taken out in this week’s Isle of Wight County Press by the Isle of Wight Council raise notice that a couple more areas are to become “designated public spaces”, which means that the police and other recognised officers are allowed to confiscate alcohol (no drinking on the beach…).


Despite my best efforts, I failed to find a listing on the Isle of Wight Council website detailing all designated public spaces on the island, or showing maps of their extent.

As with other council orders that have a geographical component, I am ever hopeful that appropriate maps will be made available, in the case of designated public places as a shapefile, not least because I generally have no idea what boundaries the notices refer to when they just describe road names and other geographical features (“along the beach to the point in line with the junction of Whitecross Lane and Whitecross Farm Lane”, for example…). The notice does say “as shown on the map to be attached to the Order”, but I can’t readily find the order on the council website to see whether the map has yet been attached (maybe it’s buried in papers relating to licensing committee meetings?) I did manage to find a map (search: designated public place lake/) but only because I read URLs…

spot the map

Re: finding maps relating to DPPOs, I’ve had this problem before. The same criticisms still apply – the Home OFfice don’t appear to maintain a single register of DPPOs in force, despite the original guidance note saying they would, although they will release the data at a crude level in response to an FOI request. Again, I wonder about setting up an FOI repeater to schedule monthly submissions of the same request to handle this sort of query to the Home Office.

Drawing on the fan in/fan out notion, a Home Office aggregation of the data represents a large fan-in of data from separate councils, and seems to be the easiest place to get hold of such data. However, it would help if they also requested and made available information about the geographical extent of each order as a shapefile.

Here’s the map, though I’m not clear whether I’m breaching copyright in displaying it without permission:

dppo map - licensed

Returning to the notice, I can’t find it displayed on the Isle of Wight County Press website, or on OnTheWight, our hyperlocal blog, despite online fora being legitimate spaces for publishing notices, so I guess buying the County Press is another of those effective taxes I need to pay to keep up with council notices (spending – (click search to run…)).

(As well as posting notices in the IWCP, why doesn’t the council have a /notices area of the website to which it could post its statutory announcements in addition to any other media channels it uses that would act as a convenient, authoritative and archival repository of such notices? [NB quite a few councils do have an area for public notices on their website. For example, do a web search for intitle:"public notices" site:gov.uk] In addition, an area of the site detailing orders in place and in force on the Island so we could keep up with how freedoms are being restricted at any time by the council.)

In passing, whilst trying to find details about DPPOs on the island, I came across a copy of the consultation around the DPPOs on the Gurnard Parish Council website:

police request dppo

The Isle of Wight County Council has received a request from the Hampshire Constabulary to consider the imposition of Designated Public Place Orders (DPPOs) in …”

Hmm… so does the Hampshire Constabulary website list the extent of DPPOs in the region it covers? (I note on WhatDoTheKnow several people have FOId information about DPPOs from police forces.)

I’m pretty sure there’s a DPPO covering Ryde (here’s an example of additional evidence from Hampshire Constabulary offered in support of DPPO in Ryde in 2007 that was use to support an application for an order at that time) but I can’t find any statement about where, or if, such an order applies on the Isle of Wight Council website, or on the appropriate local neighbourhood area of the Hampshire Constabulary site.

hampshire dppo

So the only way for me to find out is to go round Ryde looking for a signs, and maybe (outside chance) a notice that has a map of the area? Hmm…

PS Just by the by, shapefiles would also help when it comes to working out which bits of the beach I can and can’t walk the dog on…

shapefiles would help...

Open Data, Transparency, Fan-In and Fan-Out

In digital electronics, the notions of fan in and fan out describe, respectively, the number of inputs a gate (or, on a chip, a pin) can handle, or the number of output connections it can drive. I’ve been thinking about this notion quite a bit, recently, in the context of concentrating information, or data, about a particular service.

For example, suppose I want to look at the payments made by a local council, as declared under transparency regulations. I can get the data for a particular council from a particular source. If we consider each organisation that the council makes a payment to as a separate output (that is, as a connection that goes between that council and the particular organisation), the fan out of the payment data gives the number of distinct organisations that the council has made a declared payment to.

One things councils do is make payments to other public bodies who have provided them with some service or other. This may include other councils (for example, for the delivery of services relating to out of area social care).

Why might this be useful? If we aggregate the payments data from different councils, we can set up a database that allows us to look at all payments from different councils to a particular organisation, (which may also be a particular council, which is obliged to publish its transaction data, as well as a private company, which currently isn’t). (See Using Aggregated Local Council Spending Data for Reverse Spending (Payments to) Lookups for an example of this. I think startup Spend Network are aggregating this data, but they don’t seem to be offering any useful open or free services, or data collections, off the back of it. OpenSpending has some data, but it’s scattergun in what’s there and what isn’t, depending as it does on volunteer data collectors and curators.)

The payments incoming to a public body from other public bodies are therefore available as open data, but not in a generally, or conveniently, concentrated way. The fan in public payments is given by the number of public bodies that have made a payment to a particular body (which may itself be a public body or may be a private company). If the fan in is large, it can be a major chore searching through the payments data of all the other public bodies trying to track down payments to the body of interest.

Whilst I can easily discover fan out payments from a public body, I can’t easily discover the originators of fan in public payments to a body, public or otherwise. Except that I could possibly FOI a public body for this information (“please send me a list of payments you have received from these bodies…”).

As more and more public services get outsourced to private contractors, I wonder if those private contractors will start to buy services off the public providers? I may be able to FOI the public providers for their receipts data (any examples of this, successful or otherwise?), but I wouldn’t be able to find any publicly disclosed payments data from the private provider to the public provider.

The transparency matrix thus looks something like this:

  • payment from public body to public body: payment disclosed as public data, receipts available from analysis of all public body payment data (and reciipts FOIable from receiver?)
  • payment from public body to private body: payment disclosed as public data; total public payments to private body can be ascertained by inspecting payments data of all public bodies. Effective fan-in can be increased by setting up different companies to receive payments and make it harder to aggregate total public monies incoming to a corporate group. (Would be useful if private companied has to disclose: a) total amount of public monies received from any public source, exceeding some threshold; b) individual payments above a certain value from a public body)
  • payment from private body to public body: receipt FOIable from public body? No disclosure requirement on private body? Private body can effectively reduce fan out (that is, easily identified concentration of outgoing payments) by setting up different companies through which payments are made.
  • payment from private body to private body: no disclosure requirements.

I have of course already wondered Do We Need Open Receipts Data as Well as Open Spending Data?. My current take on this would perhaps argue in favour of requiring all bodies, public or private, that receive more than £25,000, for example, in total per financial year from a particular corporate group* to declare all the transactions (over £500, say) from that body. A step on the road towards that would be to require bodies that receive more than a certain amount of receipts summed from across all public bodies to be subject to FOI at least in respect of payments data received from public bodies.

* We would need to define a corporate group somehow, to get round companies setting up EvilCo Public Money Receiving Company No. 1, EvilCo Public Money Receiving Company No. 2354 Ltd, etc, each of which only ever invoices up to £24,999. There would also have to be a way of identifying payments from the same public body but made through different accounts (for example, different local council directorates).

Whilst this would place a burden on all bodies, it would also start to level out the asymmetry between public body reporting and private body reporting in the matter of publicly funded transactions. At the moment, private company overheads for delivering subcontracted public services are less than public body overheads for delivering the same services in the matter of, for example, transparency disclosures, placing the public body at a disadvantage compared to the private body when it comes to transparency disclosures. (Note that things may be changing, at least in FOI stakes… See for example the latter part of Some Notes on Extending FOI.)

One might almost think the government was promoting transparency of public services gleeful in the expectation that as there privatisation agenda moves on a decreasing proportion of service providers will actually have to make public disclosures. Again, this asymmetry would make for unfair comparisons between service providers based on publicly available data if only data from public body providers of public services, rather than private providers of tendered public services, had to be disclosed.

So the take home, which has got diluted somewhat, is the proposal that the joint notions of fan in and fan out, when it comes to payment/receipts data, may be useful when it comes to helping us think about out how easy it is to concentrate data/information about payments to, or from, a particular body, and how policy can be defined to shine light where it needs shining.


