OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Anything you want’ Category

More Tukey Gems

leave a comment »

Via a half quote by Adam Cooper in his SoLAR flare talk today, elucidated in his blog post Exploratory Data Analysis, I am led to a talk by John Tukey – The Technical Tools of Statistics – read at the 125th Anniversary Meeting of the American Statistical Association, Boston, November 1964.

As ever (see, for example, Quoting Tukey on Visual Storytelling with Data), it contains some gems… The following is a spoiler of the joy of reading the paper itself. I suggest you do that instead – you’ll more than likely find your own gems in the text: The Technical Tools of Statistics.

If you’re too lazy to click away, here are some of the quotes and phrases I particularly enjoyed.

To start with, the quote referenced by Adam:

Some of my friends felt that I should be very explicit in warning you of how much time and money can be wasted on computing, how much clarity and insight can be lost in great stacks of computer output. In fact, I ask you to remember only two points:

  • The tool that is so dull that you cannot cut yourself on it is not likely to be sharp enough to be either useful or helpful.
  • Most uses of the classical tools of statistics have been, are, and will be, made by those who know not what they do.

And here’s one I’m going to use when talking about writing diagrams:

Hand-drawing of graphs, except perhaps for reproduction in books and in some journals, is now economically wasteful, slow, and on the way out.

(It strikes me that using a spreadsheet wizard to create charts in a research or production setting, where we are working in a reproducible, document generation context, is akin to the “hand-drwaing of graphs” of yesteryear?)

“I know of no person or group that is taking nearly adequate advantage of the graphical potentialities of the computer.”

Nothing’s changed?!

[W]e are going to reach a position we should have reached long ago. We are going, if I have to build it myself, to have a programming system — a “language” if you like — with all that that implies, suited to the needs of data analysis. This will be planned to handle numbers in organized patterns of very different shapes, to apply a wide variety of data-analytical operations to make new patterns from old, to carry out the oddest sequences of apparently unrelated operations, to provide a wide variety of outputs, to automatically store all time-expensive intermediate results “on disk” until the user decides whether or not he will want to do something else with them, and to do all this and much more easily.

Since I’ve started playing with pandas, my ability to have written conversations with data has improved. Returning to R after a few months away, I’m also finding that easier to write as well (the tabular data models, and elements of the syntax, are broadly similar across the two).

Most of the technical tools of the future statistician will bear the stamp of computer manufacture, and will be used in a computer. We will be remiss in our duty to our students if we do not see that they learn to use the computer more easily, flexibly, and thoroughly than we ever have; we will be remiss in our duties to ourselves if we do not try to improve and broaden our own uses.

This does not mean that we shall have to continue to teach our students the elements of computer programming; most of the class of ’70 is going to learn that as freshmen or sophomores. Nor does it mean that each student will write his own program for analysis of variance or for seasonal adjustment, this would be a waste. … It must mean learning to put together, effectively and easily — on a program-self-modifying computer and by means of the most helpful software then available — data analytical steps appropriate to the need, whether this is to uncover an anticipated specific appearance or to explore some broad area for unanticipated, illuminating appearances, or, as is more likely, to do both.

Interesting to note that in the UK, “text-based programming” has made it into the curriculum. (Related: Text Based Programming, One Line at a Time (short course pitch).)

Tukey also talks about how computing will offer flexibility and fluidity. Flexibility includes the “freedom to introduce new approaches; freedom, in a word, to be a journeyman carpenter of data-analytical tools”. Fluidity “means that we are prepared to use structures of analysis that can flow rather freely … to fit the apparent desires of the data”.

As the computer revolution finally penetrates into the technical tools of statistics, it will not change the essential characteristics of these tools, no matter how much it changes their appearance, scope, appositeness and economy. We can only look for:

  • more of the essential erector-set character of data analysis techniques, in which a kit of pieces are available for assembly into any of a multitude of analytical schemes,
  • an increasing swing toward a greater emphasis on graphicality and informality of inference,
  • a greater and greater role for, graphical techniques as aids to exploration and incisiveness,
  • steadily increasing emphasis on flexibility and on fluidity,
  • wider and deeper use of empirical inquiry, of actual trials on potentially interesting data, as a way to discover new analytic techniques,
  • greater emphasis on parsimony of representation and inquiry, on the focussing, in each individual analysis, of most of our attention on relatively specific questions, usually in combination with a broader spreading of the remainder of our attention to the exploration of more diverse possibilities.

In order that our tools, and their uses, develop effectively … we shall have to give still more attention to doing the approximately right, rather than the exactly wrong, …

All quotes from John Tukey, The Technical Tools of Statistics, 1964.


Written by Tony Hirst

October 24, 2014 at 3:18 pm

Posted in Anything you want

Tagged with ,

A Loss of Sovereignty?

with 4 comments

Over the course of the weekend, rummaging through old boxes of books as part of a loft clearout, I came across more than a few OU textbooks and course books. Way back when, OU course materials were largely distributed in the form of print items and hard media – audio and video cassettes, CD- and DVD-ROMs and so on. Copies of the course materials could be found in college and university libraries that acted as OU study centres, via the second hand market, or in some cases purchased from the OU via OU Worldwide.

Via an OU press release out today, I notice that “[c]ourse books from The Open University (OU) have been donated to an educational sponsorship charity in Kenya, giving old course books a new use for the local communities.” Good stuff…

..but it highlights an issue about the accessibility of our materials as they increasingly move to digital form. More and more courses deliver more and more content to students via the VLE. Students retain access to online course materials and course environments for a period of time after a module finishes, but open access is not available.

True, many courses now release some content onto OpenLearn, the OU’s free open learning platform. And the OU also offers courses on the FutureLearn platform (an Open University owned company that made some share allotments earlier this year).

But access to the electronic form is not tangible – the materials are not persistent, the course materials not tradeable. They can’t really be owned.

I’m reminded of a noticing I had earlier this week about our Now TV box that lets us watch BBC iPlayer, 4oD, youTube and so on via the telly. The UI is based around a “My subscriptions” model which shows the channels (or apps) you subscribe to. Only, there are some channels in their that I didn’t subscribe to, and that – unlike the channels I did subscribe to – I can’t delete from my subscriptions. Sky – I’m looking at you. (Now TV is a Sky/BSkyB product.)

In a similar vein, Apple and U2 recently teamed together to dump a version of U2’s latest album into folks’ iTunes accounts, “giving away music before it can flop, in an effort to stay huge” as Iggy Pop put it in his John Peel Lecture [on BBC iPlayer], and demonstrating once again that our “personal” areas on these commercial services are no such thing. We do not have sovereignty over them. Apple is no Sir Gawain. We do not own the things that are in our collections on these services and nor do we own the collection: I doubt you hold a database right in any collection you curate on youtube or in iTunes, even if you do expend considerable time, effort and skill in putting that collection together; and I fully imagine that the value of those collections as databases are exploited by the recommendation engine mining tools the platform services operate.

And just as platform operators can add things to out collections, so too can they take them away. Take Amazon, for example, who complement their model of selling books with one of renting you limited access to ebooks via their Kindle platform. As history shows – Amazon wipes customer’s Kindle and deletes account with no explanation or The original Big Brother is watching you on Amazon Kindle – Amazon is often well within its rights, and it is well within its capacity, to remove books from your device whenever it likes.

In the same way that corporate IT can remotely manage “your” work devices using enterprise mobile device management (Blackberry: MDM and beyond, Goole apps: mobile management overview, Apple: iOS and the new IT, for example), so too can platform operators of devices – and services – reach into your devices – or service clients – and poke around inside them. Unless we’ve reclaimed it as our own, we’re all users of enterprise technology masked as consumer offerings and have ceded control over our services and devices to the providers of them.

The loss of sovereignty also extends to the way in which devices and services are packaged so that we can’t look inside them, need special tools to access them, can’t take ownership of them in order to appropriate them for other purposes. We are users in a pejorative sense; and we are used by service and platform providers as part of their business models.

Written by Tony Hirst

October 21, 2014 at 10:27 am

Posted in Anything you want, Paranoia

Tagged with ,

Isle of Wight Ferries – Adjournment Debate

leave a comment »

Island MP Andrew Turner (Con) secured an adjournment debate last night on the Isle of Wight Ferries. As with airlines, Wightlink (I’m not sure about Red Funnel?) appear to operate dynamic pricing (their strapline: “flexi-Pricing… matching demand with capacity”), upping the cost of ferry tickets to match demand. Residents’ multilink tickets (books of tickets bought in advance at a discounted price – currently, a return trip, off a book of 10 returns by car, costs me about £43 on the boat) don’t guarantee a sailing: residents’ places appear to be subject to quota).

The ferry companies are leveraged by private debt, which acts as a brake on investment and an inflator of ticket prices. In recent years, the number of sailings has reduced – making convenient travel difficult at times, more so when resident ticket quotas are applied to sailings – presumably in order to reduce operating costs.

The unreliability (from my experience) of rail connections provided between London and Portsmouth, along with reduced late night sailings, means that day trips to London require a very early evening departure from London in order to guarantee making a passenger boat. Start of the day London meetings require a very early start; important early start meetings require travel up to London the day before.

Both the cost and inconvenience of sailing (not only limited sailings: a one-way crossing of the Solent by car ferry takes about an hour when booking in, loading, crossing, and disembarkation are taken into account) factor into personal decisions I now make about leaving and returning to the Island in a detrimental way on many levels.

Written by Tony Hirst

October 14, 2014 at 8:41 am

Posted in Anything you want

List Brokerage – Putting Data About You to Work…

with 2 comments

Not ever having worked in the marketing world, whenever I do stumble across the presumably everyday dealings of marketers of advertisers I am reminded about how incredibly naive I am about it all.

So for example, today I came across Media-Arrow, a direct marketing and list brokering company. Here’s an example of some of the lists they advertise access to:

  • Adults Only: almost 50,000 buyers over the last year of “a wide range for adult products including toys, DVD’s, videos, magazines, clothing, etc. via mail order or through the company’s shops”. So they buy the data from a particular company? And over 50,000 “Active Enquirers” (“catalogue and product information requesters, web based or coupon based requests”) over the last year, with verified names and addresses.
  • Affluent Grey Britain: over 650,000 “affluent over 55’s” and almost 50,000 age selectable e-mail addresses. The database “has been specifically built with affluence in mind and the profiles used provide clear targeting to this lucrative, wealthy and financially astute market. These consumers have high disposable income, good credit rating and live in identifiable high-valuable properties – identified by list owner’s Property Watcher data. Exclusively homeowners…” So someone’s keeping track of house prices and ownership as part of the value-add associated with this list?
  • Award Productions: over 80,000 mail order buyers, typically ex-servicemen and women medal holders and buyers of related commemorative products”. This list looks like it could be the customer list of http://www.awardmedals.com/ ?
  • Big Book Default: over 700,000 contacts with dates of birth and selectable names. “This specialised credit file identifies prospects who demonstrate an active willingness to take out additional borrowing. All have made some payments to major catalogue companies, credit cards or utilities but have subsequently defaulted. … The entire list is deliberately overlaid with a home-owner tag, ensuring that secured lending can be effectively marketed.” So folk who are happy to take on debt, and then default, and may have something to secure it against? Wonga fodder?
  • Britains Movers: “homeowner movers is drawn from a high volume data pool. The file is updated weekly from Land Registry and Utility Company data”. Why do you think corporates keep lobbying for open public data?
  • Charity Superstore: “over 900,000 charity donators that have been sourced through transactional data systems which capture the details of live supporters of various charitable causes”. Because giving isn’t enough; and selling you on doesn’t really cost you anything more, does it?
  • Cotswold Collections: “a much sought after mail order catalogue”, apparently, that also sells almost 15,000 customer records on?
  • Cottage Garden: “a ‘fast-growing’ file of Mail Order Buyers of gardening & gift offers for the keen amateur gardener. … Buyers are recruited from National newspaper adverts, Offers & Inserts”. So the next time you fall for a mail-order ad via your favourite newspaper, remember that the price is so low because the product is actually you…
  • Credit Seekers: “Built specifically from mail order catalogue buyers data, this segment accurately targets lower income households experiencing some cash flow issues.” Because they don’t know any better and you can rip them off some more…
  • Director Select: “select file of Directors at home, built primarily from Companies House data, has 900,000+ named company directors at home address.” Why do you think corporates keep lobbying for open public data?
  • Educating Britain: “The file include the names of nearly 500,000 people who have bought or are buying distance learning courses in the last 12 months”. Hmm…
  • Pet ID: “the Pet-ID file includes people who have had their pets micro-chipped in case of loss or theft”. Because a dog’s not just for Christmas, it’s also for data.
  • Pet ID – Horse Owners: “one of UK’s official Horse Passport Issuers … UK legislation, now requires all horse owners to obtain ‘passport’ papers for their horses.” A handy UK gov spinoff: driving the data economy.

And here are the rest of the lists, by name…: Book Buyers, Communication Avenue, Dukeshill, Empty Nest High Fliers, Executive Suite, Family Britain, Fast & Furious, Financial Britain, Gambling Britain, Home Improvers, Industrial Claims File, Krystal Communication, Mail Order Superstore, Monied Ladies, Mont Rose of Guernsey, Older & Wiser, Over 65′s, Pashmina Bazaar, PDSA Lottery, Pet House, Pet People, Pet Solutions, Prize Magazines Responders, Prosperity File, Prudent Savers, Retail Therapy, Salesfeed, Six Channels – B2B File, SixChannels – B2B file Worldwide, SixChannels – B2C File, SixChannels – Consumer-Business Selects, TDS Insurance File, The Pottery File, The Rich List, Totally Professional, Wealthy Database.

I’m not sure how the list brokerage actually works? I assume the purchaser doesn’t get the list, they just get access to the list, and provide the broker with the thing they want mailing out? But does the broker have access to the lists, and are they data controllers of their contents? If so, I should be able to make a Data Protection Act subject access request of them to find out which lists I’m on and what information each of them has about me?

See also: Demographically Classed, which lists the segments used in the ACORN and MOSAIC geodemographic segmentation schemes.

Written by Tony Hirst

October 10, 2014 at 6:16 pm

Posted in Anything you want

Summary Notes of Data Conversations Around PFI Data

leave a comment »

As I briefly mentioned in a previous post, a few weeks ago I came across a spreadsheet summarising awarded PFI contracts as of 2013 (private finance initiative projects, 2013 summary data). At the time, I put together a couple of quick notebooks exploring the data. This data is a summary post/note to self about what’s in those notebooks.

As background to what PFI actually is, the Commons Treasury Select committee published this report on the Private Finance Initiative in July, 2011. It describes PFI as follows:

In a typical PFI project, the private sector party is constituted as a Special Purpose Vehicle (SPV), which manages and finances the design, build and operation of a new facility. The financing of the initial capital investment (i.e. the capital required to pay transaction costs, buy land and build the infrastructure) is provided by a combination of share capital and loan stock from the owners of the SPV, together with senior debt from banks or bond-holders. The return on both equity and debt capital is sourced from the periodic “unitary charge”, which is paid by the public authority from the point at which the contracted facility is available for use. The unitary charge may be reduced (to a limited degree) in certain circumstances: e.g. if there is a delay in construction, if the contracted facility is not fully operational, or if services fail to meet contracted standards. Thus, the PFI structure is designed to transfer project risks from the public to the private sector.

The document A new approach to public private partnerships, HM Treasury, December 2012 clarifies that “the public sector does not pay for the project’s capital costs over the construction period. Once the project is operational and is performing to the required standard, the public sector pays a unitary charge which includes payments for ongoing maintenance of the asset, as well as repayment of, and interest on, debt used to finance the capital costs. The unitary charge, therefore, represents the whole life cost associated with the asset.”

A brief critique of PFI in the context of the health service can be found in The private finance initiative PFI in the NHS — is there an economic case? by Declan Gaffney, Allyson M Pollock, David Price and Jean Shaoul.

The PFI summary data table itemises historical unitary charge payments associated with a particular project on a financial year basis (eg ‘Unitary charge payment 1992-93 (£m)’) as well as projected unitary charges (eg ‘Estimated unitary charge payment 2015-16 (£m)’). An amount is also given for the Capital Value (£m) of the project.

The first notebook – Quick Look at UK PFI Contracts Data – identifies all the columns available in the spreadsheet.

For example, separate columns identify whether a project is ‘On / Off balance sheet under IFRS’, ‘On / Off balance sheet under ESA 95′, or ‘On / Off balance sheet under UK GAAP’. According to the New Approach document:

Departments have separate budgets for resource and capital spending. Resource spending (RDEL) includes current expenditure such as pay or procurement. Capital spending (CDEL) includes new investment. The scoring of the project in departmental budgets depends on whether the project is classified as on or off balance sheet under ESA95.

1.12 If a central government project is deemed to be on balance sheet under ESA95, then the capital value of the project (i.e. the debt required to undertake the project) is recorded as CDEL in the first year of operation; and the interest, service and depreciation are recorded as RDEL each year unitary charges are paid.

1.13 If a central government project is deemed to be off balance sheet under ESA95, then there is no impact on the department’s CDEL in the first year of operation. The full unitary charge (including interest, service and debt repayment) does, however, score in RDEL each year. Around 85 per cent of past PFI projects have been considered off-balance sheet under ESA95.

The notebook includes summary calculations such as the total capital value of projects by sector, as well as time series plots showing the value of unitary charge payments over time (both in total and for particular procuring authorities or department.


The spreadsheet also contains information about equity partners. We can use this to report on the projects that a particular company is involved with.


We can also review the unitary payments going to a particular group over time:


The second notebook – A Quick Thread Pull Around PFI Special Purpose Vehicles – digs around the PFI project SPVs (special purpose vehicles) a little more, using data from OpenCorporates.

One question explored in the notebook is whether or not the set of directors for a particular SPV also act as the set of directors for any other companies. So for example, for the SPV that is Pyramid Schools (Hadley) Ltd, we find several other companies sharing all the same directors:


For Island Roads, we see that several companies appear to have been set up associated with the project. In addition, there are several directors from the Island Roads director list associated with other companies, for example HOUNSLOW HIGHWAYS SERVICES LIMITED or PARTNERS 4 LIFT.


A search of PFI SPVs identifies Hounslow Highway Services Limited as another PFI company, so the director linkage suggests that one of the partners for the Island Roads project is also a partner of the Hounslow Roads project. In this case, the linkage can also be identified through the equity partners:


There is possibly more that could be done to look through the linkage between the PFI SPVs and equity partners, eg on the basis of similarities between directors, or registered addresses. There might also be some mileage in looking at directors who are also directors of companies that make large political donations, for example.

Written by Tony Hirst

October 10, 2014 at 3:03 pm

Posted in Anything you want

Participatory Surveillance

with one comment

This is an evocative phrase, I think – “participatory surveillance” – though the definition of it is lacking from the source in which I came across it (Online Social Networking as Participatory Surveillance, Anders Albrechtslund, First Monday, Volume 13, Number 3 – 3 March 2008).

A more recent and perhaps related article – Cohen, Julie E., The Surveillance-Innovation Complex: The Irony of the Participatory Turn (June 19, 2014). In Darin Barney, Gabriella Coleman, Christine Ross, Jonathan Sterne & Tamar Tembeck, eds., The Participatory Condition (University of Minnesota Press, 2015, Forthcoming) – notes how “[c]ontemporary networked surveillance practices implicate multiple forms of participation, many of which are highly organized and strategic”, and include the “crowd-sourcing of commercial surveillance”. It’s a paper I need to read and digest properly…

One example from the last week or two of a technology that supports particapatory surveillance comes from Buzzfeed’s misleading story relating how Hundreds Of Devices [Are] Hidden Inside New York City Phone Booths that “can push you ads — and help track your every move”; (the story resulted in the beacons being removed). My understanding of beacons is that they are a Bluetooth push technology that emit a unique location code, or a marketing message, within a limited range. A listening device can detect the beacon message and do something with it. The user thus needs to participate in any surveillance activity that makes use of the beacon by listening out for a beacon, capturing any message it hears, and then doing something with that message (such as phoning home with the beacon message).

The technology described in the Buzzfeed story is developed by Gimbal, who offer an API, so it should be possible to get a feel from that what is actually possible. From a quick skim of the documentation, I don’t get the impression that the beacon device itself listens out for and tracks/logs devices that come into range of it? (See also Postscapes – Bluetooth Beacon Handbook.)

Of course, participating in beacon mediated transactions could be done unwittingly or surreptitiously. Again, my understanding is that Android devices require you to install an app and grant permissions to it that let it listen out for, and act on, beacon messages, whereas iOS devices have iBeacon listening built in the iOS Location Services*, and you then grant apps permission to use messages that have been detected? This suggests that Apple can hear any beacon you pass within range of?

* Apparently, [i]f [Apple] Location Services is on, your device will periodically send the geo-tagged locations of nearby Wi-Fi hotspots and cell towers in an anonymous and encrypted form to Apple to augment Apple’s crowd-sourced database of Wi-Fi hotspot and cell tower locations. In addition, if you’re traveling (for example, in a car) and Location Services is on, a GPS-enabled iOS device will also periodically send GPS locations and travel speed information in an anonymous and encrypted form to Apple to be used for building up Apple’s crowd-sourced road traffic database. The crowd-sourced location data gathered by Apple doesn’t personally identify you. Apple don’t pay you for that information of course, though they might argue you get a return in kind in the form of better location awareness for your device.

There is also the possibility with any of those apps that you install one for a specific purpose, grant it permissions to use beacons, then the company that developed gets taken over by someone you wouldn’t consciously give the same privileges to… (Whenever you hear about Facebook or Google or Experian or whoever buying a company, it’s always worth considering what data, and what granted permissions, they have just bought ownership of…)

See also: “participatory sensing”Four Billion Little Brothers? Privacy, mobile phones, and ubiquitous data collection, Katie Shilton, University of California, Los Angeles, ACM Queue, 7(7), August 2009 – which “tries to avoid surveillance or coercive sensing by emphasizing individuals’ participation in the sensing process”.

Written by Tony Hirst

October 10, 2014 at 10:22 am

Posted in Anything you want

Tagged with ,

Data Reporting, not Data Journalism?

with 14 comments

As a technology optimist, I don’t tend to go in so much for writing up long critical pieces about tech (i.e. I don’t do the “proper” academic thing), instead preferring to spend time trying to work out how make use of it, either on its own or in tandem with other technologies. (My criticality is often limited to quickly ruling out things I don’t want to waste my time on because they lack interestingness, or in spending time how to appropriate one thing to do something it perhaps wasn’t intended to do originally.)

I also fall in to the camp of lacking in confidence about things other people think I know about, generally assuming there are probably “proper” ways of doing things for someone properly knowledgeable in a tradition, although I don’t know what they are. (Quite where this situates me on a scale of incompetence, I’m not sure… e.g. Why the Unskilled Are Unaware: Further Explorations of (Absent) Self-Insight Among the Incompetent h/t @Downes).

A couple of days ago, whilst doodling with a data-to-text notebook (the latest incarnation of which can be found here) I idly asked @fantasticlife whether it was an example of data journalism (we’ve been swapping links about bot-generated news reports for some time), placing it also in the context of The Changing Task Composition of the US Labor Market: An Update of Autor, Levy, and Murnane (2003), David H. Autor & Brendan Price,June 21, 2013, and in particular this chart about the decline in work related “routine cognitive tasks”:


His response? “No, that’s data reporting”.

So now I’m wondering: how do data reporting and data journalism differ? And to what extent is writing some code along the lines of:

def pc(amount,rounding=''):
    if rounding=='down': rounding=decimal.ROUND_DOWN
    elif rounding=='up': rounding=decimal.ROUND_UP
    else: rounding=decimal.ROUND_HALF_UP

    ramount=float(decimal.Decimal(100 * amount).quantize(decimal.Decimal('.1'), rounding=rounding))
    return '{0:.1f}%'.format(ramount)

def otwMoreLess(now,then):
    if delta>0:
    elif delta<0:
    return txt

def otwPCmoreLess(this,that):
    return '{delta} {diff}'.format(delta=pc(abs(delta)),diff=otwMoreLess(this,that))

That means {localrate} of the resident {localarea} population {poptype} are {claim} \
– {regiondiff} than the rest of the {region} ({regionrate}), \
and {ukdiff} than the whole of the UK ({ukrate}).
           poptype=decase(get16_64Population(localcode)['CELL_NAME'].iloc[0].split('(')[1].split(' -')[0]),


to produce text from nomis data of the form:

That means 1.9% of the resident Isle of Wight population aged 16-64 are persons claiming JSA – 0.5% more than the rest of the South East (1.4%), and 0.5% less than the whole of the UK (2.4%).

an output that is data reporting, an example of journalistic practice? For example, is locating the data source a journalistic act? Is writing a script to parse the data into words through the medium of code a journalistic act?

Does it become “more journalistic” if the text generating procedure comments on the magnitude of the changes? For example, using something like:

def _txt6(filler=','):
    def _magnitude(term):

        #A heuristic here
        if propDelta<0.05:
        elif propDelta<0.10:
             mod=random.choice(['considerable','large' ])
        term=' '.join([mod,term])
        return p.a(term)
    propDelta= abs(yeardelta)/mostRecent['Total']
    if yeardelta==0:
        txt6+=', exactly the same amount.'
        if yeardelta <0: direction=_magnitude(random.choice([ 'decrease','fall']))
        else: direction=_magnitude(random.choice(['increase','rise']))
        txt6+='{_filler} {0} of {1} since then.'.format(p.a(direction),
                                                         _filler= filler )
    return txt6


to produce further qualified text (in terms of commenting on the magnitude of the amount) that can take a variety of slightly different forms?

- The most recent figure (August 2014) for persons claiming JSA for the Isle of Wight area is 1502.
– This compares with a figure of 2720 from a year ago (August 2013), a large fall of 1218 since then.
– This compares with a figure of 2720 from a year ago (August 2013), a considerable fall of 1218 since then

At what point does the interpretation we can bake into the text generator become “journalism”, if at all? Can the algorithm “do journalism”? Is the crafting of the algorithm “journalism”? Or it is just computer assisted reporting?

In the context of routine cognitive tasks – such as the task of reporting the latest JSA figures in this town or that city – is this sort of automation likely? Is it replacing one routine cognitive task with another, or is it replacing it with a “non-routine analytical task”? Might it a be a Good Thing, allowing the ‘reporting’ to be done automatically, at least in draft form, and freeing up journalists to do other tasks? Or a Bad Thing, replace a routine, semi-skilled task by automation? To what extent might the algorithms so embed more intelligence and criticism, for example, automatically flagging up figures that are “interesting” in some way? (Would that be algorithmic journalism? Or would it be more akin to an algorithmic stringer?)

PS Twitter discussion thread around this post from 29/9/14

Written by Tony Hirst

September 29, 2014 at 12:39 pm


Get every new post delivered to your Inbox.

Join 864 other followers