Category: Tinkering

Some Random Upstart Debugging Notes…

…just so I don’t lose them…

dmesg spews out messages the kernel has been issuing…

/var/log/upstart/SERVICE.log has log messages from trying to start a service SERVICE.

/etc/init.d should contain what looks like a generic sort of file, filename SERVICE, with the actual config file that contains the command you want to start the service in a file SERVICE.conf in /etc/init.

To generate the files that will have a go at auto-running the service, run the command update-rc.d SERVICE defaults.

Start a service with service SERVICE start, stop it with service SERVICE stop, and restart (stop if started, then start) with service SERVICE restart. Find out what’s going on with it using service SERVICE status.


Confusing Chart? Seaborn Jointplot

I’ve just been doodling with some data and the seaborn graphic library and managed to confuse myself a couple of times when quickly glancing at some “jointplots” that add marginal histograms to a scatter plot.

The code is easy enough:

sns.jointplot(x='Indoors Sub-domain Rank (where 1 is most deprived)',
           y='Outdoors Sub-domain Rank (where 1 is most deprived)', 

which give charts of the form:


but how do you intuitively see the histograms to compare them?


My at-a-glance (tired, past midnight…) reaction keeps “seeing” the top comparison, representing a 90 degree counterclockwise rotation about the top left corner of the y-axis chart. But if you think about for more than a glance, that obviously puts the large-y values at low-x, and low y-values at large-x.

The correct way is the bottom one; to make the comparison you need to flip the y-axis chart, folding its bottom right corner towards the top-left corner of the x-axis chart. This is a rotation and a reflection. Which is hard.

But if we do the flip to try to help (can we do this using seaborn???):


my eye now feels as if it wants to help out, keeping all the near corners of the various charts close together (top right of the y-axis chart, bottom left of the x-axis one, and top left of the main panel) all close together, and flipping the bottom left corner of the y-axis chart as if up to the top right corner of the x-axis chart. (i.e. in this configuration it wants to do the flip?)

The distribution of bars in  the marginal charts may also be complicating matters, encouraging the eye to match large with large (which in this case is wrong…).


Tinkering With Data Consuming WordPress Plugins

Now that I’ve got my own webspace on Reclaim Hosting and set up a handful of my own self-hosted WordPress blogs, I started having a tinker with some custom plugins. I’ve started off with something simple, a couple of shortcode plugins that I can use to embed a simple leaflet map into a post or a page.


So far, I’ve tried a couple of different approaches…

The first one has been pared down to just a default accepting shortcode:


The shortcode calls a bit of code that makes a cached, (at most, three times a day) request to a scraper that returns a list of currently open for consultation planning applications to the Isle of Wight Council. (That said, I can pass in a few shortcode parameters relating to the size of the map (which is a fixed size, unfortunately – I haven’t worked out to make it flexible…) and the default location and zoom.) The scraper itself runs once daily.


My feeling is that this sort of code would work best in the context of a WordPress page, acting as a destination that allows folk to just check on currently open applications.

The second plugin embeds a map that displays markers for recent house sales (as recorded by the Land Registry prices paid dataset). This dataset is published as a monthly set of data a month or two after the fact and is downloaded to my local desktop. A python script then reads in the data, creates a new WordPress post containing the shortcode with the data baked in, and uploads the post to WordPress (where it appears in the draft queue).


In this shortcode, the marker data is currently encoded using the PHP serialise format (via the python phpserialize dumps method) and embedded in the post as the value of shortcode attribute.

[MultiMarkerLeafletMap zoom=11 lat=50.675 lon=-1.32 width=800 height=500 markers='a:112:{i:0;a:5:{s:3:"lat";d:50.699382843;s:4:"date";s:10:"2015-07-10";s:3:"lon";d:-1.29297620442;s:8:"location";s:22:"SAVOY COURT, TOWN...]

In this case, with the marker data baked into the shortcode, there’s a good argument for rendering the map within a timestamped post as a fixed map (at least, ‘fixed’ in the sense that the data is unchanging).

The PHP un/serialize route isn’t ideal because I think it raises security issues? I originally tried to pass the data as serialised JSON, but the data is in the form of a list and it seems the ] breaks things. I guess what I should really do is see if I can pass the data in as serialised JSON between the shortcode tags rather than pass it in as an attribute?

Another approach might be to define just a simple map embed shortcode, and then support additional shortcodes that add markers to the map?

My PHP coding is also a bit scrappy (and I keep forgetting the ;’s…:-( I think one thing I do need to do is pop the simple plugin code functions into a class to keep them safe, and also patch in a hack or two (as seems to be required?) so that the leaflet map libraries are only loaded into post and page headers for posts/pages that actually contain one of the map shortcodes.

I’m also thinking now I need to find a way to support boundary lines and shape colouring?



First Steps in a Conversational Slackbot interface to CQC Inspection Data

A few months ago, I started to have a play with ratings data from the CQC – the Care Quality Commission. I don’t seem to have posted the scraper tools I ended up with anywhere, but I have started playing with them again over the last couple of weeks in the context of my slackbot explorations.

In particular, I’ve started working about a conversational workflow for keeping track of inspections in a particular local area. At the current time, the CQC website only seems to support alerts relating to new reports at the level of a particular location, although it is possible to get faceted search results relating to reports published over the course of the last week or last month. For the local journalist wanting to keep tabs on reports associated with local providers, this means setting up a data beat that includes checking the CQC website on a regular basis, firstly to see whether there are any new reports, and secondly to see what sort of reports they are.

And as a report from OnTheWight today shows (Beacon Health Centre rated ‘Good’ by CQC), this can include good news stories as well as bad news one.


So what’s the thing I’ve been exploring in the slack context? A time saver, hopefully. In the first case, it provides a quick way of checking up on reports from the local area released over the previous week or previous month:

To begin with, we can ask for a summary report of recent inspections:


The report does a bit of counting – to provide some element of context – and provides a highlights statement regarding the overall result from each of the reports. (At the moment, I don’t sort the order in which reports are presented. There are opportunities here for prioritising which results to show. I also need to think about wether results should be provided as a single slackbot response, as is currently the case, or using a separate (and hence, “star-able”) response for each report.)

After briefly skimming over the recent results, we can tunnel down into a slightly more detailed summary of the report by passing in a location ID:


As part of the report, this returns a link to the CQC website so we can inspect the full result at source. I’ve also got a scraper that pulls the full reports from the CQC site, but at the moment it’s not returning the result to slack (I think there’s a message size limit which I’m overflowing, so I need to find what that limit it and split up the response to get round it.). That said, slack is perhaps not the best place to return long reports? Maybe a better way would be to submit quoted report components into a draft WordPress blog post?

We can also pull back administrative information regarding a location from its location ID.


This report also includes information about other locations operated by the same provider. (I think I do have a report somewhere that summarises the report ratings over all the locations associated with a given provider, so we can look to see how well other establishments operated by the same provider are rated, but I haven’t wired that into the Slack bot yet.)

There are several other ways we can develop this conversation…

Company number and charity number information is available for some providers, which means it should be to trivial to ask return company registration information and company directors information from Companies House or OpenCorporates, and perhaps even data from the Charities Commission.

Rather more scruffily, perhaps, we could use location name and postcode to try a search on the Food Standards Agency website to see if we can find food ratings data for establishments of interest.

There might also be opportunities for linking in items from local spending data to try to capture local authority spend with a particular location or provider. This would be simplified if council payments to CQC rated establishments or providers included the CQC location or provider ID, but that’s perhaps too much to ask for.

If nothing else, however, this demonstrates a casual conversational way in which a local journalist might be able to use slack as part of a local data beat to run regular, periodic checks over recent reports published by the CQC relating to local care establishments.

Slackbot Data Wire, Initial Sketch

Via a round-up post from Matt Jukes/@jukesie (Interesting elsewhere: Bring on the Bots), I was prompted to look again at Slack. OnTheWight’s Simon Perry originally tried to hook me in to Slack, but I didn’t need another place to go to check messages. Simon had also mentioned, in passing, how it would be nice to be able to get data alerts into Slack, but I’d not really followed it through, until the weekend, when I read again @jukesie’s comment that “what I love most about it [Slack] is the way it makes building simple, but useful (or at least funny), bots a breeze.”

After a couple of aborted attempts, I found a couple of python libraries to wrap the Slack API: pyslack and python-rtmbot (the latter also requires python-slackclient).

Using pyslack to send a message to Slack was pretty much a one-liner:

#Create API token at

#!pip install pyslack
import slack
slack.api_token = token'#general', 'Hello world', username='testbot')


I was quite keen to see how easy it would be to reuse one of more of my data2text sketches as the basis for an autoresponder that could get accept a local data request from a Slack user and provide a localised data response using data from a national dataset.

I opted for a JSA (Jobseekers Allowance) textualiser (as used by OnTheWight and reported here: Isle of Wight innovates in a new area of Journalism and also in this piece: How On The Wight is experimenting with automation in news) that I seem to have bundled up into a small module, which would let me request JSA figures for a council based on a council identifier. My JSA textualiser module has a couple of demos hardwired into it (one for the Isle of Wight, one for the UK) so I could easily call on those.

To put together an autoresponder, I used the python-rtmbot, putting the botcode folder into a plugins folder in the python-rtmbot code directory.

The code for the bot is simple enough:

from nomis import *
import nomis_textualiser as nt
import pandas as pd


import time
crontable = []
outputs = []

def process_message(data):

	text = data["text"]
	if text.startswith("JSA report"):
		if 'IW' in text: outputs.append([data['channel'], nt.otw_rep1(nt.iwCode)])
		elif 'UK' in text: outputs.append([data['channel'], nt.otw_rep1(nt.ukCode)])
	if text.startswith("JSA rate"):
		if 'IW' in text: outputs.append([data['channel'], nt.rateGetter(nt.iwCode)])
		elif 'UK' in text: outputs.append([data['channel'], nt.rateGetter(nt.ukCode)])


Rooting around, I also found a demo I’d put together for automatically looking up a council code from a Johnston Press newspaper title using a lookup table I’d put together at some point (I don’t remember how!).

Which meant that by using just a tiny dab of glue I could extend the bot further to include a lookup of JSA figures for a particular council based on the local rag JP covering that council. And the glue is this, added to the process_message() function definition:

	def getCodeForTitle(title):
		return code

	if text.startswith("JSA JP"):
		title=text.split('JSA JP')[1].strip()

		outputs.append([data['channel'], nt.otw_rep1(code)])
		outputs.append([data['channel'], nt.rateGetter(code)])


This is quite an attractive route, I think, for national newsgroups: anyone in the group can create a bot to generate press release style copy at a local level from a national dataset, and then make it available to reporters from other titles in the group – who can simply key in by news title.

But it could work equally well for a community network of hyperlocals, or councils – organisations that are locally based and individually do the same work over and over again on national datasets.

The general flow is something a bit like this:


which has a couple of very obvious pain points:


Firstly, finding the local data from the national data, cleaning the data, etc etc. Secondly, making some sort of sense of the data, and then doing some proper journalistic work writing a story on the interesting bits, putting them into context and explaining them, rather than just relaying the figures.

What the automation route does is to remove some of the pain, and allow the journalist to work up the story from the facts, presented informatively.


This is a model I’m currently trying to work up with OnTheWight and one I’ll be talking about briefly at the What next for community journalism? event in Cardiff on Wednesday [slides].

PS Hmm.. this just in, The Future of the BBC 2015 [PDF] [announcement].

Local Accountability Reporting Service

Under this proposal, the BBC would allocate licence fee funding to invest in a service that reports on councils, courts and public services in towns and cities across the UK. The aim is to put in place a network of 100 public service reporters across the country.

Reporting would be available to the BBC but also, critically, to all reputable news organisations. In addition, while it would have to be impartial and would be run by the BBC, any news organisation — news agency, independent news provider, local paper as well as the BBC itself—could compete to win the contract to provide the reporting team for each area.

A shared data journalism centre Recent years have seen an explosion in data journalism. New stories are being found daily in government data, corporate data, data obtained under the Freedom of Information Act and increasing volumes of aggregated personalised data. This data offers new means of sourcing stories and of holding public services, politicians and powerful organisations to account.

We propose to create a new hub for data journalism, which serves both the BBC and makes available data analysis for news organisations across the UK. It will look to partner a university in the UK, as the BBC seeks to build a world-class data journalism facility that informs local, national and global news coverage.

A News Bank to syndicate content

The BBC will make available its regional video and local audio pieces for immediate use on the internet services of local and regional news organisations across the UK.

Video can be time-consuming and resource-intensive to produce. The News Bank would make available all pieces of BBC video content produced by the BBC’s regional and local news teams to other media providers. Subject to rights and further discussion with the industry we would also look to share longer versions of content not broadcast, such as sports interviews and press conferences.

Content would be easily searchable by other news organisations, making relevant material available to be downloaded or delivered by the outlets themselves, or for them to simply embed within their own websites. Sharing of content would ensure licence fee payers get maximum value from their investment in local journalism, but it would also provide additional content to allow news organisations to strengthen their offer to audiences without additional costs. We would also continue to enhance linking out from BBC Online, building on the work of Local Live.

Hmm… Share content – or share “pre-content”. Use BBC expertise to open up the data to more palatable forms, forms that the BBC’s own journalists can work with, but also share those intermediate forms with the regionals, locals and hyperlocals?

Authoring Multiple Docs from a Single IPython Notebook

It’s my not-OU today, and whilst I should really be sacrificing it to work on some content for a FutureLearn course, I thought instead I’d tinker with a workflow tool related to the production process we’re using.

The course will be presented as a set of HTML docs on FutureLearn, supported by a set of IPython notebooks that learners will download and execute themselves.

The handover resources will be something like:

– a set of IPython notebooks;
– a Word document for each week containing the content to appear online. (This document will be used as the basis for multiple pages on the course website. The content is entered into the FutureLearn system by someone else as markdown (though I’m not sure what flavour?)
– for each video asset, a Word document containing the script;
– ?separate image files (the images will also be in the Word doc).

Separate webpages provide teaching that leads into a linked to IPython notebook. (Learners will be running IPython via Anaconda on their own desktops – which means tablet/netbook users won’t be able to do the interactive activities as currently delivered; we looked at using Wakari, but didn’t go with it; offering our own hosted solution or tmpnb server was considered out of scope.)

The way I have authored my week is to create a single IPython document that proceeds in a linear fashion, with “FutureLearn webpage” content authored using as markdown, as well as incorporating executed code cells, followed by “IPython notebook” activity content relating to the previous “webpage”. The “IPython notebook” sections are preceded by a markdown cell containing a NOTEBOOK START statement, and closed with markdown cell containing a NOTEBOOK END statement.

I then run a simple script that:

  • generates one IPython notebook per “IPython notebook” section;
  • creates a monolithic notebook containing all, but just, the “FutureLearn webpage” content;
  • generates a markdown version of that monolithic notebook;
  • uses pandoc to convert the monolithic markdown doc to a Microsoft Word/docx file.


Note that it would be easy enough to render each “FutureLearn webpage” doc as markdown directly from the original notebook source, into its own file that could presumably be added directly to FutureLearn, but that was seen as being overly complex compared to the original “copy rendered markdown from notebook into Word and then somehow generate markdown to put into FutureLearn editor” route.

import io, sys
import IPython.nbformat as nb
import IPython.nbformat.v4.nbbase as nb4

#Are we in a notebook segment?

#Quick and dirty count of notebooks

#The monolithic notebook is the content ex of the separate notebook content

#Load the original doc in'ORIGINAL.ipynb',nb.NO_CONVERT)

#For each cell in the original doc:
for i in mynb['cells']:
    if (i['cell_type']=='markdown'):
        #See if we can stop a standalone notebook code delimiter
        if ('START NOTEBOOK' in i['source']):
            #At the start of a block, create a new notebook
        elif ('END NOTEBOOK' in i['source']):
            #At the end of the block, save the code to a new standalone notebook file
        elif (innb):
    elif (i['cell_type']=='code'):
        #For the code cells, preserve any output text
        for o in i['outputs']:
        #Route the code cell as required...
        if (innb):

#Save the monolithic notebook

#Convert it to markdown
!ipython nbconvert --to markdown monolith.ipynb

##On a Mac, I got pandoc via:
#brew install pandoc

#Generate a Microsoft .docx file from the markdown
!pandoc -o monolith.docx -f markdown -t docx

What this means is that I can author a multiple chapter, multiple notebook minicourse within a single IPython notebook, then segment it into a variety of different standalone files using a variety of document types.

Of course, what I really should have been doing was working on the course material… but then again, it was supposed to be my not-OU today…;-)

PS The actual workflow, of course, turned out to be more traditional. Content for the FutureLearn website was copied from the notebooks into Word document, edited there, and then somehow converted to markdown for entry into FutureLearn. (I haven’t seen what the FutureLearn content entry forms look like – anyone got a user guide or screenshots they could share?) Which caused all sorts of fun with the tables and code styling…

Data Driven Press Releases From HSCIC Data – Diabetes Prescribing

By chance, I saw a tweet from the HSCIC yesterday announcing Prescribing for Diabetes, England – 2005/06 to 2014/15′ #hscicstats.

The data comes via a couple of spreadsheets, broken down at the CCG level.

As an experiment, I thought I’d see how quickly I could come up with a story form and template for generating a “data driven press release” that localises the data, and presents it in a textual form, for a particular CCG.

It took a couple of hours, and at the moment my recipe is hard coded to the Isle of Wight, but it should be easily generalisable to other CCGs (the blocker at the moment is identifying regional codes from CCG codes (the spreadsheets in the release don’t provide that linkage – another source for that data is required).

Anyway, here’s what I came up with:


Figures recently published by the HSCIC show that for the reporting period Financial 2014/2015, the total Net Ingredient Costs (NIC) for prescribed diabetes drugs was £2,450,628.59, representing 9.90% of overall Net Ingredient Costs. The ISLE OF WIGHT CCG prescribed 136,169 diabetes drugs items, representing 4.17% of all prescribed items. The average net ingredient cost (NIC) was £18.00 per item. This compares to 4.02% of items (9.85% of total NIC) in the Wessex (Q70) region and 4.45% of items (9.98% of total NIC) in England.

Of the total diabetes drugs prescribed, Insulins accounted for 21,170 items at a total NIC of £1,013,676.82 (£47.88 per item (on average), 0.65% of overall prescriptions, 4.10% of total NIC) and Antidiabetic Drugs accounted for 93,660 items at a total NIC of £825,682.54 (£8.82 per item (on average), 2.87% of overall prescriptions, 3.34% of total NIC).

For the NHS ISLE OF WIGHT CCG, the NIC in 2014/15 per patient on the QOF diabetes register in 2013/14 was £321.53. The QOF prevalence of diabetes, aged 17+, for the NHS ISLE OF WIGHT CCG in 2013/14 was 6.43%. This compares to a prevalence rate of 6.20% in Wessex and 5.70% across England.

All the text generator requires me to do is pass in the name of the CCG and the area code, and it does the rest. You can find the notebook that contains the code here: diabetes prescribing textualiser.