Validating Local Spending Data

In passing, I noticed on the Local Government Association (LGA) website a validator for checking the format of documents used to publish local council spending data, as well as various other data releases (contracts, planning applications, toilet locations, land holdings etc): LGA OpenData Schema Validator.


I wonder how many councils are publishing new releases that actually validate, and how many have “back-published” historical data releases using a format that validates?! When officers publish data files, I wonder how many of them even try to download and open the data files they have just published (to check the links work, the documents open as advertised, and also appear to contain what’s expected), let alone run either the uploaded or downloaded files through the validator (it often makes sense to do both: check the file validates before you publish it, then download it and check the downloaded version, just in case the publishing process has somehow mangled the file…)

Guidance for the spending data releases can be found here: Local government open data schemas: Spend

Documentation regarding the release of procurement and spending information (v. 1.1 dated 14/12/2014) can be found here: Local transparency guidance – publishing spending and procurement information.

I’ve still no real idea how to make interesting use of this data, or how DCLG expect folk to make use of it?!;-)

January 7, 2015 at 2:42 pm

Pondering Local Spending Data, Again…

Last night I saw a mention of a budget review consultation being held by the Milton Keynes Council. I’ve idly wondered before about whether spending data could be used to inform these consultations, for example by roleplaying what the effects of a cut to a particular spending area might be at a transactional level. (For what it’s worth, I’ve bundled together the Milton Keynes spending data into a single (but uncleaned) CSV file here and posted the first couple of lines of a data conversation with it here. One of the things I realised is that I still don’t know how to visualise data by financial year, so I guess I need to spend some time looking at pandas timeseries support).

Another transparency/spending data story that caught my eye over the break was news of how Keighley Town Council had been chastised for its behaviour around various transparency issues (see for example the Audit Commission Report in the public interest on Keighley Town Council). Among other things, it seems that the council had “entered into a number of transactions with family members of Councillors and employees” (which makes me think that an earlier experiment I dabbled with that tried to reconcile councillor names with: a) directors of companies in general; b) directors of companies that trade with a council may be a useful tool to work up a bit further). They had also been lax in ensuring “appropriate arrangements were in place to deal with non-routine transactions such as the recovery of overpayments made to consultants”. I’ve noted before that where a council publishes all its spending data, not just amounts over £500, including negative payments, there may be interesting things to learn (eg Negative Payments in Local Spending Data).

It seems that the Audit Commission report was conducted in response to a request from a local campaigner (Keighley investigation: How a grandmother blew whistle on town council [Yorkshire Post, 20/12/14]). As you do, I wondered whether the spending data might have sent up an useful signals about any of the affairs the auditors – and local campaigners – took issue with. The Keighley Town Council website doesn’t make it obvious where the spending data can be found – the path you need to follow is Committees, then Finance and Audit, then Schedule of payments over £500 – and even then I can’t seem to find any data for the current financial year.

The data itself is published using an old Microsoft Office .doc format:


The extent of the data that is published is not brilliant… In terms of usefulness, this is pretty low quality stuff…


Getting the data, such as it is, into a canonical form is complicated by the crappy document format, though it’s not hard to imagine how such a thing came to be generated (council clerk sat using an old Pentium powered desktop and Windows 95, etc etc ;-). Thanks to a tip off from Alex Dutton, unoconv can convert the docs into a more usable format (apt-get update ; apt-get install -y libreoffice ; apt-get install -y unoconv); so for example, unoconv -f html 2014_04.doc converts the specified .doc file to an HTML document. (I also had a look at getting convertit, an http serverised version of unoconv, working in a docker container, but it wouldn’t build properly for me? Hopefully a tested version will appear on dockerhub at some point…:-)

This data still requires scraping of course… but I’m bored already…

PS I’m wondering if it would be useful to skim through some of the Audit Commission’s public interest reports to fish for ideas about interesting things to look for in the spending data?

January 6, 2015 at 12:58 pm

Confused Fragments About Open Data Economics…

Some fragments…

the public paid for it so public has a right to it: the public presumably paid for it through their taxes. Companies that use open public data that don’t fully and fairly participate in the tax regime of the country that produced the data then they didn’t pay their fair share for access to it.

data quality will improve: with open license conditions that allow users to take open (public) data and do what they want with it without the requirement to make derived data available in a bulk form under an open data license, how does the closed bit of the feedback loop work? I’ve looked at a lot of open public data releases on council and government websites and seen some companies making use of that data in presumably a cleaned form (if it hasn’t been cleaned, then they’re working with a lot of noise…) But if they have cleaned and normalised the data, have they provided this back ion an open form to the public body that gifted them access to it? Is there an open data quality improvement cycle working there? Erm… no… I suspect if anything, the open data users would try to sell the improved quality data back to the publisher. This may be their sole business model, or it may be a spin-off as a result of using the (cleaned and normalised) data fro some other commercial purpose.

September 9, 2014 at 1:32 pm

Corporate Groupings in Care Provision – Finding the Data for GP Practices, Prequel…

For some time I’ve been pondering the best way of trying to map the growth in the corporate GP care provision – the number of GP practices owned by Virgin Care, Care UK and so on. Listings about GP practices from the various HSCIC datasets don’t appear to identify corporate owners, so the stop gap solution I’d identified was to scrape lists of practices from the various corporate websites and then try to reconcile them against GP practice codes from the HSCIC as some sort of check.

However, today I stumbled across a dataset released by the Care Quality Commission (CQC) that provides a “complete directory of places where CQC regulated care is provided in England” [CQC information and data]. Two data files are provided – a simple register of locations, and “a second file … which contains details of registered managers and care home bed numbers. It also allows you to easily filter by the regulated activities, service types or service user bands.”

Both files contain fields that allow you to identify GP practices, but the second one also provides information about the actual provider (parent company owner) and any brand name associated with the service. Useful…:-)

What this means is it should be easy enough to pull the data into a report that identifies the practices associated with a particular brand or corporate group… (I’ll have a go at that as soon as I get a chance…)

Another thing that could be useful to do would be to match (that is, link) the location identifiers used by the CQC with the practice codes used by the HSCIC. [First attempt here…. Looks like work needs to be done…:-(] Then we could easily start to aggregate and analyse quality stats, referring and prescribing behaviour data, and so on, for the different corporate groupings and look to see if we can spot any meaningful differences between them (for example, signals that there might be corporate group level policies or behaviours being applied). We could probably also start to link in drug trial data, at least for trials that are registered, and that we can associate with a particular practice (eg Sketching Sponsor Partners Running UK Clinical Trials).

Finally, it’d possibly also be useful to reconcile companies against company registrations on Companies House, and perhaps charity registrations with the Charities Commission (cf. this quick data conversation with the 360 Giving Grant Navigator data).

PS more possible linkage:
– company names to company IDs on OpenCorporates (and from that we can look for additional linkage around registered company addresses, common directors etc)
– payments from local gov and NHS to the companies (from open spending data/transactions data)
– food hygiene inspection ratings (eg for care homes)

September 9, 2014 at 12:12 pm

