Local Council Spending Data – Time Series Charts

In What Role, If Any, Does Spending Data Have to Play in Local Council Budget Consultations? I started wondering about the extent to which local spending transparency data might play a role in supporting consultation around new budgets.

As a first pass, I’ve popped up a quick application up at http://glimmer.rstudio.com/psychemedia/iwspend2013_14/ [if that’s broken, try this one] (shiny code here) that demonstrates various ways of looking at open spending data from the Isle of Wight council. You can pass form items in via the URL (except to set the Directorate – oops!), and also search using regular expressions, but at the moment still need to hit the Search button to actually run the search. NOTE – there’s a little bug – you need to hit the Search button to get it to show data; note – selecting All directorates and no filter terms can be a bit slow to display anything…

Examples:

http://glimmer.rstudio.com/psychemedia/iwspend2013_14/?expensesType=(oil)|(gas)|(electricity)

http://glimmer.rstudio.com/psychemedia/iwspend2013_14/?serviceArea=mainland

http://glimmer.rstudio.com/psychemedia/iwspend2013_14/?supplierName=capita

I’ve started exploring various views over the data, but these need thinking through properly (in particular with respect to finding out views that may actually be useful!)

iw spend music expneses type

Hmm… did the budget change directorate?!

IW spend - music service area

IW spend music suppliers

Some more views over the suppliers tab – I started experimenting with some tabular views in the suppliers tab too…

IW spend music suppliers table 1

IW spend music suppliers table 2

This is all very “shiny” of course, but is it useful? From these early glimpses over the data, can you think of any ways that a look at the spending data might help support budget consultations? What views over the data, in particular, might support such an aim, and what sort of stories might we be able to tell around this sort of data?

Public Sector Transparency – Do We Need Open Receipts Data as Well as Open Spending Data?

Some time ago, in the post Using Aggregated Local Council Spending Data for Reverse Spending (Payments to) Lookups, I described a way of looking at local council spending data based on how much different councils spent with each other.

This technique generalises within and across sectors, so for example we could look at how hospitals spend money with each other, or how police authorities spend money with each other. In this way, we can get a picture of how public bodies buy -and sell – services off each other. The mappings don’t have to relate to spend, either – we could equally well use this sort of model to see how hospitals transfer patients to one another, or how mental health or social care services offer out-of-area cover to each other, or how councils and housing trusts manage transfers between each other.

The insight that lets us produce this sort of view is that we have entities of a particular sort (hospitals, for example, or local councils), entering into transactions with other entities of the same sort. If these sorts of entity all operate under the same transparency rules, a requirement to publish outgoing (spend) transactions, for example, then we can recreate incoming (receipt) transactions from each entity of the same sort. For example, if local councils are required to publish details of spend over £x, then we can also learn how much councils received from other local councils by means of transactions over £x.

As the UK Government at least seems hell bent on getting markets established in the delivery of public services, markets that can include private companies, then we are faced with a possible asymmetry in transparency information.

UK Gov PolicyMaking local councils more transparent and accountable to local people

The public should be able to hold local councils to account about the services they provide. To do this, people need information about what decisions local councils are taking, and how local councils are spending public money.

And from the NHS:

NHS – Transparency of Spend

As part of the government’s commitment to greater transparency, there is a requirement to publish online each NHS organisation’s expenditure over £25,000. In accordance with the requirement NHS Direct publish this on the basis of payments made in each calendar month.

For example, if hospital A buys significant services off hospital B, and must report that spend under transparency legislation, we can build up a picture not only relating to A’s spend, but also B’s sale of services, because A’s data relating to spend with B is openly available; which means B’s receipts from A are also available. (In this example, if items can be itemised as less than £25k per item, then this form of reporting under transparency guidelines is not required.)

If hospital A now buys service of company C, then we can look up spend from hospital A to get a picture of how much public money is flowing out to the private sector and into company C. That is, we can get an idea of company C’s receipts from openly published hospital spending data. (Of course, games could be played with itemisation – 10 treatments at £3k a treatment would result in a ‘must declare’ spend of £30k on the course of treatment, but an undeclarable £3k per treatment if billing is organised that way.)

But what if company C buys services off hospital B (maybe even subcontracting services it was contracted to deliver by hospital A)? If the spend data of company C is not subject to transparency requirements, and the receipt data from the hospital is not publicly available, we lose sight of how money is being spent within and across the public service.

Whilst private companies may balk at being required to publish details of their own spending data, we might still be able to recreate a picture of their spend with public services by requiring public bodies to also publish receipts data, along with the current requirement to publish spend data?

Wherefore Art Thou, Research Sector Transparency / Research Transparency Sector Board?

On June 28th, 2012, the open data policy white paper Unleashing the Potential was published by the Cabinet Office. In the section on “Opening Up Access to Research”, one particular paragraph runs as follows:

2.66 To further develop government policy on access to research, we are also establishing a Research Transparency Sector Board, chaired by the Minister for Universities and Science, which will consider ways in which transparency in the area of research can be a driver for innovation. Recognising that research data is different to other PSI [Public Sector Information, presumably? – ed.], the Board will consider how to implement transparency measures relating to research in a manner which protects the integrity of the research and associated intellectual property, while ensuring access to research for those SME entrepreneurs vital for driving growth. This will help to realise the full benefits for society as a whole. The Research Transparency Sector Board will consist of government departments, funding agencies and representatives from universities and other stakeholders, and among the first of its tasks will be to consider how to act on the recommendations of the Royal Society report.

The announcement of the board (referred to as the Research Sector Transparency Board – which makes more sense…) was welcomed by the Royal Society in a guest blog post on the data.gov.uk website dated 27th June 2012 (the day before the embargo lifted? I’m not sure when the blog post actually became public): An intelligently open enterprise.

The minutes of a Regular meeting of the ICO Higher Education sector panel on FOI and DP (24.09.2012) dated 16/10/12 notes the following:

Research data caused much concern. VA reminded delegates that she does need input from Research Councils and BIS in this area, as stated in the draft DD [HE definition document]. Definitions of “publicly funded” and “key outputs” may need clarification. It was noted that the Engineering and Physical Sciences Research Panel had to produce this type of data to an agreed timetable by 2015. It was also mentioned that the Open Data White Paper announced the formation of a new Research Sector Transparency Board and it was suggested that HEI research data could be linked to that format – it is not yet ready for use but might be worth noting in the new DD that this is a future aim.

Correspondence from House of Lords European Union Select Committee includes a letter from David Willetts MP dated 25 October 2012 that refers to his anticipated chairing of the Board:

On the question of Open Access (OA), I was pleased to note your expressed support for Open Data (OD) for which the UK is again identified as a good example. We have made excellent progress through the Finch Report on expanded access to research publications and the Government’s response to it. OD is at a relatively early stage. Some initiatives are already in train under Government’s Transparency Agenda, as detailed in the Cabinet Office White Paper, Open Data: Unleashing the Potential. This includes establishment of the Research Sector Transparency Board, which I shall be chairing. The Board will want to examine the complex issues around increasing the sharing of research data. The Research Councils’ published Open Access policy makes appropriate reference to research data, and the recent Royal Society report has informed the discussion, but work is needed on deciding further measures and implementing these appropriately, with the right terms and conditions and timing for disclosure.

We cannot be complacent and we will want to consider how best to monitor the take-up of Gold OA both here in the UK and overseas. The HEFCE-funded Joint Infrastructure Systems Committee (JISC), OAIG, and the Research Innovation Network (RIN) are already active in monitoring OA trends generally. HEFCE also envisages a possible role for JISC in monitoring the effectiveness – and effects – of Government OA policy. I expect that the Research Sector Transparency Board will also take an interest in OA policy implementation.

The 2012 BIS Annual Innovation Report from November 2012 referred to the announcement of the Board, making me wonder how many other Annual Reports celebrate the announcement of vapourwareentities?

10.3 Open data and transparency
We have continued to work to harness the potential and collaborative opportunities offered by wider use of open data.

In June 2012 the Government announced in its Open Data White Paper that we would set up a Research Sector Transparency Board. The Board will consider how transparency in research can be a driver for innovation and discovery while furthering the UK’s recognised excellence in science. It will advise Government transparency issues relating to the national research effort, and improved access for small and medium businesses to the research base. Amongst its first tasks will be to consider and address the recommendations of the Royal Society report, Science as an Open Enterprise, into the sharing and disclosing of research data.

We also established the Administrative Data Taskforce, in December 2011. It will publish proposals for new mechanisms and collaborative agreements to enable and promote the wider use of administrative data for research and policy purposes, before the end of the year.

(I’m not sure I’d picked up on the Administrative Data Taskforce before? It reported in December 2012: The UK Administrative Data Research Network: Improving Access for Research and Policy. This report looks like it could be worth reading – a quick skim reveals several sections on legal and ethical issues related to linking administrative data to other dataset.)

A Hansard reported Written Answer to the House of Lords from 12 Dec 2012 (Column WA241) from The Parliamentary Under-Secretary of State, Department for Business, Innovation and Skills (Lord Marland) on questions referring to open access to research data records:

Any further opening up of access to data, in the context of the wider open data agenda, would be the subject of future discussions with the research councils and other parties including the Data Strategy Board and representative university bodies. These policy issues would also be considered as appropriate by the Research Sector Transparency Board which is chaired by David Willetts. There are no proposals to change the research councils’ policy on access to data at this time.

The Russell Group response to the House of Lords Science and Technology Committee’s inquiry on open access publishing, dated 24 January 2013, makes the following reference to the board:

1.3 The Russell Group has been monitoring the development of open access (OA) policy for some time. We followed the ‘Finch Review’ and Royal Society work on science as an open enterprise with interest and the Russell Group is now represented on the Research Sector Transparency Board which will be covering OA, open data and other issues over the coming year. We have recently had a number of meetings with Research Councils UK (RCUK) to discuss implementation of OA policy.

This suggests that membership of the board has been decided upon, at least partially?

A HEFCE letter on Open access and submissions to the REF post-2014 dated 25/2/13 refers to the board in the following terms:

25. With the Research Councils and the Research Transparency Sector Board, we are giving consideration to the issues involved in increasing access to research data. We are committed to working in dialogue with the sector to develop fair and balanced mechanisms to achieve this aim.

Again, this suggests that the Board has been convened.

So I wonder:

  • What is tha actual name of the board – Research Transparency Sector Board or Research Sector Transparency Board ;-)? (Other sectors have Transparency Boards….)
  • What is the membership of the board and has it convened yet?
  • What are the terms of reference for the board?
  • If it has convened, where are the minutes?

By the by, I note the emergence of the Research Councils UK – Gateway to Research, which provides a single point of access to “[k]ey data from the seven UK Research Councils in one location.”

RCUK - Gateway to Research

This site appears to collate information about research grants, grantees, and publications by grant, across the Research Councils (I’m not sure if an #opendata dump is available though, which would mean I don’t need to scrape across all the sites using Scraperwiki any more?!;-)

PS it seems a tweet about the first meeting appeared whilst I was writing this post:

No linkage that I can see yet, though?

Practical Data Scraping – UK Government Transparency Data (Minister’s Meetings)

Earlier this week, I came across the Number 10 website’s transparency data area, which among other things has a section on who Ministers are meeting.

Needless to say, the Who’s Lobbying website has started collating this data and making it searchable, but I thought I’d have a look at the original data to see what it would take to aggregate the data myself using Scraperwiki.

The Number 10 transparency site provides a directory to Ministers’ meetings by government department on a single web page:

Number 10 transparency - ministers meetings

The links in the Ministers’ meetings, Ministers’ hospitality, Ministers’ gifts and Ministers’ overseas travel columns all point directly to CSV files. From inspecting a couple of the Ministers’ meetings CSV files, it looks as if they may be being published in a standardised way, using common column headings presented in the same order:

Ministers' meetings transparency data - csv format

Except that: some of the CSV files appeared to have a blank row between the header and the data rows, and at least one table had a blank row immediately after the data rows, followed some notes in cells that did not map onto the semantics of corresponding column headers. Inspecting the data, we also see that once a minister is identified, there is a blank in the first (Minister) column, so we must presumably assume that the following rows relate to meetings that minister had. WHen the data moves on to another minister, that Minister’s name/position is identified in the first column, once again then followed by blank “same as above” cells.

To get the data into scraperwiki means we need to do two things: extract meeting data from a CSV document and get it into a form whereby we can put it into the scraperwiki database; scrape the number 10 Minisiters’ meetings webpage to get a list of the URLs that point to the CSV files for each department. (It might also be worth scraping the name of the department, and adding that as additional metadata to each record pulled out from the CSV docs.)

Here’s the Scraperwiki code I used to scrape the data. I tried to comment it, so it’s worth reading through even if you don’t speak Python, because I’m not going to provide any more description here…;-)

import urllib
import csv
import md5
import scraperwiki


url = "http://download.cabinetoffice.gov.uk/transparency/co-ministers-meetings.csv"
# I have started just looking at data from one source.
# I am assuming, (dangerously), that the column headings are:
#   a) the same, and 
#   b) in the same order
# for different departments

data = csv.DictReader(urllib.urlopen(url))

# Fudge to cope with possibility of blank row between header and first data row
started=False

# Inspection of the data file suggests that when we start considering a Minister's appointments,
#   we leave the Minister cell blank to mean "same as above".
# If we want to put the Minister's name into each row, we need to watch for that. 
minister=''

for d in data:
    if not started and d['Minister']=='':
        # Skip blank lines between header and data rows
        continue
    elif d['Minister']!='':
        # A new Minister is identified, so this becomes the current Minister of interest
        if not started:
            started=True
        minister=d['Minister']
    elif d['Date']=='' and d['Purpose of meeting']=='' and d['Name of External Organisation']=='':
        # Inspection of the original data file suggests that there may be notes at the end of the CSV file...
        # One convention appears to be that notes are separated from data rows by at least one blank row
        # If we detect a blank row within the dataset, then we assume we're at data's end
        # Of course, if there are legitimate blank rows within the later, we won't scrape any of the following data
        # We probably shouldn't discount the notes, but how would we handle them?!
        break
    print minister,d['Date'],d['Purpose of meeting'],d['Name of External Organisation']
    id='::'.join([minister,d['Date'],d['Purpose of meeting'],d['Name of External Organisation']])
    # The md5 function creates a unique ID for the meeting
    id=md5.new(id).hexdigest()
    # Some of the original files contain some Latin-1 characters (such as right single quote, rather than apostrophe)
    #   that make things fall over unless we handle them...
    purpose=d['Purpose of meeting'].decode('latin1').encode('utf-8')
    record={'id':id,'Minister':minister,'date':d['Date'],'purpose':purpose,'lobbiest':d['Name of External Organisation'].decode('latin1').encode('utf-8')}
    # Note that in some cases there may be multiple lobbiests, separated by a comma, in the same record.
    # It might make sense to generate a meeting MD5 id using the original record data, but actually store
    #   a separate record for each lobbiest in the meeting (i.e. have lobbiests and lobbiest columns) by separating on ','
    # That said, there are also records where a comma separates part of the title or affiliation of an individual lobbiest.
    # A robust convention for separating different lobbiests in the same meeting (e.g. ';' rather than ',') would help

    scraperwiki.datastore.save(["id"], record) 

for d in data:
    #use up the generator, close the file, allow garbage collection?
    continue

Here’s a preview of what the scraped data looks like:

Ministers' meetings datascrape - scraperwiki

Here’s the scraper itself, on Scraperwiki: UK Government Transparency Data – Minister’s Meetings Scratchpad

Assuming that the other CSV files are all structured the same way as the one I tested the above scraper on, we should be able to scrape meeting data from other departmental spreadsheets using the same script. (Note that I did try to be defensive in the handling of arbitrary blank lines between the first header row and the data.)

One problem arises in the context of meetings with more than one person. Ideally, I think there should be a separate row for each person attending, so for example, the Roundtable on June, 2010 between Parliamentary Secretary (Minister for Civil Society), Nick Hurd MP and National Voices, MENCAP,National Council of Voluntary Organisations, St Christopher’s Hospice, Diabetes UK, Place 2 Be, Terrence Higgins Trust, British Heart Foundation, Princess Royal Trust for Carers, Clic Sargent might be mapped to separate data rows for each organisation present. If we take this approach, it might also make sense to ensure that each row carries with it a meeting ID, so that we can group all the rows relating to a particular meeting (one for each group in the meeting) on meeting ID.

However, there is an issue in identifying multiple attendee meetings. In the above example, we can simply separate the groups by splitting the attendees lists at each comma; but using this approach would then mean that the meeting with Secretary General, Organisation of the Islamic Conference, Ekmelledin Ihsanoglu would be mapped onto three rows for that meeting: one with Secretary General as an attendee, one with Organisation of the Islamic Conference as an attendee, and finally one with Ekmelledin Ihsanoglu identified as an attendee…

What this suggests to me is that it would be really handy (in data terms), if a convention was used in the attendees column that separated representation from different organisations with a semi-colon, “;”. We can then worry about how to identify numerous individuals from the same organisation (e.g. J Smith, P Brown, Widget Lobbying group), or how to pull out roles from organisations (Chief Lobbiest, Evil Empire Allegiance), names and roles from organisations (J Smith, Chief Lobbiest, UN Owen, Head Wrangler, Evil Empire Allegiance) and so on…

And I know, I know… the Linked Data folk would be able to model that easily.. but I’m talking about quick and dirty typographical conventions that can be easily used in simple CSV docs that more folk are comfortable with than are comfortable with complex, explicitly structured data…;-)

PS I’ll describe how to scrape the CSV urls from the Number 10 web page, and then loop through all of this to generate a comprehensive “Ministers’ meetings” database in a later post…

PPS a really informative post on the WHo’s Lobbying blog goes into further detrail about some of the “pragmatic reuse” problems associated with the “Ministers’ meetings” data released to date: Is this transparency? No consistent format for 500 more UK ministerial meetings.