A Quick Peek at Some Charities Data…

As part of the Maximising Your Data Impact launch event of the Civil Society Data Network (reviewed in part by the Guardian here: Open data and the charity sector: a perfect fit), a series of mini-data expeditions provided participants with the opportunity to explore a range of data related questions relevant to the third sector. This was my first data exploration, and rather than deep dive into the data (which I probably should have done!) I was rather more interested in getting a feel for how potential data users actually discussed data related questions.

The topic my group was exploring was how to decide whether it made more sense to make a donation to a small or large charity. Unpacking this question a little forces a consideration of what the donor perceives to be a good use of their donation. To try to make the question more tractable, we also focussed on a particular scenario: how might a donor wishing to contribute to a charity related in some way to the hospice movement draw comparisons between them in order to inform their decision.

A scan of the Charity Commission website reveals that information is available for many charities relating to the size of the organisation as given by the number of staff or number of volunteers, as well as a certain amount of financial reporting:

charity commission comparison

Unfortunately, there doesn’t seem to be a way of comparing charities across a sector, nor does the data appear to be available as such. However, OpenCharities does make the data available in a rather more convenient form (that is, in a machine readable form) at web addresses/URLs of the form http://opencharities.org/charities/OPENCHARITIESID.json.

opencharities json

OpenCharities also publish a CSV file (a simple text based tabular data file) that contains crude summary information about UK registered charities, including such things as the charity name, its OpenCharities ID number, a summary description of each charity’s major activities, its registered address, and its social media profile information. If we grab this data, and pull from it the charities that are of interest to us, we can then use the OpenCharities IDs to create URLs from which we can pull the more detailed data.

Grabbing the CSV data file as charities.csv, we can filter out the rows containing items relating to hospices. From a Linux/Mac terminal command line, we can use the grep tool to grab rows that mention “hospice” somewhere in the line, and then create a new file using those rows appended to the column header row from the original file, putting the result into the file hospices.csv:

grep hospice charities.csv > tmp.csv
head -1 charities.csv | cat tmp.csv > hospice_charities.csv

and then upload the resulting filtered file to a Google spreadsheet.

Alternatively, we could load the whole OpenCharities CSV file into a tool such as OpenRefine and then filter the rows using the text filter on an appropriate column to select just rows mentioning “hospice” within a given column. Using the Custom Tabular Export, we could then upload the data directly to Google Spreadsheet.

Having got a list of hospice related charities into a Google Spreadsheet, we can Share the spreadsheet and also publish it (form the File menu). Publishing Google Spreadsheets makes the data available in a variety of formats, such as CSV data, via a web location/URL.

Our recipe so far is as follows:

– get CSV file of charities on OpenCharities;
– filter the file to get charities associated with hospices;
– upload the filtered file to Google Spreadsheets;
– publish the spreadsheet so we have it available as CSV at known URL.

The next step is to grab the more detailed data down from OpenCharities using the OpenCharities ID to construct the web addresses for where we can find that data for each hospice. We could use a tool such as OpenRefine to do this, but instead I’m going to write a short programme on Scraperwiki (OpenCharities Scraper) using the Python programming language to do the task.

#We need to load in some programme libraries to make life easier
import scraperwiki,csv,urllib2,json

#Here's where we load in the spreadsheet data published on Google Spreadsheets as a CSV file
charities = csv.DictReader((urllib2.urlopen('https://docs.google.com/spreadsheet/pub?key=0AirrQecc6H_vdFVlV0pyd3RVTktuR0xmTTlKY1gwZ3c&single=true&gid=1&output=csv')))

#This function will grab data about a charity from OpenCharities given its OpenCharities ID
def opencharitiesLookup(id):
    url = 'http://opencharities.org/charities/'+id+'.json'
    jsondata = json.load(urllib2.urlopen(url))
    return jsondata

#This routine cycles through each row of data/charity pulled from the spreadsheet
for charity in charities:
    #print charity

    #For each charity, I'm going to pull out several data fields that we'll then save to a database of our own
    #We start with some info from the spreadsheet - charity ID, name and activity summary
    data={'cid':charity['charity_number']}
    for tmp in ['title','activities']:
        data[tmp]=charity[tmp]

    #Here's where we pull in the more detailed data for each charity
    jdata = opencharitiesLookup(charity['charity_number'])

    #Then we start to go hunting for the data...
    chdata = jdata['charity']
    fdata = chdata['financial_breakdown']

    #The data will include employee and volunteer numbers..
    for tmp in ['volunteers','employees']:
        data[tmp] = chdata[tmp]

    #...as well as financial information
    for tmp in ['assets','spending','income']:
        if fdata != None and tmp in fdata:
            for tmp2 in fdata[tmp]:
                data[tmp+'_'+tmp2] = fdata[tmp][tmp2]
    #print data

    #Here's where we save all the data for a charity to a database
    scraperwiki.sqlite.save(unique_keys=['cid'], table_name='hospices', data=data)

When we run the scraper, it looks up the hospice data in the Google Spreadsheet, gets the richer data from OpenCharities, and pops it into a local database on Scraperwiki from where we can download the data:

scraperwiki opencharities

We can now download this enriched data as a CSV file and then, from Google Drive:

open a fusion table from drive

upload it to Google Fusion tables.

The recipe has now been extended as follows:

– pull the list of hospices into Scraperwiki from the Google Spreadsheet
– for each hospice, create an OpenCharities URL that points to the detailed data for that charity
– for each hospice, grab the corresponding data from that OpenCharities URL
– for each hospice, pull out the data elements we want and pop it into a Scraperwiki database
– download the database as a CSV file that now contains detailed data for each charity
– upload the detailed data CSV file to Google Fusion Tables

With the data in Google Fusion Tables, we can now start to analyse it and see what stories it might have to tell:

import csv to fusion table

Here it is…

data in fusion tables

We can then start to build charts around the data:

fusion tables add chart

Using the chart builder, you can specify which quantities to plot against each axis:

building up a fusion chart

(?I don’t know how to associate the title column (name of each charity) with points on the chart, so that when we hover over a point we can see which charity it relates to?)

We can also define out own calculations, much as you would in a spreadsheet.

fusion table add formula

For example, we might be interesting in knowing the ratio of income received from voluntary contributions versus the amount spent soliciting voluntary contributions. If this ratio is greater than 1, more money is spent soliciting these contributions than is received as a result. If the ratio is small (close to zero) then either the money spent soliciting voluntary contributions is effective, or voluntary income is being generated by other other means…

fusion table calc formula

We can then use one or more filters to explore which hospices meet different criteria. For example, how about organisations that seem to get a good voluntary income return versus spend on generating voluntary income, with the proviso that the voluntary income is less than £250,000:

build up filter

Along similar lines, we might want to calculate the ratio of volunteers to employees, and then then view the data for organisations with a certain number of employees (10 to 50, for example), or a certain number of volunteers (less than 500, say).

This is now the point at which story making – story generation, and data story telling – can start to kick in, based on the questions we want to ask of the data, or the stories we want to try to get it to tell us…

Organisations Providing Benefits to All-Party Parliamentary Groups, Part 1

Via a tweet from the author, I came across Rob Fenwick’s post on APPGs – the next Westminster scandal? (APPG = All Party Parliamentary Groups):

APPGs are entitled to a Secretariat. Set aside any images you have of a sensibly dressed person of a certain age mildly taking dictation, the provision of an APPG Secretariat is one of the main routes used by public affairs agencies, charities, and businesses to cosey up to MPs and Peers. These “secretaries” often came up with the idea of setting up the group in the first place, to advance the interests of a client or cause.

The post describes some of the organisations that provide secretariat services to APPGs, and in a couple of cases also takes the next step: “Take the APPG on the Aluminium Industry, the secretarial services of which are provided by Aluminium Federation Ltd which is “a not-for-profit organisation.” That sounds suitably reassuring – if the organisation is not-for-profit what chance can there be of big business buying favoured access? It’s only when you look at the Federation’s website, and examine each of its nine sub-associations in turn, that it becomes clear that this not-for-profit organisation is a membership umbrella for private business. This is above board, within the rules, published, and transparent. Transparent if you’re prepared to invest the time to look, of course.”

It’s worth reading the post in full… Go on…

… Done that? Go on.

Right. Here’s the list of registered All Party Groups. Here’s an example of what’s recorded:

APG form

Conveniently, there’s a Scraperwiki scraper (David Jones / All Party Groups) grabbing this information, so I though I’d have a play with it.

Looking at the benefits, there is a fair bit of convention in the way benefits are described. For example, we see recurring things of the form:

  • 5000 from CAFOD, 6000 from Christian Aid – that is, [AMOUNT] from [ORGANISATION]
  • Age UK (a charity) acts as the groups secretariat. – that is, [ORGANISATION] {(OPTIONAL_TYPE)} acts as the groups secretariat.

We could parse these things out directly (on the to do list!) but as a short cut, I thought I’d try a couple of content analysis/entity extraction services to see if they could pull out the names of companies and charities from the benefits list. You can find the scraper I used to enhance David Jones’ APPG scraper here: APG Enhancer.

Here are a couple of sample reports from my scraper:

This gives a first pass attempt at extracting organisation (company and charity) names from the APPG register, and in passing provides a partial directory for looking up companies by APG (partial because the entity extractors aren’t perfect and don’t manage to identify every company, charity, association or other recognised group.

A more thorough way to look up particular companies is to do a site’n’path limited web search: eg
aviva site:http://www.publications.parliament.uk/pa/cm/cmallparty/register/

How might we go further, though? One way would be to look up companies on OpenCorporates, and pull down a list of directors:

opencorposrates - look up directors

And then we can start to walk through the database of directors, looking for other companies that appear to have the same director:

opencorporates - director lookup

(Note: we need to watch out for false positives, whereby one director has the same name as another person who is also a company director. There may be false negatives too, where we don’t find a directorship held by a specific person because a slightly different variation of their name was used on a registration document.)

We can also look up charities on OpenCharities to find charity trustees:

OpenCharities charity lookup

If we’re graph walking, we might then look up the trustees on OpenCorporates to see whether or not the trustees are directors of any companies with possible interests in the area, and as a result identify companies who may be trying to influence Parliamentarians through APPGs that benefit from the direct support of a charity, via that charity.

In this way, we can start to build out a wider direct interest graph around a Parliamentary group. I’m not sure how useful or even how meaningful any of this actually is, but it’s increasingly possible, and once the scripted patterns are there, increasingly easy to deploy in other contexts (for example, wherever there is a list of company names, charity names, or names of people who may be directors. I guess a trustee search on OpenCharities may also be available at some point? From a graph linking point of view, I also wonder if any charities share registered addresses with companies, etc…)

PS by the by, here’s a guest post I just wrote on the OpenCorporates blog: Data Sketching With the OpenCorporates API.