By chance, a couple of days ago I stumbled across a spreadsheet summarising awarded PFI contracts as of 2013 (private finance initiative projects, 2013 summary data).
The spreadsheet has 800 or so rows, and a multitude of columns, some of which are essentially grouped (although multi-level, hierarchical headings are not explicitly used) – columns relating to (estimated) spend in different tax years, for example, or the equity partners involved in a particular project.
As part of the OU course we’re currently developing on “data”, we’re exploring an assessment framework based on students running small data investigations, documented using IPython notebooks, in which we will expect students to demonstrate a range of technical and analytical skills, as well as some understanding of data structures and shapes, data management technology and policy choices, and so on. In support of this, I’ve been trying to put together one or two notebooks a week over the course of a few hours to see what sorts of thing might be possible, tractable and appropriate for inclusion in such a data investigation report within the time constraints allowed.
To this end, the Quick Look at UK PFI Contracts Data notebook demonstrates some of the ways in which we might learn something about the data contained within the PFI summary data spreadsheet. The first thing to note is that it’s not really a data investigation: there is no specific question I set out to ask, and no specific theme I set out to explore. It’s more exploratory (that is, rambling!) than that. It has more of the form what I’ve started referring to as a conversation with data.
In a conversation with data, we develop an understanding of what the data may have to say, along with a set of tools (that is, recipes, or even functions) that allow us to talk to it more easily. These tools and recipes are reusable within the context of the data set or datasets selected for the conversation, and may be transferrable to other data conversations. For example, one recipe might be how to filter and summarise a dataset in a particular way, or generate a particular view or reshaping of it that makes it easy to ask a particular sort of question, or generate a particular sort of a chart (creating a chart is like asking a question where the surface answer is returned in a graphical form).
If we are going to use IPython notebook documented conversations with data as part of an assessment process, we need to tighten up a little more how we might expect to see them structured and what we might expect to see contained within them.
Quickly skimming through the PFI conversation, elements of the following skills are demonstrated:
- downloading a dataset from a remote location;
– loading it into an appropriate data representation;
– cleaning (and type casting) the data;
– spotting possibly erroneous data;
– subsetting/filtering the data by row and column, including the use of partial string matching;
– generating multiple views over the data;
– reshaping the data;
– using grouping as the basis for the generation of summary data;
– sorting the data;
– joining different data views;
– generating graphical charts from derived data using “native” matplotlib charts and a separate charting library (ggplot);
– generating text based interpretations of the data (“data2text” or “data textualisation”).
The notebook also demonstrates some elements of reflection about how the data might be used. For example, it might be used in association with other datasets to help people keep track of the performance of facilities funded under PFI contracts.
Note: the notebook I link to above does not include any database management system elements. As such, it represents elements that might be appropriate for inclusion in a report from the first third of our data course – which covers basic principles of data wrangling using pandas as the dominant toolkit. Conversations held later in the course should also demonstrate how to get data into an out of an appropriately chosen database management system, for example.
In Holding Companies to Account – Open Data Consolidation, I noted a couple of different ways in which we could use opendata to consolidate something of what we know about companies that provide services to or on behalf of public bodies, or otherwise receive monies from public services:
1) structural consolidation, in the sense of identifying companies that are part of the same corporate group;
2) financial consolidation, in the sense of identifying spend made to the same company from across different public bodies, and/or spend to different companies from the same corporate group from one or more public bodies.
In respect of the second notion, see also Open Data, Transparency, Fan-In and Fan-Out which describes how we can also start to consolidate connections and payments made between public bodies (also Public Sector Transparency – Do We Need Open Receipts Data as Well as Open Spending Data?).
I’ve previously doodled thoughts on whether there is a need for companies receiving public money to disclose those receipts (eg Spending & Receipts Transparency as a Consequence of Accepting Public Money?) – but whilst they may have no obligation to do so, the availability of open transactions data (and increasingly, open contracts data (eg The Local Government (Transparency) (Descriptions of Information) (England) Order 2014, h/t @owenboswarva) means that we can start to aggregate and publish this information, on their behalf, as part of a corporate watch activity:-)
So here’s a what if… What if there was a way we could set up “open public data reflector” sites that would aggregate data about a particular company or corporate group, aggregate it, and reflect it back? As a start, we could simply flip requirements put onto public bodies (eg publication of spend over £25,000 for large departments or services) to complementary views on the private corporate side (publication of all receipts over £25,000 from large public bodies, publication of summed receipts over £25,000 from local councils (who have a lower spending amount disclosure threshold). Of course, in the latter case, we’d need to aggregate the smaller amounts in order to calculate the sums.) By aggregating contract information, (additional) spend against contracts could also be tracked.
In this respect, I could imagine someone like SpendNetwork setting up a white label site that would allow civil society activists to fire up a ‘corporate watch’ website that reflects back open public data that refers to a particular company (something a little more sophisticated than their current raw data listings). If they made their data a little easier to access, I may be tempted to play with it…
Alongside the open public data reflector, it might be useful to have a “what do they know about you?” reflector that describes the sort of information the company holds about you that could be accessed via a Data Protection Act subject access request. (I’m not sure how we could find that out? Get several people to put requests in, extract the field names/metadata elements and publish those?) Thinks: wouldn’t it be nice if there was a request that could be made of data-controllers that forced them to disclose the fields and descriptive metadata for any data that would inspect when putting together a subject access request response?!;-) A “meta” subject access request, in other words?
PS Examples of outputs relating to aggregated spend with particular companies:
- Centre for Entrepreneurs/Spend Network report: Spend Small – Local Authority Spend Index.
If you know of any more, please let me have a link via the comments.
An article in the Observer declares Academy chain accused of ‘privatisation by stealth’ over plan to outsource jobs, a report on how the Academy Enterprise Trust (AET) has “selected” PricewaterhouseCoopers (PwC) to partner with it in establishing an LLP “to take over responsibility for providing school business managers, IT staff, secretarial staff, and finanicial expertise, along with speech and language therapy provision, education psychology, education welfare, curriculum development and professional development”.
Here’s a quick chart of academy groupings with at least 15 academies in the group:
As you might expect, when talking about their performance, the group in part assumes you must be referring to their financial rather than just educational performance:
(By the by, they really need to talk to their webteam about the domain mapping…)
As well as outsourcing everything, it looks like the AET is of a size where they can start raking it in as a provider of teacher training courses:
(I don’t know if this means they can use their own academies for placements, getting cheap labour and perhaps a training grant bung, as well as the more general student fees, for each placement?)
School Direct places are offered by the AET National Teaching School Alliance in partnership with an good or outstanding Ofsted accredited teacher training provider.
You will spend the majority of your training in the classroom, building your confidence in an academy environment whilst receiving excellent mentoring support.
You will complete placements in academies who are experienced in initial teacher training and who excel at training and developing their staff.
All School Direct places will lead to Qualified Teacher Status (QTS). In addition, some providers also accredit a Postgraduate Certificate in Education (PGCE)
Participating AET academies recruit and select their own trainees with the expectation that they will then go on to work within the Academies Enterprise Trust, although there is no absolute guarantee of employment.
You may apply directly to the Academies Enterprise Trust National Teaching School Alliance or directly to the AET academy that you wish to be trained in or contact the AET National Teaching School Alliance.
Scale-up, plug-in, live-off public money and private student debt, partner with “trusted auditors” to disburse funds to private providers (because they previously got rumbled when making disbursements to themselves…?) – is that how it works?
By the by, the following schools are part of the AET group:
- primary: Westerings Primary Academy (Rayleigh and Wickford), Plumberow Primary School (Rayleigh and Wickford), Ashingdon School (Rayleigh and Wickford), Weston Academy (Isle of Wight), Hamford Primary Academy (Clacton), Oaks Academy (Faversham and Mid Kent), St James the Great Academy (Tonbridge and Malling), Tree Tops Academy (Faversham and Mid Kent), Langer Primary Academy (Suffolk Coastal), Molehill Copse Primary School (Faversham and Mid Kent), Percy Shurmer Academy (Birmingham, Hall Green), Brockworth Primary Academy (Tewkesbury), Offa’s Mead Academy (Forest of Dean), Severn View Academy (Stroud), Noel Park Primary School (Hornsey and Wood Green), Trinity Primary Academy (Hornsey and Wood Green), Hall Road Academy (Kingston upon Hull North), Newington Academy (Kingston upon Hull West and Hessle), The Green Way Academy (Kingston upon Hull North), Charles Warren Academy (Milton Keynes South), Barton Hill Academy (Torbay), North Ormesby Primary Academy (Middlesbrough), Montgomery Primary Academy (Birmingham, Hall Green), Feversham Primary Academy (Bradford East), Shafton Primary Academy (Barnsley East), St Helen’s Primary School (Barnsley Central), Lea Forest Academy (Birmingham, Hodge Hill), Cottingley Primary Academy (Leeds Central), Beacon Academy (Loughborough), Anglesey Primary Academy (Burton), Four Dwellings Primary Academy (Birmingham, Edgbaston), Caldicotes Primary Academy (Middlesbrough), Meadstead Primary Academy (Barnsley Central), Hazelwood Academy (South Swindon), North Thoresby Primary School (Louth and Horncastle), The Utterby Primary School (Louth and Horncastle);
– secondary: Unity City Academy (Middlesbrough), New Rickstones Academy (Witham), Greensward Academy (Rayleigh and Wickford), Maltings Academy (Witham), Clacton Coastal Academy (Clacton), Aylward Academy (Edmonton), Nightingale Academy (Edmonton), Richmond Park Academy (Richmond Park), Tendring Technology College (Clacton), Bexleyheath Academy (Bexleyheath and Crayford), Everest Community Academy (Basingstoke), Ryde Academy (Isle of Wight), Sandown Bay Academy (Isle of Wight), East Point Academy (Waveney), Felixstowe Academy (Suffolk Coastal), Millbrook Academy (Tewkesbury), The Duston School (South Northamptonshire), The Rawlett School (An Aet Academy) (Tamworth), The New Forest Academy (New Forest East), Childwall Sports & Science Academy (Liverpool, Wavertree), Sir Herbert Leon Academy (Milton Keynes South), Tamworth Enterprise College and AET Academy (Tamworth), Winton Community Academy (North West Hampshire), Broadlands Academy (North East Somerset), Greenwood Academy (Birmingham, Erdington), Cordeaux Academy (Louth and Horncastle), Four Dwellings Academy (Birmingham, Edgbaston), Kingsley Academy (Brentford and Isleworth), Kingswood Academy (Kingston upon Hull North), Swallow Hill Community College (Leeds West), Firth Park Academy (Sheffield, Brightside and Hillsborough), Hillsview Academy (Redcar);
– special: Columbus School and College (Chelmsford), The Pioneer School (Basildon and Billericay), Wishmore Cross Academy (Surrey Heath), Greenfield Academy (Stroud), Peak Academy (Stroud), The Ridge Academy (Cheltenham), Newlands School (Camberwell and Peckham).
Just doodling, looking for sources of data to try to map out the evolution of corporate groups that are taking over the operation of public services (another few possible pieces in the holding companies to account jigsaw…)
Health and Social Care
The CQC publishes a spreadsheet detailing all the locations it inspects which includes a group/brand identifier that allows us to track individual companies that operate as part of a group or under a common brand [CQC data].
Justice and detention
I haven’t found a nice dataset that identifies the operators of prison and detention facilities. The Ministry of Justice publish Prison and probation trusts performance statistics that includes management information detailing the names of prisons and trusts, but not their operators. There is an unstructured list of contracted out prison operators but no data file? I can’t find a list of operators of immigration removal/detention centres? (Serco, GEO and MITIE may all be operators?)
SLAs and general contract details here – MOJ – Prison Service level agreements and contracts – though no mention of private contractor names? Other possible sources: inspection reports from HM Inspectorate of Prisons. Reports are presented as PDF documents, and appear to include a fact page that includes a statement of Escort contractors, Health service providers and Learning and skills providers. I haven’t found a simple datafile that clearly states the providers of these service types by prison?
As far as probation service delivery goes, I’m not sure what sort of private contracts the probation trusts operate – one way in might be to trawl through the spending data (eg Probation trusts spend over £25,000 (note: that page says “We do not publish everything and HMT lists transactions not to be included in their guidance on data inclusions and redactions” although no link is given to that guidance document…).
The Department for Education publishes a spreadsheet detailing Open academies and academy projects in development that includes the name of the sponsoring organisation.
The Homes and Communities Agency publish results from “an annual online survey completed by all English private registered providers of social housing (PRPs)” – Statistical Data Return (SDR) (see particularly the statistical return full data sheet; amongst other things, this document includes a sheet that associates subsidiary companies with their parent housing trusts).
I don’t really know much about PFI (I don’t really know much about anything linked to in this post!) except that PFI contracts often have large numbers of 000s associated with them. Private Finance Initiative Projects summary data is perhaps one quick way into to exploring that whole can of worms further?
So what other areas, and what other data sources, am I missing?
PS writing about a collaboration between OpenOil and OpenCorporates to map the structure of the BP corporate sprawl, Glyn Moody notes the realisation by OpenOil that “[t]he entire play in the way multinationals operate is in the interplay between the group as a co-ordinated whole…”:
…[T]his unified strategy is played out across over a thousand affiliate companies who each exist as a separate legal “person”. The company naturally seeks to maximise advantage across jurisdictions by combining these different legal persons in the most profitable and least liable way for any given business problem. But even if the group does act with one mind, the price of being able to maintain the affiliate structure as separate legal persons is a bare minimum of autonomous reporting by each of them.
It was as if the … group is a superorganism and its affiliates were the constituent organisms included in the whole, like individual ants or coral. None of those companies had any purpose or would even survive without being integrated into the colony. Nevertheless, each of them has a unique footprint and what we were doing was studying the traces of their uniqueness, their “genetic code”, to see if significant information was stored there which could tell us something about the internal functioning of the colony.
As soon as you start working with data in networked model (where you look for links and relationships that can aggregate data from multiple sources) you soon realise that apparently meaningful or coherent gross level patterns or structures can emerge from the simple interactions and behaviours of individual components. (Supervenience is the term used to describe how the properties of the higher level are derived from (or supervene on) the lower level properties or behaviours. However, in the case of a corporate netwrok, we might image that a goal state created at the higher level by actors with a view over the whole system is what is actually driving the behaviour of subservient lower level actors. An ecological view of the corporate network might look to elements of downward causation to try to explain the behaviour of the lower level parts.)
PPS It’s probably 20 years since I last read heavily, and thought a lot about, parts, whole, levels, supervenience and emergence. And I seem to have forgotten most of it:-( I wonder if the box loads of photocopied papers are still “archived” in my office somewhere… Hmmm…
According to Wikipedia:
In business, consolidation or amalgamation is the merger and acquisition of many smaller companies into much larger ones. In the context of financial accounting, consolidation refers to the aggregation of financial statements of a group company as consolidated financial statements.
I’ve been pondering the use of open data for holding companies to account again (see also here and here, for example) and a couple of ways forward seem to be crystallising out for me, at least in the way(s) I’ve been hacking some data sketches around. These ways loosely map on to the two senses of consolidation described above, I think?
In the first case, using open data sources to map out corporate groupings, or look at how companies start to consolidate into corporate groupings. The OpenCorporates folk are looking at doing this properly – based on share ownership of one company by another – but I’m looking for other signals and sources of data that allow us to associate company names within a wider corporate sprawl. For example, CQC data lists all the locations inspected by that body, along with the group or brand name (if any) under which a particular location operates. We can then use this information to identify all the locations associated with a particular brand or group.
Whilst doing this in the context of sponsoring organisations for school academies, it struck me that once several independent locations have been established or aggregated together as part of a group, if those groups are driven by “growth” strategies, we will presumably start to see merger and acquisition behaviours? [See also other possible courses of action that larger groupings may take: School Chain Locks Out Public Service Values?.] By using open data sources, we may be able to track the first – and then possibly second – phase of this sort of consolidating activity?
In the second case, part of the rationale for identifying corporate groupings is so that we can start to consolidate information about payments made to, and quality or evaluation reports relating to, the members of a particular group. That is, we can start to think about a form of consolidated accounting. For example, we can start to total up all the payments made by the public sector (across both national and local government) to a particular corporate grouping, possibly across several spending areas; or we can look at the quality reports relating to different contracts raised by a particular corporate group as a whole and make a judgement about the service levels delivered by that operator in general. This consolidated quality and/or financial reporting also provides us with a way of looking at the gross behaviour of a company grouping, and comparing it, in accountability terms, with national public services, for example.
I’ve long since been confused about what open data may or may not be good for in accountability or transparency terms, but now I feel as if it’s starting to make sense to me: as a way of shining some light onto the behaviour of private companies operating in the public sector, and also as a way of demonstrating just how much public money is sunk into some of them compared to finding made available to public bodies, for example.
If we could also get tax positions of companies and corporate groups more clearly illuminated as accessible data sources, along with information about their employment and payment practices (so we could, for example, run models on the extent to which the state is also likely to subsidise these companies’ operations through tax, housing and welfare benefits/payments made to their employees compared to those made to public sector employees), we could start to get a better idea about the way public money is actually being spent system wide?
Via one of my feeds from TheyWorkForYou, I noticed this written answer to a question my MP, Andrew Turner, asked of the Secretary of State for International Development:
I’ve been playing around with development data lately, trying to sketch together some pieces for an OpenLearn course on data visualisation for development (hopefully!), so I thought this would be a good test of how quickly I could find the data and confirm the results.
Working backwards, GDP data (in various adjusted forms) is available from the World Bank API, which I’ve been accessing via the remote data interface calls in pandas (for example, Easy Access to World Bank and UN Development Data from IPython Notebooks).
So where do get the aid ranking from?
There are two ways of doing this – one to look for local UK sources (eg from DFID perhaps), the other to look for international sources of data. The advantage of the former is that these are presumably the sources that whoever answered the question went to. The advantage of the latter is that we should be able to generalise the question to query similar rankings for aid distributed by other countries.
“Official Development Assistance” seems to be a key phrase, with a quick websearch for that phrase and the term “data” turning up this Aid statistics – charts, tables and databases resource page, which in turn points to a whole raft of datatables as Excel files detailing statistics on resource flows to developing countries; the International Development Statistics (IDS) online databases page links to several more general online databases. (There’s also a beta data.oecd.org site.)
Forsaking the raw data files for a minute, the site claims that “the Query Wizard for International Development Statistics [QWIDS] is the easiest way to search our database as it automatically extracts the most appropriate dataset from OECD.Stat to match your search” – so let’s try that… QWIDS.
Nice and simple then…?!
A bit of tinkering (setting the donor, unticking recipients so only countries – rather than countries and groupings are included) gives what I think is the data for the aid disbursements from the UK to other countries, data I could export as a CSV file; but there are no tools onsite to help me look at the top 10.
Poking around, it looks like the data’s also there to allow us to look at disbursements (or perhaps just allocations) by donor country and sector into a particular country? Maybe?! This would then let us see how aid was being allocated from the UK to the top 10 recipients, broken down by sector, which might be more illuminating? I also wonder if there are any relationships between aid paid by donors into a particular sector, and imports into the recipient country from the donor country within the same sectors? For this, we need trade data breakdowns. (We can get total flows between countries (I think?!) but I’m not sure how to find the data broken down by sector?)
The stats.oecd.org site does let us sort, but I couldn’t find an easy or clean way to limit results to countries, and exclude groupings:
The order (of aid disbursements from the UK in 2012) has the same rank order as the response to my MP’s question.
For the GDP and GDP per capita data, we can go to the World Bank:
Note a couple of things – units tend to be given in US dollars rather than Sterling; there are all sorts of US dollars… (see for example Accounting for Inflation – Deflators, or “What Does ‘Prices in Real Terms’ Actually Mean?”).
Hmm… maybe it would have been easier to find the data on the DFID site instead…
PS Indeed it was – Statistics on International Development 2013 – Tables has a link to a dataset that contains the league table: “Table 4: Top Twenty Recipients UK Net Bilateral ODA 2010 – 2012″.
The front page of this week’s Isle of Wight County Press describes a tragic incident relating to a particular care home on the Island earlier this year:
(Unfortunately, the story doesn’t seem to appear on the County Press’ website? Must be part of a “divide the news into print and online, and never the twain shall meet” strategy?!)
As I recently started pottering around the CQC website and various datasets they publish, I thought I’d jot down a few notes about what I could find. The clues from the IWCP article were the name of the care home – Waxham House, High Park Road, Ryde – and the proprietor – Sanjay Ramdany.
Using the “CQC Care Directory – With Filters” from the CQC data and information page, I found a couple of homes registered to that provider.
1-120578256, 19/01/2011, Waxham House, 1 High Park Road, Ryde, Isle of Wight, PO33 1BP 1-120578313, 19/01/2011, Cornelia Heights 93 George Street, Ryde, Isle of Wight, PO33 2JE 1-101701588, Mr Sanjay Prakashsingh Ramdany & Mrs Sandhya Kumari Ramdany
Looking up “Waxham House” on the CQC website gives us a copy of the latest report outcome:
Looking at the breadcrumb navigation, it seems we can directly get a list of other homes operated by the same proprietors:
I wonder if we can search the site by proprietor name too?
Looks like it…
So how did their other home fare?
By the by, according to the Food Standards Agency, how’s the food?
And how much money is the local council paying these homes?
[Click through on the image to see the app - hit Search to remove the error message and load the data!]
Why the refunds?
A check on OpenCorporates for director names turned up nothing.
I’m not trying to offer any story here about the actual case reported by the County Press, more a partial story about how we can start to look for data around a story to see if there may be more to the story we can find from open data sources.