Archive for the ‘Open Data’ Category
I haven’t done a round up of open data news for a bit, so I here’s a quick skim through some of my current open browser tabs on the subject.
First up, the rather concerning news that DCLG [are] to withhold funding from district over transparency “failings”. Noticing a failure to publish information about a termination of employment pay settlement, it was also noticed in a letter to Rother District Council that it “appears from your website that your council has not published data in a number of important areas, for example, contracts over £5,000, land and assets, senior salaries, an organisation chart, trade union facility time, parking revenues, grants to the voluntary sector and the like”, in contravention of the Local Government Transparency Code 2014.
From running several open data training days for representatives of local councils and government departments on behalf of Open Knowledge recently, I know from looking at council websites that finding “transparency” data on official websites is not always that simple. (The transparency” keyword sometimes(?!) works, “open data” frequently doesn’t…; often using a web search engine with a site: search limit on the council website works better than the local search.) I’m still largely at a loss as to what can usefully be done with things like spending data, though I do have a couple of ideas about how we might feed some of that data into some small investigations…
However, I’m not convinced that punishing councils by withholding funding is the best approach to promoting open data publication. On the other hand, promoting effective data workflows that naturally support open data publishing (and hopefully, from that, “an increase in reuse within the organisation as data availability/awareness/accessibility improves”) and encouraging effective transparency through explaining how decisions were made in the context of available data (whilst at the same time making data available so that the data basis of those decisions can be checked) would both seem to be more useful approaches?
The funding being withheld from Rother Council seems to be the new burdens funding, which presumably helps cover the costs of publishing transparency data. Something that’s bugged me over the years (eg Some Sketchnotes on a Few of My Concerns About #opendata) is how privatisation of contracts is associated with several asymmetries in the public vs private provision of services. On the one hand, public bodies have transparency and freedom of information “burdens” placed on them, which means: 1) they take a financial hit, needing to cover the costs of meeting those burdens; 2) they are accountable, in that the public has access to certain sorts of information about their activities. Private contractors are not subject to the same terms, so not only can they competitively bid less than public bodies for service delivery, they also get to avoid the same public disclosure requirements about their activities and potentially are less accountable than the public body counterparts; who remain overall accountable as commissioners of the services; but presumably have to cover the costs of that accountability, as well as the administrative overheads of managing the private contracts.
Now it seems that the ICO is call[ing] for greater transparency around government outsourcing, publishing a report and a roadmap on the subject that recommend a “transparency by design” approach in which councils should:
– Make arrangements to publish as much information as possible, including the contract and regular performance information, in open formats with a licence permitting re-use.
– When drawing up the contract, think about any types of information that the contractor will hold on their behalf eg information that a public authority would reasonably need to see to monitor performance. Describe this in an annex to the contract. This is itself potentially in scope of a FOIA request.
– Set out in the contract the responsibilities of both parties when dealing with FOIA requests. Look at standard contract terms (eg the Model Services Contract ) for guidance.
Around about the same time, a Cabinet Office policy document on Transparency of suppliers and government to the public also appeared. Whilst on the one hand “Strategic Suppliers to Government will supply data on a contract basis that is then aggregated to the departmental level and aggregated again to the Government level” means it should be easier to see how much particular companies receive from government (assuming they don’t operate as BigCo Red Ltd, BigCo Blue Ltd, BigCo Purple Ltd, using a different company for each contract) it does mean that data can presumably also be aggregated to a point of meaninglessness.
(Just by the by, I tried looking through various NHS Commissioning Group and NHS Trust spending datasets looking to see how much was going to Virgin Care and other private providers. Whilst I could see from news reports and corporate websites that those operators were providing services in particular areas, I couldn’t find any spend items associated with them. Presumably I was looking in the wrong place… but if so, it suggests that even if you do have a question about spend in a particular context with a particular provider, it doesn’t necessarily follow that even if you think you know how to drive this open transparency data stuff, you’ll get anywhere…)
When looking at affordability of contracts, and retaining private vs public contractors, it would seem only fair that any additional costs associated with the contracting body having to meet transparency requirements “on behalf of” the private body should be considered part of the cost of the contract. If private bodies complain that this gives an unfair advantage to public bodies competing for service provision, they can perhaps opt-in to FOI regulations and transparency codes and cover the costs of disclosure of information themselves to level the playing field that way?
Another by the by… in appointments land, Mike Bracken has been appointed the UK’s first Chief Data Officer (CDO), suggesting that we should talk about “data as a public asset. In this regard, the National Information Infrastructure still appears to be a thing, for the moment at least. An implementation document was published in March that has some words in it (sic…!)…
As purdah approached, there was a sudden flurry of posts on the data.gov.uk blog. Four challenges for the future of Open Data identified the following as major issues:
- Pushing Open Data where it is not fully embraced
- Achieving genuine (Open) Data by default; (this actually seems to be more about encouraging open data workflows/open data (re)use – “a general move to adopt data practice into the way public services are run”)
- Improving public confidence in Open Data
- Improving (infra)structure around Open Data
The question is – how to best address them? I think that Open Knowledge has delivered all the open data training sessions it was due to deliver under the open data voucher scheme, which means my occasional encounters with folk tasked with open data delivery from councils and government departments may have come to an end via that route; which is a shame, because I felt we never really got a chance to start building on those introductory sessions…
The Cabinet Office also made a state of the nation announcement to finish off the parliamentary session by announcing the Local authorities setting standards as Open Data Champions. A quick skim down the list seems to suggest that the champions are typically councils that have started their own datastore…
In passing, I noticed on the Local Government Association (LGA) website a validator for checking the format of documents used to publish local council spending data, as well as various other data releases (contracts, planning applications, toilet locations, land holdings etc): LGA OpenData Schema Validator.
I wonder how many councils are publishing new releases that actually validate, and how many have “back-published” historical data releases using a format that validates?! When officers publish data files, I wonder how many of them even try to download and open the data files they have just published (to check the links work, the documents open as advertised, and also appear to contain what’s expected), let alone run either the uploaded or downloaded files through the validator (it often makes sense to do both: check the file validates before you publish it, then download it and check the downloaded version, just in case the publishing process has somehow mangled the file…)
Guidance for the spending data releases can be found here: Local government open data schemas: Spend
Documentation regarding the release of procurement and spending information (v. 1.1 dated 14/12/2014) can be found here: Local transparency guidance – publishing spending and procurement information.
I’ve still no real idea how to make interesting use of this data, or how DCLG expect folk to make use of it?!;-)
Last night I saw a mention of a budget review consultation being held by the Milton Keynes Council. I’ve idly wondered before about whether spending data could be used to inform these consultations, for example by roleplaying what the effects of a cut to a particular spending area might be at a transactional level. (For what it’s worth, I’ve bundled together the Milton Keynes spending data into a single (but uncleaned) CSV file here and posted the first couple of lines of a data conversation with it here. One of the things I realised is that I still don’t know how to visualise data by financial year, so I guess I need to spend some time looking at pandas timeseries support).
Another transparency/spending data story that caught my eye over the break was news of how Keighley Town Council had been chastised for its behaviour around various transparency issues (see for example the Audit Commission Report in the public interest on Keighley Town Council). Among other things, it seems that the council had “entered into a number of transactions with family members of Councillors and employees” (which makes me think that an earlier experiment I dabbled with that tried to reconcile councillor names with: a) directors of companies in general; b) directors of companies that trade with a council may be a useful tool to work up a bit further). They had also been lax in ensuring “appropriate arrangements were in place to deal with non-routine transactions such as the recovery of overpayments made to consultants”. I’ve noted before that where a council publishes all its spending data, not just amounts over £500, including negative payments, there may be interesting things to learn (eg Negative Payments in Local Spending Data).
It seems that the Audit Commission report was conducted in response to a request from a local campaigner (Keighley investigation: How a grandmother blew whistle on town council [Yorkshire Post, 20/12/14]). As you do, I wondered whether the spending data might have sent up an useful signals about any of the affairs the auditors – and local campaigners – took issue with. The Keighley Town Council website doesn’t make it obvious where the spending data can be found – the path you need to follow is Committees, then Finance and Audit, then Schedule of payments over £500 – and even then I can’t seem to find any data for the current financial year.
The data itself is published using an old Microsoft Office .doc format:
Getting the data, such as it is, into a canonical form is complicated by the crappy document format, though it’s not hard to imagine how such a thing came to be generated (council clerk sat using an old Pentium powered desktop and Windows 95, etc etc ;-). Thanks to a tip off from Alex Dutton, unoconv can convert the docs into a more usable format (apt-get update ; apt-get install -y libreoffice ; apt-get install -y unoconv); so for example, unoconv -f html 2014_04.doc converts the specified .doc file to an HTML document. (I also had a look at getting convertit, an http serverised version of unoconv, working in a docker container, but it wouldn’t build properly for me? Hopefully a tested version will appear on dockerhub at some point…:-)
This data still requires scraping of course… but I’m bored already…
PS I’m wondering if it would be useful to skim through some of the Audit Commission’s public interest reports to fish for ideas about interesting things to look for in the spending data?
In Holding Companies to Account – Open Data Consolidation, I noted a couple of different ways in which we could use opendata to consolidate something of what we know about companies that provide services to or on behalf of public bodies, or otherwise receive monies from public services:
1) structural consolidation, in the sense of identifying companies that are part of the same corporate group;
2) financial consolidation, in the sense of identifying spend made to the same company from across different public bodies, and/or spend to different companies from the same corporate group from one or more public bodies.
In respect of the second notion, see also Open Data, Transparency, Fan-In and Fan-Out which describes how we can also start to consolidate connections and payments made between public bodies (also Public Sector Transparency – Do We Need Open Receipts Data as Well as Open Spending Data?).
I’ve previously doodled thoughts on whether there is a need for companies receiving public money to disclose those receipts (eg Spending & Receipts Transparency as a Consequence of Accepting Public Money?) – but whilst they may have no obligation to do so, the availability of open transactions data (and increasingly, open contracts data (eg The Local Government (Transparency) (Descriptions of Information) (England) Order 2014, h/t @owenboswarva) means that we can start to aggregate and publish this information, on their behalf, as part of a corporate watch activity:-)
So here’s a what if… What if there was a way we could set up “open public data reflector” sites that would aggregate data about a particular company or corporate group, aggregate it, and reflect it back? As a start, we could simply flip requirements put onto public bodies (eg publication of spend over £25,000 for large departments or services) to complementary views on the private corporate side (publication of all receipts over £25,000 from large public bodies, publication of summed receipts over £25,000 from local councils (who have a lower spending amount disclosure threshold). Of course, in the latter case, we’d need to aggregate the smaller amounts in order to calculate the sums.) By aggregating contract information, (additional) spend against contracts could also be tracked.
In this respect, I could imagine someone like SpendNetwork setting up a white label site that would allow civil society activists to fire up a ‘corporate watch’ website that reflects back open public data that refers to a particular company (something a little more sophisticated than their current raw data listings). If they made their data a little easier to access, I may be tempted to play with it…
Alongside the open public data reflector, it might be useful to have a “what do they know about you?” reflector that describes the sort of information the company holds about you that could be accessed via a Data Protection Act subject access request. (I’m not sure how we could find that out? Get several people to put requests in, extract the field names/metadata elements and publish those?) Thinks: wouldn’t it be nice if there was a request that could be made of data-controllers that forced them to disclose the fields and descriptive metadata for any data that would inspect when putting together a subject access request response?!;-) A “meta” subject access request, in other words?
PS Examples of outputs relating to aggregated spend with particular companies:
- Centre for Entrepreneurs/Spend Network report: Spend Small – Local Authority Spend Index.
If you know of any more, please let me have a link via the comments.
According to Wikipedia:
In business, consolidation or amalgamation is the merger and acquisition of many smaller companies into much larger ones. In the context of financial accounting, consolidation refers to the aggregation of financial statements of a group company as consolidated financial statements.
I’ve been pondering the use of open data for holding companies to account again (see also here and here, for example) and a couple of ways forward seem to be crystallising out for me, at least in the way(s) I’ve been hacking some data sketches around. These ways loosely map on to the two senses of consolidation described above, I think?
In the first case, using open data sources to map out corporate groupings, or look at how companies start to consolidate into corporate groupings. The OpenCorporates folk are looking at doing this properly – based on share ownership of one company by another – but I’m looking for other signals and sources of data that allow us to associate company names within a wider corporate sprawl. For example, CQC data lists all the locations inspected by that body, along with the group or brand name (if any) under which a particular location operates. We can then use this information to identify all the locations associated with a particular brand or group.
Whilst doing this in the context of sponsoring organisations for school academies, it struck me that once several independent locations have been established or aggregated together as part of a group, if those groups are driven by “growth” strategies, we will presumably start to see merger and acquisition behaviours? [See also other possible courses of action that larger groupings may take: School Chain Locks Out Public Service Values?.] By using open data sources, we may be able to track the first – and then possibly second – phase of this sort of consolidating activity?
In the second case, part of the rationale for identifying corporate groupings is so that we can start to consolidate information about payments made to, and quality or evaluation reports relating to, the members of a particular group. That is, we can start to think about a form of consolidated accounting. For example, we can start to total up all the payments made by the public sector (across both national and local government) to a particular corporate grouping, possibly across several spending areas; or we can look at the quality reports relating to different contracts raised by a particular corporate group as a whole and make a judgement about the service levels delivered by that operator in general. This consolidated quality and/or financial reporting also provides us with a way of looking at the gross behaviour of a company grouping, and comparing it, in accountability terms, with national public services, for example.
I’ve long since been confused about what open data may or may not be good for in accountability or transparency terms, but now I feel as if it’s starting to make sense to me: as a way of shining some light onto the behaviour of private companies operating in the public sector, and also as a way of demonstrating just how much public money is sunk into some of them compared to finding made available to public bodies, for example.
If we could also get tax positions of companies and corporate groups more clearly illuminated as accessible data sources, along with information about their employment and payment practices (so we could, for example, run models on the extent to which the state is also likely to subsidise these companies’ operations through tax, housing and welfare benefits/payments made to their employees compared to those made to public sector employees), we could start to get a better idea about the way public money is actually being spent system wide?
I’ve been in a ranty mood all day today, so to finish it off, here are some thoughts about how we can start to use #opendata to hold companies to account. The trigger was finding a dataset released by the Care Quality COmmission (CQC) listing the locations of premises registered with the CQC, and the operating companies of those locations (early observations on that data here).
The information is useful because it provides a way of generating aggregated lists of companies that are part of the same corporate group (for example, locations operated by Virgin Care companies, or companies operated by Care UK). When we have these aggregation lists, it means we can start to run the numbers across all the companies in a corporate group, and get some data back about how the companies that are part of a group are operating in general. The aggregated lists thus provide a basis for looking at the gross behaviour of a particular company. We can then start to run league tables against these companies (folk love league tables, right? At least, they do when it comes to public sector bashing). So we can start to see how the corporate groupings compare against each other, and perhaps also against public providers. Of course, there is a chance that the private groups will be shown to be performing better than public sector bodies, but that could be a useful basis for a productive conversation about why…
So what sorts of aggregate lists can we start to construct? The CQC data allows us to get lists of locations associated with various sorts of care delivery (care home, GP services, dentistry, more specialist services) and identify locations that are part of the same corporate group. For example, I notice that filtering the CQC data to care homes, the following are significant operators (the number relates to the number of locations they operate):
Voyage 1 Limited 273 HC-One Limited 169 Barchester Healthcare Homes Limited 168
When it comes to “brands”, we have the following multiple operators:
BRAND Four Seasons Group 346 BRAND Voyage 279 BRAND BUPA Group 246 BRAND Priory Group 183 BRAND HC-One Limited 169 BRAND Barchester Healthcare 168 BRAND Care UK 130 BRAND Caretech Community Services 118
For these operators, we could start to scrape their most recent CQC reports and build up a picture of how well the group as a whole is operating. In the same way that “armchair auditors” (whatever they are?!) are supposed to be able to hold local councils to account, perhaps they can do the same for companies, and give the directors a helping hand… (I would love to see open data activists buying a share and going along to a company shareholder meeting to give some opendata powered grief ;-)
Other public quality data sites provide us with hints at ways of generating additional aggregations. For example, from the Food Standards Agency, we can search on ‘McDonalds’ as a restaurant to bootstrap a search into premises operated by that company (although we’d probably also need to add in searches across takeaways, and perhaps also look for things like ‘McDonalds Ltd” to catch more of them?).
Note – the CQC data provides a possible steer here for how other data sets might be usefully extended in terms of the data they make available. For example, having a field for “operating company” or “brand” would make for more effective searches across branded or operated food establishments. Having company number (for limited companies and LLPs etc) provided would also be useful for disambiguation purposes.
Hmm, I wonder – would it make sense to start to identify the information that makes registers useful, and that we should start to keep tabs on? We could then perhaps start lobbying for companies to provide that data, and check that such data is being and continues to be collected? It may not be a register of beneficial ownership, but it would provide handy cribs for trying to establish what companies are part of a corporate grouping…
(By the by, picking up on Owen Boswarva’s post The UK National Information Infrastructure: It’s time for the private sector to release some open data too, these registers provide a proxy for the companies releasing certain sorts of data. For example, we can search for ‘Tesco’ as a supermarket on the FSA site. Of course, if companies were also obliged to publish information about their outlets as open data – something you could argue that as a public company they should be required to do, trading their limited liability for open information about where they might exert that right – we could start to run cross-checks (which is the sort of thing real auditors do, right?) and publish complete records of publicly account performance in terms of regulated quality inspections.)
The CQC and Food Standards Agency both operate quality inspection registers, so what other registers might we go to to build up a picture of how companies – particularly large corporate groupings – behave?
The Environment Agency publish several registers, including one detailing enforcement actions, which might be interesting to track, though I’m not sure how the data is licensed? The HSE (Health & Safety Executive) publish various notices by industry sector and subsector, but again, I’m not too clear on the licensing? The Chief Fire Officers Association (CFOA) publish a couple of enforcement registers which look as if they cover some of the same categories as the CQC data – though how easy it would be to reconcile the two registers, I don’t know (and again, I don’t know how the license is actually registered). One thing to bear in mind is that where registers contain personally identifiable information, any aggregations we build that incorporates such data (if we are licensed to build such things) means (I think) that we become data controllers for the purposes of the Data Protection Act (we are not the maintainers and publishers of the public register so we don’t benefit from the exemptions associated with that role).
Looking at the above, I’m starting to think it could be a really interesting exercise to pick some of the care home provider groups and have a go at aggregating any applicable quality scores and enforcement notices from the CQC, FSA, HSE and CFOA (and even the EA if any of their notices apply! Hmm… does any HSCIC data cover care homes at all too?) Coupled with this, a trawl of directors data to see how the separate companies in a group connect by virtue of directors (and what other companies may be indicated by common directors in a group?).
Other areas perhaps worth exploring – farms incorporated into agricultural groups? (Where would be find that data? One register that could be used to partially hold those locations to account may be the public register of pesticide enforcement notices as well as other EA notices?)
As well as registers and are there any other sources of information about companies we can add in to the mix? There’s lots: for limited companies we can pull down company registration details and lists of directors (and perhaps struck off directors) and some accounting information. Data about charities should be available from the Charities Commission. The HSCIC produces care quality indicators for a range of health providers, as well as prescribing data for individual GP practices. Data is also available about some of the medical trials that particular practices are involved in.
At a local council level, local councils maintain and publish a wide variety of registers, including registers of gaming machine licenses, licensed premises and so on. Where the premises are an outlet of a parent corporate group, we may be able to pick up the name of the parent group as the licensee. (Via @OwenBoswarva, it seems the Gambling Commission has a central list of operating license holders and licensed premises.)
Having identified influential corporate players, we might then look to see whether those same bodies are represented on lobbiest groups, such as the EU register of commission expert groups, or as benefactors of UK Parliamentary All Party groups, or as parties to meetings with Ministers etc.
We can also look across all those companies to see how much money the corporate groups are sinking from the public sector, by inspecting who payments are made to in the masses of transparency spending data that councils, government departments, and services such as the NHS publish. (For an example of this, see Spend Small Local Authority Spending Index; unfortunately, the bulk data you need to run this sort of analysis yourself is not openly available – you need to aggregate and clean it yourself.)
Once we start to get data that lists companies that are part of a group, we can start to aggregate open public data about all the companies in the group and look for patterns of behaviour within the groups, as well as across them. Lapses in one part of the group might suggest a weakness in high level management (useful for the financial analysts?), or act as a red flag for inspection and quality regimes.
Hmmm… methinks it’s time to start putting some of this open data to work; but put it to work by focussing on companies, rather than public bodies…
I think I also need to do a little bit of digging around how public registers are licensed? Should they all be licensed OGL by default? And what guidance, if any, is there around how we can make use of such data and not breach the Data Protection Act?
PS via @RDBinns, What do they know about me? Open data on how organisations use personal data, describing some of the things we can find from the data protection notifications published by the ICO [ICO data controller register].
– the public paid for it so public has a right to it: the public presumably paid for it through their taxes. Companies that use open public data that don’t fully and fairly participate in the tax regime of the country that produced the data then they didn’t pay their fair share for access to it.
– data quality will improve: with open license conditions that allow users to take open (public) data and do what they want with it without the requirement to make derived data available in a bulk form under an open data license, how does the closed bit of the feedback loop work? I’ve looked at a lot of open public data releases on council and government websites and seen some companies making use of that data in presumably a cleaned form (if it hasn’t been cleaned, then they’re working with a lot of noise…) But if they have cleaned and normalised the data, have they provided this back ion an open form to the public body that gifted them access to it? Is there an open data quality improvement cycle working there? Erm… no… I suspect if anything, the open data users would try to sell the improved quality data back to the publisher. This may be their sole business model, or it may be a spin-off as a result of using the (cleaned and normalised) data fro some other commercial purpose.