When Open Public Data Isn’t…?
This year was always going to be an exciting year for open data. The launch of data.gov.uk towards the end of last year, along with commitments from both sides of the political divide before the election that are continuing to be acted upon now means data is starting to be opened up -scruffily at first, but that’s okay – and commercial enterprises are maybe starting to get interested too…
…which was always the plan…
…but how is it starting to play out?
The story so far…
A couple of weeks ago, the first meeting of the Public Data Transparency Board was convened, which discussed – and opened up for further discussion, a set of draft public data principles. (Papers relating to the meeting can be found here.)
In a letter to the responsible Minister prior to the meeting (commentable extracts here), Professor Nigel Shadbolt suggested that:
4. … The economic analysis, and the views we regularly hear from the business community themselves, are unequivocal: data must be released for free re-use so that the private sector can add new value and develop innovative new business services from government information. …
8. Transparency principles need to be extended to those who operate public services on a franchised, regulated or subsidised basis. If the state is controlling a service to the public or is franchising or regulating its delivery the data about that activity should be treated as public data and made available. …
11. We need to support the development of licences and supporting policies to ensure that data released by all public bodies can be freely re-used and is interoperable with the internationally recognised Creative Commons model. …
12. A key Government objective is to realise significant economic benefits by enabling businesses and non-profit organisations to build innovative applications and websites using public data. …
The business imperative is further reinforced by the second of three reasons given by the Open Government Data tracking project in Why Open Government Data?:
Releasing social and commercial value. In a digital age, data is a key resource for social and commercial activities. Everything from finding your local post office to building a search engine requires access to data much of which is created or held by government. By opening up data, government can help drive the creation of innovative business and services that deliver social and commercial value.
So how has business been getting involved? As several local councils start to pick up a request contained in a letter from the Prime Minister published at the end of May that they open up their financial data, Chris Taggart/@countculture, developer of OpenlyLocal posted a piece on The open spending data that isn’t… this is not good in which he described how apparently privileged access to financial data from several councils was being used to drive Spikes Cavell’s SpotlightOnSpend website (for a related open equivalent, see Adrian Short’s Armchair Auditor). Downstream use of the data was hampered by a “personal use only” license, and a CAPTCHA that requires a human in the loop in order to access the data. The Public Sector Transparency Board promptly responded to Chris’ post (Work on Local Spending Data), quoting the principle that:
“Public data will be released under the same open licence which enables free reuse, including commercial reuse – all data should be under the same easy to understand licence. Data released under the Freedom of Information Act or the new Right to Data should be automatically released under that licence.”
and further commenting: “We have already reminded those involved of this principle and the existing availability of the ‘data.gov.uk’ licence which meets its criteria, and we understand that urgent measures are already taking place to rectify the problems identified by Chris.”
Spikes Cavell chief executive Luke Spikes responded via an interview with Information Age, (SpotlightOnSpend reacts to open criticism):
[SpotlightOnSpend] is first and foremost a spend analysis software and consultancy supplier, and that it publishes data through SpotlightOnSpend as a free, optional and supplementary service for its local government customers. The hope is that this might help the company to win business, he explains, but it is not a money-spinner in itself.
“The contribution we’re making to transparency is less about what the purists would like to see, it’s simply putting the information out there in a form that is useful for the audience for which it is intended [i.e. citizens and small businesses]” he said. “But there are a few things we haven’t done right, and we’ll fix that.”
Following the criticism, Cavell says that SpotlightOnSpend will make the data available for download in its raw form. “That’s what we thought was the most sensible solution to overcoming this obstacle,” he told me.
Adrian Short, developer of the open Armchair Auditor, then picked up the baton in a comment to the Information Age article:
There is room for Spikes Cavell to develop their applications and I doubt that anyone has any objection to them offering their services to councils commercially just like thousands of other businesses. But they do not have a monopoly of ideas, talent and resources to build great applications with public spending data. Nor does anyone else.
The concerns that @CountCulture raised were not that Spikes Cavell were trading with councils or trying to attract their business but that they are doing so in a way that precludes anyone else developing applications with this data. By legally and technically locking the data into the Spotlight on Spend platform, everyone else is excluded.
It’s understandable that most councils have no understanding of the culture, legalities or technicalities of open data. This is new territory for nearly all of them. Those councils that have put their data straight onto Spotlight on Spend, bypassing the part where it is made genuinely open — cannot be criticised for not complying with what to them must be a very unusual requirement. But that’s why @CountCulture and I and others want to be very clear that the end result of this process is having effective scrutiny of council finances through multiple websites and applications, not just Spotlight on Spend or any other single website or application. The way we get there is with open data.
And Chris Taggart’s response? (Update on the local spending data scandal… the empire strikes back):
Lest we forget, Spikes Cavell is not an agent for change here, not part of those pushing to open public data, but in fact has a business model which seems to be predicated on data being closed, and the maintenance of the existing procurement model which has served us so badly.
(For recommendations on how councils might publish financial data in an open way, see: Publishing itemised local authority expenditure – advice for comment (reiterated here: Open Government Data: Finances. The Office for National Statistics occasionally releases summary statistics (e.g. as republished in Openlocal: Local spending data in OpenlyLocal, and some thoughts on standards) but at only a coarse resolution. As to how much it might cost to do that, some are claiming Cost of publishing ‘£500 and above’ council expenditure prohibitive.)
From my own perspective, I would also add that should consultants like Spikes Cavell create derived data works from open public data, there should be some transparency in declaring how the derived work was created (see for example: So Where Do the Numbers in Government Reports Come From? and Data is not Binary).
Another example of how once open data is becoming “closed” behind a paywall comes from Paul Geraghty (“Closed Data Now” SOCITM does a “Times”):
If my memory serves me well the e-Gov Register (eGR) hosted by Brent has been showing every IT supplier sortable by product type, supplier, local government type and even on maps for about 6 or 7 years (some links below if you hurry up).
I am aware that there are problems with this data, in my own past employer I know that some of the data is out of date.
But it is there, it is useful and informative and it is OPEN to all, even SMEs like me researching on niche markets in local government.
The latest move by SOCITM (and presumably with the knowledge of the LGA and the IDeA) means all that data is going to go behind the SOCITM paywall.
And the response from Socitm, via a comment from Vicky Sargent:
First of all, I’d like to clear up some points of fact. No local authority or other public sector service provider that provides data to the Applications Register will have to pay for their subscription and for them, access to the data will be free, regardless of whether they subscribe to Socitm Insight (as 95% of local authorities do). Anyone who is employed in an organisation that is an Applications Register subscriber – f-o-c or paid, will be able to access the data.
Then there is who pays. Clearly an information service like this that adds value, has to cover the costs of development and delivery. Unlike government departments, LGA, IDeA and local councils, Socitm is not directly funded by the taxpayer, and needs to fund the services it delivers from money raised from fees, subscriptions, events and other services.
The business model we use for the Applications Register is that public bodies that contribute should not pay to use the service, but those that do not contribute pay in cash. Private sector bodies can only pay in cash.
Your article also suggests that Socitm’s support for the move towards open data is hypocritical, set against our business model for the Applications Register. I think this misunderstands the thinking behind ‘open data’, which is to get raw data out of government systems for transparency purposes, also so that it can be re-used. Socitm has been a long-term strong supporter of this.
The open data agenda explicitly acknowledges that ‘re-use’ includes adding value and selling on. If councils were to routinely publish the sort of data we will collect for the Applications Register, there would still be work to be done aggregating and manipulating and re-publishing the information to make it useful, and that is what we do, recovering our costs in the way described.
Adrian Short (can you see how it’s the same few players engaging in these debates?!;-) develops the “keep it free” argument in a further comment:
Your argument presupposes your conclusion, which is that Socitm is the best organisation to be managing/publishing the applications register. Because, as you correctly say, you don’t receive any direct funding from the taxpayer, you have to find other ways of paying for that work. Inevitably this means charging non-contributing users.
What you’re missing is that millions of pounds of public money is spent every year supporting businesses, helping to create new markets and generally oiling the parts of the economy that don’t easily oil themselves. That’s what BIS and the economic development departments of local authorities do. The public interest and private benefit aren’t easily distinguishable unless you contrive that private benefit for a small group to the exclusion of others. But as Paul rightly points out, the potential market for this information is enormous — essentially every business and individual that works for, supplies or wants to work for the public sector, from the individual IT worker to the massive global consultancies, manufacturers and software firms.
Currently it’s a small number of incumbent suppliers that benefit from this relatively inefficient market. Other businesses lose. Public sector buyers lose. The taxpayer loses.
Keeping this information free for everyone to use and enabling it to be used in future when combined with the enormous amount of data that will be released soon will be likely to produce economic benefits to the public through market efficiencies that outstrip its cost by several orders of magnitude. If Socitm can’t publish this data in the most useful, non-discriminatory way then it’s not the best organisation for the job. I can see no reason in principle or practice why it shouldn’t be fully funded by the taxpayer and free at the point of use for everyone. To do otherwise would be an extremely false economy.
(Note that “free vs. open” debates have also been played out in the open source software arena. Maybe it’s worth revisiting them…?)
The previously quouted comment from Vicky Sargent also contains what might be described as an example case study:
This brings me to Better Connected, the annual survey of council websites carried out by Socitm. You say:
Just about every council in the UK has little option but to pay SOCITM hundreds of pounds annually to join their club to find out the exact details of how their website is being ranked.The data for Better connected only exists because Socitm has devised a methodology for evaluating websites, pays for a team of reviewers collect the data each year, and then analyses and publishes the results. No one has to subscribe, they choose to do so because the information is valuable to them.
Information about how we do the evaluation and ranking is freely available on our website, in our press releases and in our free-to-join Website Usage and Improvement community. The 2010 headline results for all councils are published on socitm.net as open data under a creative commons licence and are linked from data.gov.uk.
If the Better connected report has become a ‘must read’, that is because the investment Socitm has made in the product has led to it being a more cost-effective investment for councils than alternative sources of advice on improving their website. Many users have told us Better connected (cover price £415 for non-subscribers or free as part of the Socitm Insight subscription that starts at £670 pa for a small district council) is worth many days’ consultancy, even when that consultancy is purchased from lower cost SME providers.
As these examples show, the license under which data is originally released can have significant consequences on its downstream use and commercialisation. The open source software community has know this for years, of course, which is why organisations like GNU have two different licenses – GPL, which keeps software open by tainting other software that includes GPL libraries, and LGPL, which allows libraries to be used in closed/proprietary code. There is a good argument that by combining data from different open sources in a particular way valuable results may be created, but it should also be recognised that work may be expended doing this and a financial return may need to be generated (so maybe companies shouldn’t have to open up their aggregated datasets?) Just how we balance commercial exploitation with ongoing openness and access to raw public data is yet to be seen.
(The academic research area – which also has it’s own open data movement (e.g. Panton Principles) – also suggests a different sort of tension arising from the “potential value” of a data set or aggregated data set. For example, research groups analysing data in one particular way may be loathe to release to others because they want to analyse it in another, value creating way at a later date.)
Getting the licensing right is particularly important if councils become obliged to use third party services to publish their data. For example, the grand vision of the Public Sector Transparency Board identified in this paragraph in Shadbolt’s letter to Maude states:
13. We must promote and support the development and application of open, linked data standards for public data, including the development of appropriate skills in the public services. …
But as a recent report, again from Chris Taggart, on Publishing Local Open Data – Important Lessons from the Open Election Data project suggests, there are certain challenges associated with web related development in local authorities, and in particular a significant lack of experience and expertise in dealing with Linked Data (which is not surprising – it is a relatively new, and so far arcane) technology. Here are the first four lessons, for example:
- There is a lack of ‘corporate’ awareness/understanding of open data issues, and this will inhibit take up of open, linked data publishing unless it is addressed
- There is a lack of even basic web skills at some councils
- Many councils lack web publishing resources, never mind the resources to implement open, linked data publishing
- The understanding of even the basics of linked data and the steps to publishing public data in this way is very, very limited
What this suggests to me is that it is likely that in the short term at least, the capability for publishing Linked Data will reside in specialist third party companies, possibly one of only a few companies. As Paul Geraghty discovers from the eGovernment Register in If #localgovweb supplier says “RDF WTF?” Sack em #opendata #spending:
[I]t seems to me that of 450 or so local government organisations, 357 are listed as having a “Financials” supplier **.
There are only 18 suppliers listed, and of those there are 6 Big Ones.
Between them the 6 Big Ones supply “Financials” to 326 Councils.
Don’t you think that the first one of those 6 Big Ones who natively supports LOD [Linked Open Data] as an export option (or agrees to within, say, 8 months) really ought to be favoured when bidding for new business?
Lets go further, lets say that it should be mandated that all new contracts with “Financials” suppliers include an LOD clause.
Perhaps Mr Pickles could dispatch someone to have a chat with one or two of these suppliers, or that he should have someone check that future contracts for Financial products being sold to Local Government all contain the necessary wording to make this happen?
So instead of trying to train and cajole 450 councils to FTP assorted CSV files into localdata.gov.uk (FFS) all the way through to grokking RDF, namespaces, LOD et al – why does the government not get on and make a strategy to bully and coerce 6 suppliers instead – and potentially get 326 councils teed up to produce useful LOD a bit sharpish?
Another technology option is for councils to publish their own linked data to a commercially hosted datastore. At the moment, the two companies I know of that offer “datastore” services for publishing Linked Data, at scale, are Talis, and the Stationery Office (in partnership with Garlik). It is, of course, open knowledge that one Professor Nigel Shadbolt is a director of Garlik Limited.