OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Policy’ Category

Public Accounts Committee – Armchair Auditors, Dead as a Parrot…

leave a comment »

Seems like the Commons Committee of Public Accounts have just produced their report on “Local government funding: assurance to Parliament” [report PDF] which reads to me like they have no idea what’s going on?!

On the question of how open spending data might have a role to play, the conclusions and recommendations (point 6) of the report were as follows:

The quality and accessibility of information to enable residents and councillors to scrutinise local authorities’ decisions varies. If the local accountability system is to work effectively it is fundamentally important that residents can hold local authorities to account for their decisions. It is therefore vital that residents can access relevant and understandable financial and performance information. The Department’s Local Government Transparency Code requires local authorities to publish data, including details of all expenditure over £500, information on senior salaries and details on local authorities’ land holdings and building assets. However, often this data is presented in a way which makes easy and effective scrutiny by the public very difficult. We are also concerned that the public might be less engaged with decisions on services that are significant in terms of expenditure, but do not affect them directly, such as adult care and children’s services. The Department expects that greater transparency of information will empower “armchair auditors” to hold local authorities to account, but there is no evidence that this has actually happened.

Recommendation: The Department should ensure that local authorities conform to the new mandatory Transparency Code on the publication of data, and work with local authorities to improve performance where shortcomings are identified.

Recommendation: The Department should assess whether the data published under the Transparency Code helps residents to scrutinise the performance of local authorities, and if alternative data would be of more value.

Para 18 of the report describes their consideration of this point:

We considered the new system of accountability when it was set up in 2011. At that time, the Department expected data transparency to empower ‘armchair auditors’ to hold local authorities to account for their decisions. We asked the Department whether it had any evidence that these armchair auditors had actually emerged. The Department said it thought that they had, but acknowledged that it had not actually measured whether this was the case. The Department acknowledged that residents play less of a role in challenging decisions on certain services, such as for vulnerable adults and children. For these services, the Department places greater reliance on the role of inspectors, such as the Care Quality Commission, for its assurance.

It seems that where services are delivered “in partnership”, there may also be a few “issues” (para 19):

Government departments are increasingly funding local services through partnership arrangements which are not subject to the same safeguards of local accountability and transparency as local authorities. The accountability structures for this type of arrangement are unclear. Even where the local accountability system is working effectively, it will not be able to provide departments with assurance over funding they grant to partnerships.

Some partnerships, like Local Enterprise Partnerships (LEPs), operate across local authority boundaries. The NAO’s report found that lines of accountability between local authorities and their electorate could be blurred in LEPs, where one local authority makes decisions on behalf of another. The Department told us the benefit of these bodies is that they can tackle issues that go beyond the boundaries of one local authority area and create incentives for the key economic stakeholders to work together. The Department told us it would produce a separate accountability system statement for LEPs and local growth deals.

The quality of evidence provided about “armchair auditors” from Sir Bob Kerslake, (as Permanent Secretary of DCLG?) also reads like something out of a Monty Python sketch…

Q55 Chair: Austin’s got a question, but first I want to pick up on some issues that have not been covered. When the new system of accountability was introduced, we talked about an army of armchair auditors. Have they emerged?

Sir Bob Kerslake: I think they have at a local level.

Q56 Chair: Have they? ["It's dead"]

Sir Bob Kerslake: I am sure that if you spoke to local councillors, theywould talk to you about the extent to which local residents challenge what they do. ["It's not dead, it's resting..."] We heard about one local resident, Mr Jackson

Q57 Chair: He’s the MP; you would expect him to challenge. ["I took the liberty of examining that parrot..."]]

Sir Bob Kerslake: Well, he’s one of the armchair auditors, I guess, ["of course it was nailed there..."] and I suspect that there are many more at a local level, but it will vary across the country.

Q58 Chair: Have you got any evidence of that? It is one of those hope and pray things, so it would be nice to hear whether it has happened in reality.

Sir Bob Kerslake: We haven’t measured it, but we can, for example, measure the number of people who have put forward requests to take over community assets and things like that

FWIW, I would love to see *any* member of @CommonsPAC show us what an “armchair auditor” could do with a clean and complete local spending dataset, let alone one that’s actually been released! Or perhaps Sir Bob would like to a demonstration…? ;-)

Written by Tony Hirst

September 12, 2014 at 5:20 pm

Posted in Policy

Using Open Public Data to Hold Companies to Account

leave a comment »

I’ve been in a ranty mood all day today, so to finish it off, here are some thoughts about how we can start to use #opendata to hold companies to account. The trigger was finding a dataset released by the Care Quality COmmission (CQC) listing the locations of premises registered with the CQC, and the operating companies of those locations (early observations on that data here).

The information is useful because it provides a way of generating aggregated lists of companies that are part of the same corporate group (for example, locations operated by Virgin Care companies, or companies operated by Care UK). When we have these aggregation lists, it means we can start to run the numbers across all the companies in a corporate group, and get some data back about how the companies that are part of a group are operating in general. The aggregated lists thus provide a basis for looking at the gross behaviour of a particular company. We can then start to run league tables against these companies (folk love league tables, right? At least, they do when it comes to public sector bashing). So we can start to see how the corporate groupings compare against each other, and perhaps also against public providers. Of course, there is a chance that the private groups will be shown to be performing better than public sector bodies, but that could be a useful basis for a productive conversation about why…

So what sorts of aggregate lists can we start to construct? The CQC data allows us to get lists of locations associated with various sorts of care delivery (care home, GP services, dentistry, more specialist services) and identify locations that are part of the same corporate group. For example, I notice that filtering the CQC data to care homes, the following are significant operators (the number relates to the number of locations they operate):

Voyage 1 Limited                          273
HC-One Limited                            169
Barchester Healthcare Homes Limited       168

When it comes to “brands”, we have the following multiple operators:

BRAND Four Seasons Group                   346
BRAND Voyage                               279
BRAND BUPA Group                           246
BRAND Priory Group                         183
BRAND HC-One Limited                       169
BRAND Barchester Healthcare                168
BRAND Care UK                              130
BRAND Caretech Community Services          118

For these operators, we could start to scrape their most recent CQC reports and build up a picture of how well the group as a whole is operating. In the same way that “armchair auditors” (whatever they are?!) are supposed to be able to hold local councils to account, perhaps they can do the same for companies, and give the directors a helping hand… (I would love to see open data activists buying a share and going along to a company shareholder meeting to give some opendata powered grief ;-)

Other public quality data sites provide us with hints at ways of generating additional aggregations. For example, from the Food Standards Agency, we can search on ‘McDonalds’ as a restaurant to bootstrap a search into premises operated by that company (although we’d probably also need to add in searches across takeaways, and perhaps also look for things like ‘McDonalds Ltd” to catch more of them?).

Note – the CQC data provides a possible steer here for how other data sets might be usefully extended in terms of the data they make available. For example, having a field for “operating company” or “brand” would make for more effective searches across branded or operated food establishments. Having company number (for limited companies and LLPs etc) provided would also be useful for disambiguation purposes.

Hmm, I wonder – would it make sense to start to identify the information that makes registers useful, and that we should start to keep tabs on? We could then perhaps start lobbying for companies to provide that data, and check that such data is being and continues to be collected? It may not be a register of beneficial ownership, but it would provide handy cribs for trying to establish what companies are part of a corporate grouping…

(By the by, picking up on Owen Boswarva’s post The UK National Information Infrastructure: It’s time for the private sector to release some open data too, these registers provide a proxy for the companies releasing certain sorts of data. For example, we can search for ‘Tesco’ as a supermarket on the FSA site. Of course, if companies were also obliged to publish information about their outlets as open data – something you could argue that as a public company they should be required to do, trading their limited liability for open information about where they might exert that right – we could start to run cross-checks (which is the sort of thing real auditors do, right?) and publish complete records of publicly account performance in terms of regulated quality inspections.)

The CQC and Food Standards Agency both operate quality inspection registers, so what other registers might we go to to build up a picture of how companies – particularly large corporate groupings – behave?

The Environment Agency publish several registers, including one detailing enforcement actions, which might be interesting to track, though I’m not sure how the data is licensed? The HSE (Health & Safety Executive) publish various notices by industry sector and subsector, but again, I’m not too clear on the licensing? The Chief Fire Officers Association (CFOA) publish a couple of enforcement registers which look as if they cover some of the same categories as the CQC data – though how easy it would be to reconcile the two registers, I don’t know (and again, I don’t know how the license is actually registered). One thing to bear in mind is that where registers contain personally identifiable information, any aggregations we build that incorporates such data (if we are licensed to build such things) means (I think) that we become data controllers for the purposes of the Data Protection Act (we are not the maintainers and publishers of the public register so we don’t benefit from the exemptions associated with that role).

Looking at the above, I’m starting to think it could be a really interesting exercise to pick some of the care home provider groups and have a go at aggregating any applicable quality scores and enforcement notices from the CQC, FSA, HSE and CFOA (and even the EA if any of their notices apply! Hmm… does any HSCIC data cover care homes at all too?) Coupled with this, a trawl of directors data to see how the separate companies in a group connect by virtue of directors (and what other companies may be indicated by common directors in a group?).

Other areas perhaps worth exploring – farms incorporated into agricultural groups? (Where would be find that data? One register that could be used to partially hold those locations to account may be the public register of pesticide enforcement notices as well as other EA notices?)

As well as registers and are there any other sources of information about companies we can add in to the mix? There’s lots: for limited companies we can pull down company registration details and lists of directors (and perhaps struck off directors) and some accounting information. Data about charities should be available from the Charities Commission. The HSCIC produces care quality indicators for a range of health providers, as well as prescribing data for individual GP practices. Data is also available about some of the medical trials that particular practices are involved in.

At a local council level, local councils maintain and publish a wide variety of registers, including registers of gaming machine licenses, licensed premises and so on. Where the premises are an outlet of a parent corporate group, we may be able to pick up the name of the parent group as the licensee. (Via @OwenBoswarva, it seems the Gambling Commission has a central list of operating license holders and licensed premises.)

Having identified influential corporate players, we might then look to see whether those same bodies are represented on lobbiest groups, such as the EU register of commission expert groups, or as benefactors of UK Parliamentary All Party groups, or as parties to meetings with Ministers etc.

We can also look across all those companies to see how much money the corporate groups are sinking from the public sector, by inspecting who payments are made to in the masses of transparency spending data that councils, government departments, and services such as the NHS publish. (For an example of this, see Spend Small Local Authority Spending Index; unfortunately, the bulk data you need to run this sort of analysis yourself is not openly available – you need to aggregate and clean it yourself.)

Once we start to get data that lists companies that are part of a group, we can start to aggregate open public data about all the companies in the group and look for patterns of behaviour within the groups, as well as across them. Lapses in one part of the group might suggest a weakness in high level management (useful for the financial analysts?), or act as a red flag for inspection and quality regimes.

Hmmm… methinks it’s time to start putting some of this open data to work; but put it to work by focussing on companies, rather than public bodies…

I think I also need to do a little bit of digging around how public registers are licensed? Should they all be licensed OGL by default? And what guidance, if any, is there around how we can make use of such data and not breach the Data Protection Act?

PS via @RDBinns, What do they know about me? Open data on how organisations use personal data, describing some of the things we can find from the data protection notifications published by the ICO [ICO data controller register].

Written by Tony Hirst

September 9, 2014 at 9:39 pm

Confused Fragments About Open Data Economics…

with 4 comments

Some fragments…

- the public paid for it so public has a right to it: the public presumably paid for it through their taxes. Companies that use open public data that don’t fully and fairly participate in the tax regime of the country that produced the data then they didn’t pay their fair share for access to it.

- data quality will improve: with open license conditions that allow users to take open (public) data and do what they want with it without the requirement to make derived data available in a bulk form under an open data license, how does the closed bit of the feedback loop work? I’ve looked at a lot of open public data releases on council and government websites and seen some companies making use of that data in presumably a cleaned form (if it hasn’t been cleaned, then they’re working with a lot of noise…) But if they have cleaned and normalised the data, have they provided this back ion an open form to the public body that gifted them access to it? Is there an open data quality improvement cycle working there? Erm… no… I suspect if anything, the open data users would try to sell the improved quality data back to the publisher. This may be their sole business model, or it may be a spin-off as a result of using the (cleaned and normalised) data fro some other commercial purpose.

Written by Tony Hirst

September 9, 2014 at 1:32 pm

Posted in oh_ffs, Open Data, Policy

Tagged with

Corporate Groupings in Care Provision – Finding the Data for GP Practices, Prequel…

leave a comment »

For some time I’ve been pondering the best way of trying to map the growth in the corporate GP care provision – the number of GP practices owned by Virgin Care, Care UK and so on. Listings about GP practices from the various HSCIC datasets don’t appear to identify corporate owners, so the stop gap solution I’d identified was to scrape lists of practices from the various corporate websites and then try to reconcile them against GP practice codes from the HSCIC as some sort of check.

However, today I stumbled across a dataset released by the Care Quality Commission (CQC) that provides a “complete directory of places where CQC regulated care is provided in England” [CQC information and data]. Two data files are provided – a simple register of locations, and “a second file … which contains details of registered managers and care home bed numbers. It also allows you to easily filter by the regulated activities, service types or service user bands.”

Both files contain fields that allow you to identify GP practices, but the second one also provides information about the actual provider (parent company owner) and any brand name associated with the service. Useful…:-)

What this means is it should be easy enough to pull the data into a report that identifies the practices associated with a particular brand or corporate group… (I’ll have a go at that as soon as I get a chance…)

Another thing that could be useful to do would be to match (that is, link) the location identifiers used by the CQC with the practice codes used by the HSCIC. [First attempt here.... Looks like work needs to be done...:-(] Then we could easily start to aggregate and analyse quality stats, referring and prescribing behaviour data, and so on, for the different corporate groupings and look to see if we can spot any meaningful differences between them (for example, signals that there might be corporate group level policies or behaviours being applied). We could probably also start to link in drug trial data, at least for trials that are registered, and that we can associate with a particular practice (eg Sketching Sponsor Partners Running UK Clinical Trials).

Finally, it’d possibly also be useful to reconcile companies against company registrations on Companies House, and perhaps charity registrations with the Charities Commission (cf. this quick data conversation with the 360 Giving Grant Navigator data).

PS more possible linkage:
– company names to company IDs on OpenCorporates (and from that we can look for additional linkage around registered company addresses, common directors etc)
– payments from local gov and NHS to the companies (from open spending data/transactions data)
– food hygiene inspection ratings (eg for care homes)

Written by Tony Hirst

September 9, 2014 at 12:12 pm

Posted in Open Data, Policy

More OpenData Published – So What?

with 8 comments

Whenever a new open data dataset is released, the #opendata wires hum a little more. More open data is a Good Thing, right? Why? Haven’t we got enough already?

In a blog post a few weeks ago, Alan Levine, aka @cogdog, set about Stalking the Mythical OER Reuse: Seeking Non-Blurry Videos. OERs are open educational resources, openly licensed materials produced by educators and released to the world so others could make use of them. Funding was put into developing and releasing them and then, … what?

OERs. People build them. People house them in repositories. People do journal articles, conference presentations, research on them. I doubt never their existence.

But the ultimate thing they are supposed to support, maybe their raison d’être – the re use by other educators, what do we have to show for that except whispered stories, innuendo, and blurry photos in the forest?
Alan Levine

Alan went in search of the OER reuse in his own inimitable way…

… but came back without much success. He then used the rest of the post to put out all for stories about how OERs have actually been used in the world… Not just mythical stories, not coulds and mights: real examples.

So what about opendata – is there much use, or reuse, going on there?

It seems as is more datasets get opened every day, but is there more use every day, first day use of newly released datasets, incremental reuse of the datasets that are already out, linkage between the new datasets and the previously released ones.

Yesterday, I spotted via @owenboswarva the release of a dataset that aggregated and normalised data relating to charitable grant awards: A big day for charity data. Interesting… The supporting website – 360 Giving – (self-admittedly in it’s early days) allows you to search by funder, recipient or key word. You have to search using the right keywords, though, and the right capitalisation of keywords…

360giving-uniOxford

And you may have to add in white space.. so *University of Oxford * as well as *University of Oxford*.

I don’t want to knock the site, but I am really interested to know how this data might be used. Really. Genuinely. I am properly interested. How would someone working in the charitable sector use that website to help them do something? What thing? How would it support them? My imagination may be able to go off on crazy flights of fancy in certain areas, but my lack of sector knowledge or a current headful of summer cold leaves me struggling to work out what this website would tangibly help someone to do. (I tried to ask a similar question around charities data before, giving the example of Charities Commission data grabbed from OpenCharities, but drew a blank then.) Like @cogdog in his search for real OER use case stories, I’d love to hear examples of real questions – no matter how trivial – that the 360 Giving site could help answer.

As well as the website, 360 Giving folk provide a data download as a CSV file containing getting on for a quarter of a million records. The date stamp on the file I grabbed is 5th June 2014. Skimming through the data quickly – my own opening conversation with it can be found here: 360 Giving Grant Navigator – Initial Data Conversation – I noticed through comparison with the data on the website some gaps…

  • this item doesn’t seem to appear in the CSV download, perhaps because it doesn’t appear to have a funder?
  • this item on the website has an address for the recipient organisation, but the CSV document doesn’t have any address fields. In fact, on close inspection, the record relates to a grant by the Northern Rock Foundation, and I see no records from that body in the CSV file?
  • Although there is a project title field in the CSV document, no project titles are supplied. Looking through a sample of grants on the website, are any titles provided?
  • The website lists the following funders:

    Arts Council England
    Arts Council Wales
    Big Lottery
    Creative Scotland
    DSDNI
    Heritage Lottery Fund
    Indigo Trust
    Nesta
    Nominet Trust
    Northern Rock Foundation
    Paul Hamlyn Foundation
    Sport England
    Sport Northern Ireland
    Sport Wales
    TSB
    Wellcome Trust

    The CSV file has data from these funders:

    Arts Council England
    Arts Council Wales
    Big Lottery
    Creative Scotland
    DSDNI
    Nesta
    Nominet Trust
    Sport England
    Sport Northern Ireland
    Sport Wales
    TSB
    Wellcome Trust

    That is, the CSV contains a subset of the data on the website; data from Heritage Lottery Fund, Indigo Trust, Northern Rock Foundation, Paul Hamlyn Foundation doesn’t seem to have made it into the data download? I also note that data from the Research Councils’ Gateway to Research (aside from the TSB data) doesn’t seem to have made it into either dataset. For anyone researching grants to universities, this could be useful information. (Could?! Why?!;-)

  • No company numbers or Charity Numbers are given. Using opendata from Companies House a quick join on recipient names and company names from the Companies House register (without any attempts at normalising out things like LTD and LIMITED – that is, purely looking for an exact match) gives me just over 15,000 matched company names (which means I now have their address, company number, etc. too). And presumably if I try to match on names from the OpenCharities data, I’ll be able to match some charity numbers. Now both these annotations will be far from complete, but they’d be more than we have at the moment. A question to then ask is – is this better or worse? Does the dataset only have value if it is in some way complete? One of the clarion calls for open data initiatives has been to ‘just get the data out there’ so that it can be started to be worked on, and improved on. So presumably having some company numbers of charity numbers matched is a plus?

    Now I know there is a risk to this. Funders may want to not release details about the addresses of the charities of they are funding because that data may be used to plot maps to say “this is where the money’s going” when it isn’t. The charity may have a Kensington address and the received funding for an initiative in Oswaldtwistle, but the map might see all the money sinking into Kensington; which would be wrong. But that’s where you have to start educating the data users. Or releasing data fields like “address of charity” and “postcode area of point of use”, or whatever, even if the latter is empty. As it is, if you give me a charity or company name, I can look up it’s address. And its company or charity number if it has one.

As I mentioned, I don’t want to knock the work 360 Giving have done, but I’m keen to understand what it is they have done, what they haven’t done, and what the opendata they have aggregated and re-presented could – practically, tractably, tangibly – be used for. Really used for.

Time to pack my bags and head out into the wood, maybe…

Written by Tony Hirst

August 15, 2014 at 9:56 am

Posted in Open Data, Policy

Tagged with

Using Open Data to Hold Companies to Account?

leave a comment »

Some rambling but possibly associated thoughts… I suggest you put Alice’s Restaurant on…

For some time now, I’ve had an uncomfortable feeling about the asymmetries that exist in the open data world as well as total confusion about the notion of transparency.

Part of the nub of the problem (for me) lies with the asymmetric disclosure requirements of public and private services. Public bodies have disclosure requirements (eg Local Government Transparency Code), private companies don’t. Public bodies disclose metrics and spend data, data that can be used in public contract tendering processes by private bodies against public ones tendering for the same service. The private body uses this information – and prices in a discount associated with not having to carry the cost of public reporting – into the bid. The next time the contract is tendered, the public body won’t have access to the (previously publicly disclosed) information that the private body originally had when making its bid. Possibly. I don’t know how tendering works. But from the outside, that’s how it appears to me. (Maybe there needs to be more transparency about the process?)

Open data is possibly a Big Thing. Who knows? Maybe it isn’t. Certainly the big consulting firms are calling it as something worth squillionty billionty of pounds. I’m not sure how they cost it. Maybe I need to dig through the references and footnotes in their reports (Cap Gemini’s Open Data Economy: Unlocking Economic Value by Opening Government and Public Data, Deloitte’s Open growth: Stimulating demand for open data in the UK or McKinsey’s Open data: Unlocking innovation and performance with liquid information). I don’t know how much those companies have received in fees for producing those reports, or how much they have received in consultancy fees associated with public open data initiatives – somehow, that spend data doesn’t seem to have been curated in a convenient way, or as a #opendatadata bundle? – but I have to assume they’re not doing it to fleece the public bodies and tee up benefits for their other private corporate clients.

Reminds me – I need to read Owen Boswarva’s Who supports the privatisation of Land Registry? and ODUG benefits case for open data release of an authoritative GP dataset again… And remind myself of who sits on the Open Data User Group (ODUG), and other UK gov departmental transparency boards…

And read the FTC’s report Data Brokers: A Call For Transparency and Accountability

Just by the by, one thing I’ve noticed about a lot of opendata releases is that, along with many other sorts of data, they are most useful when aggregated over time or space, and/or combined with other data sets. Looking at the month on month reports of local spending data from my local council is all very well, but it gets more interesting when viewed over several months or years. Looking at the month on month reports of local spending data from my local council is all very well, but it gets more interesting when looking at spend across councils, as for example in the case of looking at spend to particular companies.

Aggregating public data is one of the business models that helps create some of the GDP figure that contributes to the claimed, anticipated squillionty billionty pounds of financial benefit that will arise from open data – companies like opencorporates aggregating company data, or Spend Network aggregating UK public spending data who hope to start making money selling products off the back of public open data they have curated. Yes – I know a lot of work goes in to cleaning and normalising that data, and that exploiting the data collection as a whole is what their business models are about – and why they don’t offer downloads of their complete datasets, though maybe licenses require they do make links to, or downloads of, the original (“partial”) datasets available?

But you know where I think the real value of those companies lies? In being bought out. By Experian, or Acxiom (if there’s even a hint of personally identifiable data through reverse engineering in the mix), or whoever… A weak, cheap, cop out business model. Just like this: Farmers up in arms over potential misuse of data. (In case you missed it, Climate Corporation was one of the OpenData500 that aggregated shed loads of open data – according to Andrew Stott’s Open Data for Economic Growth report for the World Bank, Climate Corp “uses 60 years of detailed crop yield data, weather observations from one million locations in the United States and 14 terabytes of soil quality data – all free from the US Government – to provide applications that help farmers improve their profits by making better informed operating and financing decisions”. It was also recently acquired by Monsanto – Monsanto – for just under a billion US $. That’s part of the squillionty billionties I guess. Good ol’ open data. Monsanto.)

Sort of related to this – that is, companies buying others to asset strip them for their data – you know all that data of yours locked up in Facebook and Google? Remember MySpace? Remember NNTP? According to the Sophos blog, Just Because You Don’t Give Your Personal Data to Google Doesn’t Mean They Can’t Acquire It. Or that someone else might buy it.

And as another aside – Google – remember Google? They don’t really “read” your email, at least, people don’t, they just let algorithms process it so the algorithms can privately just use that data to send you ads, but no-one will ever know what the content of the email was to trigger you getting that ad (‘cos the cookie tracking, cookie matching services can’t unpick ad bids, ad displays, click thrus, surely, can they?!), well – maybe there are side effects: Google tips off cops after spotting child abuse images in email (for some reason, after initially being able to read that article, my browser can’t load it atm. Server fatigue?). Of course, if Google reads your ads for blind business purposes and ad serving is part of that blind process you accept it. But how does the law enforcement ‘because we can even though you didn’t warrant us to?’ angle work? Does the Post Office look inside the envelope? Is surveillance actually part of Google’s business model?

If you want to up the paranoia stakes, this (from Ray Corrigan, in particular: “Without going through the process of matching each government assurance with contradictory evidence, something I suspect would be of little interest, I would like to draw your attention to one important misunderstanding. It seems increasingly to be the belief amongst MPs that blanket data collection and retention is acceptable in law and that the only concern should be the subsequent access to that data. Assertions to this effect are simply wrong.”) + that. Because one day, one day, they may just find your name on an envelope of some sort under a tonne of garbage. Or an algorithm might… Kid.

But that’s not what this post is about – what this post is about is… Way back when, so very long ago, not so very long ago, there was a license called GPL. GPL. And GPL was a tainting license. findlaw describes the consequences of reusing GPL licensed code as follows: Kid, ‘if a user of GPL code decides to distribute or publish a work that “in whole or in part contains or is derived from the [open source] or any part thereof,” it must make the source code available and license the work as a whole at no charge to third parties under the terms of the GPL (thereby allowing further modification and redistribution).

‘In other words, this can be a trap for the unwary: a company can unwittingly lose valuable rights to its proprietary code.’

Now, friends, GPL scared people so much that another license called LGPL was created, and LGPL allowed you to use LGPL licensed code without fear of tainting your own code with the requirement to open up your own code as GPL would require of it. ‘Cos licenses can be used against you.

And when it comes to open data licenses, they seem to be like LGPL. You can take open public data and aggregate it, and combine it, and mix it and mash it and do what you like with it and that’s fine… And then someone can come along buy that good work you’ve done and do what they want with it. Even Monsanto. Even Experian. And that’s good and right, right? Wrong. The ODUG. Remember the ODUG? The ODUG is the Open Data User Group that lobbies government for what datasets to open up next. And who’s on the ODUG? Who’s there, sitting there, on the ODUG bench, right there, right next to you?

Kid… you wanna be the all-open, all-liberal open data advocate? You wanna see open data used for innovation and exploitation and transparency and all the Good Things (big G, big T) that open data might be used for? Or you wanna sit down on the ODUG bench? With Deloitte, and Experian, and so on…

And if you think that using a tainting open data license so anyone who uses that data has to share it likewise, aggregated, congregated, conjugated, disaggregated, mixed, matched, joined, summarised or just otherwise and anyways processed, is a Good Thing…? Then kid… they’ll all move away from you on the bench there…

Because when they come to buy you, they won’t your data to be tainted in any way that means they’ll have to give up the commercial advantage they’ll have from buying up your work on that open data…

But this post? That’s not what this post is about. This post about holding companies to account. Open data used to hold companies to account. There’s a story to be told that’s not been told about Dr Foster, and open NHS data and fear-mongering and the privatisation of the NHS and that’s one thing…

But another thing is how government might use data to help us protect ourselves. Because government can’t protect us. Government can’t make companies pay taxes and behave responsibly and not rip off consumers. Government needs our help to do that. But can government help us do that too? Protect and Survive.

There’s a thing that DECC – the Department of Energy and Climate Change – do, and that’s publish statistics about domestic energy price statistics and industrial energy price statistics and road fuel and other petroleum product price statistics, and they’re all meaningless. Because they bear little resemblance to spot prices paid when consumers pay their domestic energy bills and road fuel and other petroleum product bills.

To find out what those prices are you have to buy the data from someone like Experian, from something like Experian’s Catalist fuel price data – daily site retail fuel prices – data product. You may be able to caluclate the DECC statistics from that data (or you may not) but you certainly can’t go the other way, from the DECC statistics to anything like the Experian data.

But can you go into your local library and ask to look at a copy of the Experian data? A copy of the data that may or may not be used to generate the DECC road fuel and other petroleum product price statistics (how do they generate those statistics anyway? What raw data do they use to generate those statistics?)

Can you imagine ant-eye-ant-eye-consumer data sets being published by your local council or your county council or your national government that can be used to help you hold companies to account and help you tell them that you know they’re ripping you off and your council off and your government off and that together, you’re not going to stand for it?

Can you imagine your local council publishing the forecourt fuel prices for one petrol stations, just one petrol station, in your local council area every day? And how about if they do it for two petrol stations, two petrol stations, each day? And if they do it for three forecourts, three, can you imagine if they do it for three petrol stations…? And can you, can you imagine prices for 50 petrol stations a day being published by your local council, your council helping you inform yourself about how you’re being manipulated, can you imagine…? (It may not be so hard – food hygiene ratings are published for food retail environments across the England, Northern Ireland and Wales…

So let’s hear it for open data, and how open data can be used to hold corporates to account, and how public bodies can use open data to help you make better decisions (which is a good neoliberal position to take and one which the other folk on the bench tell you that that’s what you want and that and markets work, though they also fall short of telling you that the models say that markets work with full information but you don’t have the information, and even if you did, you wouldn’t understand it, because you don’t really know how to make a good decision, but at the end of the day you don’t want a decision, you just want a good service fairly delivered, but they don’t tell that it’s all right to just want that…)

And let’s hear it for public bodies making data available whether it’s open or not, making it available by paying for it if they have to and making it available via library services so that we can start using it to start holding companies to account and start helping our public services, and ourselves, protect ourselves from the attacks being mounted on us by companies, and their national government supporters, who take on debt, and who allow them to take on debt, to make dividend payouts but not capital investment and subsidise the temporary driving down of prices (which is NOT a capital investment) through debt subsidised loss leading designed to crush competition in a last man standing contest that will allow monopolistic last man standing price hikes at the end of it…

And just remember, if there’s anything you want, you know where you can get it… At Alice’s… or the library… only they’re shutting them down, aren’t they…? So that leaves what..? Google?

Written by Tony Hirst

August 3, 2014 at 12:10 am

Confused About “The Right to be Forgotten”. Was: Is Web Search is Getting Harder?

with one comment

[Seems I forgot to post this, though I started drafting it on May 19th... Anyway, things seem to have moved on a bit...]

A search related story in the news last week reported on a ruling by the European Union Court of Justice that got wide billing as a “right to be forgotten” (eg BBC News: EU court backs ‘right to be forgotten’ in Google case).

Here’s another example of “censorship”? WordPress not allowing me to link to a URL because it insists on rewriting the & characters in it – here’s the actual link:

http://curia.europa.eu/juris/document/document.jsf?text=&docid=152065&pageIndex=0&doclang=EN&mode=req&dir=&occ=first&part=1&cid=250828

CURIA_-_Documents_and_A_critical_reflection_on_Big_Data__Considering_APIs__researchers_and_tools_as_data_makers___Vis___First_Monday

For stories like this, I try to look at the original ruling but also tend to turn to law blogs such as my colleague Ray Corrigan’s B2fxxx (though he hasn’t posted on this particular story yet?) or Pinsent Mason’s Out-law (eg Out-law: Google data protection ruling has implications for multi-faceted global businesses) to find out what was actually said.

Here’s the gist of the rulings:

  • “the activity of a search engine consisting in finding information published or placed on the internet by third parties, indexing it automatically, storing it temporarily and, finally, making it available to internet users according to a particular order of preference must be classified as ‘processing of personal data’ within the meaning of Article 2(b) when that information contains personal data and, second, the operator of the search engine must be regarded as the ‘controller’ in respect of that processing, within the meaning of Article 2(d).” So what? A person has a right to object to a data controller about the way their data is processed and can obtain “the rectification, erasure or blocking of data” because of its “incomplete or inaccurate nature”.
  • “processing of personal data is carried out in the context of the activities of an establishment of the controller on the territory of a Member State, within the meaning of that provision, when the operator of a search engine sets up in a Member State a branch or subsidiary which is intended to promote and sell advertising space offered by that engine and which orientates its activity towards the inhabitants of that Member State.” So Google was found to be “established” in EU member state territories. Are there any implications from that ruling regards tax situation, I wonder?
  • Insofar as the processing of personal data that has been subject to a successful objection goes, “the operator of a search engine is obliged to remove from the list of results displayed following a search made on the basis of a person’s name links to web pages, published by third parties and containing information relating to that person, also in a case where that name or information is not erased beforehand or simultaneously from those web pages, and even, as the case may be, when its publication in itself on those pages is lawful.” Note there are limits on this in the case of legitimate general public interest.
  • The final ruling seems to at least admit the possibility that folk can request data be taken down without them having to demonstrate that it is prejudicial to them? “when appraising the conditions for the application of those provisions, it should inter alia be examined whether the data subject has a right that the information in question relating to him personally should, at this point in time, no longer be linked to his name by a list of results displayed following a search made on the basis of his name, without it being necessary in order to find such a right that the inclusion of the information in question in that list causes prejudice to the data subject. As the data subject may, in the light of his fundamental rights under Articles 7 and 8 of the Charter, request that the information in question no longer be made available to the general public on account of its inclusion in such a list of results, those rights override, as a rule, not only the economic interest of the operator of the search engine but also the interest of the general public in having access to that information upon a search relating to the data subject’s name. “However, that would not be the case if it appeared, for particular reasons, such as the role played by the data subject in public life, that the interference with his fundamental rights is justified by the preponderant interest of the general public in having, on account of its inclusion in the list of results, access to the information in question.”.

[Update, July 2nd, 2014]
It seems as if things have moved on – Google is publishing notices in the google.co.uk territotry at least to the effect that “Some results may have been removed under data protection law in Europe” [my emphasis].

_Esmerelda_Bobbins__-_Google_Search

The FAQ describes what’s happening thus:

When you search for a name, you may see a notice that says that results may have been modified in accordance with data protection law in Europe. We’re showing this notice in Europe when a user searches for most names, not just pages that have been affected by a removal.

The media are getting uppity about it of course, eg Peston completely misses the point, as well as getting it wrong?

BBC_News_-_Why_has_Google_cast_me_into_oblivion__and_Stan_O_Neal_site_bbc_co_uk_-_Google_Search_and_https___circle_ubc_ca_bitstream_handle_2429_5042_ubc_1994-0177_pdf____1

In fact, it seems as if the BBC themselves are doing a much better job of obliviating Peston from their own search results…

peston_oblivion

What all the hype this time around seems to be missing – as with the reporting around the original ruling – is the interpretation that the court ruled on about the behaviour of the search engines insofar as they are deemed to be processors of “personal data”. (Of course, these companies also process personal data as part of the business of operating user accounts, but the business functions – of managing user accounts versus operating a search index and returning search results from public queries applied to it – are presumably sandboxed as far as data protection legislation goes.)

If Google is deemed to be a data controller of personal data that is processed as a result of the way it operates its search index, it presumably means that I can make a subject access request about the data that the Google search index holds about me (as well as the subject access requests I can make to the bit of Google that operates the various Google accounts that I have registered).

As far as the loss of the “right to discover” that the hacks are banging on about as a consequence of “the right to be forgotten”, does this mean that Google is the start and end point of their research activity? (And also putting aside the point that most folk: a) don’t look past the first few results; b) are rubbish at searching. As far as search engine ranking algorithms go – erm, what sort of “truth” do you think reveal? How do you think Google ranks results? And how do you think it comparatively ranks content generated years ago (when links were more persistent than a brief appearance in Twitter feeds and Facebook streams) to content generated more recently (that doesn’t set up persistent link structures)?)

Don’t they use things like Nexis UK?

Or if anything other than Google is too hard, they can just edit the URL to use google.com rather than google.co.uk

This is where it probably also starts to make sense to look back to the original ruling and spend some time reading it more closely. Is LexisNexis a data controller, subject to data protection legislation, based on it’s index of news media content? Are the indices it operates around court cases similarly covered?

Written by Tony Hirst

July 3, 2014 at 1:35 pm

Posted in Policy

Follow

Get every new post delivered to your Inbox.

Join 795 other followers