– the public paid for it so public has a right to it: the public presumably paid for it through their taxes. Companies that use open public data that don’t fully and fairly participate in the tax regime of the country that produced the data then they didn’t pay their fair share for access to it.
– data quality will improve: with open license conditions that allow users to take open (public) data and do what they want with it without the requirement to make derived data available in a bulk form under an open data license, how does the closed bit of the feedback loop work? I’ve looked at a lot of open public data releases on council and government websites and seen some companies making use of that data in presumably a cleaned form (if it hasn’t been cleaned, then they’re working with a lot of noise…) But if they have cleaned and normalised the data, have they provided this back ion an open form to the public body that gifted them access to it? Is there an open data quality improvement cycle working there? Erm… no… I suspect if anything, the open data users would try to sell the improved quality data back to the publisher. This may be their sole business model, or it may be a spin-off as a result of using the (cleaned and normalised) data fro some other commercial purpose.
I’ve no idea… Because there aren’t any, apparently: Poor data quality hindering government open data programme. And as I try to make sense of that article, it seems there aren’t any because of UTF-8, I think? Erm…
For my own council, the local hyperlocal, OnTheWight, publish a version of Adrian Short’s Armchair Auditor app at armchairauditor.onthewight.com. OnTheWight have turned a few stories from this data, I think, so they obviously have a strategy for making use of the data.
My own quirky skillset, such as it is, meant that it wasn’t too hard for me to start working with the original council published data to build an app showing spend in different areas, by company etc – Local Council Spending Data – Time Series Charts – although the actual application appears to have rotted (pound signs are not liked by the new shiny library and I can’t remember how to log in to the glimmer site:-(
I also tried to make sense of the data by trying to match it up to council budget areas, but that wasn’t too successful: What Role, If Any, Does Spending Data Have to Play in Local Council Budget Consultations?
But I still don’t know what questions to ask, what scripts to run? Some time ago, Ben Worthy asked Where are the Armchair Auditors? but I’m more interested in the question: what would they actually do? and what sort of question or series of questions might they usefully ask, and why?
Just having access to data is not really that very interesting. It’s the questions you ask for it, and the sorts of stories you look for in it, that count. So what stories might Armchair Auditors go looking for, what odd things might they seek out, what questions might they ask of the data?
Longtime readers will know that every so often I publish a posts whose title starts “Confused About”. The point of these posts is to put a marker down in this public notebook about words, phrases, ideas or stories that seem to make sense to everyone else but that I really don’t get.
They’re me putting my hand up and asking the possibly stupid question, then trying to explain my confusion and the ways in which I’m trying to make sense of the idea.
As educators, we’re forever telling learners not to be afraid of asking the question (“if you have a question, ask it: someone less comfortable even that than you with asking questions probably has the same question too, so you’ll be doing everyone a favour”), not to be afraid of volunteering an answer.
Of course, as academics, we can’t ever be seen to be wrong or to not know the answer, which means we can’t be expected to admit to being confused or not understanding something. Which is b*****ks of course.
We also can’t be seen to write down anything that might be wrong, because stuff that’s written down takes on the mantle of some sort of eternal truthiness. Which is also b*****ks. (This blog is a searchable, persistent, though mutable by edits, notebook of stuff that I was thinking at the time it was written. As the disclaimer could go, it does not necessarily represent even my own ideas or beliefs…)
It’s easy enough to take countermeasures to avoid citation of course – never publish in formal literature; if it’s a presentation that’s being recorded, put some music in it whose rights owners are litigious, or some pictures of Disney characters. Or swear…. Then people won’t link to you.
Anyway, back to being confused… I think that’s why I post these posts…
I also like to think they’re an example of open practice…
[Thinkses in progress – riffing around the idea that transparency is not reporting. This is all a bit confused atm…]
UK Health Secretary Jeremy Hunt was on BBC Radio 4’s Today programme today talking about a new “open and honest reporting culture” for UK hospitals. Transparency, it seems, is about publishing open data, or at least, putting crappy league tables onto websites. I think: not….
The fact that a hospital has “a number” of mistakes may or may not be interesting. As with most statistics, there is little actual information in a single number. As the refrain on the OU/BBC co-produced numbers programme More or Less goes, ‘is it a big number or a small number?’. The information typically lies in the comparison with other numbers, either across time or across different entities (for example, comparing figures across hospitals). But comparisons may also be loaded. For a fair comparison we need to normalise numbers – that is, we need to put them on the same footing.
[A tweet from @kdnuggets comments: ‘The question to ask is not – “is it a big number or a small number?”, but how it compares with other numbers’. The sense of the above is that such a comparison is always essential. A score of 9.5 in a test is a large number when the marks are out of ten, a small one when out of one hundred. Hence the need for normalisation, or some other basis for generating a comparison.]
The above cartoon from web comic XKCD demonstrates this with a comment about how reporting raw numbers on a map often tends to just produce a population map illustrates this well. If 1% of town A with population 1 million has causal incidence [I made that phrase up: I mean, the town somehow causes the incidence of X at that rate] of some horrible X (that is, 10,000 people get it as a result of living in town A), and town B with a population of 50,000 (that is, 5,000 people get X) has a causal incidence of 10%, a simple numbers map would make you fearful of living in town A, but you’d be likely worse off moving to town B.
Sometimes a single number may appear to be meaningful. I have £2.73 in my pocket so I have £2.73 to spend when I go to the beach. But again, there is a need for comparison here. £2.73 needs to be compared against the price of things it can purchase to inform my purchasing decisions.
In the opendata world, it seems that just publishing numbers is taken as transparency. But that’s largely meaningless. Even being able to compare numbers year on year, or month on month, or hospital on hospital, is largely meaningless, even if those comparisons can be suitably normalised. It’s largely meaningless because it doesn’t help me make sense of the “so what?” question.
Transparency comes from seeing how those numbers are used to support decision making. Transparency comes from seeing how this number was used to inform that decision, and why it influenced the decision in that way.
Transparency comes from unpacking the decisions that are “evidenced” by the opendata, or other data not open, or no data at all, just whim (or bad policy).
Suppose a local council spends £x thousands on an out-of area placement several hundred miles away. This may or may not be expensive. We can perhaps look at other placement spends and see that the one hundred of miles away appears to offer good value for money (it looks cheap compared to other placements; which maybe begs the question why those other placements are bing used if pure cost is a factor). The transparency comes from knowing how the open data contributed to the decision. In many cases, it will be impossible to be fully transparent (i.e. to fully justify a decision based on opendata) because there will be other factors involved, such as a consideration of sensitive personal data (clinical decisions based around medical factors, for example).
So what that there are z mistakes in a hospital, for league table purposes – although one thing I might care about is how z is normalised to provide a basis of comparison with other hospitals in a league table. Because league tables, sort orders, and normalisation make the data political. On the other hand – maybe I absolutely do want to know the number z – and why is it that number? (Why is it not z/2 or 2*z? By what process did z come into being? (We have to accept, unfortunately, that systems tend to incur errors. Unless we introduce self-correcting processes. I absolutely loved the idea of error-correcting codes when I was first introduced to them!) And knowing z, how does that inform the decision making of the hospital? What happens as a result of z? Would the same response be prompted if the number was z-1, or z/2? Would a different response be in order if the number was z+1, or would nothing change until it hit z*2? In this case the “comparison” comes from comparing the different decisions that would result from the number being different, or the different decisions that can be made given a particular number. The meaning of the number then becomes aligned to the different decisions that are taken for different values of that number. The number becomes meaningful in relation to the threshold values that the variable corresponding to that number are set at when it comes to triggering decisions.)
Transparency comes not from publishing open data, but from being open about decision making processes and possibly the threshold values or rates of change in indicators that prompt decisions. In many cases the detail of the decision may not be fully open for very good reason, in which case we need to trust the process. Which means understanding the factors involved in the process. Which may in part be “evidenced” through open data.
Going back to the out of area placement – the site hundreds of miles away may have been decided on by a local consideration, such as the “spot price” of the service provision. If financial considerations play a part in the decision making process behind making that placement, that’s useful to know. It might be unpalatable, but that’s the way the system works. But it begs the question – does the cost of servicing that placement (for example, local staff having to make round trips to that location, opportunity cost associated with not servicing more local needs incurred by the loss of time in meeting that requirement) also form part of the financial consideration made during the taking of that decision? The unit cost of attending a remote location for an intervention will inevitably be higher than attending a closer one.
If financial considerations are part of a decision, how “total” is the consideration of the costs?
That is very real part of the transparency consideration. To a certain extent, I don’t care that it costs £x for spot provision y. But I do want to know that finance plays a part in the decision. And I also want to know how the finance consideration is put together. That’s where the transparency comes in. £50 quid for an iPhone? Brilliant. Dead cheap. Contract £50 per month for two years. OK – £50 quid. Brilliant. Or maybe £400 for an iPhone and a £10 monthly contract for a year. £400? You must be joking. £1250 or £520 total cost of ownership? What do you think? £50? Bargain. #ffs
Transparency comes from knowing the factors involved in a decision. Transparency comes from knowing what data is available to support those decisions, and how the data is used to inform those decisions. In certain cases, we may be able to see some opendata to work through whether or not the evidence supports the decision based on the criteria that are claimed to be used as the basis for the decision making process. That’s just marking. That’s just checking the working.
The transparency bit comes from understanding the decision making process and the extent to which the data is being used to support it. Not the publication of the number 7 or the amount £43,125.26.
Reporting is not transparency. Transparency is knowing the process by which the reporting informs and influences decision making.
I’m not sure that “openness” of throughput is a good thing either. I’m not even sure that openness of process is a Good Thing (because then it can be gamed, and turned against the public sector by private enterprise). I’m not sure at all how transparency and openness relate? Or what “openness” actually applies to? The openness agenda creeps (as I guess I am proposing here in the context of “openness” around decision making) and I’m not sure that’s a good thing. I don’t think we have thought openness through and I’m not sure that it necessarily is such a Good Thing after all…
What I do think we need is more openness within organisations. Maybe that’s where self-correction can start to kick in, when the members of an organisation have access to its internal decision making procedures. Certainly this was one reason I favoured openness of OU content (eg Innovating from the Inside, Outside) – not for it to be open, per se, but because it meant I could actually discover it and make use of it, rather than it being siloed and hidden away from me in another part of the organisation, preventing me from using it elsewhere in the organisation.
Not content with selling off public services, is the government doing all it can to monetise us by means other than taxation by looking for ways of selling off aggregated data harvested from our interaction as users of public services?
For example, “Better information means better care” (door drop/junk mail flyer) goes the slogan that masks the notice that informs you of the right to opt out [how to opt out] of a system in which your care data may be sold on to commercial third parties, in a suitably anonymised form of course… (as per this, perhaps?).
The intention is presumably laudable – better health research? – but when you sell to one person you tend to sell to another… So when I saw this story – Data Broker Was Selling Lists Of Rape Victims, Alcoholics, and ‘Erectile Dysfunction Sufferers’ – I wondered whether care.data could end up going the same way?
Despite all the stories about the care.data release, I have no idea which bit of legislation covers it (thanks, reporters…not); so even if I could make sense of the legalese, I don’t actually know where to read what the legislation says the HSCIC (presumably) can do in relation to sale of care data, how much it can charge, any limits on what the data can be used for etc.
I did think there might be a clause or two in the Health and Social Care Act 2012, but if there is it didn’t jump out at me. (What am I supposed to do next? Ask a volunteer librarian? Ask my MP to help me find out which bit of law applies, and then how to interpret it, as well as game it a little to see how far the letter if not the spirit of the law could be pushed in commercially exploiting the data? Could the data make it as far as Experian, or Wonga, for example, and if so, how might it in principle be used there? Or how about in ad exchanges?)
A little more digging around the HSCIC Data flows transition model turned up some block diagrams showing how data used for commissioning could flow around, but I couldn’t find anything similar as far as sale of care.data to arbitrary third parties goes.
(That’s another reason to check the legislation – there may be a list of what sorts of company is allowed to access care.data for now, but the legislation may also use Henry VIII’th clauses or other schedule devices to define by what ministerial whim additional recipients or classes of recipient can be added to the list…)
What else? Over on the Open Knowledge Foundation blog (disclaimer: I work for the Open Knowledge Foundation’s School of Data for 1 day a week), I see a guest post from Scraperwiki’s Francis Irving/@frabcus about the UK Government Performance Platform (The best data opens itself on UK Gov’s Performance Platform). The platform reports the number of applications for tax discs over time, for example, or the claims for carer’s allowance. But these headline reports make me think: there is presumably much finer grained data below the level of these reports, presumably tied (for digital channel uptake of this services at least) to Government Gateway IDs. And to what extent is this aggregated personal data sellable? Is the release of this data any different in kind to the release of the other national statistics or personal information containing registers (such as the electoral roll) that the government publish either freely or commercially?
Time was when putting together a jigsaw of the bits and pieces of information you could find out about a person meant doing a big jigsaw with little pieces. Are we heading towards a smaller jigsaw with much bigger pieces – Google, Facebook, your mobile operator, your broadband provider, your supermarket, your government, your health service?
PS related, in the selling off stakes? Sale of mortgage style student loan book completed. Or this ill thought out (by me) post – Confused by Government Spending, Indirectly… – around government encouraging home owners to take out shared ownership deals with UK gov so it can sell that loan book off at a later date?