Opening Research Data on Your Behalf…

I note a pre-announced intention from the Justice Data Lab that they will publish “[t]ailored reports pertaining to the re-offending outcomes of services or interventions delivered by organisations who have requested information through the Justice Data Lab. Each report will be an Official Statistic.

If you haven’t been keeping up, the Justice data lab is a currently free, one year pilot scheme (started April 2013) in which “a small team from Analytical Services within the Ministry of Justice (the Justice Data Lab team) will support organisations that provide offender services by allowing them easy access to aggregate re-offending data specific to the group of people they have worked with” [User Journey].

Here’s how the user journey doc describes the process…

data lab process

…and the methodology:

datalab methodology

which is also described in the pre-announcement doc as follows:

Participating organisations supply the Justice Data Lab with details of the offenders who they have worked with, and information about the services they have provided. As standard the Justice Data Lab will supply aggregate one year proven re-offending rates for that group, and that of a matched control group of similar offenders. The re-offending rates for the organisation’s group and the matched control group will be compared using statistical testing to assess the impact of the organisation’s work on reducing re-offending. The results will then be returned to the organisation in a clear and easy to understand format, with explanations of the key metrics, and any caveats and limitations necessary for interpretation of the results.

The pre-announcement suggests that participating organisations will not only receive a copy of the report, but so will the public… The rationale:

The Justice Data Lab pilot is free at the point of service, paid for through the Ministry of Justice budget. The Ministry of Justice therefore has a duty to act transparently and openly about the outcomes of this initiative. It is anticipated that by making this information available in the public domain, organisations that work with offenders will have a greater evidence base about what works to rehabilitate offenders, and ultimately cut crime.

(Nice to see the MoJ believes in transparency. Shame that doesn’t go as far as timely spending data transparency, but I guess we can’t have it all…)

I think it’s worth taking notice of this pre-announcement for few reasons:

– are such data release mechanisms the result of lobbying pressure? Other government departments have datalabs, such as the HMRC datalab. HMRC recently ran a consultation on the release of VAT registration information as opendata, although concerns have been raised that this may just be a shortcut way of releasing company VAT registration data to credit rating agencies and their ilk…?, so it seems as if they are looking at what data they may be able to open up, and how, maybe in response to lobbying requests from corporate players who don’t want to have to (pay to) collect the data themselves…? Who might have lobbied the MoJ for the results of MoJ datalab requests to be opened up as public data, I wonder?

– are the results gameable, or might they be used as a tool to “attack” a group that is the basis of a research request? For example, can third parties request that the MoJ datalab runs an analysis on the effectiveness of a programme carried out by another party, such as, I dunno, G4S?

– the ESRC is in the process of a multi-stage funding round that will establish a range of research data centres. The first round, to establish a series of Administrative Data Research Centres has now closed (who won?!) and the second – for Business and Local Government Data Research Centres – is currently open. (Phase three will focus on “Third Sector data and social media data”…wtf?!) To what extent might any of the funded research data centres require that summaries of analyses run using datasets they control access to are released as public open data?

Just by the by, I note here the RCUK Common Principles on Data Policy:

Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.

Institutional and project specific data management policies and plans should be in accordance with relevant standards and community best practice. Data with acknowledged long-term value should be preserved and remain accessible and usable for future research.

To enable research data to be discoverable and effectively re-used by others, sufficient metadata should be recorded and made openly available to enable other researchers to understand the research and re-use potential of the data. Published results should always include information on how to access the supporting data.

RCUK recognises that there are legal, ethical and commercial constraints on release of research data. To ensure that the research process is not damaged by inappropriate release of data, research organisation policies and practices should ensure that these are considered at all stages in the research process.

To ensure that research teams get appropriate recognition for the effort involved in collecting and analysing data, those who undertake Research Council funded work may be entitled to a limited period of privileged use of the data they have collected to enable them to publish the results of their research. The length of this period varies by research discipline and, where appropriate, is discussed further in the published policies of individual Research Councils.

In order to recognise the intellectual contributions of researchers who generate, preserve and share key research datasets, all users of research data should acknowledge the sources of their data and abide by the terms and conditions under which they are accessed.

It is appropriate to use public funds to support the management and sharing of publicly-funded research data. To maximise the research benefit which can be gained from limited budgets, the mechanisms for these activities should be both efficient and cost-effective in the use of public funds.

See also: Some Sketchnotes on a Few of My Concerns About #opendata

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

4 thoughts on “Opening Research Data on Your Behalf…”

  1. Interesting blog, enjoyed reading it, thanks.

    I’m a data analyst with the probation service and I might be able to answer some of your questions. The Justice Data Lab was set up quickly by Chris Grayling when he took over from Ken Clarke at the MoJ last year and shouldn’t be viewed outside the context of his urgent “Transforming Rehabilitation” programme of reform.

    The opening up of the market for providing probation services will be considered a risk by some and seen as an opportunity by others. With part of the new contracts intended to be “payment by results” (in this case reduced reoffending rates) potential bidders ought to be interested in finding out exactly which interventions work.

    Lots of voluntary and community sector organisations will be hoping to win sub-contracting work at least (if not able financially to bid outright for contracts) and the JDL gives them a way of showing the G4S’s and Sercos of this world that they can do a good job.

    If anyone was lobbying the MoJ for a Justice Data Lab, it was probably New Philanthropy Capital who I know were pretty instrumental in getting it set up. If anything, they represent the voluntary sector providers and social investors rather than raw corporate interests.

    On third parties making submissions, this would be impossible unless they had access to individual-level data with personal identifiers (eg name and DoB or PNC number) as the MoJ analysts need this information to match records to the Police National Computer.

    Of course you are right to point out that a ministry will be transparent when it suits them, but perhaps not when it doesn’t. They clearly don’t want to tell you their spending arrangements. To date, they have refused to publish the risk register associated with the break-up of public sector Probation Trusts. I imagine a sea of red.

    As a reoffending data analyst and sometime open data champion, I couldn’t really criticise the JDL without picking holes, but as I’m picking holes, here’s one. The results will show that people who had a particular intervention reoffended more or less than their matched comparison group. It won’t tell you anything about the other things they had going on, their homelessness, drug intake, relationships etc, nor that of the control group. Desistance from crime is a complex issue, and probably rarely down to one specific thing.

    Having said that, the publication of statistically sound “evidence” of successful interventions is a good thing. It’s not an area that has been sufficiently explored, in my opinion. So on the face of it, the JDL looks like a good thing.

    My fear is it might just be a red herring. The whole concept relies on the major probation providers paying attention to evidence and investing in the right services. And that’s the key – investment. Whichever way you look at it, the most successful interventions are not likely to be cheap. For private providers to make a profit, they will need to be sure that the investment will pay off, and be paid big enough bonuses on their contracts.

    The thing is, there really isn’t any profit in rehabilitation. It’s an inherently public public service, one that should be provided by the state, for the benefit of society.

    The great irony is that for all these years, the MoJ, with its rooms’ full of analysts and big-foreheaded statisticians, has never really investigated what works despite having full access to all the data they need. They’ve just published a document essentially admitting as much.

    And as they prepare to sell the service to the highest bidders, and open up the data at the same time, the likelihood is that the future providers will be more interested in cutting costs and securing a guaranteed profit, rather than investing in the services that could see reoffending fall.

    I hope I’m wrong but I expect I’m not.

  2. Hi Jason-

    Thanks for your comments – a couple in particular made me start to wonder:

    “The results will show that people who had a particular intervention reoffended more or less than their matched comparison group. It won’t tell you anything about the other things they had going on, their homelessness, drug intake, relationships etc, nor that of the control group.”

    I guess a large part of it depends on what data is used to do the matching and the extent to which additional factors could be linked to the identities? For example, the NHS Data Linkage and Extract Service [ http://www.hscic.gov.uk/dles ] “provide extracts from a range of individual and linked data sets [from within the NHS dataspace] and can add significant value to individual sets of data by combining and matching them at individual record level in a secure environment”; various recommendations are in place from DWP regarding local authority sharing of data “to help local authorities decide whether they can use customer data, obtained for the purpose of administering social security benefits, to help improve delivery of other locally managed services and benefits” [ eg https://www.gov.uk/government/publications/data-sharing-guidance-for-local-authorities ] (these docs are also referenced from DfE http://www.education.gov.uk/childrenandyoungpeople/families/childpoverty/b0066347/child-poverty-data/child-poverty-data-sharing-data-effectively ] See also https://blog.ouseful.info/2012/12/10/all-i-did-was-go-to-a-carol-service/ for a search around the extent to which DWP can work with local authorities in fraud investigations. The DfE run a National Pupil Database [ http://www.education.gov.uk/researchandstatistics/national-pupil-database ] by means of which the “Secretary of State has specific powers to share pupil data from the NPD with named bodies and third parties who require access to the data to undertake research into the educational achievements of pupils under strict terms and conditions”. The HMRC datalab currently only seems to “allow approved academics to access anonymised HM Revenue & Customs (HMRC) data” but who knows where their open data strategy [ http://www.hmrc.gov.uk/transparency/implementation-plan.htm ] may take them! (For example, “As part of the Department’s Open Data Strategy, HMRC are planning to include more information within the Datalab based on requests. The VOA [Valuation Office Agency] will explore with HMRC and other stakeholders the potential for making its data sets available through this.”) And so on. Might we imagine a time when the various datalabs (“metadatalabs”?!) link with each other to provide a complete study? Would this be a Good Thing or a Bad Thing?

    “And as they prepare to sell the service to the highest bidders, and open up the data at the same time, the likelihood is that the future providers will be more interested in cutting costs and securing a guaranteed profit, rather than investing in the services that could see reoffending fall.”

    This also made me twitch… is the release of open data actually an invitation for business to come up with crafty analyses by which they can claim to be able to offer a more cost-effective service if only the service could be privately, rather than publicly, offered?

  3. Hi, Tony,

    Thanks for your reply. Your thoughts about linked data are particularly interesting. With reference to the Justice Data Lab, the MoJ say “a major limitation of the method is that propensity score matching can only account for offender characteristics captured by the Ministry of Justice”.

    This suggests they may not have considered a linked data approach, which might strengthen the case for any findings. For example, matching health records for drug or alcohol use could produce a more robust counterfactual. Very interesting area and room for a bit more research.

    On your second point, hopefully any analysis of public data will be subject to scrutiny, both by government and the public. As far as my area – criminal justice – is concerned, I need to be convinced the private sector cares about any kind of analysis enough to invest in it.

    If the contracts are mainly fixed service fee contracts for long periods, I imagine they will cut the service to its bare bones to lock in the safest route to profit. If, conversely, there is a large payment-by-results (PbR) slice to the contracts, based on reducing reoffending rates, then there may be an incentive to experiment and explore.

    This is the rationale you hear from government for PbR (it stimulates innovation) but in reality, the private contractors have to be happy with the contract to put in a bid, and if they don’t like the risk involved with their capacity to affect reoffending rates, they will get cold feet and look for a massive fixed fee.

    For all sorts of reasons (which is probably a blog post in its own right!) the more they look, the less they will like the idea of gambling on reoffending rates and I will guess that the PbR component to any future contracts will be very small. Then they’ll take it if they get it, but probably won’t be “wasting” loads of time and money finding out what works.

    Bit cynical, maybe. On an upbeat note, I think you’re on to something with the linked data.

Comments are closed.

%d bloggers like this: