Some Random Noticings About Data Linkage

Via a post on my colleague, and info law watchdog, Ray Corrigan’s blog – Alas medical confidentiality in the UK, we knew it well… – I note he has some concerns about the way in which the NHS data linkage service may be able to up its game as a result of the creation of the HSCIC – the Health and Social Care Information Centre – and it’s increasing access to data (including personal medical records?) held by GPs via the General Practice Extraction Service (GPES). (The HSCIC itself was established via legislation: Part 9 Chapter 2 of the Health and Social Care Act 2012. As I commented in The Business of Open Public Data Rolls On…, I think we need to keep a careful eye on (proposed) legislation that allows for “information of a prescribed description” to be made available to a “prescribed person” or “a person falling within a prescribed category”, where those prescriptions are left to the whim of the Minister responsible.) (Also via Ray, medConfidential has an interesting review of the HSCIC/GPES story so far.)

Something I hadn’t spotted before was the price list for data extraction and linkage services – just as interesting as the prices are the categories of service:

HSCIC datalinkage services

Here are the actual prices:

HSCIC data linkage price list

Complexity based on time to process-

3. A request is classed as ‘simple’ if specification, production and checking are expected to take less than 5 hours.
4. A request is classed as ‘medium’ if specification, production and checking are expected to take less than 7 hours but more than 5.
5. A request is classed as ‘complex’ if specification, production and checking are expected to take less than 12 hours but more than 7.

Doing a little search around the notion of “data linkage”, I stumbled across what looks to be quite a major data linkage initiative going on in Scotland – the Scotland Wide data linkage framework. There seems to have been a significant consultation exercise in 2012 prior to the establishment of this framework earlier this year: Data Linkage Framework Consultation [closed] [see for example the Consultation paper on “Aims and Guiding Principles” or the Technical Consultation on the Design of the Data Sharing and Linking Service [closed]]. Perhaps mindful of the fact that there may have been and may yet be public concerns around the notion of data linkage, an engagement exercise and report on Public Acceptability of Cross-Sectoral Data Linkage was also carried out (August 2012). A further round of engagement looks set so occur during November 2013.

I’m not sure what the current state of the framework, or its implementation, is (maybe this FOI request on Members and minutes of meetings of Data Linkage Framework Operations Group would give some pointers?) but one component of it at at least looks to be the Electronic Data Research and Innovation Service (eDRIS), a “one-stop shop for health informatics research”, apparently… Here’s the elevator pitch:

edris elevator pitch

Some examples of collaborative work are also provided:

– Linking data from NHS24 and Scottish Ambulance Service with emergency admissions and deaths data to understand unscheduled care patient pathways.
– Working with NHS Lothian to provide linked health data to support EuroHOPE – European Healthcare Outcomes, Performance and Efficiency Project Epidemiology, disease burden and outcomes of diverticular disease in Scotland
– Infant feeding in Scotland: Exploring the factors that influence infant feeding choices (within Glasgow) and the potential health and economic benefits of breastfeeding on child health

This got me wondering about what sorts of data linkage project things like HSCIC or the MoJ data lab (as reviewed here) might get up to. Several examples seem to to provided by the ESRC Administrative Data Liaison Service (ADLS): Summary of administrative data linkage. (For more information about the ADLS, see the Administrative Data Taskforce report Improving Access for Research and Policy.)

The ADLS itself was created as part of a three phase funding programme by the ESRC, which is currently calling for second phase bids for Business and Local Government Data Research Centres. I wonder if offering data linkage services will play a role in their remit? If so, I wonder if they will offer services along the lines of the ADLS Trusted Third Party Service (TTPS), which “provides researchers and data holding organisations a mechanism to enable the combining and enhancing of data for research to which may not have otherwise been possible because of data privacy and security concerns”? Apparently,

The [ADLS TTPS] facility is housed within a secure room within the Centre for Census and Survey Research (CCSR) at the University of Manchester, and has been audited by the Office for National Statistics. The room is only used to carry out disclosure risk assessment work and other work that requires access to identifiable data.”

Another example of a secure environment for data analysis is provided by the the HMRC Datalab. One thing I notice about that facility is that they don’t appear to allow expect researchers to use R (the software list identifies STATA 9/10/11, SAS 9.3, Microsoft Excel, Microsoft Word, SPSS Clementine 8.1/9.0/10.1/11.1/12)?

Why’s this important? Because little L, little D, linked data can provide a much richer picture than distinct data sets…

PS see also mosaic theory

PPS reminded by @wilm, here’s a “nice” example of data linkage from the New York Times… N.S.A. Gathers Data on Social Connections of U.S. Citizens.

PPPS and from the midata Innovation Lab, I notice this claim:

On the 4th of July 2013 we opened the midata Innovation Lab (mIL), on what we call “UK Consumer Independence Day”. So what is it? It’s the UK Government, leading UK companies and authoritative bodies collaborating on data services innovation and consumer protection for a data-driven future. We’ve put together the world’s fastest-built data innovation lab, creating the world’s most interesting and varied datasets, for the UK’s best brands and developers to work with.

The mIL is an accelerator for business to use a rich dataset to create new services for consumers. Designed in conjunction with innovative “Founding Partner” businesses, it also has oversight from authoritative bodies so we can create the world’s best consumer protection in the emerging personal data ecosystem.

The unique value of the lab is its ability to offer a unique dataset and consumer insight that it would be difficult for any one organization to collate. With expert input from authoritative consumer protection bodies, we can test and learn how to both empower and protect consumers in the emerging personal data ecosystem.

And this: “The personal data that we have asked for is focused on a few key areas: personal information including vehicle and property data, transactional banking and credit records, mobile, phone, broadband and TV billing information and utility bills.” It seems that data was collected from 50 individuals to start with.

One comment

