Opening Research Data on Your Behalf…

I note a pre-announced intention from the Justice Data Lab that they will publish “[t]ailored reports pertaining to the re-offending outcomes of services or interventions delivered by organisations who have requested information through the Justice Data Lab. Each report will be an Official Statistic.

If you haven’t been keeping up, the Justice data lab is a currently free, one year pilot scheme (started April 2013) in which “a small team from Analytical Services within the Ministry of Justice (the Justice Data Lab team) will support organisations that provide offender services by allowing them easy access to aggregate re-offending data specific to the group of people they have worked with” [User Journey].

Here’s how the user journey doc describes the process…

data lab process

…and the methodology:

datalab methodology

which is also described in the pre-announcement doc as follows:

Participating organisations supply the Justice Data Lab with details of the offenders who they have worked with, and information about the services they have provided. As standard the Justice Data Lab will supply aggregate one year proven re-offending rates for that group, and that of a matched control group of similar offenders. The re-offending rates for the organisation’s group and the matched control group will be compared using statistical testing to assess the impact of the organisation’s work on reducing re-offending. The results will then be returned to the organisation in a clear and easy to understand format, with explanations of the key metrics, and any caveats and limitations necessary for interpretation of the results.

The pre-announcement suggests that participating organisations will not only receive a copy of the report, but so will the public… The rationale:

The Justice Data Lab pilot is free at the point of service, paid for through the Ministry of Justice budget. The Ministry of Justice therefore has a duty to act transparently and openly about the outcomes of this initiative. It is anticipated that by making this information available in the public domain, organisations that work with offenders will have a greater evidence base about what works to rehabilitate offenders, and ultimately cut crime.

(Nice to see the MoJ believes in transparency. Shame that doesn’t go as far as timely spending data transparency, but I guess we can’t have it all…)

I think it’s worth taking notice of this pre-announcement for few reasons:

– are such data release mechanisms the result of lobbying pressure? Other government departments have datalabs, such as the HMRC datalab. HMRC recently ran a consultation on the release of VAT registration information as opendata, although concerns have been raised that this may just be a shortcut way of releasing company VAT registration data to credit rating agencies and their ilk…?, so it seems as if they are looking at what data they may be able to open up, and how, maybe in response to lobbying requests from corporate players who don’t want to have to (pay to) collect the data themselves…? Who might have lobbied the MoJ for the results of MoJ datalab requests to be opened up as public data, I wonder?

– are the results gameable, or might they be used as a tool to “attack” a group that is the basis of a research request? For example, can third parties request that the MoJ datalab runs an analysis on the effectiveness of a programme carried out by another party, such as, I dunno, G4S?

– the ESRC is in the process of a multi-stage funding round that will establish a range of research data centres. The first round, to establish a series of Administrative Data Research Centres has now closed (who won?!) and the second – for Business and Local Government Data Research Centres – is currently open. (Phase three will focus on “Third Sector data and social media data”…wtf?!) To what extent might any of the funded research data centres require that summaries of analyses run using datasets they control access to are released as public open data?

Just by the by, I note here the RCUK Common Principles on Data Policy:

Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.

Institutional and project specific data management policies and plans should be in accordance with relevant standards and community best practice. Data with acknowledged long-term value should be preserved and remain accessible and usable for future research.

To enable research data to be discoverable and effectively re-used by others, sufficient metadata should be recorded and made openly available to enable other researchers to understand the research and re-use potential of the data. Published results should always include information on how to access the supporting data.

RCUK recognises that there are legal, ethical and commercial constraints on release of research data. To ensure that the research process is not damaged by inappropriate release of data, research organisation policies and practices should ensure that these are considered at all stages in the research process.

To ensure that research teams get appropriate recognition for the effort involved in collecting and analysing data, those who undertake Research Council funded work may be entitled to a limited period of privileged use of the data they have collected to enable them to publish the results of their research. The length of this period varies by research discipline and, where appropriate, is discussed further in the published policies of individual Research Councils.

In order to recognise the intellectual contributions of researchers who generate, preserve and share key research datasets, all users of research data should acknowledge the sources of their data and abide by the terms and conditions under which they are accessed.

It is appropriate to use public funds to support the management and sharing of publicly-funded research data. To maximise the research benefit which can be gained from limited budgets, the mechanisms for these activities should be both efficient and cost-effective in the use of public funds.

See also: Some Sketchnotes on a Few of My Concerns About #opendata

The Business of Open Public Data Rolls On…

A few more bits and pieces around the possible distribution and application of open public data (that is, openly licensed data released by public bodies):

  • Bills before Parliament – Education (Information Sharing) Bill 2013-14: although this is a private member’s bill, explanatory notes have been prepared by prepared by the Department for Education. The bill allows for “student information of a prescribed description” to be made available to a “prescribed person” or “a person falling within a prescribed category”. If the bill goes through, keeping tabs on these prescriptions will be key to seeing how this might play out.

    As mentioned in my Rambling Round-Up of Some Recent #OpenData Notices from August, the HMRC is consulting on opening up access to VAT records. And through the post this week, I received a letter from the NHS regarding the sharing of data within the NHS via Summary Care Records, although this appears to be more to do with data sharing within the NHS on a case-by-case basis, rather than sharing of bulk datasets for analysis/research and/or business development. So outbreaks of planned sharing are appearing all over the place. I’m not sure what the best way of tracking such initiatives is though?

    I haven’t really been tracking private members’ bills either (except the Supermarket Pricing Information Bill 2012-13 that never went anywhere!), and I’m not really sure what they signal, but some of them do make me a bit twitchy. Like the currently proposed Collection of Nationality Data Bill that will “require the collection and publication of information relating to the nationality of those in receipt of benefits and of those to whom national insurance numbers are issued.” Or the Face Coverings (Prohibition) Bill 2013-14, whereby “a person wearing a garment or other object intended by the wearer as its primary purpose to obscure the face in a public place shall be guilty of an offence.” As discussions regarding privacy and anonymity on the web ebb and flow, it’s interesting to see how they’re tracked “IRL”. If a space is public, do you have any right to privacy or anonymity?

  • ESRC Pre-call: Business and Local Government Data Research Centres – Big Data Network Phase 2:

    The ESRCs Big Data Network will support the development of a network of innovative investments which will strengthen the UKs competitive advantage in Big Data. The core aim of this network is to facilitate access to different types of data and thereby stimulate innovative research and develop new methods to undertake that research. This network has been divided into three phases.

    • Phase 1 of the Big Data Network the ESRC has invested in the development of the Administrative Data Research Network (ADRN) which will provide access to de-identified administrative data collected by government departments for research use
    • Phase 2, which is the focus of this pre-announcement, will focus primarily on business data and local government data
    • Phase 3, further details of which will be released in the Autumn, will focus primarily on third sector data and social media data
  • Progress continues on the smart meter roll out program, with huge chunks of money being lined up for a few lucky companies (Government Selects Favourites For The Smart Meter Roll-Out). See also the Energy and Climate Change Select Committee inquiry – “Smart meter roll-out” and their Smart meter roll out report. Whilst the drivers are presumably supposedly related more efficient energy management, there are plenty of surveillance opportunities arising! Whilst not public data, as such, the availability (and sharing with data aggregators) of smart meter data does form part of the government’s #midata programme (around which the current strategy appears to be “the less said the better”…)
  • Maybe of interest to hardcore openspending data geeks, Local Audit and Accountability Bill 2013-14 has made its way from the Lords into the Commons. Schedule 9 introduces regulations around data matching, described as “an exercise involving the comparison of sets of data to determine how far they match (including the identification of any patterns and trends)”, although “data matching exercise[s] may not be used to identify patterns and trends in an individual’s characteristics or behaviour which suggest nothing more than the individual’s potential to commit fraud in the future”. A code of practice is also required. The power “is exercisable for the purpose of assisting in the prevention and detection of fraud” although the schedule may be amended in order to assist: “a) in the prevention and detection of crime (other than fraud), (b) in the apprehension and prosecution of offenders, and (c) in the recovery of debt owing to public bodies”.

    Schedule 11 covers the Disclosure of Information. Where an auditor obtains information from a public body “[a] local auditor, or a person acting on the auditor’s behalf, may also disclose information to which this Schedule applies except where the disclosure would, or would be likely to, prejudice the effective performance of a function imposed or conferred on the auditor by or under an enactment”. I’m not sure to what extent such information might be requestable from the local auditor though?

I have to admit, I’m losing track of all these data and information related laws. And I guess I should also admit that I don’t really understand what any of them actually mean, either…!;-)

Nudging Up the Price of Parking

When faced with a car parking charge of £1.90 and a “no change” ticket machine, how much do we actually end up paying?

A recent report on English Local Authority Parking Finances by the RAC Foundation reviews the surpluses made by local councils when comparing the revenue they generate from local parking and traffic enforcement notice charges and the costs associated with providing those services. Across all the English councils, it seems to amount to £412 million for the most recently reported on period, the financial year 2011-2012. From the reported figures, income of £1,371 million is generated with costs of £806 million and a surplus of £565 million, a gross margin of 41.2%.

The gross margin is calculated by dividing the difference between income and costs – that is, the surplus – by the total income. The gross margin is essentially the percentage of the income that we can consider to be “profit”. In this case,gross margin = (1371-806)/1371 * 100% = 41.2%

Presumably in an attempt to make a better story for unwary journalists making back of the envelope percentage calculations, the report describes how councils “collect around £1.4 billion [rounding up from £1,371 million] from parking tickets, permits and penalties, spend around £0.8 billion [rounding down, slightly, from £806 million] and make a surplus of £0.6 billion [rounding up from £565 million]”. The gross margin calculation using these numbers is 0.6/1.4 * 100% = 42.86%, which we might typically round up to 43%, compared to the proper rounding of the original amount, which would be 41%.

41% is still a great rate of return, of course! But is it fair? In written evidence to the current House of Commons Transport Select Committee on local authority parking enforcement, the RAC Foundation noted that “There is evidence that official guidance to TMA 2004 [Operational Guidance to Local Authorities: Parking Policy and Enforcement] on parking charges is not strictly adhered to, and that councils set parking charges with the likelihood of them realising a surplus. It should be clear to all local authorities that they have no legal powers to set parking charges at a higher level than that needed to achieve the objective of relieving or preventing congestion of traffic.”

Referring to the guidance itself, we see that setting the price of parking is something of a dark art that can use consumer psychology to influence behaviour in support of a particular transport policy.

4.8 When setting charges, authorities should consider the following factors:

  • parking charges can help to curb unnecessary car use where there is adequate public transport or walking or cycling are realistic alternatives, for example, in town centres;
  • charges can reflect the value of kerb-space, encouraging all but short-term parking to take place in nearby off-street car parks where available. This implies a hierarchy of charges within a local authority area, so that charges at a prime parking space in a busy town centre would normally be higher than those either at nearby off-street car parks or at designated places in more distant residential areas. Such hierarchies should be as simple as practicable and applied consistently so that charge levels are readily understandable and acceptable to both regular and occasional users;
  • charges should be set at levels that encourage compliance with parking restrictions. If charges are set too high they could encourage drivers to risk non-compliance or to park in unsuitable areas, possibly in contravention of parking restrictions. In certain cases they could encourage motorists to park in a neighbouring local authority area which may not have the capacity to handle
    the extra vehicles. In commercial districts this may have a negative impact on business in the area; and
  • if on-street charges are set too low, they could attract higher levels of traffic than are desirable. They could discourage the use of off-street car parks and cause the demand for parking spaces to exceed supply, so that drivers have to spend longer finding a vacant space.

Balancing these policy objectives against claims that the level of surplus being generated is unfair is something that each council needs to justify to its own constituents. When making such a justification, it would seem likely that representation could be made on several different levels – by considering overall revenues, costs and surpluses; by looking at the occupancy volumes or rates for different car parking spaces; or at the level of actual car parking tariffs (that is, how much it costs to park for an hour in a particular location).

Most of us feel the pain at the everyday level, of course, when it actually comes to actually finding and paying for car parking. But are we paying more than we need to, nudged into contributing to additional surpluses over and above what a quick calculation based on parking volumes and tariffs (that is, charges for parking) might suggest is the “planned” surplus? I thought I’d put my data sleuth hat on to try and find out how much extra money could be made by not providing change…

Take my local council, for example, on the Isle of Wight. The main civic car park in the charming harbour town of Yarmouth has a range of ticket prices, including a £1.90 rate for stays between one and two hours, and a £3.40 rate for durations between two and four hours. The two ticket machines are both cash based and don’t offer change. Many retailers know that pricing goods at £something.99 helps encourage sales, although how psychological pricing tricks like this actually work is still open to debate. (For more on the psychology of pricing, see the OFT commissioned report on Pricing Practices: Their Effects on Consumer Behaviour and Welfare.) In a “no change” payment setting, might we use related psychological tricks in association with the value of our coinage (1p, 2p, 5p, 10p, 20p, 50p, £1) to apparently set one price, which we must defend, whilst on average expecting the payment of a larger amount? That is, might we choose a £1.90 price point in the expectation that we might actually make £2 on many of the transactions?

Using data acquired via a Freedom of Information request, I asked the Isle of Wight council for the number of tickets issued within each price band for the Yarmouth town car park during 2012/13, along with the revenue generated by each of the two ticket machines. Using this information, we can calculate how much additional revenue is generated for each price band based on overpayments:


In the grander scheme of things, this doesn’t amount to a huge sum of money (the total overpayments come to £2272.15, or 1.7% of the total revenue), though it must be remembered that this refers to just a single car park in a single local council area.

If we look at the raw data that details the actual payment made for each ticket issued by the ticket machines at the £1.90 tariff level, we can see how many people actually overpay:

Actual Payment (£)	Count
1.9			10237
1.95			19
2			7734
2.05			6
2.1			16
2.15			1
2.2			16
2.3			7
2.35			1
2.4			22
2.5			39
2.55			1
2.6			7
2.7			7
2.75			1
2.8			2
2.9			11
2.95			1
3			134
3.05			2
3.1			3
3.2			18
3.3			10
3.35			3

Of the 18,298 tickets issued at the £1.90 level for the Yarmouth town car park during financial year 2012/13, it would appear that over 40% of the tickets issued generated £2 in revenue, presumably because drivers didn’t have the exact change to hand.

Whilst it would be easy enough to exclaim “We can only guess at how much money extra money English councils raise in this way”, that’s not strictly true. We could find out exactly by making FOI requests to them all…

Investigations such as this often raise more questions than they answer. For example: what parking tariff bands does your local council use? How much overpayment are you “happy” to make for your car parking ticket? If there were increases in charges from an amount such as £1.40 to £1.60, what might that have done for actual revenues raised within that price band? If you start exploring this topic in your local area, please let me know via the comments:-)

PS see also this Telegraph article on Academic finds link between parking tickets and wardens’ overtime.

Some Sketchnotes on a Few of My Concerns About #opendata

With my growing unease about just what the agenda driving open government/public data is, I think I’m going to have to find some time away to walk the dog lots, and mull over what pieces might be part of the jigsaw, as well as having a go at trying to put some of them together…

Near the top of the list is a concern about information asymmetry and how open data may be used by private concerns to provide a one-off advantage for them when it comes to poaching services from the public sector. How so? My gut reaction thinking is this: if, as part of the procurement process, the private sector can use open public data to help it secure a contract in competition with a public sector provider, then when contracts come to renewal the public sector may know less when it comes to bid than the private sector company was able to learn when it first tendered. The question here is: does open public data put private sector companies in an advantage when it comes to bidding for public service contracts against an encumbent public provider compared to a public body bidding to recapture a service from an encumbent private provider, given that the private provider may not be required to open up information (for example, through FOI requests, transparency or public reporting obligations) in the same way that a public body is.

Another take on a similar theme is the extent to which there may be a loss of transparency when a service goes from a public to a private provider. If we think there is some benefit to be had from transparency in general terms, then private providers of public services should have the same openness requirements placed on them as the public body. If private companies can claim revealing information is against their commercial interest, can public bodies make the same claims on exactly the same terms under FOI exemption rules, for example (eg MoJ Freedom of information guidance: Exemptions guidance – Section 43: Commercial interests).

Taking the NHS as a case example, here are a few things on my reading list:

  • Monitor report from March 2013 on A fair playing field for the benefit of NHS patients [actual report]. For example, the report identified the following distortions:

    1. Participation distortions. Some providers are directly or indirectly excluded from offering their services to NHS patients for reasons other than quality or efficiency. Restrictions on participation disadvantage providers seeking to expand into new services or new areas, regardless of whether the providers are public, charitable or private. Participation distortions disadvantage nonincumbent providers of every type.
    2. Cost distortions. Some types of provider face externally imposed costs that do not fall on other providers. On balance, cost distortions mostly disadvantage charitable and private health care providers compared to public providers.
    3. Flexibility distortions. Some providers’ ability to adapt their services to the changing needs of patients and commissioners is constrained by factors outside their control. These flexibility distortions mostly disadvantage public sector providers compared to other types.

    I’m not sure to what extent, if any, the report reviews distortions and asymmetries arising from open data issues.
    A search of the report for mentions of FOI turns up:

    Provider transparency
    Historically, public providers have faced higher levels of scrutiny than other providers, including requests for information under the Freedom of Information Act. This degree of scrutiny can improve accountability to patients and promote good practice. Freedom of Information requirements have been extended through the standard NHS contract to private and charitable providers. However, it is not clear that this is operating effectively as yet, and other aspects of transparency do not apply across all types of provider.
    29. The Government and commissioners should ensure that transparency, including Freedom of Information requirements, is implemented across all types of provider of NHS services on a consistent basis.

    As I said, it’s on the reading list…

  • A terrifying post on the Computer Weekly/Public Sector IT blog – NHS watchdog commandeers data in bid to stimulate privatization and an earlier one on the naive take on hospital mortality data: Data regime makes merciless start on NHS privatization. Are there any reports or strategy documents from the Care Quality Commission (CQC) I need to add to my reading list?
  • Something academic… such as this piece from the Proceedings of the 21st European Conference on Information Systems on The Generative Mechanisms of Open Government Data, much of which I suspect is summarised by these two figures taken (without permission) from the the paper:

    generative mechanisms open gov data


  • Opening up data (particularly data held by public bodies) around private companies is another area I can quite get my head round, particularly when it comes to comparing information about the machinations of private companies as compared to public bodies. To what extent should companies that are public and limited liability have data that is held by them by public bodies be openly available, for example? Maybe related to this is a currently open BIS consultation: Company ownership: transparency and trust discussion paper as well as the HMRC consultationon Sharing and publishing data for public benefit (press release) that I linked to from yesterday’s Rambling Round-Up of Some Recent #OpenData Notices. (OpenCorporate’s Chris Taggart posts some interesting thoughts on the sheen given to the proposed release of VAT registration data to credit agencies that the consultation is in part based around: Open tax data, or just VAT ‘open wash’.) A recent edition of File on 4 (h/t @onthewight) on charity based tax fraud – Faith, Hope and… Tax Avoidance – also got me wondering further about what information is openly available about charities’ activities (eg A Quick Peek at Some Charities Data…)?
  • One to dig for… via Lexology, a post on Freedom of information in the private sector? which claims that “The Confederation of British Industry (“CBI“) has revealed that it is developing ‘transparency guidelines’ that will apply to private companies that provide services to the NHS.” Have these appeared yet, even in draft or consultation form?

A few other things on my to-do list in this area: map out the lobbiests and board/panel members around open data; use disclosure logs to search for companies putting in FOI requests in different sectors; see who’s pitching ideas in to ODUG; map out who’s funding NGOs and activities in the opendata space.

Sigh… no time…;-)

PS Not sure if there is a full paper version of this…? Bates, J. 2013, Information policy in the crises of neoliberalism: the case of Open Government Data in the UK at International Association of Media and Communications Researchers Conference, Dublin, June 2013: “Whilst open data releases by the UK government have received substantial support within UK civil society, often being interpreted as a creative and innovative response to a range of social issues, and, for some, a radical challenge to key components of neoliberal capitalism, this paper argues that deeper analysis of the OGD initiative suggests that it is being shaped by the UK government and corporate interests in an attempt to leverage a distinctly neoliberal agenda. The adoption and development of the OGD agenda as core to the policy response adopted by the UK Government to conditions of political economic crisis, suggests that information policy is being implemented as a key, yet often opaque, element of the neoliberal policy toolbox.” See also an earlier paper, “This is what modern deregulation looks like”: Co-optation and contestation in the shaping of the UK’s Open Government Data Initiative (“whilst OGD [Open Government Data] might potentially support modes of transparent and democratic governance, the current ‘transparency agenda’ should be recognised as an initiative that also aims to enable the marketisation of public services, and this is something that is not readily apparent to the general observer.”) and a statement of Jo Bates current research project in the are: The politics of Open Government Data in the UK

PPS Supporting the idea of symmetry in reporting between public services and private companies delivering public services, Richard Murphy on Making public services accountable. And some excellent writings critiquing computational thinking and the teaching of code by Ben Williamson.

Rambling Round-Up of Some Recent #OpenData Notices

It’s been some time since I (b)logged recent reports and announcements relating to the ongoing evolution of the open data thang in the UK, so here’s a quick round-up of some of the things I have floating in my open tabs…

  • Secretary of State’s Code of Practice (datasets) on the discharge of public authorities’ functions under Part 1 of the Freedom of Information Act – I guess this is the big one, the latest code of practice relating to the release of datasets under FOI. (Owen Boswarva has also compared it to the consulted upon draft.) The ICO give a quick overview as well as a specialist guidance on Datasets (FOI sections 11, 19 & 45). A sceptic might say it looks like FOIable bodies have also been given the wherewithal to set up their own data trading funds, enabled by The Freedom of Information (Release of Datasets for Re-use) (Fees) Regulations 2013. In passing, this looks like a handy place to catch up on FOI round-ups.
  • Via Out-Law, HMRC consults on plans to release anonymised tax datasets, I notice that HMRC has a new consultation out: Sharing and publishing data for public benefit. Apparently, [t]his consultation brings forward three options:
    • wider sharing of aggregated and anonymised tax data, for example, for the purposes of research or policy development;
    • release of basic non-financial VAT registration data as public data; and
    • sharing more detailed VAT registration data on a more restricted and controlled basis for specific purposes, such as credit referencing.

    At first glance, section 2.4 read to me like the hatchets are out on the NHS and the marshalling of resources to drive its privatisation continues ever onwards. From the consultation, I noticed that a Tax Sector Transparency Board was set up at the end of last year, which brings the number of sector transparency/data boards to about 437, I think? (Try searching for site:gov.uk inurl:sector-transparency-board.)

    In passing, I also note this response to an FOI request from 2010 in relation to accessing company VAT number data:

    I believe that disclosure of a complete list of VAT numbers currently in use would be likely to prejudice the prevention or detection of crime and the assessment or collection of VAT. I have reached this conclusion as I believe that the requested information could be used by opportunistic individuals and fraudsters to hijack genuine VAT numbers in order to fraudulently present themselves to HMRC, to other traders or to prospective customers as VAT registered. VAT is charged when a VAT-registered business sells to either another business or to a non-business customer. When VAT-registered businesses buy goods or services they can generally reclaim the VAT they have paid. If fraudsters are able to charge or reclaim VAT when they are not entitled to do so, then this will result in loss to the Public Purse and to members of the public who fall victim to such fraud.

    Section 31 is a qualified exemption which means that, if it applies, I must consider whether it is the public interest to override the exemption and release the information. I have very carefully considered this but have decided that on balance it is not in the public interest to release this information.

  • The ICO is also in consulting mode, running a Consultation on the “Conducting privacy impact assessments” code of practice. The consultation isn’t posted on page with a sensible URL though, it’s linked via the “current consultations” page, so if you’re reading this a month or two after the time of writing, you’ll probably need to look in the closed consultations area of the site. Go figure…
  • I had a little play with FOI myself recently. G4S was in the news again in a minor scandal about overcharging the Ministry of Justice on tagging contracts. I thought I’d have a peek at the MoJ spending data with respect to G4S, but they’ve been slipping in their transparency duties, so I felt obliged to FOI the spending data. No reply as yet – and I’m not sure if the data has gone up via their transparency pages yet, either?
  • I’ve recently started picking up on the creation of research panels and government department data labs. For example, the HMRC datalab and the more recent Justice data lab, which looks like an interesting resource for charities and other agencies working in the justice sector who need to demonstrate impact… HEFCE are also trying to open up access, sort of, to student survey data by means of the National Student Survey research panel. I suspect that the NHS (and the DfE, eg via the National Pupil Database) have data access initiatives, as well as data linkage services? For example, the Linked Hospital Episode Statistics and Mental Health Minimum Data Set. In the academic health research area, see also the Expert Advisory Group on Data Access.
  • A handful of recent reports on how open data is perceived and being used: from Sciencewise, a June 2013 report on Public views on open data; from the Department for Work and Pensions (DWP), a couple of brief reports on “how DWP uses transparency and open data to improve public services and accountability”. Alternatively, for a few thousand dollars, you can get a Forrester Research report on Getting The Most Out Of Open Data. One of the opendata “success” stories I heard championed most recently was by Sir Nigel Shadbolt at the Guardian Activate Summit. Apparently, the release of spending data has resulted in the “success” of private companies selling procurement advice based on an analysis of the data back to the public bodies, though I don’t think any specifics were mentioned. Are there any papers out there looking at how open data is being used to drive privatisation and destroy public services, I wonder?
  • More research centre initiatives, from another report that I missed when it came out in December 2012 – The UK Administrative Data Research Network: Improving Access for Research and Policy. It would be interesting to see how the models proposed in this report compare to the structures used by the government datalabs?

And finally, an even older report I’d not picked up on before. From the Audit Commission in March 2010, a discussion paper: The Truth is Out There – Transparency in an Information Age. I keep meaning to do a history of UK open government data over the last few years, so checkpoints like this are interesting when it comes to logging the hopes and aspirations, as well as the claims that were being made in support of developing policy, from back in the day. Also on the to do list is post about how I’m increasing uncomfortable with the whole open data thing, and what motivations are actually driving it at the policy (and lobbiest) level…

Publishing Stats for Analytic Reuse – FAOStat Website and R Package

How can stats and data publishers, from NGOs and (inter)national statistics agencies to scientific researchers, publish their data in a way that supports its analysis directly, as well as in combination with other datasets?

Here’s one approach I learned about from Michael Kao of the UN Food and Agriculture Organisation statistics division, FAOStat.

At first glimpse, the FAOStat website offers a rich website that supports data downloads, previews and simple analysis tools around a wide variety of international food related datasets:

FAOStat website

FAOstat - graphical tools

faostat - inline data preview

FAOStat - ddata analysis

One problem with having so many controls and fields available is that it can be hard to know where (or how) to get started – a bit like the problem of being presented with an empty SPARQL query box…

It would be quite handy to be able to set – and save with meaningful labels – preference sets about the countries you’re interested in so you don’t have to keep keep scrolling through long country lists looking for the countries you want to generate reports for? (Support for “standard” groupings of countries might also be useful?) Being able to share URLs to predefined reports might also be handy? But this would possibly make the site even more complex to use!

One easier way of working with FAOStat data, particularly if you access the FAO datasets regularly, might be to take a programmatic route using the FAOStat R package. Making datasets available in ways that bring that data directly into a desktop analysis environment where they can be worked on without requiring cleaning or other forms of tidying up (which is often the case when data is made available via Excel spreadsheets or CSV files) is a trend I hope we see more of. (That is not to say that data shouldn’t also be published in “generic” document formats…). If you are using a reproducible research strategy, queries to original datasources provide implicit, self-describing metadata about the data source and the query used to return a particular dataset, metadata that is all to easy to lose, or otherwise detach from a dataset when working with downloaded files.

I haven’t had chance to play with this package yet – it’s still in testing anyway, I think? – but it looks quite handy at a first glance (I need to do a proper review…). As well as providing a way of running data grab queries over theFAO FAOSTAT and World Bank WDI APIs, it seems to provide support for “linkage”. As the draft vignette suggests, “Merge is a typical data manipulation step in daily work yet a non-trivial exercise especially when working with different data sources. The built in mergeSYB function enables one to merge data from different sources as long as the country coding system is identified. … Data from any source with [a] classification [supported by the package] can be supplied to mergeSYB in order to obtain a single merged data. (sic)“. Supported formats currently include: United Nations M49 country standard [UN_CODE]; FAO country code scheme [FAOST_CODE]; FAO Global Administrative Unit Layers (GAUL) [ADM0_CODE]; ISO 3166-1 alpha-2 [ISO2_CODE]; ISO 3166-1 alpha-2 (World Bank) [ISO2_WB_CODE]; ISO 3166-1 alpha-3 [ISO3_CODE]; ISO 3166-1 alpha-3 (World Bank) [ISO3_WB_CODE].

By releasing an “official” R package to access the FAOStat API, it occurs to me that this makes it much easier to start building sector specific Shiny applications around particular datasets? I wonder whether the FAOstat folk have considered whether there is a possibility of developing a small Shiny app or custom client ecosystem around their data, even if it just takes the form of a curated set of gists that can be downloaded directly into RStudio, for example, using runGist?

I don’t know whether the Eurostat EC Statistics database has an associated R package too? (If so, it could be quite interesting trying to tie them together?! I do note, however, that Eurostat data is available for download (though I haven’t read the terms/license conditions…).

I also note that a Linked Data/SPARQL way in to Eurostat data appears to be available? Eurostat Linked Data.

[Man flu, hence the brevity of the post… skulks back off to sick bed…]

PS BY the by, I notice that the NHS are experimenting with making some data releases available via Google Public Data Explorer [scroll down…]

PPS See also this package – Smarter Poland – which provides an API to the Eurostat database.

Wherefore Art Thou, Research Sector Transparency / Research Transparency Sector Board?

On June 28th, 2012, the open data policy white paper Unleashing the Potential was published by the Cabinet Office. In the section on “Opening Up Access to Research”, one particular paragraph runs as follows:

2.66 To further develop government policy on access to research, we are also establishing a Research Transparency Sector Board, chaired by the Minister for Universities and Science, which will consider ways in which transparency in the area of research can be a driver for innovation. Recognising that research data is different to other PSI [Public Sector Information, presumably? – ed.], the Board will consider how to implement transparency measures relating to research in a manner which protects the integrity of the research and associated intellectual property, while ensuring access to research for those SME entrepreneurs vital for driving growth. This will help to realise the full benefits for society as a whole. The Research Transparency Sector Board will consist of government departments, funding agencies and representatives from universities and other stakeholders, and among the first of its tasks will be to consider how to act on the recommendations of the Royal Society report.

The announcement of the board (referred to as the Research Sector Transparency Board – which makes more sense…) was welcomed by the Royal Society in a guest blog post on the data.gov.uk website dated 27th June 2012 (the day before the embargo lifted? I’m not sure when the blog post actually became public): An intelligently open enterprise.

The minutes of a Regular meeting of the ICO Higher Education sector panel on FOI and DP (24.09.2012) dated 16/10/12 notes the following:

Research data caused much concern. VA reminded delegates that she does need input from Research Councils and BIS in this area, as stated in the draft DD [HE definition document]. Definitions of “publicly funded” and “key outputs” may need clarification. It was noted that the Engineering and Physical Sciences Research Panel had to produce this type of data to an agreed timetable by 2015. It was also mentioned that the Open Data White Paper announced the formation of a new Research Sector Transparency Board and it was suggested that HEI research data could be linked to that format – it is not yet ready for use but might be worth noting in the new DD that this is a future aim.

Correspondence from House of Lords European Union Select Committee includes a letter from David Willetts MP dated 25 October 2012 that refers to his anticipated chairing of the Board:

On the question of Open Access (OA), I was pleased to note your expressed support for Open Data (OD) for which the UK is again identified as a good example. We have made excellent progress through the Finch Report on expanded access to research publications and the Government’s response to it. OD is at a relatively early stage. Some initiatives are already in train under Government’s Transparency Agenda, as detailed in the Cabinet Office White Paper, Open Data: Unleashing the Potential. This includes establishment of the Research Sector Transparency Board, which I shall be chairing. The Board will want to examine the complex issues around increasing the sharing of research data. The Research Councils’ published Open Access policy makes appropriate reference to research data, and the recent Royal Society report has informed the discussion, but work is needed on deciding further measures and implementing these appropriately, with the right terms and conditions and timing for disclosure.

We cannot be complacent and we will want to consider how best to monitor the take-up of Gold OA both here in the UK and overseas. The HEFCE-funded Joint Infrastructure Systems Committee (JISC), OAIG, and the Research Innovation Network (RIN) are already active in monitoring OA trends generally. HEFCE also envisages a possible role for JISC in monitoring the effectiveness – and effects – of Government OA policy. I expect that the Research Sector Transparency Board will also take an interest in OA policy implementation.

The 2012 BIS Annual Innovation Report from November 2012 referred to the announcement of the Board, making me wonder how many other Annual Reports celebrate the announcement of vapourwareentities?

10.3 Open data and transparency
We have continued to work to harness the potential and collaborative opportunities offered by wider use of open data.

In June 2012 the Government announced in its Open Data White Paper that we would set up a Research Sector Transparency Board. The Board will consider how transparency in research can be a driver for innovation and discovery while furthering the UK’s recognised excellence in science. It will advise Government transparency issues relating to the national research effort, and improved access for small and medium businesses to the research base. Amongst its first tasks will be to consider and address the recommendations of the Royal Society report, Science as an Open Enterprise, into the sharing and disclosing of research data.

We also established the Administrative Data Taskforce, in December 2011. It will publish proposals for new mechanisms and collaborative agreements to enable and promote the wider use of administrative data for research and policy purposes, before the end of the year.

(I’m not sure I’d picked up on the Administrative Data Taskforce before? It reported in December 2012: The UK Administrative Data Research Network: Improving Access for Research and Policy. This report looks like it could be worth reading – a quick skim reveals several sections on legal and ethical issues related to linking administrative data to other dataset.)

A Hansard reported Written Answer to the House of Lords from 12 Dec 2012 (Column WA241) from The Parliamentary Under-Secretary of State, Department for Business, Innovation and Skills (Lord Marland) on questions referring to open access to research data records:

Any further opening up of access to data, in the context of the wider open data agenda, would be the subject of future discussions with the research councils and other parties including the Data Strategy Board and representative university bodies. These policy issues would also be considered as appropriate by the Research Sector Transparency Board which is chaired by David Willetts. There are no proposals to change the research councils’ policy on access to data at this time.

The Russell Group response to the House of Lords Science and Technology Committee’s inquiry on open access publishing, dated 24 January 2013, makes the following reference to the board:

1.3 The Russell Group has been monitoring the development of open access (OA) policy for some time. We followed the ‘Finch Review’ and Royal Society work on science as an open enterprise with interest and the Russell Group is now represented on the Research Sector Transparency Board which will be covering OA, open data and other issues over the coming year. We have recently had a number of meetings with Research Councils UK (RCUK) to discuss implementation of OA policy.

This suggests that membership of the board has been decided upon, at least partially?

A HEFCE letter on Open access and submissions to the REF post-2014 dated 25/2/13 refers to the board in the following terms:

25. With the Research Councils and the Research Transparency Sector Board, we are giving consideration to the issues involved in increasing access to research data. We are committed to working in dialogue with the sector to develop fair and balanced mechanisms to achieve this aim.

Again, this suggests that the Board has been convened.

So I wonder:

  • What is tha actual name of the board – Research Transparency Sector Board or Research Sector Transparency Board ;-)? (Other sectors have Transparency Boards….)
  • What is the membership of the board and has it convened yet?
  • What are the terms of reference for the board?
  • If it has convened, where are the minutes?

By the by, I note the emergence of the Research Councils UK – Gateway to Research, which provides a single point of access to “[k]ey data from the seven UK Research Councils in one location.”

RCUK - Gateway to Research

This site appears to collate information about research grants, grantees, and publications by grant, across the Research Councils (I’m not sure if an #opendata dump is available though, which would mean I don’t need to scrape across all the sites using Scraperwiki any more?!;-)

PS it seems a tweet about the first meeting appeared whilst I was writing this post:

No linkage that I can see yet, though?