Communicating Data – Different Takes

A couple of recent articles on bias in the justice system recently caught my eye that show different models of engagement around data analysis in a particular topic area:

  • Hester, Rhys, and Todd K. Hartman. “Conditional Race Disparities in Criminal Sentencing: A Test of the Liberation Hypothesis from a Non-Guidelines State”Journal of Quantitative Criminology pp 1-24, an academically published, peer reviewed article that will cost you £30 to look at.
  • Uncovering Big Bias with Big Data By David Colarusso on May 31st, 2016, The Lawyerist blog, a recreational data blog post.

The blog post comes complete with links to a github repo containing a Jupyter notebook describing the analysis. The data is not provided, for data protection/privacy compliance, although a link to the original source of the data, and a brief description of it, is (but not a deep link to the actual data?). I’m not sure if any data associated with the academic paper is publicly or openly available, or whether any of the analysis scripts are (see below – analysis scripts are available).

The blog post is open to comments (there are a few) and the author has responded to some of them. The academic post authors made themselves available via a Reddit AMA (good on them:-): Racial Bias AMA (h/t @gravityvictims/Alastair McCloskey for the tip).

The Reddit AMA notes the following: an ungated (i.e., not behind paywall) version of our research at the SSRN or Dr. Hartman’s website. The official publication was First online: 29 February 2016. The SSRN version is dated as Date posted: November 6, 2014 ; Last revised: January 4, 2016. The SSRN version includes a small amount of Stata code at the end (the “official” version of the paper doesn’t?), but I’m not sure what data it applies to or whether it’s linked to from the data (I only skimmed the paper.) Todd Hartman’s website includes a copy of the published paper and a link to the replication files (7z compressed, so how many folk will be able to easily open that, I wonder?!).


So Stata, R and data files. Good stuff. But from just the paper homepage on the Springer Journal site, I wouldn’t have got that?

Of course, the Springer paper reference gets academic brownie points.

PS by the by, In the UK the Ministry of Justice Justice Data Lab publish regular reports on who’s using their data. For example: Justice Data Lab statistics: June 2016.