A recent provisional data release from the Ministry of Justice contains sentencing data from English(?) courts, at the offence level, for the period July 2010-June 2011: “Published for the first time every sentence handed down at each court in the country between July 2010 and June 2011, along with the age and ethnicity of each offender.” Criminal Justice Statistics in England and Wales [data]
In this post, I’ll describe a couple of ways of working with the data to produce some simple graphical summaries of the data using Google Fusion Tables and R…
…but first, a couple of observations:
– the web page subheading is “Quarterly update of statistics on criminal offences dealt with by the criminal justice system in England and Wales.”, but the sidebar includes the link to the 12 month set of sentencing data;
– the URL of the sentencing data is http://www.justice.gov.uk/downloads/publications/statistics-and-data/criminal-justice-stats/recordlevel.zip, which does not contain a time reference, although the data is time bound. What URL will be used if data for the period 7/11-6/12 is released in the same way next year?
The data is presented as a zipped CSV file, 5.4MB in the zipped form, and 134.1MB in the unzipped form.
The unzipped CSV file is too large to upload to a Google Spreadsheet or a Google Fusion Table, which are two of the tools I use for treating large CSV files as a database, so here are a couple of ways of getting in to the data using tools I have to hand…
Unix Command Line Tools
I’m on a Mac, so like Linux users I have ready access to a Console and several common unix commandline tools that are ideally suited to wrangling text files (on Windows, I suspect you need to install something like Cygwin; a search for windows unix utilities should turn up other alternatives too).
In Playing With Large (ish) CSV Files, and Using Them as a Database from the Command Line: EDINA OpenURL Logs and Postcards from a Text Processing Excursion I give a couple of examples of how to get started with some of the Unix utilities, which we can crib from in this case. So for example, after unzipping the recordlevel.csv document I can look at the first 10 rows by opening a console window, changing directory to the directory the file is in, and running the following command:
Or I can pull out rows that contain a reference to the Isle of Wight using something like this command:
grep -i wight recordlevel.csv > recordsContainingWight.csv
(The -i reads: “ignoring case”; grep is a command that identifies rows contain the search term (wight in this case). The > recordsContainingWight.csv says “send the result to the file recordsContainingWight.csv” )
Having extracted rows that contain a reference to the Isle of Wight into a new file, I can upload this smaller file to a Google Spreadsheet, or as Google Fusion Table such as this one: Isle of Wight Sentencing Fusion table.
Once in the fusion table, we can start to explore the data. So for example, we can aggregate the data around different values in a given column and then visualise the result (aggregate and filter options are available from the View menu; visualisation types are available from the Visualize menu):
We can also introduce filters to allow use to explore subsets of the data. For example, here are the offences committed by females aged 35+:
Looking at data from a single court may be of passing local interest, but the real data journalism is more likely to be focussed around finding mismatches between sentencing behaviour across different courts. (Hmm, unless we can get data on who passed sentences at a local level, and look to see if there are differences there?) That said, at a local level we could try to look for outliers maybe? As far as making comparisons go, we do have Court and Force columns, so it would be possible to compare Force against force and within a Force area, Court with Court?
If you really want to start working the data, then R may be the way to go… I use RStudio to work with R, so it’s a simple matter to just import the whole of the reportlevel.csv dataset.
Once the data is loaded in, I can use a regular expression to pull out the subset of the data corresponding once again to sentencing on the Isle of Wight (i apply the regular expression to the contents of the court column:
recordlevel <- read.csv("~/data/recordlevel.csv")
We can then start to produce simple statistical charts based on the data. For example, a bar plot of the sentencing numbers by age group:
barplot(age, main="IW: Sentencing by Age", xlab="Age Range")
We can also start to look at combinations of factors. For example, how do offence types vary with age?
barplot(ageOffence,beside=T,las=3,cex.names=0.5,main="Isle of Wight Sentences", xlab=NULL, legend = rownames(ageOffence))
If we remove the beside=T argument, we can produce a stacked bar chart:
barplot(ageOffence,las=3,cex.names=0.5,main="Isle of Wight Sentences", xlab=NULL, legend = rownames(ageOffence))
If we import the ggplot2 library, we have even more flexibility over the presentation of the graph, as well as what we can do with this sort of chart type. So for example, here’s a simple plot of the number of offences per offence type:
#You may need to install ggplot2 as a library if it isn't already installed
ggplot(iw, aes(factor(Offence_type)))+ geom_bar() + opts(axis.text.x=theme_text(angle=-90))+xlab('Offence Type')
Alternatively, we can break down offence types by age:
ggplot(iw, aes(AGE))+ geom_bar() +facet_wrap(~Offence_type)
We can bring a bit of colour into a stacked plot that also displays the gender split on each offence:
ggplot(iw, aes(AGE,fill=sex))+geom_bar() +facet_wrap(~Offence_type)
One thing I’m not sure how to do is rip the data apart in a ggplot context so that we can display percentage breakdowns, so we could compare the percentage breakdown by offence type on sentences awarded to males vs. females, for example? If you do know how to do that, please post a comment below ;-)
PS HEre’s an easy way of getting started with ggplot… use the online hosted version at http://www.yeroon.net/ggplot2/ using this data set: wightCrimRecords.csv; download the file to your computer then upload it as shown below:
PPS I got a little way towards identifying percentage breakdowns using a crib from here. The following command:
generates a (multidimensional) array for the responseVar (Offence) about the groupVar (sex). I don’t know how to generate a single data frame from this, but we can create separate ones for each sex as follows:
We can then plot these percentages using constructions of the form:
What I haven’t worked out how to do is elegantly map from the multidimensional array to a single data.frame? If you know how, please add a comment below…(I also posted a question on Cross Validated, the stats bit of Stack Exchange…)
With more and more councils doing as they were told and opening up their spending data in the name of transparency, it’s maybe worth a quick review of how the data is currently being made available.
To start with, I’m going to consider the Isle of Wight Council’s data, which was opened up earlier this week. The first data release can be found (though not easily?!) as a pair of Excel spreadsheets, both of which are just over 1 MB large, at http://www.iwight.com/council/transparency/ (This URL reminds me that it might be time to review my post on “Top Level” URL Conventions in Local Council Open Data Websites!)
The data has also been released via Spikes Cavell at Spotlight on Spend: Isle of Wight.
The Spotlight on Spend site offers a hierarchical table based view of the data; value add comes from the ability to compare spend with national averages and that of other councils. Links are also provided to monthly datasets available as a CSV download.
Uploading these datasets to Google Fusion tables shows the following columns are included in the CSV files available from Spotlight on Spend (click through the image to see the data):
Note that the Expense Area column appears to be empty, and “clumped” transaction dates use? Also note that each row, column and cell is commentable upon…
The Excel spreadsheets on the Isle of Wight Council website are a little more complete – here’s the data in Google Fusion tables again (click through the image to see the data):
(It would maybe worth comparing these columns with those identified as Mandatory or Desirable in the Local Spending Data Guidance? A comparison with the format the esd use for their Linked Data cross-council local spending data demo might also be interesting?)
Note that because the Excel files on the Isle of Wight Council were larger than the 1MB size limit on XLS spreadsheet uploads to Google Fusion Tables, I had to open the spreadsheets in Excel and then export them as CSV documents. (Google Fusion Tables accepts CSV uploads for files up to 100MB.) So if you’re writing an open data sabotage manual, this maybe something worth bearing in mind (i.e. publish data in very large Excel spreadsheets)!
It’s also worth noting that if different councils use similar column headings and CSV file formats, and include a column stating the name of the council, it should be trivial to upload all their data to a common Google Fusion Table allowing comparisons to be made across councils, contractors with similar names to be identified across councils, and so on… (i.e. Google Fusion tables would probably let you do as much as Spotlight on Spend, though in a rather clunkier interface… but then again, I think there is a fusion table API…?;-)
Although the data hasn’t appeared there yet, I’m sure it won’t be long before it’s made available on OpenlyLocal:
So that’s a round up of where the data is, and how it’s presented. If I get a chance, the next step is to:
– compare the offerings with each other in more detail, e.g. the columns each view provides;
– compare the offerings with the guidance on release of council spending data;
– see what interesting Google Fusion table views we can come up with as “top level” reports on the Isle of Wight data;
– explore the extent to which Google Fusion Tables can be used to aggregate and compare data from across different councils.
PPS for a list of local councils and the data they have released, see Guardian datastore: Local council spending over £500, OpenlyLocal Council Spending Dashboard
It’s all been quite geeky here on OUseful.info for a bit, so I thought I’d step back to the infoskills domain for a post that isn’t intended to be political in any way, but that maybe provides an interesting case study for an info skills activity.
So we all know that the Daily Telegraph bought some juicy info on MPs expenses, and they’ve been publishing stories for weeks. Many people also know that the Telegraph has been getting in touch with MPs the day before any story is written about them offering them a chance to comment. So what do those letters look like, and do the MPs stand any chance of defending themselves?
A couple of days ago, it was the turn of my local MP to be highlighted. Since we only have a weekly, rather than daily, local newspaper (the Isle of Wight County Press), I suspect that many people on the Island actually picked up the story from our local new media news blog, the VentnorBlog (which despite it’s name covers Island-wide news as well as hyperlocal news from Ventnor): Telegraph MP Expenses Spotlight Turns on Andrew Turner. What’s interesting about that post is that it republishes the letter that the Daily telegraph sent to Andrew Turner, and his responses to each of the questions. The actual Daily Telegraph story can be found here: MPs’ expenses: Andrew Turner claimed for ‘life coaching’ classes for his parliamentary assistant girlfriend. The Isle of Wight County Press (the local newspaper) reported the story on their website here: MP defends expenses claims (the print edition of this weekly paper is out tomorrow).
So how do the stories compare?
VentnorBlog: Telegraph MP Expenses Spotlight Turns on Andrew Turner
And from the quoted replied to letter:
3. In April 2005 your partner and parliamentary assistant Carole Dennett
Daily Telegraph: MPs’ expenses: Andrew Turner claimed for ‘life coaching’ classes for his parliamentary assistant girlfriend
IWCP: MP defends expenses claims
VentnorBlog(from the quoted replied to letter): “1. In 2004 you claimed £579 for occupational health treatment for Colin Hedgley, who later won an unfair dismissal claim against you. Please could you explain in what way you felt this was an appropriate use of public money.
“The welfare and efficiency of my staff is important to me and I was advised to pay for occupational health treatment as the member of staff concerned was suffering from stress, which he said was caused at work. This course of action was agreed in advance by the Personnel Advice Service at the House of Commons who also advised that this represented good employment practice.”
2. The following year you claimed £2,327 for solicitors’ fees following the employment tribunal that found against you. Although the file is unclear, it also appears you claimed back the cost of the £10,250 compensation you were ordered to pay. Please could you explain in what way you felt this was an appropriate use of public money.
“The tribunal case was in relation to the employment of a member of staff who was employed on Parliamentary and constituency business. It was widely reported at the time.
“All costs relating to Industrial Tribunals can be legitimately claimed against Parliamentary expenses. All costs were fully agreed with the Fees Office.
“In January 2006 the fees office paid me £6,471 in relation to the compensatory award of £10,250 which was the balance left in the allowances budget for the year 2004/5.”
Daily Telegraph: In 2004, he was ordered to pay £10,250 to Colin Hedgley, his former office manager, after a tribunal found he had been unfairly dismissed. He had been sacked at a disciplinary interview that he could not attend because he was off work through stress.
Mr Hedgley told the tribunal of a “poisonous” atmosphere in the office, …
After the tribunal chairman ruled that Mr Turner supported his girlfriend ahead of staff – even though her treatment of other workers was “far below acceptable” – Mr Turner used his incidental expenses provision, used to cover the costs of running a constituency office, to pay £6,471 of the compensation bill.
He had also claimed £940 for “HR advice and support” during the dispute with his former employee.
Mr Turner said in a statement: “All costs relating to industrial tribunals can be legitimately claimed against parliamentary expenses. All costs were fully agreed with the fees office.”
IWCP: ISLAND MP Andrew Turner has defended expenses claims published in yesterday’s (Tuesday) Daily Telegraph. They include £6,471 towards compensation after a former staff member won a case for unfair dismissal, £2,327 for solicitors’ fees incurred by the case, cash for wrapping paper and radios for the office.
In 2004 Mr Turner lost an employment tribunal to former office manager Colin Hedgley and was ordered to pay £10,250 compensation, of which he claimed £6,471 from parliamentary allowances and paid the rest from personal funds.
VentnorBlog (from the quoted replied to letter): 3. In April 2005 your partner and parliamentary assistant Carole Dennett wrote, in an email to the fees office concerning your bank account details: “Look forward to receiving the money – I shall then be able to spend it on lots of booze so that the forthcoming election goes in an alcoholic blur.” Please could you explain whether you feel this is appropriate.
“In the cold light of day this could be judged as inappropriate but it was a private joke between two people who were in regular contact.”
Daily Telegraph: The month before the general election in 2005, Miss Dennett, in an email to the Commons fees office after they asked for bank account details, wrote: “Look forward to receiving the money – I shall then be able to spend it on lots of booze so that the forthcoming election goes in an alcoholic blur.
“What do you think?”
Asked about the “booze” email, he replied: “In the cold light of day this could be judged as inappropriate but it was a private joke between two people who were in regular contact.”
IWCP: Mr Turner also admitted that an e-mail, sent in 2005 by his partner and parliamentary assistant Carole Dennett to the fees office saying she would use expenses cash to buy alcohol, was an inappropriate joke.
A 2005 e-mail sent by Miss Dennett to the fees office stated: “Look forward to receiving the money – I shall be able to spend it on lots of booze so that the forthcoming election goes in an alcoholic blur.”
Regarding the e-mail, Mr Turner said: “In the cold light of day this could be judged as inappropriate, but it was a private joke between two people in regular contact.”
VentnorBlog (from the quoted replied to letter): 6. The same year you successfully claimed £16 for “robin and wreath” wrapping paper purchased on Christmas Eve. Please can you explain in what way this was necessary for your work as an MP, and why you felt this was an appropriate use of public money.
“I suffered a stroke in December 2006 and a number of people kindly sent flowers. Some came to my home – others to the office and £10 was spent in Matalan on a vase for the office. The wrapping paper (3 x £2) should clearly have been crossed off the claim.”
Daily Telegraph: The following year, Mr Turner submitted a bundle of receipts for office goods that included £6 worth of wrapping paper bought from Matalan on Christmas Eve. The fees office did not question the claims.
He admitted the wrapping paper “should clearly have been crossed off the claim”.
IWCP: He admitted a £6 claim for wrapping paper bought on Christmas Eve, 2006 — shortly after he suffered a stroke — was a mistake. He said a further claim of £10 was made for a vase for the office after a well-wisher sent him flowers there.
VentnorBlog (from the quoted replied to letter): 7. You also claimed £240 for a member of your staff to study GCSE maths. Please could you explain in what way you felt this was an appropriate use of public money.
“In accordance with good employment practice I believe in developing the skills and confidence of my employees. Two are currently studying for an NVQ3 in Customer Service.”
Daily Telegraph: He was also reimbursed £240 for a member of staff to study GCSE maths,
IWCP: Another claim, of £240, was made for a member of staff to study GCSE maths.
“In accordance with good employment practice I believe in developing the skills and confidence of my employees. Two are currently studying for an NVQ3 in customer service,” said Mr Turner.
VentnorBlog (from the quoted replied to letter): 5. In 2006-7 you attempted to claim £424 for a Hitachi camcorder and accessories but it was turned down by the fees office. Please could you explain in what way you felt this was an appropriate use of public money.
“I was planning to put video clips on my website of the work I do in my constituency. I understand that the rules have subsequently changed and expenditure for this purpose is now claimable against Parliamentary allowances.”
Daily Telegraph: but had a £424 claim for a camcorder … rejected
VentnorBlog (from the quoted replied to letter): 9. In the same year you claimed £20 for House of Commons cufflinks. Please could you explain in what way you felt this was an appropriate use of public money.
“It was a leaving gift for an intern who had worked as a volunteer for several months in my Commons office. He made a valuable contribution towards the running of my office.”
Daily Telegraph: Mr Turner also submitted a £20 receipt for House of Commons cufflinks.
[?There is also this: “and £20 for a leaving gift rejected”. More cufflinks?]
VentnorBlog (from the quoted replied to letter): 8. In July 2007 you submitted a £160 claim for “life coaching (4 sessions)” for Miss Dennett. Please could you explain in what way you felt this was an appropriate use of public money. Please could you also explain whether these classes were related to her treatment of employees being described by the chairman of the employment tribunal as “far below an acceptable form of behaviour”.
“The treatment was not for Miss Dennett but for another member of staff who was suffering from emotional problems related to family illness which were affecting her work. This was agreed in advance with the House of Commons Personnel Advice service. Relationships between staff in my office are good.”
Daily Telegraph: That summer Mr Turner put through a £160 invoice for a member of staff to have “life coaching (4 sessions)”.
He said the life coaching was agreed in advance with the Commons. “Relationships between staff in my office are good,” he added.
IWCP: Mr Turner said a £160 claim for life-coaching was for a member of staff who was suffering emotional problems related to a family illness which were affecting their work.
VentnorBlog (from the quoted replied to letter): 10. In 2006-7 you claimed £199 for a radio, and then £139 for another one the following year. Please could you explain in what way you felt this was an appropriate use of public money.
“I listen to the local and national news in my office. When the original radio broke down it was replaced. It was not realized until too late that the original one was still under guarantee.”
Daily Telegraph: and £139 for a digital radio, even though he had bought another for £199 the previous year.
IWCP: Claims were made in successive years for radios, one for £199 and another for £139.
“I listen to the local and national news in my office. When the original radio broke down it was replaced. It was not realised until it was too late the original was still under guarantee,” he said.
VentnorBlog (from the quoted replied to letter): 11. You claim the Additional Costs Allowance for the house in Newport that you jointly own with Miss Dennett. This is the house on which you both appear on the electoral roll. Neither of you appear to have any links to homes elsewhere. Please could you explain what you consider your main home to be, and what proportion of your time you spend there.
“We have a flat in London on which I also have a mortgage. The House of Commons Authorities are fully aware of both addresses. At the time I made the nomination I spent more nights at my London address than the Island one so it was nominated as my ‘main’ home in accordance with the rules.”
Daily Telegraph: The couple designate their jointly owned house on the Isle of Wight as their second home, on which they claim about £1,100 a month in mortgage interest payments.
VentnorBlog (from the quoted replied to letter): 4. In the same year you claimed £3.48 for “Vax for pets” carpet cleaner. Please can you explain in what way this was necessary for your work as an MP.
“To clean the carpets in my office which is used by staff and constituents and had become grubby over time.”
VentnorBlog (from the quoted replied to letter): Please could we receive your comments by 5pm today so that they can be given due weight in our inquiries and properly reflected in any article we decide to publish. Please could you also inform us if you do not wish to comment.
Daily Telegraph: Mr Turner said in a statement:
Asked about …
PS Just for the record, the VentnotBlog also stated: “VB has previously asked for the details of the expenses, but Andrew Turner’s office told us that they preferred to wait for them to be published by the Conservative Party House of Commons Authorities.” That said, a breakdown of Andrew Turner’s expenses were also published on the VentnorBlog some time ago: Andrew Turner’s MP Expenses: Detail: Updated.