Lib Dems in Government have allocated £300,000 to fund the M20 Junctions 6 to 7 improvement, Maidstone, helping to reduce journey times and create 10,400 new jobs. Really? 10,400 new jobs?
In Critiquing Data Stories: Working LibDems Job Creation Data Map with OpenRefine I had a little poke around some of the data that was used to power a map on a Lib Dems’ website, A Million Jobs:
Liberal Democrats have helped businesses create over 1 million new private sector jobs. Click on the map below to find out what we’ve done where you live.
And then there was the map…
One thing we might take away from this as an assumption is that the markers correspond to locations or environs where jobs were created, and that by adding up the number of jobs created at those locations, we would get to a number over a million.
Whilst I was poking through the data that powers the map, I started to think this might be an unwarranted assumption. I also started to wonder about how the “a million jobs” figure was actually calculated?
Using a recipe described in the Critiquing Data Stories post, I pulled out marker descriptions containing the phrase “helping to reduce journey” along with the number of jobs created (?!) associated with those claims, where a number was specified.
Claims were along the lines of:
Summary: Lib Dems in Government have allocated £2,600,000 to fund the A38 Markeaton improvements , helping to reduce journey times and create 12,300 new jobs. The project will also help build 3,300 new homes.
Note that as well as claims about jobs, we can also pull out claims about homes.
If we use OpenRefine’s Custom Tabular Exporter to upload the data to a Google spreadsheet (here) we can use the Google Spreadsheet-as-a-database query tool (as described in Asking Questions of Data – Garment Factories Data Expedition) to sum the total number of jobs “created” by road improvements (from the OpenRefine treatment, I had observed the rows were all distinct – the count of each text facet was 1).
The sum of jobs “created”? 468, 184. A corresponding sum for the number of homes gives 203,976.
Looking at the refrain through the descriptions, we also notice that the claim is along the lines of: “Lib Dems in Government have allocated £X to fund [road improvement] helping to reduce journey times and create Y new jobs. The project will also help build Z new homes.” Has allocated. So it’s not been spent yet? [T]o create X new jobs. So they haven’t been created yet? And if those jobs are the result of other schemes made possible by road improvements, numbers will be double counted? [W]ill also help build So the home haven’t been built yet, but may well be being claimed as achievements elsewhere?
Note that the numbers I calculated are lower bounds, based on scheme descriptions that contained the specified search phrase and (“helping to reduce journey”) and a job numbers specified according to the pattern detected by the following Jython regular expression:
tmp=re.sub(r'.* creat(e|ing) ([0-9,\.]*) new jobs.*',r'\2',tmp)
In addition, the housing numbers were extracted only from rows where a number of jobs was identified by that regular expression, and where they were described in a way that could be extracted using the following the Jython regular expression re.sub(r'.* The project will also help build ([0-9,\.]*) new homes.*',r'\1',tmp)
PS I’m reading The Smartest Guys in the Room at the moment, learning about the double counting and accounting creativity employed by Enron, and how confusing publicly reported figures often went unchallenged…
It also makes me wonder about phrases like “up to” providing numbers that are then used when calculating totals?
So there’s another phrase to look for, maybe? have agreed a new ‘City Deal’ with …