Fragment, Part 1 – Estimating Populations in Small Areas

Something I hadn’t picked up on before – the deadline for comments for which is today – are proposed boundary changes to wards on the Isle of Wight: Review of Isle of Wight council ward boundaries.

More formal guidance can be found in the *Local Government Boundary Commission for England’ Electoral reviews: Technical guidance document.

An interactive tool allows submissions to be made for newly suggested boundaries:

However, this doesn’t include population estimates within and drawn / suggested boundaries.

Compare that with the Constituency Boundaries tool from the House of Commons Library’s Oli Hawkins.

This interactive tool allowed users to select newly suggested ward areas, for which population estimates were also available, in order to come up with new constituency areas.

Which made me think – what would a boundary explorer look like for ward level boundary changes?

In terms of geographies / data, current ward boundaries can be found as part of the Ordnance Survey Boundary Line product, as well as from the ONS (ONS – Wards – Boundaries). The ONS boundaries come as shapefiles or KML. GeoJSON boundaries are available from martinjc/UK-GeoJSON (one thing I think that could be really useful would be to have a datasette enabled version of that repo?)

The lowest level geography for which population data (as recorded at the last census) is available are Output Areas (OAs). The ONS Census geography documentation describes them in the following terms:

[OAs] were designed to have similar population sizes and be as socially homogenous as possible based on tenure of household and dwelling type (homogeneity was not used as a factor in Scotland).

Urban/rural mixes were avoided where possible; OAs preferably consisted entirely of urban postcodes or entirely of rural postcodes.

They had approximately regular shapes and tended to be constrained by obvious boundaries such as major roads.

OAs were required to have a specified minimum size to ensure the confidentiality of data.

OA boundaries are available as shapefiles as well as population weighted centroids.

The ONS also publish lookup tables from OAs to wards, as well as population estimates at OA level. (You can also get hold of the 2011 census population estimates for OA level.)

According to the ONS Boundary Dataset Guidance (h/t @ONSgeography for the link), here’s a quick summary of the differences between boundary line types:

Full: As originally supplied to ONS, the highest resolution data available. Use ‘Full’ datasets for advanced GIS analysis (such as point-in-polygon allocation). Full datasets should not be used for general mapping purposes if an intermediate or simple version is available.

Intermediate/Generalised (20m): Intermediate datasets are designed for high quality mapping, preserving much of the original detail from the full dataset, but typically 10% of the file size. They are also suitable for non-demanding GIS analyses (such as buffering). Intermediate datasets are a good compromise between detail and small file size

Boundary sets can be prepared to “extent of the realm” and “clipped to the coastline”.

Extent of the realm boundary sets typically extend to Mean Low Water, although they can extend to islands off the coast e.g. Avonmouth ward in the City of Bristol extends to the islands of Flat Holm and Steep Holm in the Bristol Channel.

Clipped to the coastline boundary sets, derived from the extent of realm boundaries, show boundaries to Mean High Water. Usually prepared for visualisation of data such boundaries more closely represent map users expectations of how a coastal boundary should look. Whereas extent of the realm boundaries adjacent to an inlet or estuary may join at a point midway across the water, clipped to coastline boundaries permit the more precise identification of the waterside.

The guidance also provides a handy summary of ESRI shapefile components:

  • .shp  – the file that stores the feature geometry.
  • .shx – the file that stores the index of the feature geometry.
  • .dbf – the dBASE file that stores the attribute information of features.
  • .prj – the file that stores the projection of the feature geometry.
  • .sbx – a spatial index file
  • .sbn – a spatial index file

So… what I’m wondering is: how easy would it be to convert Oli’s Parliamentary constituency boundaries app to allow folk to work at a local level to combine OA level population estimates to sketch out suggested new ward boundaries.

By the by, I wonder about the extent to which recent population estimates are derived from projections of earlier Census data demographics (births/deaths predictions or statistics?)*, and the extent to which they accommodate things like new build housing estates (which presumably have the potential to change OA level populations significantly?) In turn, this makes me think that any Island Plan projections for new housing build areas should be added as an overlay to any consultation tool under the expectation that changed boundaries will be in place for at least a decade and it would be useful to know where population changes are being future-planned to occur? [* also internal migration from GP registration data (h/t OH)]

One of the things I note about OAs is that they were planned to be as socially homogenous as possible based on tenure of household and dwelling type. If we can colour code OAs according to this sort of information – and / or other demographic data – it would also allow us to get a feeling for the character of current and any proposed new wards based on its demographics. (It would also allow us to see if they were homogenous or mixed demographic.) I think the Output Area Classifications data is the one to use for this (data)?

For example, downoloading the 2011 OAC Clusters and Names csv (1.1 Mb ZIP), unzipping, renaming the the CSV file to oac.csv then using textql (as per Seven Ways of Making use of SQLite with the command:

textql -header -sql 'SELECT DISTINCT [Supergroup Name],[Group Name], [Subgroup Name] FROM oac WHERE [Local Authority Name] LIKE "%Wight%";' oac.csv

(the square brackets are used to escape the column names that contain spaces) gives the following unique categories for OAs on the Island:

Rural Residents,Ageing Rural Dwellers,Renting Rural Retirement
Rural Residents,Farming Communities,Agricultural Communities
Urbanites,Ageing Urban Living,Self-Sufficient Retirement
Hard-Pressed Living,Industrious Communities,Industrious Transitions
Suburbanites,Semi-Detached Suburbia,Older Workers and Retirement
Hard-Pressed Living,Hard-Pressed Ageing Workers,Renting Hard-Pressed Workers
Rural Residents,Rural Tenants,Rural Life
Hard-Pressed Living,Hard-Pressed Ageing Workers,Ageing Industrious Workers
Urbanites,Ageing Urban Living,Delayed Retirement
Urbanites,Ageing Urban Living,Communal Retirement
Suburbanites,Suburban Achievers,Ageing in Suburbia
Suburbanites,Suburban Achievers,Detached Retirement Living
Rural Residents,Farming Communities,Older Farming Communities
Hard-Pressed Living,Hard-Pressed Ageing Workers,Ageing Rural Industry Workers
Rural Residents,Ageing Rural Dwellers,Detached Rural Retirement
Rural Residents,Ageing Rural Dwellers,Rural Employment and Retirees
Rural Residents,Rural Tenants,Ageing Rural Flat Tenants
Suburbanites,Semi-Detached Suburbia,Semi-Detached Ageing
Urbanites,Urban Professionals and Families,White Professionals
Suburbanites,Semi-Detached Suburbia,White Suburban Communities
Rural Residents,Farming Communities,Established Farming Communities
Rural Residents,Rural Tenants,Rural White-Collar Workers
Constrained City Dwellers,Ageing City Dwellers,Retired Communal City Dwellers
Urbanites,Urban Professionals and Families,Families in Terraces and Flats 
Constrained City Dwellers,Challenged Diversity,Hampered Aspiration
Hard-Pressed Living,Industrious Communities,Industrious Hardship
Hard-Pressed Living,Challenged Terraced Workers,Deprived Blue-Collar Terraces
Constrained City Dwellers,White Communities,Outer City Hardship
Constrained City Dwellers,Ageing City Dwellers,Retired Independent City Dwellers
Hard-Pressed Living,Migration and Churn,Young Hard-Pressed Families
Constrained City Dwellers,Ageing City Dwellers,Ageing Communities and Families
Constrained City Dwellers,White Communities,Challenged Transitionaries
Constrained City Dwellers,White Communities,Constrained Young Families
Hard-Pressed Living,Migration and Churn,Hard-Pressed Ethnic Mix
Constrained City Dwellers,Challenged Diversity,Multi-Ethnic Hardship
Constrained City Dwellers,Challenged Diversity,Transitional Eastern European Neighbourhoods
Urbanites,Urban Professionals and Families,Multi-Ethnic Professionals with Families

In passing, here’s that block of text in a word cloud (via):


And here it is if I remove the DISTINCT constraint from the query and generate the cloud from descriptors of each OA on the Island:


(That query returned 466 rows, compared to the 40 council wards. So each ward seems to be made up from about 10 OAs.)

One thing that might be interesting in urban areas is to see whether newly proposed boundaries are drawn so as to try to split up and disenfranchise particular groups at local level (under the argument that wards should be dominated by majority white / elderly / conservative voting populations) or group them together so that wards can be ghettoised and sacrificed to other parties by the conservative (you can big-C that if you like…) majority.

Remember: all data is political, and all data can be used for political purposes…

Another thing that might be handy is a look-up from postcode to output area, perhaps then reporting on the classification given to the output area and the surrounding ones. To help with that, the ONS do a postcode to OA lookup.

I can’t think this through properly at the moment, but I wonder if its sensible to find the average of two or more neighbouring weighted centroid locations to find an “average” centroid for them that could be used as the basis of a Voronoi diagram boundary estimator? (So for example, select however many neighbouring OA centroids for each newly proposed ward, find the mean location of them, then create Voronoi diagram boundaries around those mean centroids, at least as a first estimate of a boundary. Then compare these with the merged OA boundaries? Is this a meaningful thing to do, and if so, would this tell us anything interesting?

Okay, so that’s some resources found. Next thing is to pull them into a datasette to support this post and figure out some questions to ask. Not sure I’ll have chance to do anything before the consultation finishes though (particularly given the day job is calling for the rest of the day…)

Thanks to Oli Hawkins for pointers into some of the datasets and info about estimate production…

PS I also notice that the O/S Boundary Line product has a dataset called polling_districts_England_region. I wonder if this is something that can be used to map catchment areas around polling locations? I also wonder how this boundary reflects wards and whether changes to these boundaries necessarily follow changes to ward boundaries?

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: