Fragment, Part 1 – Estimating Populations in Small Areas

Something I hadn’t picked up on before – the deadline for comments for which is today – are proposed boundary changes to wards on the Isle of Wight: Review of Isle of Wight council ward boundaries.

More formal guidance can be found in the *Local Government Boundary Commission for England’ Electoral reviews: Technical guidance document.

An interactive tool allows submissions to be made for newly suggested boundaries:

However, this doesn’t include population estimates within and drawn / suggested boundaries.

Compare that with the Constituency Boundaries tool from the House of Commons Library’s Oli Hawkins.

This interactive tool allowed users to select newly suggested ward areas, for which population estimates were also available, in order to come up with new constituency areas.

Which made me think – what would a boundary explorer look like for ward level boundary changes?

In terms of geographies / data, current ward boundaries can be found as part of the Ordnance Survey Boundary Line product, as well as from the ONS (ONS – Wards – Boundaries). The ONS boundaries come as shapefiles or KML. GeoJSON boundaries are available from martinjc/UK-GeoJSON (one thing I think that could be really useful would be to have a datasette enabled version of that repo?)

The lowest level geography for which population data (as recorded at the last census) is available are Output Areas (OAs). The ONS Census geography documentation describes them in the following terms:

[OAs] were designed to have similar population sizes and be as socially homogenous as possible based on tenure of household and dwelling type (homogeneity was not used as a factor in Scotland).

Urban/rural mixes were avoided where possible; OAs preferably consisted entirely of urban postcodes or entirely of rural postcodes.

They had approximately regular shapes and tended to be constrained by obvious boundaries such as major roads.

OAs were required to have a specified minimum size to ensure the confidentiality of data.

OA boundaries are available as shapefiles as well as population weighted centroids.

The ONS also publish lookup tables from OAs to wards, as well as population estimates at OA level. (You can also get hold of the 2011 census population estimates for OA level.)

According to the ONS Boundary Dataset Guidance (h/t @ONSgeography for the link), here’s a quick summary of the differences between boundary line types:

Full: As originally supplied to ONS, the highest resolution data available. Use ‘Full’ datasets for advanced GIS analysis (such as point-in-polygon allocation). Full datasets should not be used for general mapping purposes if an intermediate or simple version is available.

Intermediate/Generalised (20m): Intermediate datasets are designed for high quality mapping, preserving much of the original detail from the full dataset, but typically 10% of the file size. They are also suitable for non-demanding GIS analyses (such as buffering). Intermediate datasets are a good compromise between detail and small file size

Boundary sets can be prepared to “extent of the realm” and “clipped to the coastline”.

Extent of the realm boundary sets typically extend to Mean Low Water, although they can extend to islands off the coast e.g. Avonmouth ward in the City of Bristol extends to the islands of Flat Holm and Steep Holm in the Bristol Channel.

Clipped to the coastline boundary sets, derived from the extent of realm boundaries, show boundaries to Mean High Water. Usually prepared for visualisation of data such boundaries more closely represent map users expectations of how a coastal boundary should look. Whereas extent of the realm boundaries adjacent to an inlet or estuary may join at a point midway across the water, clipped to coastline boundaries permit the more precise identification of the waterside.

The guidance also provides a handy summary of ESRI shapefile components:

  • .shp  – the file that stores the feature geometry.
  • .shx – the file that stores the index of the feature geometry.
  • .dbf – the dBASE file that stores the attribute information of features.
  • .prj – the file that stores the projection of the feature geometry.
  • .sbx – a spatial index file
  • .sbn – a spatial index file

So… what I’m wondering is: how easy would it be to convert Oli’s Parliamentary constituency boundaries app to allow folk to work at a local level to combine OA level population estimates to sketch out suggested new ward boundaries.

By the by, I wonder about the extent to which recent population estimates are derived from projections of earlier Census data demographics (births/deaths predictions or statistics?)*, and the extent to which they accommodate things like new build housing estates (which presumably have the potential to change OA level populations significantly?) In turn, this makes me think that any Island Plan projections for new housing build areas should be added as an overlay to any consultation tool under the expectation that changed boundaries will be in place for at least a decade and it would be useful to know where population changes are being future-planned to occur? [* also internal migration from GP registration data (h/t OH)]

One of the things I note about OAs is that they were planned to be as socially homogenous as possible based on tenure of household and dwelling type. If we can colour code OAs according to this sort of information – and / or other demographic data – it would also allow us to get a feeling for the character of current and any proposed new wards based on its demographics. (It would also allow us to see if they were homogenous or mixed demographic.) I think the Output Area Classifications data is the one to use for this (data)?

For example, downoloading the 2011 OAC Clusters and Names csv (1.1 Mb ZIP), unzipping, renaming the the CSV file to oac.csv then using textql (as per Seven Ways of Making use of SQLite with the command:

textql -header -sql 'SELECT DISTINCT [Supergroup Name],[Group Name], [Subgroup Name] FROM oac WHERE [Local Authority Name] LIKE "%Wight%";' oac.csv

(the square brackets are used to escape the column names that contain spaces) gives the following unique categories for OAs on the Island:

Rural Residents,Ageing Rural Dwellers,Renting Rural Retirement
Rural Residents,Farming Communities,Agricultural Communities
Urbanites,Ageing Urban Living,Self-Sufficient Retirement
Hard-Pressed Living,Industrious Communities,Industrious Transitions
Suburbanites,Semi-Detached Suburbia,Older Workers and Retirement
Hard-Pressed Living,Hard-Pressed Ageing Workers,Renting Hard-Pressed Workers
Rural Residents,Rural Tenants,Rural Life
Hard-Pressed Living,Hard-Pressed Ageing Workers,Ageing Industrious Workers
Urbanites,Ageing Urban Living,Delayed Retirement
Urbanites,Ageing Urban Living,Communal Retirement
Suburbanites,Suburban Achievers,Ageing in Suburbia
Suburbanites,Suburban Achievers,Detached Retirement Living
Rural Residents,Farming Communities,Older Farming Communities
Hard-Pressed Living,Hard-Pressed Ageing Workers,Ageing Rural Industry Workers
Rural Residents,Ageing Rural Dwellers,Detached Rural Retirement
Rural Residents,Ageing Rural Dwellers,Rural Employment and Retirees
Rural Residents,Rural Tenants,Ageing Rural Flat Tenants
Suburbanites,Semi-Detached Suburbia,Semi-Detached Ageing
Urbanites,Urban Professionals and Families,White Professionals
Suburbanites,Semi-Detached Suburbia,White Suburban Communities
Rural Residents,Farming Communities,Established Farming Communities
Rural Residents,Rural Tenants,Rural White-Collar Workers
Constrained City Dwellers,Ageing City Dwellers,Retired Communal City Dwellers
Urbanites,Urban Professionals and Families,Families in Terraces and Flats 
Constrained City Dwellers,Challenged Diversity,Hampered Aspiration
Hard-Pressed Living,Industrious Communities,Industrious Hardship
Hard-Pressed Living,Challenged Terraced Workers,Deprived Blue-Collar Terraces
Constrained City Dwellers,White Communities,Outer City Hardship
Constrained City Dwellers,Ageing City Dwellers,Retired Independent City Dwellers
Hard-Pressed Living,Migration and Churn,Young Hard-Pressed Families
Constrained City Dwellers,Ageing City Dwellers,Ageing Communities and Families
Constrained City Dwellers,White Communities,Challenged Transitionaries
Constrained City Dwellers,White Communities,Constrained Young Families
Hard-Pressed Living,Migration and Churn,Hard-Pressed Ethnic Mix
Constrained City Dwellers,Challenged Diversity,Multi-Ethnic Hardship
Constrained City Dwellers,Challenged Diversity,Transitional Eastern European Neighbourhoods
Urbanites,Urban Professionals and Families,Multi-Ethnic Professionals with Families

In passing, here’s that block of text in a word cloud (via):

Word_Cloud_Generator.png

And here it is if I remove the DISTINCT constraint from the query and generate the cloud from descriptors of each OA on the Island:

Word_Cloud_Generator2.png

(That query returned 466 rows, compared to the 40 council wards. So each ward seems to be made up from about 10 OAs.)

One thing that might be interesting in urban areas is to see whether newly proposed boundaries are drawn so as to try to split up and disenfranchise particular groups at local level (under the argument that wards should be dominated by majority white / elderly / conservative voting populations) or group them together so that wards can be ghettoised and sacrificed to other parties by the conservative (you can big-C that if you like…) majority.

Remember: all data is political, and all data can be used for political purposes…

Another thing that might be handy is a look-up from postcode to output area, perhaps then reporting on the classification given to the output area and the surrounding ones. To help with that, the ONS do a postcode to OA lookup.

I can’t think this through properly at the moment, but I wonder if its sensible to find the average of two or more neighbouring weighted centroid locations to find an “average” centroid for them that could be used as the basis of a Voronoi diagram boundary estimator? (So for example, select however many neighbouring OA centroids for each newly proposed ward, find the mean location of them, then create Voronoi diagram boundaries around those mean centroids, at least as a first estimate of a boundary. Then compare these with the merged OA boundaries? Is this a meaningful thing to do, and if so, would this tell us anything interesting?

Okay, so that’s some resources found. Next thing is to pull them into a datasette to support this post and figure out some questions to ask. Not sure I’ll have chance to do anything before the consultation finishes though (particularly given the day job is calling for the rest of the day…)

Thanks to Oli Hawkins for pointers into some of the datasets and info about estimate production…

PS I also notice that the O/S Boundary Line product has a dataset called polling_districts_England_region. I wonder if this is something that can be used to map catchment areas around polling locations? I also wonder how this boundary reflects wards and whether changes to these boundaries necessarily follow changes to ward boundaries?

Working With OpenStreetMap Roads Data Using osmnx

A couple of days ago, I came across the incredible looking osmnx Python package, originally created by Geoff Boeing at UC Berkeley in support of his PhD, via one of his blog posts: OSMnx: Python for Street Networks (there is a citeable paper, but that’s not what I originally found…) There are also some great example notebooks: gboeing/osmnx-examples.

I spent a chunk of today having a play with is, and have posted a notebook walkthrough here.

It’s quite incredible…

Pretty much a one-liner lets you pull back all the roads in a particular area, searched for by name:

The osmnx package represents routes as a networkx graph – so we can do graphy things with it, like finding the shortest distance between two points, aka route planning:

The route can also be plotted on an interactive map. An option lets you hover on a route and display a context sensitive tooltip, in this case the name of the road:

Retrieving routes by area name is handy, but we can also pull back routes within a certain distance of a specified location, or routes that are contained within a particular region specified by a shapefile.

In a previous post (Trying Out Spatialite Support in Datasette) I showed how to put Ordnance Survey BoundaryLine data into a SpatiaLite database and then access the geojson boundary files from a datasette API. We can use that approach again, here wrapped up in a dockerised context:

Using the retrieved boundary shapefile, we can then use osmnx to just grab the roads contained within that region, in this case my local parish:

Once again, we can use an interactive map to display the results:

If we overlay the parish boundary, we see that the routes returned correspond to the graph between nodes that lay within the boundary. Some roads pass straight through the boundary, others appear to lay just outside the boundary.osmnx_Demo11

However, it looks like we can tweak various parameters to get the full extent of the roads within the region:

osmnx_Demo12

As well as routes, we can also get building footprints from OpenStreetMap:

If you know where to look, you can see our house!

Building footprints can also be overlaid on routes:

If we generate a distance to time mapping, the graph representation means we can also colouring nodes according to how far in walking time, for example, they are from a particular location:

We can also overlay routes on isochrone areas to show travel times along routes – so where’s within half an hour’s walk of the Pointer Inn in Newchurch?

Living on a holiday island, I wonder if there’s any mileage (!) in offering a simple service for campsites, hotels etc that would let them print off isochrone walking maps centred on the campsite, hotel etc with various points of interest, and estimated walking times, highlighted?

I’m also wondering how much work would be required in order to add additional support to the osmnx package so that it could use Ordnance Survey, rather than OSM, data?

And finally, one other thing I’d like to explore is how to generate tulip diagrams from route graphs… If anyone has any ideas about that, please let me know via the comments…

PS for building outlines in the UK derived from Ordnance Survey data, see for example Alasdair Rae’s OS OpenMap Local – All Buildings in Great Britain.

PPS And building footprints for the US, courtesy of Microsoft: https://github.com/Microsoft/USBuildingFootprints

Geofenced Audio Tours and Geo-Privacy

Whilst on holiday a couple of weeks ago, we took an audio tour on an open top bus. I missed a significant chunk of the tour because my headphones broke after the first couple of minutes (the cable was only a tiny, easily snapped strand of copper thick… I guess the cost of copper played a part in that…?!) but it got me thinking (again) about geofenced audio tours.

If the term is new to you, geofencing refers to a technique where you put a notional boundary round a location and capture the GPS coordinates of the boundary; if the boundary is a regular shape, you can calculate the boundary co-ordinates via a simple formula. For example, if you have a circular region 1 km wide about a point, the geofence is defined by the circumference (or as Terry Pratchett would say,m the circumfence) of the circle, centred on the point of interest and with a radius of 0.5km. The geofenced region then lies within this circle.

For audio tours, what this means is that you can go for a walk, or a ride, and as an when you within the confines of the geofence around an object, a commentary about that object can play out for you. I had a quick look around for apps that might support the creation of such a tour, but there didn’t seem to be much out there. Here’s what I did find though:

Hackney Hear – An interactive GPS-triggered audio tour of Hackney; creative arts project to develop an audio tour app around the London Borough of Hackney; due out in January 2012. Looks exciting (will they open the code, I wonder?)
NoTours – Augmented Aurality for Android: this site includes an editor for creating your own geofenced audio tours, and a demo app that lets you play a tour with up to 10(?) audio locations marked. The site espouses open source principles but I couldn’t see a link to the code anywhere? (The site name actually reminded me of the idea of a misguided walk;-)
Geovative geotours: a commercial offering, though the free plan suggests you can create up to three tours.

Unfortunately, I’ve not had a chance to play with any of them yet…

(If you know of any others, particularly open source apps and tour creators, please let me know via the comments.)

One thing I’d quite like to do one day is create an app that I can listen to on the motorway that will play out stories about the places I’m passing. (I wonder if I could build such a thing in Android App inventor?). As a first pass, I could imagine querying something like Wikipedia (e.g. using this new wikilocation api) to pull back articles relating to points of interest near my current location, and then using text to speech to read a selected article out.

(Note to self: I think Audioboo allows geotagged audio uploads. Does it also support geo-based searches? This could provide a secondary source of commentaries…?)

It might also be fun to try to create tours for bus routes, e.g. as identified from the wonderful new MySociety site, FixMyTransport (though adding such functionality to that site directly would be the sort of feature creep that I think MySociety sites always try to avoid: How to create sustainable open data projects with purpose).

And finally, whilst on the question of geofences, and what actually brought them back to mind today: flickr has just opened up geo-privacy fences: Introducing geofences on Flickr!:

Geofences are special locations that deserve their own geo privacy settings. For example, you might want to create a geofence around the your “home” or “school” that only allows “Friends and Family” to see the location of the photos you geotag in that area. So the next time you upload a photo with a geotag in the radius of a geofence, it will follow the default geo privacy you’ve designated for that hotspot.

Clever ;-)

PS here’s an alternative take on “geofences” (h/t @AidanBaker for the reminder about this art piece:-)

The artists involved created a rod that showed the wifi strength at a particular location, then, using time lapsed photography, took it for a walk: Immaterials: Light painting WiFi

PPS fwiw, you can run spatial queries over geo-data hosted in Google Fusion Tables: Search your geo data using spatial queries from Fusion Tables Spatial queries can be run “via the WHERE clause. The syntax is: ST_INTERSECTS(, ) For , use a in your table that contains location data. For , use one of the following: ” I haven’t tried it yet, but this approach looks amenable to geofence style query activity within regularly bounded regions.
A weaker spatial query form (that cannot be combined with ST_INTERSECTS conditions) to use ORDER BY based on distance. “The syntax is: ST_DISTANCE(, ) For , use a in your table that contains location data. Listing the as a is optional when using ORDER BY ST_DISTANCE. ORDER BY ST_DISTANCE cannot be combined with any of the ST_INTERSECTS conditions.”

PPPS and finally, just in case, here’s a link to the code repository for the Google MyTracks Android app; it may contain useful code snippets for any homebrew native Android app…