Fragment, Part 1 – Estimating Populations in Small Areas

Something I hadn’t picked up on before – the deadline for comments for which is today – are proposed boundary changes to wards on the Isle of Wight: Review of Isle of Wight council ward boundaries.

More formal guidance can be found in the *Local Government Boundary Commission for England’ Electoral reviews: Technical guidance document.

An interactive tool allows submissions to be made for newly suggested boundaries:

However, this doesn’t include population estimates within and drawn / suggested boundaries.

Compare that with the Constituency Boundaries tool from the House of Commons Library’s Oli Hawkins.

This interactive tool allowed users to select newly suggested ward areas, for which population estimates were also available, in order to come up with new constituency areas.

Which made me think – what would a boundary explorer look like for ward level boundary changes?

In terms of geographies / data, current ward boundaries can be found as part of the Ordnance Survey Boundary Line product, as well as from the ONS (ONS – Wards – Boundaries). The ONS boundaries come as shapefiles or KML. GeoJSON boundaries are available from martinjc/UK-GeoJSON (one thing I think that could be really useful would be to have a datasette enabled version of that repo?)

The lowest level geography for which population data (as recorded at the last census) is available are Output Areas (OAs). The ONS Census geography documentation describes them in the following terms:

[OAs] were designed to have similar population sizes and be as socially homogenous as possible based on tenure of household and dwelling type (homogeneity was not used as a factor in Scotland).

Urban/rural mixes were avoided where possible; OAs preferably consisted entirely of urban postcodes or entirely of rural postcodes.

They had approximately regular shapes and tended to be constrained by obvious boundaries such as major roads.

OAs were required to have a specified minimum size to ensure the confidentiality of data.

OA boundaries are available as shapefiles as well as population weighted centroids.

The ONS also publish lookup tables from OAs to wards, as well as population estimates at OA level. (You can also get hold of the 2011 census population estimates for OA level.)

According to the ONS Boundary Dataset Guidance (h/t @ONSgeography for the link), here’s a quick summary of the differences between boundary line types:

Full: As originally supplied to ONS, the highest resolution data available. Use ‘Full’ datasets for advanced GIS analysis (such as point-in-polygon allocation). Full datasets should not be used for general mapping purposes if an intermediate or simple version is available.

Intermediate/Generalised (20m): Intermediate datasets are designed for high quality mapping, preserving much of the original detail from the full dataset, but typically 10% of the file size. They are also suitable for non-demanding GIS analyses (such as buffering). Intermediate datasets are a good compromise between detail and small file size

Boundary sets can be prepared to “extent of the realm” and “clipped to the coastline”.

Extent of the realm boundary sets typically extend to Mean Low Water, although they can extend to islands off the coast e.g. Avonmouth ward in the City of Bristol extends to the islands of Flat Holm and Steep Holm in the Bristol Channel.

Clipped to the coastline boundary sets, derived from the extent of realm boundaries, show boundaries to Mean High Water. Usually prepared for visualisation of data such boundaries more closely represent map users expectations of how a coastal boundary should look. Whereas extent of the realm boundaries adjacent to an inlet or estuary may join at a point midway across the water, clipped to coastline boundaries permit the more precise identification of the waterside.

The guidance also provides a handy summary of ESRI shapefile components:

  • .shp  – the file that stores the feature geometry.
  • .shx – the file that stores the index of the feature geometry.
  • .dbf – the dBASE file that stores the attribute information of features.
  • .prj – the file that stores the projection of the feature geometry.
  • .sbx – a spatial index file
  • .sbn – a spatial index file

So… what I’m wondering is: how easy would it be to convert Oli’s Parliamentary constituency boundaries app to allow folk to work at a local level to combine OA level population estimates to sketch out suggested new ward boundaries.

By the by, I wonder about the extent to which recent population estimates are derived from projections of earlier Census data demographics (births/deaths predictions or statistics?)*, and the extent to which they accommodate things like new build housing estates (which presumably have the potential to change OA level populations significantly?) In turn, this makes me think that any Island Plan projections for new housing build areas should be added as an overlay to any consultation tool under the expectation that changed boundaries will be in place for at least a decade and it would be useful to know where population changes are being future-planned to occur? [* also internal migration from GP registration data (h/t OH)]

One of the things I note about OAs is that they were planned to be as socially homogenous as possible based on tenure of household and dwelling type. If we can colour code OAs according to this sort of information – and / or other demographic data – it would also allow us to get a feeling for the character of current and any proposed new wards based on its demographics. (It would also allow us to see if they were homogenous or mixed demographic.) I think the Output Area Classifications data is the one to use for this (data)?

For example, downoloading the 2011 OAC Clusters and Names csv (1.1 Mb ZIP), unzipping, renaming the the CSV file to oac.csv then using textql (as per Seven Ways of Making use of SQLite with the command:

textql -header -sql 'SELECT DISTINCT [Supergroup Name],[Group Name], [Subgroup Name] FROM oac WHERE [Local Authority Name] LIKE "%Wight%";' oac.csv

(the square brackets are used to escape the column names that contain spaces) gives the following unique categories for OAs on the Island:

Rural Residents,Ageing Rural Dwellers,Renting Rural Retirement
Rural Residents,Farming Communities,Agricultural Communities
Urbanites,Ageing Urban Living,Self-Sufficient Retirement
Hard-Pressed Living,Industrious Communities,Industrious Transitions
Suburbanites,Semi-Detached Suburbia,Older Workers and Retirement
Hard-Pressed Living,Hard-Pressed Ageing Workers,Renting Hard-Pressed Workers
Rural Residents,Rural Tenants,Rural Life
Hard-Pressed Living,Hard-Pressed Ageing Workers,Ageing Industrious Workers
Urbanites,Ageing Urban Living,Delayed Retirement
Urbanites,Ageing Urban Living,Communal Retirement
Suburbanites,Suburban Achievers,Ageing in Suburbia
Suburbanites,Suburban Achievers,Detached Retirement Living
Rural Residents,Farming Communities,Older Farming Communities
Hard-Pressed Living,Hard-Pressed Ageing Workers,Ageing Rural Industry Workers
Rural Residents,Ageing Rural Dwellers,Detached Rural Retirement
Rural Residents,Ageing Rural Dwellers,Rural Employment and Retirees
Rural Residents,Rural Tenants,Ageing Rural Flat Tenants
Suburbanites,Semi-Detached Suburbia,Semi-Detached Ageing
Urbanites,Urban Professionals and Families,White Professionals
Suburbanites,Semi-Detached Suburbia,White Suburban Communities
Rural Residents,Farming Communities,Established Farming Communities
Rural Residents,Rural Tenants,Rural White-Collar Workers
Constrained City Dwellers,Ageing City Dwellers,Retired Communal City Dwellers
Urbanites,Urban Professionals and Families,Families in Terraces and Flats 
Constrained City Dwellers,Challenged Diversity,Hampered Aspiration
Hard-Pressed Living,Industrious Communities,Industrious Hardship
Hard-Pressed Living,Challenged Terraced Workers,Deprived Blue-Collar Terraces
Constrained City Dwellers,White Communities,Outer City Hardship
Constrained City Dwellers,Ageing City Dwellers,Retired Independent City Dwellers
Hard-Pressed Living,Migration and Churn,Young Hard-Pressed Families
Constrained City Dwellers,Ageing City Dwellers,Ageing Communities and Families
Constrained City Dwellers,White Communities,Challenged Transitionaries
Constrained City Dwellers,White Communities,Constrained Young Families
Hard-Pressed Living,Migration and Churn,Hard-Pressed Ethnic Mix
Constrained City Dwellers,Challenged Diversity,Multi-Ethnic Hardship
Constrained City Dwellers,Challenged Diversity,Transitional Eastern European Neighbourhoods
Urbanites,Urban Professionals and Families,Multi-Ethnic Professionals with Families

In passing, here’s that block of text in a word cloud (via):

Word_Cloud_Generator.png

And here it is if I remove the DISTINCT constraint from the query and generate the cloud from descriptors of each OA on the Island:

Word_Cloud_Generator2.png

(That query returned 466 rows, compared to the 40 council wards. So each ward seems to be made up from about 10 OAs.)

One thing that might be interesting in urban areas is to see whether newly proposed boundaries are drawn so as to try to split up and disenfranchise particular groups at local level (under the argument that wards should be dominated by majority white / elderly / conservative voting populations) or group them together so that wards can be ghettoised and sacrificed to other parties by the conservative (you can big-C that if you like…) majority.

Remember: all data is political, and all data can be used for political purposes…

Another thing that might be handy is a look-up from postcode to output area, perhaps then reporting on the classification given to the output area and the surrounding ones. To help with that, the ONS do a postcode to OA lookup.

I can’t think this through properly at the moment, but I wonder if its sensible to find the average of two or more neighbouring weighted centroid locations to find an “average” centroid for them that could be used as the basis of a Voronoi diagram boundary estimator? (So for example, select however many neighbouring OA centroids for each newly proposed ward, find the mean location of them, then create Voronoi diagram boundaries around those mean centroids, at least as a first estimate of a boundary. Then compare these with the merged OA boundaries? Is this a meaningful thing to do, and if so, would this tell us anything interesting?

Okay, so that’s some resources found. Next thing is to pull them into a datasette to support this post and figure out some questions to ask. Not sure I’ll have chance to do anything before the consultation finishes though (particularly given the day job is calling for the rest of the day…)

Thanks to Oli Hawkins for pointers into some of the datasets and info about estimate production…

PS I also notice that the O/S Boundary Line product has a dataset called polling_districts_England_region. I wonder if this is something that can be used to map catchment areas around polling locations? I also wonder how this boundary reflects wards and whether changes to these boundaries necessarily follow changes to ward boundaries?

Video Surveillance Fragments

Several things that appeared on my various feeds over the last few days…

I also saw an ad / review / report on video doorbells earlier this week where one of the illustrative photos showed the view from a house/doorbell showing cars in the road outside and the doors /windows of the houses opposite… Turning a privacy bug into a Crimestoppers approved feature, I guess…

I’m not sure how laws of evidence hold when members of the public submit things like this, becuase it’s getting increasingly easy to tinker with photos nowadays…. For example, changing the weather in a photo…

I wonder if the Surveillance Camera Commissioner is going to start adding affiliate links to their website leading to their top picks…?

Participatory video surveillance… sigh…

First Class R Support in Binder / Binderhub – Shiny Apps As Well as R-Kernels and RStudio

I notice from the binder-examples/r repo that Binderhub now appears to offer all sorts of R goodness out of the can, if you specify a particular R build.

From the same repo root, you can get:

And from previously, here’s a workaround for displaying R/HTMLwidgets in a Jupyter notebook.

OpenRefine is also available from a simple URL – https://mybinder.org/v2/gh/betatim/openrefineder/master?urlpath=openrefine – courtesy of betatim/openrefineder:

Perhaps it’s time for me to try to get my head round what the Jupyter notebook proxy handlers are doing…

PS see also Scripted Forms for a simple markdown script way of building interactive Jupyter widget powered UIs.

Running Spatial Queries on Food Standards Agency Data Using Ordnance Survey Shapefiles in SpatiaLite via Datasette

In Datasette ClusterMap Plugin – Querying UK Food Standards Agency (FSA) Food Hygiene Ratings Open Data I showed how the datasette cluster map plugin could be used to display a map with clustered markers showing the location of results on an interactive map from a datasette query that returned latitude and longitude columns.

Since then, datasette has added support for SpatiaLite, a SQLite extension that supports spatial queries. In the previous posts on this topic, (Trying Out Spatialite Support in Datasette and Working With OpenStreetMap Roads Data Using osmnx), I’ve started trying to get my head round how to make use of this more powerful tool for managing geo-related data, in particular grabbing Ordnance Survey shapefiles that can be used to filter queries about the UK road network made to OpenStreetMap using the incredibly power osmnx Python package.

It’s been a hard slog so far – I’m not really up to speed on geo-tools, what they can do, and what representations work with what – but I’m slowly starting to get there. (Maybe I should have read a book, or done a course, rather than just trying to learn by doing!)

The notebook that contains the code described in this post can be found here. In particular, I’ve been looking at how to run SpatiaLite within() queries to find points within an area, as well as looking a bit more at some the the utility functions osmnx exposes.

To start with, let’s grab some data. I’m going to use the Food Standards Agency food hygiene ratings again:

One thing I should probably do is look at having a variant of that datagrabber that grabs the data into a SpatiaLite database. The original data contains latitude and longitude co-ordinates. We can create a suitable geometry column in the same table and then add a WKT description of the location. (I think the columns should actually be in the order ‘Latitude’ || ‘Longitude’?)

The locations are given using the lat/long (WGS84 / EPSG 4326) projection. However, the Ordnance Survey boundaries I obtained via the OS BoundaryLine dataset are in the OSGB (EPSG 27700) projection. But being a spatial database, SpatiaLite can help when it comes to applying the transformation between those projections:

As I’m trying to build workflows/tools around the datasette API to the database, here’s a first attempt at a crude wrapper for it:

Here’s how part of the FSA data table looks now:

One of the issues I noticed with running queries searching for locations within an area delimited by a shapefile is that they can take a long time to run. However, we can create an index on a shapefile that describes a bounding box for the shapefile which we can use as a first pass to find points broadly within an area:

We can then use a query on the index / bounding box as a subselect before running the slower query to check that a point actually lies within the possibly ragged region:

We can now use the data in the dataframe as the basis for a map display, in this case using marker clusters (the screenshot shows an exploded cluster):

The FSA data seems to use postcode centroids as the geolocation co-ordinates (at least, I’m guessing so from the overlaps). The folium/leaflet package I’m using to display maps overlays markers that are colcated so you only see the last one. There is a leaflet plugin to handle this, but it’s not currently available via folium, so for now, let’s just randomly perturb markers a bit so that we can see colocated ones:

To check that the markers are located inside the desired boundary, we can also plot the boundaryline on the map:

One of the things that I noticed from OpenStreetMap is that it actually geocodes the locations of points of interest (POIs) more accurately. The osmnx package includes a handy utility for geocoding an address, so I should be able to use this to improve the geolocation data of the FSA rated establishments?

The osmnx package also lets us search for POIs within boundary area, presented as a Shapely geometry object. We can generate such an object directly from a GeoJSON shapefile:

We can then use this as the basis for a query onto OSM by amenity tag:

There are lots of different amenity tags supported, so this could provide quite a useful lookup tool. For example, I wondering about what sorts of amenity it might be handy for parish councillors to be able to lookup via a hyperlocal / parish council geo-service?

Automating the Battlefield

Another year, another robot heavy military exercise. Eighteen months or so ago, it was Unmanned Warrior, a marine based large scale demonstration of unmanned and autonomous systems led by the Royal Navy  (previously blogged as Drone War Exercises).

Next up is Autonomous Warrior, the 2018 Army Warfighting Experiment. Apparently:

Autonomous Warrior will test a range of prototype unmanned aerial and ground cargo vehicles which aim to reduce the danger to troops during combat.

The British Army is set to launch the four-week exercise on November 12, with a Battlegroup from 1 Armed Infantry brigade providing the exercising troops and taking responsibility of command and control.

I wonder if the exercise is building on early results from the autonomous last mile resupply that was launched by the Defence and Security Accelerator last year with the following timeline:

In passing, the testing / performance criteria for competition entries provides a handy checklist of key considerations:

  • Non stop range
  • System lift capacity (mass)
  • Payload size and shape (volume)
  • Speed
  • Turn around time and effort
  • Terrain types
  • Operator control requirements / autonomy levels (eg Automatic route planning, obstacle avoidance, tasking)
  • Contested environment (eg no GPS)
  • Physical Environmental conditions (eg wind)
  • Supportability and Interchangeability

Spend on military robots for deployment  also seems to be increasing, in the US at least. For example, Bloomberg reported recently (The U.S. Army Is Turning to Robot Soldiers) that:

In April, the [US] Army awarded a $429.1 million contract to two Massachusetts companies, Endeavor Robotics of Chelmsford and Waltham-based QinetiQ North America, for small bots weighing fewer than 25 pounds. This spring, Endeavor also landed two contracts worth $34 million from the Marine Corps for small and midsized robots. … In October, the Army awarded Endeavor $158.5 million for a class of more than 1,200 medium robots, called the Man-Transportable Robotic System, Increment II, weighing less than 165 pounds.

In the air, UK use of aerial drones is in accordance with the the 84 page Joint Doctrine Publication Unmanned aircraft systems (JDP 0-30.2); for a full list of joint doctrine publications see the Joint Doctrine Publication (JDP) collection.

I’m not sure if there’s a reserve force that specifically recruits volunteers with skills in robotics, but I notice there is one for techies interested in matters relating to cyber-warfare/cyber-defence: Joint Cyber Reserve Force (CRF). You should probably read the JDP Cyber Primer first, though…

My Isle of Wight Festival 2018 Roundup

A week on, what are the bands I rated – and saw – at this year’s Isle of Wight Festival?

As with previous years, we’ve camped, which means we also get in on the Thursday. Opening things up in the Kashmir Cafe, operated by the local Quay Arts Centre, staffed by volunteers, and running the best bar on the site, were local reggae favourites The Ohms kicking off the positivity that lasted through the weekend:

They’ve got a new album out which we picked up on the day – well worth a listen…

A couple of of bands later were Rews doing a White Stripes sort of thing. It would be great to see them playing the local Strings venue, which had its first birthday yesterday…

Friday saw the main stage open up, although I tend to avoid many of the acts there preferring to seeing the smaller bands. I was however surprised to be impressed by Rita Ora, not being really sure what a Rita Ora is…

Erm… anyway… That was actually on the way to see Southampton favourites The Novatones on the Hard Rock Stage:

Walking on from there, I was tipped off to 77:78, made up of a couple of ex-Bees, playing in the Kashmir. Psychedelia through a warm pass filter, with the positivity turned up to 11. Absolutely bloody gorgeous…

A couple of bands on, and Rusty Shackle made a welcome return to the Island:

Saturday started with a quick visit to the main stage to see Slydigs, a band who reminded me muchly of the Stereophonics with a dash of added Jet, fast songs and slow songs alike, albeit with a Merseyside accent:

A bit of a break then until another returning band who I saw a couple of times last year as Germein Sisters, and who came back this year as Germein:

They’re just about to tour as support to Little Mix, whatever a Little Mix is. Erm…

Next up, Blossoms, who seem to have got to early evening main stage really quickly. As a band, you see them for the songs, not the banal, characterless performance (shut your eyes and dance along)…

On the other hand, Sisteray in the This Feeling tent were engaging both musically and performance wise; punk rock politics in the style of Green Day, with a political attitude relevant to the internet age. Not sure I’ve ever heard a song with algorithm in the title before… Toptastic.

A quick diversion on the Kashmir Cafe for another pint of Yachtsman’s Ale, where I caught the end of island bluesmiths Confred Fred teamed up with Jojo and the Woods; powerful blues guitar and one hell of a voice from female vocalist Jojo (I presume…).

Then the rockin’ continued with Liam Gallagher on the main stage. At one point he apologised to folk griping he was playing Oasis songs. Erm. The only reason I was there was for a bit of a singalong to Oasis songs, which was fun enough. But it rather reminded me a bit of a covers band; who had no real ownership of the songs or how to get the original feeling out of them. So no video prize there…

After hanging around for the start of Depeche Mode – yawn, the sound was of its time and I wasn’t prepeared to wait an hour to be reminded of the songs I played on vinyl repeatedly over 30 years ago – I caught the always awful Bullybones (get p****d, make a racket in the way I think they imagined The Stooges used to) and then a band I caught the end of at Beautiful Days Festival last year – Noble Jacks:

Sunday started with an idle listen to Sheryl Crow

…cut short in favour of seeing festival regular Suzanne Vega:

Then a break and a rest under the trees by Cirque De La Quirk before regular island visitors Tankus the Henge did the liveliest set of the weekend with perfectly in keeping added circus performers, lasers and confetti cannons:

And finally, a three way clash between The Correspondents, Ska’d For Life and Travis. In the end I opted to see the start of Travis, but stayed to the end… The singalong was incredible and Fran Healy is just the nicest front man, wrapping up the weekend with wry observations and a lot of positivity… And the best singalong…

One band I missed several times was Reminders but I hope to catch them at Rhythm Tree Festival in a couple of weeks along with The Ohmz, 77:78 and Tankus the Henge… I’m pretty sure The Correspondents are there too, so I’ll get to see a bit more of what I missed at the Isle of Wight Festival…

PS early bird tickets for the Isle of Wight Festival 2019 are already on sale…

Working With OpenStreetMap Roads Data Using osmnx

A couple of days ago, I came across the incredible looking osmnx Python package, originally created by Geoff Boeing at UC Berkeley in support of his PhD, via one of his blog posts: OSMnx: Python for Street Networks (there is a citeable paper, but that’s not what I originally found…) There are also some great example notebooks: gboeing/osmnx-examples.

I spent a chunk of today having a play with is, and have posted a notebook walkthrough here.

It’s quite incredible…

Pretty much a one-liner lets you pull back all the roads in a particular area, searched for by name:

The osmnx package represents routes as a networkx graph – so we can do graphy things with it, like finding the shortest distance between two points, aka route planning:

The route can also be plotted on an interactive map. An option lets you hover on a route and display a context sensitive tooltip, in this case the name of the road:

Retrieving routes by area name is handy, but we can also pull back routes within a certain distance of a specified location, or routes that are contained within a particular region specified by a shapefile.

In a previous post (Trying Out Spatialite Support in Datasette) I showed how to put Ordnance Survey BoundaryLine data into a SpatiaLite database and then access the geojson boundary files from a datasette API. We can use that approach again, here wrapped up in a dockerised context:

Using the retrieved boundary shapefile, we can then use osmnx to just grab the roads contained within that region, in this case my local parish:

Once again, we can use an interactive map to display the results:

If we overlay the parish boundary, we see that the routes returned correspond to the graph between nodes that lay within the boundary. Some roads pass straight through the boundary, others appear to lay just outside the boundary.osmnx_Demo11

However, it looks like we can tweak various parameters to get the full extent of the roads within the region:

osmnx_Demo12

As well as routes, we can also get building footprints from OpenStreetMap:

If you know where to look, you can see our house!

Building footprints can also be overlaid on routes:

If we generate a distance to time mapping, the graph representation means we can also colouring nodes according to how far in walking time, for example, they are from a particular location:

We can also overlay routes on isochrone areas to show travel times along routes – so where’s within half an hour’s walk of the Pointer Inn in Newchurch?

Living on a holiday island, I wonder if there’s any mileage (!) in offering a simple service for campsites, hotels etc that would let them print off isochrone walking maps centred on the campsite, hotel etc with various points of interest, and estimated walking times, highlighted?

I’m also wondering how much work would be required in order to add additional support to the osmnx package so that it could use Ordnance Survey, rather than OSM, data?

And finally, one other thing I’d like to explore is how to generate tulip diagrams from route graphs… If anyone has any ideas about that, please let me know via the comments…

PS for building outlines in the UK derived from Ordnance Survey data, see for example Alasdair Rae’s OS OpenMap Local – All Buildings in Great Britain.

PPS And building footprints for the US, courtesy of Microsoft: https://github.com/Microsoft/USBuildingFootprints