Category: Anything you want

Fragment, Part 1 – Estimating Populations in Small Areas

Something I hadn’t picked up on before – the deadline for comments for which is today – are proposed boundary changes to wards on the Isle of Wight: Review of Isle of Wight council ward boundaries.

More formal guidance can be found in the *Local Government Boundary Commission for England’ Electoral reviews: Technical guidance document.

An interactive tool allows submissions to be made for newly suggested boundaries:

However, this doesn’t include population estimates within and drawn / suggested boundaries.

Compare that with the Constituency Boundaries tool from the House of Commons Library’s Oli Hawkins.

This interactive tool allowed users to select newly suggested ward areas, for which population estimates were also available, in order to come up with new constituency areas.

Which made me think – what would a boundary explorer look like for ward level boundary changes?

In terms of geographies / data, current ward boundaries can be found as part of the Ordnance Survey Boundary Line product, as well as from the ONS (ONS – Wards – Boundaries). The ONS boundaries come as shapefiles or KML. GeoJSON boundaries are available from martinjc/UK-GeoJSON (one thing I think that could be really useful would be to have a datasette enabled version of that repo?)

The lowest level geography for which population data (as recorded at the last census) is available are Output Areas (OAs). The ONS Census geography documentation describes them in the following terms:

[OAs] were designed to have similar population sizes and be as socially homogenous as possible based on tenure of household and dwelling type (homogeneity was not used as a factor in Scotland).

Urban/rural mixes were avoided where possible; OAs preferably consisted entirely of urban postcodes or entirely of rural postcodes.

They had approximately regular shapes and tended to be constrained by obvious boundaries such as major roads.

OAs were required to have a specified minimum size to ensure the confidentiality of data.

OA boundaries are available as shapefiles as well as population weighted centroids.

The ONS also publish lookup tables from OAs to wards, as well as population estimates at OA level. (You can also get hold of the 2011 census population estimates for OA level.)

According to the ONS Boundary Dataset Guidance (h/t @ONSgeography for the link), here’s a quick summary of the differences between boundary line types:

Full: As originally supplied to ONS, the highest resolution data available. Use ‘Full’ datasets for advanced GIS analysis (such as point-in-polygon allocation). Full datasets should not be used for general mapping purposes if an intermediate or simple version is available.

Intermediate/Generalised (20m): Intermediate datasets are designed for high quality mapping, preserving much of the original detail from the full dataset, but typically 10% of the file size. They are also suitable for non-demanding GIS analyses (such as buffering). Intermediate datasets are a good compromise between detail and small file size

Boundary sets can be prepared to “extent of the realm” and “clipped to the coastline”.

Extent of the realm boundary sets typically extend to Mean Low Water, although they can extend to islands off the coast e.g. Avonmouth ward in the City of Bristol extends to the islands of Flat Holm and Steep Holm in the Bristol Channel.

Clipped to the coastline boundary sets, derived from the extent of realm boundaries, show boundaries to Mean High Water. Usually prepared for visualisation of data such boundaries more closely represent map users expectations of how a coastal boundary should look. Whereas extent of the realm boundaries adjacent to an inlet or estuary may join at a point midway across the water, clipped to coastline boundaries permit the more precise identification of the waterside.

The guidance also provides a handy summary of ESRI shapefile components:

  • .shp  – the file that stores the feature geometry.
  • .shx – the file that stores the index of the feature geometry.
  • .dbf – the dBASE file that stores the attribute information of features.
  • .prj – the file that stores the projection of the feature geometry.
  • .sbx – a spatial index file
  • .sbn – a spatial index file

So… what I’m wondering is: how easy would it be to convert Oli’s Parliamentary constituency boundaries app to allow folk to work at a local level to combine OA level population estimates to sketch out suggested new ward boundaries.

By the by, I wonder about the extent to which recent population estimates are derived from projections of earlier Census data demographics (births/deaths predictions or statistics?)*, and the extent to which they accommodate things like new build housing estates (which presumably have the potential to change OA level populations significantly?) In turn, this makes me think that any Island Plan projections for new housing build areas should be added as an overlay to any consultation tool under the expectation that changed boundaries will be in place for at least a decade and it would be useful to know where population changes are being future-planned to occur? [* also internal migration from GP registration data (h/t OH)]

One of the things I note about OAs is that they were planned to be as socially homogenous as possible based on tenure of household and dwelling type. If we can colour code OAs according to this sort of information – and / or other demographic data – it would also allow us to get a feeling for the character of current and any proposed new wards based on its demographics. (It would also allow us to see if they were homogenous or mixed demographic.) I think the Output Area Classifications data is the one to use for this (data)?

For example, downoloading the 2011 OAC Clusters and Names csv (1.1 Mb ZIP), unzipping, renaming the the CSV file to oac.csv then using textql (as per Seven Ways of Making use of SQLite with the command:

textql -header -sql 'SELECT DISTINCT [Supergroup Name],[Group Name], [Subgroup Name] FROM oac WHERE [Local Authority Name] LIKE "%Wight%";' oac.csv

(the square brackets are used to escape the column names that contain spaces) gives the following unique categories for OAs on the Island:

Rural Residents,Ageing Rural Dwellers,Renting Rural Retirement
Rural Residents,Farming Communities,Agricultural Communities
Urbanites,Ageing Urban Living,Self-Sufficient Retirement
Hard-Pressed Living,Industrious Communities,Industrious Transitions
Suburbanites,Semi-Detached Suburbia,Older Workers and Retirement
Hard-Pressed Living,Hard-Pressed Ageing Workers,Renting Hard-Pressed Workers
Rural Residents,Rural Tenants,Rural Life
Hard-Pressed Living,Hard-Pressed Ageing Workers,Ageing Industrious Workers
Urbanites,Ageing Urban Living,Delayed Retirement
Urbanites,Ageing Urban Living,Communal Retirement
Suburbanites,Suburban Achievers,Ageing in Suburbia
Suburbanites,Suburban Achievers,Detached Retirement Living
Rural Residents,Farming Communities,Older Farming Communities
Hard-Pressed Living,Hard-Pressed Ageing Workers,Ageing Rural Industry Workers
Rural Residents,Ageing Rural Dwellers,Detached Rural Retirement
Rural Residents,Ageing Rural Dwellers,Rural Employment and Retirees
Rural Residents,Rural Tenants,Ageing Rural Flat Tenants
Suburbanites,Semi-Detached Suburbia,Semi-Detached Ageing
Urbanites,Urban Professionals and Families,White Professionals
Suburbanites,Semi-Detached Suburbia,White Suburban Communities
Rural Residents,Farming Communities,Established Farming Communities
Rural Residents,Rural Tenants,Rural White-Collar Workers
Constrained City Dwellers,Ageing City Dwellers,Retired Communal City Dwellers
Urbanites,Urban Professionals and Families,Families in Terraces and Flats 
Constrained City Dwellers,Challenged Diversity,Hampered Aspiration
Hard-Pressed Living,Industrious Communities,Industrious Hardship
Hard-Pressed Living,Challenged Terraced Workers,Deprived Blue-Collar Terraces
Constrained City Dwellers,White Communities,Outer City Hardship
Constrained City Dwellers,Ageing City Dwellers,Retired Independent City Dwellers
Hard-Pressed Living,Migration and Churn,Young Hard-Pressed Families
Constrained City Dwellers,Ageing City Dwellers,Ageing Communities and Families
Constrained City Dwellers,White Communities,Challenged Transitionaries
Constrained City Dwellers,White Communities,Constrained Young Families
Hard-Pressed Living,Migration and Churn,Hard-Pressed Ethnic Mix
Constrained City Dwellers,Challenged Diversity,Multi-Ethnic Hardship
Constrained City Dwellers,Challenged Diversity,Transitional Eastern European Neighbourhoods
Urbanites,Urban Professionals and Families,Multi-Ethnic Professionals with Families

In passing, here’s that block of text in a word cloud (via):

Word_Cloud_Generator.png

And here it is if I remove the DISTINCT constraint from the query and generate the cloud from descriptors of each OA on the Island:

Word_Cloud_Generator2.png

(That query returned 466 rows, compared to the 40 council wards. So each ward seems to be made up from about 10 OAs.)

One thing that might be interesting in urban areas is to see whether newly proposed boundaries are drawn so as to try to split up and disenfranchise particular groups at local level (under the argument that wards should be dominated by majority white / elderly / conservative voting populations) or group them together so that wards can be ghettoised and sacrificed to other parties by the conservative (you can big-C that if you like…) majority.

Remember: all data is political, and all data can be used for political purposes…

Another thing that might be handy is a look-up from postcode to output area, perhaps then reporting on the classification given to the output area and the surrounding ones. To help with that, the ONS do a postcode to OA lookup.

I can’t think this through properly at the moment, but I wonder if its sensible to find the average of two or more neighbouring weighted centroid locations to find an “average” centroid for them that could be used as the basis of a Voronoi diagram boundary estimator? (So for example, select however many neighbouring OA centroids for each newly proposed ward, find the mean location of them, then create Voronoi diagram boundaries around those mean centroids, at least as a first estimate of a boundary. Then compare these with the merged OA boundaries? Is this a meaningful thing to do, and if so, would this tell us anything interesting?

Okay, so that’s some resources found. Next thing is to pull them into a datasette to support this post and figure out some questions to ask. Not sure I’ll have chance to do anything before the consultation finishes though (particularly given the day job is calling for the rest of the day…)

Thanks to Oli Hawkins for pointers into some of the datasets and info about estimate production…

PS I also notice that the O/S Boundary Line product has a dataset called polling_districts_England_region. I wonder if this is something that can be used to map catchment areas around polling locations? I also wonder how this boundary reflects wards and whether changes to these boundaries necessarily follow changes to ward boundaries?

Video Surveillance Fragments

Several things that appeared on my various feeds over the last few days…

I also saw an ad / review / report on video doorbells earlier this week where one of the illustrative photos showed the view from a house/doorbell showing cars in the road outside and the doors /windows of the houses opposite… Turning a privacy bug into a Crimestoppers approved feature, I guess…

I’m not sure how laws of evidence hold when members of the public submit things like this, becuase it’s getting increasingly easy to tinker with photos nowadays…. For example, changing the weather in a photo…

I wonder if the Surveillance Camera Commissioner is going to start adding affiliate links to their website leading to their top picks…?

Participatory video surveillance… sigh…

Running Spatial Queries on Food Standards Agency Data Using Ordnance Survey Shapefiles in SpatiaLite via Datasette

In Datasette ClusterMap Plugin – Querying UK Food Standards Agency (FSA) Food Hygiene Ratings Open Data I showed how the datasette cluster map plugin could be used to display a map with clustered markers showing the location of results on an interactive map from a datasette query that returned latitude and longitude columns.

Since then, datasette has added support for SpatiaLite, a SQLite extension that supports spatial queries. In the previous posts on this topic, (Trying Out Spatialite Support in Datasette and Working With OpenStreetMap Roads Data Using osmnx), I’ve started trying to get my head round how to make use of this more powerful tool for managing geo-related data, in particular grabbing Ordnance Survey shapefiles that can be used to filter queries about the UK road network made to OpenStreetMap using the incredibly power osmnx Python package.

It’s been a hard slog so far – I’m not really up to speed on geo-tools, what they can do, and what representations work with what – but I’m slowly starting to get there. (Maybe I should have read a book, or done a course, rather than just trying to learn by doing!)

The notebook that contains the code described in this post can be found here. In particular, I’ve been looking at how to run SpatiaLite within() queries to find points within an area, as well as looking a bit more at some the the utility functions osmnx exposes.

To start with, let’s grab some data. I’m going to use the Food Standards Agency food hygiene ratings again:

One thing I should probably do is look at having a variant of that datagrabber that grabs the data into a SpatiaLite database. The original data contains latitude and longitude co-ordinates. We can create a suitable geometry column in the same table and then add a WKT description of the location. (I think the columns should actually be in the order ‘Latitude’ || ‘Longitude’?)

The locations are given using the lat/long (WGS84 / EPSG 4326) projection. However, the Ordnance Survey boundaries I obtained via the OS BoundaryLine dataset are in the OSGB (EPSG 27700) projection. But being a spatial database, SpatiaLite can help when it comes to applying the transformation between those projections:

As I’m trying to build workflows/tools around the datasette API to the database, here’s a first attempt at a crude wrapper for it:

Here’s how part of the FSA data table looks now:

One of the issues I noticed with running queries searching for locations within an area delimited by a shapefile is that they can take a long time to run. However, we can create an index on a shapefile that describes a bounding box for the shapefile which we can use as a first pass to find points broadly within an area:

We can then use a query on the index / bounding box as a subselect before running the slower query to check that a point actually lies within the possibly ragged region:

We can now use the data in the dataframe as the basis for a map display, in this case using marker clusters (the screenshot shows an exploded cluster):

The FSA data seems to use postcode centroids as the geolocation co-ordinates (at least, I’m guessing so from the overlaps). The folium/leaflet package I’m using to display maps overlays markers that are colcated so you only see the last one. There is a leaflet plugin to handle this, but it’s not currently available via folium, so for now, let’s just randomly perturb markers a bit so that we can see colocated ones:

To check that the markers are located inside the desired boundary, we can also plot the boundaryline on the map:

One of the things that I noticed from OpenStreetMap is that it actually geocodes the locations of points of interest (POIs) more accurately. The osmnx package includes a handy utility for geocoding an address, so I should be able to use this to improve the geolocation data of the FSA rated establishments?

The osmnx package also lets us search for POIs within boundary area, presented as a Shapely geometry object. We can generate such an object directly from a GeoJSON shapefile:

We can then use this as the basis for a query onto OSM by amenity tag:

There are lots of different amenity tags supported, so this could provide quite a useful lookup tool. For example, I wondering about what sorts of amenity it might be handy for parish councillors to be able to lookup via a hyperlocal / parish council geo-service?

Automating the Battlefield

Another year, another robot heavy military exercise. Eighteen months or so ago, it was Unmanned Warrior, a marine based large scale demonstration of unmanned and autonomous systems led by the Royal Navy  (previously blogged as Drone War Exercises).

Next up is Autonomous Warrior, the 2018 Army Warfighting Experiment. Apparently:

Autonomous Warrior will test a range of prototype unmanned aerial and ground cargo vehicles which aim to reduce the danger to troops during combat.

The British Army is set to launch the four-week exercise on November 12, with a Battlegroup from 1 Armed Infantry brigade providing the exercising troops and taking responsibility of command and control.

I wonder if the exercise is building on early results from the autonomous last mile resupply that was launched by the Defence and Security Accelerator last year with the following timeline:

In passing, the testing / performance criteria for competition entries provides a handy checklist of key considerations:

  • Non stop range
  • System lift capacity (mass)
  • Payload size and shape (volume)
  • Speed
  • Turn around time and effort
  • Terrain types
  • Operator control requirements / autonomy levels (eg Automatic route planning, obstacle avoidance, tasking)
  • Contested environment (eg no GPS)
  • Physical Environmental conditions (eg wind)
  • Supportability and Interchangeability

Spend on military robots for deployment  also seems to be increasing, in the US at least. For example, Bloomberg reported recently (The U.S. Army Is Turning to Robot Soldiers) that:

In April, the [US] Army awarded a $429.1 million contract to two Massachusetts companies, Endeavor Robotics of Chelmsford and Waltham-based QinetiQ North America, for small bots weighing fewer than 25 pounds. This spring, Endeavor also landed two contracts worth $34 million from the Marine Corps for small and midsized robots. … In October, the Army awarded Endeavor $158.5 million for a class of more than 1,200 medium robots, called the Man-Transportable Robotic System, Increment II, weighing less than 165 pounds.

In the air, UK use of aerial drones is in accordance with the the 84 page Joint Doctrine Publication Unmanned aircraft systems (JDP 0-30.2); for a full list of joint doctrine publications see the Joint Doctrine Publication (JDP) collection.

I’m not sure if there’s a reserve force that specifically recruits volunteers with skills in robotics, but I notice there is one for techies interested in matters relating to cyber-warfare/cyber-defence: Joint Cyber Reserve Force (CRF). You should probably read the JDP Cyber Primer first, though…

My Isle of Wight Festival 2018 Roundup

A week on, what are the bands I rated – and saw – at this year’s Isle of Wight Festival?

As with previous years, we’ve camped, which means we also get in on the Thursday. Opening things up in the Kashmir Cafe, operated by the local Quay Arts Centre, staffed by volunteers, and running the best bar on the site, were local reggae favourites The Ohms kicking off the positivity that lasted through the weekend:

They’ve got a new album out which we picked up on the day – well worth a listen…

A couple of of bands later were Rews doing a White Stripes sort of thing. It would be great to see them playing the local Strings venue, which had its first birthday yesterday…

Friday saw the main stage open up, although I tend to avoid many of the acts there preferring to seeing the smaller bands. I was however surprised to be impressed by Rita Ora, not being really sure what a Rita Ora is…

Erm… anyway… That was actually on the way to see Southampton favourites The Novatones on the Hard Rock Stage:

Walking on from there, I was tipped off to 77:78, made up of a couple of ex-Bees, playing in the Kashmir. Psychedelia through a warm pass filter, with the positivity turned up to 11. Absolutely bloody gorgeous…

A couple of bands on, and Rusty Shackle made a welcome return to the Island:

Saturday started with a quick visit to the main stage to see Slydigs, a band who reminded me muchly of the Stereophonics with a dash of added Jet, fast songs and slow songs alike, albeit with a Merseyside accent:

A bit of a break then until another returning band who I saw a couple of times last year as Germein Sisters, and who came back this year as Germein:

They’re just about to tour as support to Little Mix, whatever a Little Mix is. Erm…

Next up, Blossoms, who seem to have got to early evening main stage really quickly. As a band, you see them for the songs, not the banal, characterless performance (shut your eyes and dance along)…

On the other hand, Sisteray in the This Feeling tent were engaging both musically and performance wise; punk rock politics in the style of Green Day, with a political attitude relevant to the internet age. Not sure I’ve ever heard a song with algorithm in the title before… Toptastic.

A quick diversion on the Kashmir Cafe for another pint of Yachtsman’s Ale, where I caught the end of island bluesmiths Confred Fred teamed up with Jojo and the Woods; powerful blues guitar and one hell of a voice from female vocalist Jojo (I presume…).

Then the rockin’ continued with Liam Gallagher on the main stage. At one point he apologised to folk griping he was playing Oasis songs. Erm. The only reason I was there was for a bit of a singalong to Oasis songs, which was fun enough. But it rather reminded me a bit of a covers band; who had no real ownership of the songs or how to get the original feeling out of them. So no video prize there…

After hanging around for the start of Depeche Mode – yawn, the sound was of its time and I wasn’t prepeared to wait an hour to be reminded of the songs I played on vinyl repeatedly over 30 years ago – I caught the always awful Bullybones (get p****d, make a racket in the way I think they imagined The Stooges used to) and then a band I caught the end of at Beautiful Days Festival last year – Noble Jacks:

Sunday started with an idle listen to Sheryl Crow

…cut short in favour of seeing festival regular Suzanne Vega:

Then a break and a rest under the trees by Cirque De La Quirk before regular island visitors Tankus the Henge did the liveliest set of the weekend with perfectly in keeping added circus performers, lasers and confetti cannons:

And finally, a three way clash between The Correspondents, Ska’d For Life and Travis. In the end I opted to see the start of Travis, but stayed to the end… The singalong was incredible and Fran Healy is just the nicest front man, wrapping up the weekend with wry observations and a lot of positivity… And the best singalong…

One band I missed several times was Reminders but I hope to catch them at Rhythm Tree Festival in a couple of weeks along with The Ohmz, 77:78 and Tankus the Henge… I’m pretty sure The Correspondents are there too, so I’ll get to see a bit more of what I missed at the Isle of Wight Festival…

PS early bird tickets for the Isle of Wight Festival 2019 are already on sale…

Working With OpenStreetMap Roads Data Using osmnx

A couple of days ago, I came across the incredible looking osmnx Python package, originally created by Geoff Boeing at UC Berkeley in support of his PhD, via one of his blog posts: OSMnx: Python for Street Networks (there is a citeable paper, but that’s not what I originally found…) There are also some great example notebooks: gboeing/osmnx-examples.

I spent a chunk of today having a play with is, and have posted a notebook walkthrough here.

It’s quite incredible…

Pretty much a one-liner lets you pull back all the roads in a particular area, searched for by name:

The osmnx package represents routes as a networkx graph – so we can do graphy things with it, like finding the shortest distance between two points, aka route planning:

The route can also be plotted on an interactive map. An option lets you hover on a route and display a context sensitive tooltip, in this case the name of the road:

Retrieving routes by area name is handy, but we can also pull back routes within a certain distance of a specified location, or routes that are contained within a particular region specified by a shapefile.

In a previous post (Trying Out Spatialite Support in Datasette) I showed how to put Ordnance Survey BoundaryLine data into a SpatiaLite database and then access the geojson boundary files from a datasette API. We can use that approach again, here wrapped up in a dockerised context:

Using the retrieved boundary shapefile, we can then use osmnx to just grab the roads contained within that region, in this case my local parish:

Once again, we can use an interactive map to display the results:

If we overlay the parish boundary, we see that the routes returned correspond to the graph between nodes that lay within the boundary. Some roads pass straight through the boundary, others appear to lay just outside the boundary.osmnx_Demo11

However, it looks like we can tweak various parameters to get the full extent of the roads within the region:

osmnx_Demo12

As well as routes, we can also get building footprints from OpenStreetMap:

If you know where to look, you can see our house!

Building footprints can also be overlaid on routes:

If we generate a distance to time mapping, the graph representation means we can also colouring nodes according to how far in walking time, for example, they are from a particular location:

We can also overlay routes on isochrone areas to show travel times along routes – so where’s within half an hour’s walk of the Pointer Inn in Newchurch?

Living on a holiday island, I wonder if there’s any mileage (!) in offering a simple service for campsites, hotels etc that would let them print off isochrone walking maps centred on the campsite, hotel etc with various points of interest, and estimated walking times, highlighted?

I’m also wondering how much work would be required in order to add additional support to the osmnx package so that it could use Ordnance Survey, rather than OSM, data?

And finally, one other thing I’d like to explore is how to generate tulip diagrams from route graphs… If anyone has any ideas about that, please let me know via the comments…

PS for building outlines in the UK derived from Ordnance Survey data, see for example Alasdair Rae’s OS OpenMap Local – All Buildings in Great Britain.

PPS And building footprints for the US, courtesy of Microsoft: https://github.com/Microsoft/USBuildingFootprints

Trying Out Spatialite Support in Datasette

A recent datasette release included improved support for the SpatiaLite geo-extensions for SQLite, so to complement some other bits of geo-tinkering I’ve been doing lately – and which I still need to blog – I thought I’d give it a spin.

The write up is in a Jupyter notebook which you can find here and which is embedded in markdown export format below:

SpatiaLite Datasette Demo

This notebook provides a quick walkthrough of getting started with a SpatiaLite geo-database and using it with Datasette.

Get the Data

The SpatiaLite database can be used to store, index and perform various geo-related query operations on various geographical objects including points and shapefiles.

To help me get up to speed, I’m going to try to load in a shapefile delimiting various bits of the UK into a SpatiaLite database, publish it as a datasette, retrieve a boundary for a particular region from the datasette API and then plot the boundary on a map.

The shapefile I’m going to use is of UK administrative areas described by the Ordnance Survey Boundary-Line product.

You can get a copy of the data from https://www.ordnancesurvey.co.uk/opendatadownload/products.html by selecting the Boundary-Line product and providing contact details, at which point you should be emailed a download link.

Download the data and unzip it. This should create a folder named bdline_essh_gb, or something similar. (The paths used in this notebook assumes that it is running inside that directory.)

Inside the bdline_essh_gb ditectory is a Data subdirectory, and inside that is a GB subdirectory containing a variety of unpacked shapefiules, including bit not limited to:

district_borough_unitary_region.dbf parish_region.shp
district_borough_unitary_region.prj parish_region.shx
district_borough_unitary_region.shp scotland_and_wales_const_region.dbf
district_borough_unitary_region.shx scotland_and_wales_const_region.prj
district_borough_unitary_ward_region.cpg scotland_and_wales_const_region.shp
district_borough_unitary_ward_region.dbf scotland_and_wales_const_region.shx
district_borough_unitary_ward_region.prj scotland_and_wales_region.dbf
district_borough_unitary_ward_region.sbn scotland_and_wales_region.prj
district_borough_unitary_ward_region.sbx scotland_and_wales_region.shp
district_borough_unitary_ward_region.shp scotland_and_wales_region.shx
district_borough_unitary_ward_region.shx

Loading Shapefile Data into SQLite

The datasette docs currently suggest creating a SpatiaLite database using a command of the form spatialite DATABASE.db and then loading shapefiles into it using a SpatiaLite commandline command of the form: .loadshp SHAPEFILE TABLENAME CP1252 23032.

That bit of voodoo actually unpacks a bit further as:

.loadshp PATH_AND_SHAPEFILE_NAME TABLENAME FILE_ENCODING [PROJECTION]

The CP1252 file encoding is for the default Microsoft Windows Latin-1 encoding, so I’m guessing that if your shapefile use another encoding you need to change that. (Internally, SpatiaLite uses UTF-8. UTF-8 is also acceptable as a file encoding value in the above commandline command; a full list of acceptable values can be found by trying to load a shapefile using spatialite_gui.) The file encoding is a required argument, as are the path to the shapefile name and the table name. The projection is optional.

The projection relates to the projection used within the shapefile. Setting this correctly allows you to transform the data to other projections, such as the WGS84 (aka EPSG:4326, GPS, latitude / longitude) projection.

The projection is identified using its SRID (Spatial Reference System Identifier) code (lookup). If no code is provided, no projection is identified with the shapefile geodata in the database (it’s give a -1 code). (I had expected the projection to be identified from the .prj (projection) file which contains a WKT description of the projection used in each .shp shapefile.)

I didn’t meet with much success looking for a Python library to identify the required SRID from the Ordnance Survey shapefiles, but did find an online API that seemed to do the trick:

#Load in the shapefile projection details
with open('./Data/GB/district_borough_unitary_region.prj', 'r') as myfile:
  prj=myfile.read()
prj

‘PROJCS[“unnamed”,GEOGCS[“unnamed”,DATUM[“D_OSGB_1936”,SPHEROID[“Airy – 1848”,6377563,299.319997677743]],PRIMEM[“Greenwich”,0],UNIT[“degree”,0.0174532925199433]],PROJECTION[“Transverse_Mercator”],PARAMETER[“latitude_of_origin”,49],PARAMETER[“central_meridian”,-2],PARAMETER[“scale_factor”,0.999601272],PARAMETER[“false_easting”,400000],PARAMETER[“false_northing”,-100000],UNIT[“METER”,1]]’

#Look up the correpsonding SRID
import requests

#http://prj2epsg.org/apidocs.html
params = {'mode': 'wkt', 'terms':prj}
r = requests.get('http://prj2epsg.org/search.json', params=params)

#Use the first hit as the most likely SRID
srid = r.json()['codes'][0]
srid
{'name': 'OSGB 1936 / British National Grid',
'code': '27700',
'url': 'http://prj2epsg.org/epsg/27700.json' }

This tells us that the co-ordinates used in the shapefile as given as OSGB Notrhings and Eastings.

Use this code value (27700) to load our shapefile in with the correct projection code:

# Load the data into a new SpatiaLite database
! spatialite adminboundaries.db ".loadshp ./Data/GB/district_borough_unitary_region adminboundaries UTF-8 27700"
SpatiaLite version ..: 4.3.0a Supported Extensions:
- 'VirtualShape' [direct Shapefile access]
- 'VirtualDbf' [direct DBF access]
- 'VirtualXL' [direct XLS access]
- 'VirtualText' [direct CSV/TXT access]
- 'VirtualNetwork' [Dijkstra shortest path]
- 'RTree' [Spatial Index - R*Tree]
- 'MbrCache' [Spatial Index - MBR cache]
- 'VirtualSpatialIndex' [R*Tree metahandler]
- 'VirtualElementary' [ElemGeoms metahandler]
- 'VirtualXPath' [XML Path Language - XPath]
- 'VirtualFDO' [FDO-OGR interoperability]
- 'VirtualGPKG' [OGC GeoPackage interoperability]
- 'VirtualBBox' [BoundingBox tables]
- 'SpatiaLite' [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 5.1.0, June 1st, 2018
GEOS version ........: 3.6.2-CAPI-1.10.2 4d2925d6
TARGET CPU ..........: x86_64-apple-darwin17.3.0
the SPATIAL_REF_SYS table already contains some row(s)
========
Loading shapefile at './Data/GB/district_borough_unitary_region' into SQLite table 'adminboundaries'

BEGIN;
CREATE TABLE "adminboundaries" (
"PK_UID" INTEGER PRIMARY KEY AUTOINCREMENT,
"NAME" TEXT,
"AREA_CODE" TEXT,
"DESCRIPTIO" TEXT,
"FILE_NAME" TEXT,
"NUMBER" INTEGER,
"NUMBER0" INTEGER,
"POLYGON_ID" INTEGER,
"UNIT_ID" INTEGER,
"CODE" TEXT,
"HECTARES" DOUBLE,
"AREA" DOUBLE,
"TYPE_CODE" TEXT,
"DESCRIPT0" TEXT,
"TYPE_COD0" TEXT,
"DESCRIPT1" TEXT);
SELECT AddGeometryColumn('adminboundaries', 'Geometry', 27700, 'MULTIPOLYGON', 'XY');
COMMIT;

Inserted 380 rows into 'adminboundaries' from SHAPEFILE
========

The Northings/Eastings projection used by the OS shapefile is not directly plottable using many simple interactive web maps. Instead, they need GPS style latitude/longitude co-ordinates. Given that we know the projection of the original shapefile, we can create a new set of transformed co-ordinates using the required lat/long (WGS84) projection.

Let’s create a simple text file containing a SQL script to handle that transformation for us:

%%bash
echo '''
BEGIN;
ALTER TABLE adminboundaries ADD COLUMN wgs84 BLOB;
UPDATE adminboundaries SET wgs84 = Transform(Geometry, 4326);
COMMIT;
''' > project2wsg84.sql
! cat project2wsg84.sql
BEGIN;
ALTER TABLE adminboundaries ADD COLUMN wgs84 BLOB;
UPDATE adminboundaries SET wgs84 = Transform(Geometry, 4326);
COMMIT;

We can now read and execute that script against our database (once again, the file encoding appears to be required?)

! spatialite adminboundaries.db ".read project2wsg84.sql utf-8"
SpatiaLite version ..: 4.3.0a Supported Extensions:
- 'VirtualShape' [direct Shapefile access]
- 'VirtualDbf' [direct DBF access]
- 'VirtualXL' [direct XLS access]
- 'VirtualText' [direct CSV/TXT access]
- 'VirtualNetwork' [Dijkstra shortest path]
- 'RTree' [Spatial Index - R*Tree]
- 'MbrCache' [Spatial Index - MBR cache]
- 'VirtualSpatialIndex' [R*Tree metahandler]
- 'VirtualElementary' [ElemGeoms metahandler]
- 'VirtualXPath' [XML Path Language - XPath]
- 'VirtualFDO' [FDO-OGR interoperability]
- 'VirtualGPKG' [OGC GeoPackage interoperability]
- 'VirtualBBox' [BoundingBox tables]
- 'SpatiaLite' [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 5.1.0, June 1st, 2018
GEOS version ........: 3.6.2-CAPI-1.10.2 4d2925d6
TARGET CPU ..........: x86_64-apple-darwin17.3.0

Accessing the SpatiaLite database Using Datasette

Having created our database and generated a projection approropriate for plotting using an interactive web map, let’s publish it as a datasette and then see if we can access the data from the datasette API.

Here’s a trick I just learned via a tweet from @minrk for running a simple web service from within a Jupyter notebook code cell without blocking the notebook execution:

# Example of running datasette server from inside a code cell
# via https://nbviewer.jupyter.org/gist/minrk/622080cf8af787734805d12bec4ae302
from threading import Thread

def app_in_thread():
    ! datasette adminboundaries.db --load-extension=/usr/local/lib/mod_spatialite.dylib --config sql_time_limit_ms:10000 --cors

t = Thread(target=app_in_thread)
t.start()

# Alternative multitasking package
# https://github.com/micahscopes/nbmultitask
Serve! files=('adminboundaries.db',) on port 8001
[2018-06-28 18:08:04 +0100] [91760] [INFO] Goin' Fast @ http://127.0.0.1:8001
[2018-06-28 18:08:04 +0100] [91760] [INFO] Starting worker [91760]

We should now be able to access the datasette API.

#What region shall we search for a boundary for?
#How about the diamond isle just off the south coast...
region='wight'
import requests

params = {'sql': 'SELECT AsGeoJSON(wgs84) FROM adminboundaries WHERE name LIKE "%{}%"'.format(region)}
json_url = "http://localhost:8001/adminboundaries.json"

r = requests.get(json_url, params=params)
results = r.json()

[2018-06-28 18:30:29 +0100] – (sanic.access)[INFO][1:2]: GET http://localhost:8001/adminboundaries-f771041.json?sql=SELECT+AsGeoJSON%28wgs84%29+FROM+adminboundaries+WHERE+name+LIKE+%22%25wight%25%22 302 0
[2018-06-28 18:30:29 +0100] – (sanic.access)[INFO][1:2]: GET http://localhost:8001/adminboundaries-a912e25.json?sql=SELECT+AsGeoJSON%28wgs84%29+FROM+adminboundaries+WHERE+name+LIKE+%22%25wight%25%22 200 398206

# Get geojson feed served from datasette - the result is actually a string
# so convert the string to actual json, qua py dict
import json
geojson=json.loads(results['rows'][0][0])

Now let’s see if we can render that region as a map.

#folium is a handy package I've used for ages for displaying maps in Jupyer notebooks
import folium
# Get example point from geojson to help center the map
lat,lng = tuple(geojson['coordinates'][0][0][0])
#Render the map
m = folium.Map(location=(lng,lat), zoom_start=10)
m.choropleth(geo_data=geojson, line_color='blue',line_weight=3)
m

So that all seems to work… :-)

Next up, I’ll see if I can load some lat/lng points, cast them as Point datatypes, then run a query to pull out just locations within a specific region…