Running Spatial Queries on Food Standards Agency Data Using Ordnance Survey Shapefiles in SpatiaLite via Datasette

In Datasette ClusterMap Plugin – Querying UK Food Standards Agency (FSA) Food Hygiene Ratings Open Data I showed how the datasette cluster map plugin could be used to display a map with clustered markers showing the location of results on an interactive map from a datasette query that returned latitude and longitude columns.

Since then, datasette has added support for SpatiaLite, a SQLite extension that supports spatial queries. In the previous posts on this topic, (Trying Out Spatialite Support in Datasette and Working With OpenStreetMap Roads Data Using osmnx), I’ve started trying to get my head round how to make use of this more powerful tool for managing geo-related data, in particular grabbing Ordnance Survey shapefiles that can be used to filter queries about the UK road network made to OpenStreetMap using the incredibly power osmnx Python package.

It’s been a hard slog so far – I’m not really up to speed on geo-tools, what they can do, and what representations work with what – but I’m slowly starting to get there. (Maybe I should have read a book, or done a course, rather than just trying to learn by doing!)

The notebook that contains the code described in this post can be found here. In particular, I’ve been looking at how to run SpatiaLite within() queries to find points within an area, as well as looking a bit more at some the the utility functions osmnx exposes.

To start with, let’s grab some data. I’m going to use the Food Standards Agency food hygiene ratings again:

One thing I should probably do is look at having a variant of that datagrabber that grabs the data into a SpatiaLite database. The original data contains latitude and longitude co-ordinates. We can create a suitable geometry column in the same table and then add a WKT description of the location. (I think the columns should actually be in the order ‘Latitude’ || ‘Longitude’?)

The locations are given using the lat/long (WGS84 / EPSG 4326) projection. However, the Ordnance Survey boundaries I obtained via the OS BoundaryLine dataset are in the OSGB (EPSG 27700) projection. But being a spatial database, SpatiaLite can help when it comes to applying the transformation between those projections:

As I’m trying to build workflows/tools around the datasette API to the database, here’s a first attempt at a crude wrapper for it:

Here’s how part of the FSA data table looks now:

One of the issues I noticed with running queries searching for locations within an area delimited by a shapefile is that they can take a long time to run. However, we can create an index on a shapefile that describes a bounding box for the shapefile which we can use as a first pass to find points broadly within an area:

We can then use a query on the index / bounding box as a subselect before running the slower query to check that a point actually lies within the possibly ragged region:

We can now use the data in the dataframe as the basis for a map display, in this case using marker clusters (the screenshot shows an exploded cluster):

The FSA data seems to use postcode centroids as the geolocation co-ordinates (at least, I’m guessing so from the overlaps). The folium/leaflet package I’m using to display maps overlays markers that are colcated so you only see the last one. There is a leaflet plugin to handle this, but it’s not currently available via folium, so for now, let’s just randomly perturb markers a bit so that we can see colocated ones:

To check that the markers are located inside the desired boundary, we can also plot the boundaryline on the map:

One of the things that I noticed from OpenStreetMap is that it actually geocodes the locations of points of interest (POIs) more accurately. The osmnx package includes a handy utility for geocoding an address, so I should be able to use this to improve the geolocation data of the FSA rated establishments?

The osmnx package also lets us search for POIs within boundary area, presented as a Shapely geometry object. We can generate such an object directly from a GeoJSON shapefile:

We can then use this as the basis for a query onto OSM by amenity tag:

There are lots of different amenity tags supported, so this could provide quite a useful lookup tool. For example, I wondering about what sorts of amenity it might be handy for parish councillors to be able to lookup via a hyperlocal / parish council geo-service?

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: