Trying Out Spatialite Support in Datasette

A recent datasette release included improved support for the SpatiaLite geo-extensions for SQLite, so to complement some other bits of geo-tinkering I’ve been doing lately – and which I still need to blog – I thought I’d give it a spin.

The write up is in a Jupyter notebook which you can find here and which is embedded in markdown export format below:

SpatiaLite Datasette Demo

This notebook provides a quick walkthrough of getting started with a SpatiaLite geo-database and using it with Datasette.

Get the Data

The SpatiaLite database can be used to store, index and perform various geo-related query operations on various geographical objects including points and shapefiles.

To help me get up to speed, I’m going to try to load in a shapefile delimiting various bits of the UK into a SpatiaLite database, publish it as a datasette, retrieve a boundary for a particular region from the datasette API and then plot the boundary on a map.

The shapefile I’m going to use is of UK administrative areas described by the Ordnance Survey Boundary-Line product.

You can get a copy of the data from https://www.ordnancesurvey.co.uk/opendatadownload/products.html by selecting the Boundary-Line product and providing contact details, at which point you should be emailed a download link.

Download the data and unzip it. This should create a folder named bdline_essh_gb, or something similar. (The paths used in this notebook assumes that it is running inside that directory.)

Inside the bdline_essh_gb ditectory is a Data subdirectory, and inside that is a GB subdirectory containing a variety of unpacked shapefiules, including bit not limited to:

district_borough_unitary_region.dbf parish_region.shp
district_borough_unitary_region.prj parish_region.shx
district_borough_unitary_region.shp scotland_and_wales_const_region.dbf
district_borough_unitary_region.shx scotland_and_wales_const_region.prj
district_borough_unitary_ward_region.cpg scotland_and_wales_const_region.shp
district_borough_unitary_ward_region.dbf scotland_and_wales_const_region.shx
district_borough_unitary_ward_region.prj scotland_and_wales_region.dbf
district_borough_unitary_ward_region.sbn scotland_and_wales_region.prj
district_borough_unitary_ward_region.sbx scotland_and_wales_region.shp
district_borough_unitary_ward_region.shp scotland_and_wales_region.shx
district_borough_unitary_ward_region.shx

Loading Shapefile Data into SQLite

The datasette docs currently suggest creating a SpatiaLite database using a command of the form spatialite DATABASE.db and then loading shapefiles into it using a SpatiaLite commandline command of the form: .loadshp SHAPEFILE TABLENAME CP1252 23032.

That bit of voodoo actually unpacks a bit further as:

.loadshp PATH_AND_SHAPEFILE_NAME TABLENAME FILE_ENCODING [PROJECTION]

The CP1252 file encoding is for the default Microsoft Windows Latin-1 encoding, so I’m guessing that if your shapefile use another encoding you need to change that. (Internally, SpatiaLite uses UTF-8. UTF-8 is also acceptable as a file encoding value in the above commandline command; a full list of acceptable values can be found by trying to load a shapefile using spatialite_gui.) The file encoding is a required argument, as are the path to the shapefile name and the table name. The projection is optional.

The projection relates to the projection used within the shapefile. Setting this correctly allows you to transform the data to other projections, such as the WGS84 (aka EPSG:4326, GPS, latitude / longitude) projection.

The projection is identified using its SRID (Spatial Reference System Identifier) code (lookup). If no code is provided, no projection is identified with the shapefile geodata in the database (it’s give a -1 code). (I had expected the projection to be identified from the .prj (projection) file which contains a WKT description of the projection used in each .shp shapefile.)

I didn’t meet with much success looking for a Python library to identify the required SRID from the Ordnance Survey shapefiles, but did find an online API that seemed to do the trick:

#Load in the shapefile projection details
with open('./Data/GB/district_borough_unitary_region.prj', 'r') as myfile:
  prj=myfile.read()
prj

‘PROJCS[“unnamed”,GEOGCS[“unnamed”,DATUM[“D_OSGB_1936”,SPHEROID[“Airy – 1848”,6377563,299.319997677743]],PRIMEM[“Greenwich”,0],UNIT[“degree”,0.0174532925199433]],PROJECTION[“Transverse_Mercator”],PARAMETER[“latitude_of_origin”,49],PARAMETER[“central_meridian”,-2],PARAMETER[“scale_factor”,0.999601272],PARAMETER[“false_easting”,400000],PARAMETER[“false_northing”,-100000],UNIT[“METER”,1]]’

#Look up the correpsonding SRID
import requests

#http://prj2epsg.org/apidocs.html
params = {'mode': 'wkt', 'terms':prj}
r = requests.get('http://prj2epsg.org/search.json', params=params)

#Use the first hit as the most likely SRID
srid = r.json()['codes'][0]
srid
{'name': 'OSGB 1936 / British National Grid',
'code': '27700',
'url': 'http://prj2epsg.org/epsg/27700.json' }

This tells us that the co-ordinates used in the shapefile as given as OSGB Notrhings and Eastings.

Use this code value (27700) to load our shapefile in with the correct projection code:

# Load the data into a new SpatiaLite database
! spatialite adminboundaries.db ".loadshp ./Data/GB/district_borough_unitary_region adminboundaries UTF-8 27700"
SpatiaLite version ..: 4.3.0a Supported Extensions:
- 'VirtualShape' [direct Shapefile access]
- 'VirtualDbf' [direct DBF access]
- 'VirtualXL' [direct XLS access]
- 'VirtualText' [direct CSV/TXT access]
- 'VirtualNetwork' [Dijkstra shortest path]
- 'RTree' [Spatial Index - R*Tree]
- 'MbrCache' [Spatial Index - MBR cache]
- 'VirtualSpatialIndex' [R*Tree metahandler]
- 'VirtualElementary' [ElemGeoms metahandler]
- 'VirtualXPath' [XML Path Language - XPath]
- 'VirtualFDO' [FDO-OGR interoperability]
- 'VirtualGPKG' [OGC GeoPackage interoperability]
- 'VirtualBBox' [BoundingBox tables]
- 'SpatiaLite' [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 5.1.0, June 1st, 2018
GEOS version ........: 3.6.2-CAPI-1.10.2 4d2925d6
TARGET CPU ..........: x86_64-apple-darwin17.3.0
the SPATIAL_REF_SYS table already contains some row(s)
========
Loading shapefile at './Data/GB/district_borough_unitary_region' into SQLite table 'adminboundaries'

BEGIN;
CREATE TABLE "adminboundaries" (
"PK_UID" INTEGER PRIMARY KEY AUTOINCREMENT,
"NAME" TEXT,
"AREA_CODE" TEXT,
"DESCRIPTIO" TEXT,
"FILE_NAME" TEXT,
"NUMBER" INTEGER,
"NUMBER0" INTEGER,
"POLYGON_ID" INTEGER,
"UNIT_ID" INTEGER,
"CODE" TEXT,
"HECTARES" DOUBLE,
"AREA" DOUBLE,
"TYPE_CODE" TEXT,
"DESCRIPT0" TEXT,
"TYPE_COD0" TEXT,
"DESCRIPT1" TEXT);
SELECT AddGeometryColumn('adminboundaries', 'Geometry', 27700, 'MULTIPOLYGON', 'XY');
COMMIT;

Inserted 380 rows into 'adminboundaries' from SHAPEFILE
========

The Northings/Eastings projection used by the OS shapefile is not directly plottable using many simple interactive web maps. Instead, they need GPS style latitude/longitude co-ordinates. Given that we know the projection of the original shapefile, we can create a new set of transformed co-ordinates using the required lat/long (WGS84) projection.

Let’s create a simple text file containing a SQL script to handle that transformation for us:

%%bash
echo '''
BEGIN;
ALTER TABLE adminboundaries ADD COLUMN wgs84 BLOB;
UPDATE adminboundaries SET wgs84 = Transform(Geometry, 4326);
COMMIT;
''' > project2wsg84.sql
! cat project2wsg84.sql
BEGIN;
ALTER TABLE adminboundaries ADD COLUMN wgs84 BLOB;
UPDATE adminboundaries SET wgs84 = Transform(Geometry, 4326);
COMMIT;

We can now read and execute that script against our database (once again, the file encoding appears to be required?)

! spatialite adminboundaries.db ".read project2wsg84.sql utf-8"
SpatiaLite version ..: 4.3.0a Supported Extensions:
- 'VirtualShape' [direct Shapefile access]
- 'VirtualDbf' [direct DBF access]
- 'VirtualXL' [direct XLS access]
- 'VirtualText' [direct CSV/TXT access]
- 'VirtualNetwork' [Dijkstra shortest path]
- 'RTree' [Spatial Index - R*Tree]
- 'MbrCache' [Spatial Index - MBR cache]
- 'VirtualSpatialIndex' [R*Tree metahandler]
- 'VirtualElementary' [ElemGeoms metahandler]
- 'VirtualXPath' [XML Path Language - XPath]
- 'VirtualFDO' [FDO-OGR interoperability]
- 'VirtualGPKG' [OGC GeoPackage interoperability]
- 'VirtualBBox' [BoundingBox tables]
- 'SpatiaLite' [Spatial SQL - OGC]
PROJ.4 version ......: Rel. 5.1.0, June 1st, 2018
GEOS version ........: 3.6.2-CAPI-1.10.2 4d2925d6
TARGET CPU ..........: x86_64-apple-darwin17.3.0

Accessing the SpatiaLite database Using Datasette

Having created our database and generated a projection approropriate for plotting using an interactive web map, let’s publish it as a datasette and then see if we can access the data from the datasette API.

Here’s a trick I just learned via a tweet from @minrk for running a simple web service from within a Jupyter notebook code cell without blocking the notebook execution:

# Example of running datasette server from inside a code cell
# via https://nbviewer.jupyter.org/gist/minrk/622080cf8af787734805d12bec4ae302
from threading import Thread

def app_in_thread():
    ! datasette adminboundaries.db --load-extension=/usr/local/lib/mod_spatialite.dylib --config sql_time_limit_ms:10000 --cors

t = Thread(target=app_in_thread)
t.start()

# Alternative multitasking package
# https://github.com/micahscopes/nbmultitask
Serve! files=('adminboundaries.db',) on port 8001
[2018-06-28 18:08:04 +0100] [91760] [INFO] Goin' Fast @ http://127.0.0.1:8001
[2018-06-28 18:08:04 +0100] [91760] [INFO] Starting worker [91760]

We should now be able to access the datasette API.

#What region shall we search for a boundary for?
#How about the diamond isle just off the south coast...
region='wight'
import requests

params = {'sql': 'SELECT AsGeoJSON(wgs84) FROM adminboundaries WHERE name LIKE "%{}%"'.format(region)}
json_url = "http://localhost:8001/adminboundaries.json"

r = requests.get(json_url, params=params)
results = r.json()

[2018-06-28 18:30:29 +0100] – (sanic.access)[INFO][1:2]: GET http://localhost:8001/adminboundaries-f771041.json?sql=SELECT+AsGeoJSON%28wgs84%29+FROM+adminboundaries+WHERE+name+LIKE+%22%25wight%25%22 302 0
[2018-06-28 18:30:29 +0100] – (sanic.access)[INFO][1:2]: GET http://localhost:8001/adminboundaries-a912e25.json?sql=SELECT+AsGeoJSON%28wgs84%29+FROM+adminboundaries+WHERE+name+LIKE+%22%25wight%25%22 200 398206

# Get geojson feed served from datasette - the result is actually a string
# so convert the string to actual json, qua py dict
import json
geojson=json.loads(results['rows'][0][0])

Now let’s see if we can render that region as a map.

#folium is a handy package I've used for ages for displaying maps in Jupyer notebooks
import folium
# Get example point from geojson to help center the map
lat,lng = tuple(geojson['coordinates'][0][0][0])
#Render the map
m = folium.Map(location=(lng,lat), zoom_start=10)
m.choropleth(geo_data=geojson, line_color='blue',line_weight=3)
m

So that all seems to work… :-)

Next up, I’ll see if I can load some lat/lng points, cast them as Point datatypes, then run a query to pull out just locations within a specific region…

Author: Tony Hirst

I'm a lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...