Making a Simple Database to Act as a Lookup for the ONS Register of Geographic Codes

Via a GSS (Government Statistical Service) blog post yesterday – Why Do We Need Another Register? – announcing the Statistical Geography Register, which contains “the codes for each type of statistical geography within the UK”, I came across mention of the ONS Register of Geographic Codes.

This register, maintained by the Office of National Statistics, is released as a zip file containing an Excel spreadsheet. Separate worksheets in the spreadsheet list codes for various geographies in England and Wales (but not Scotland; that data is available elsewhere).

To make a rather more reproducible component for accessing the data, I hacked together a simple notebook to pull the data out of the spreadsheet and pop it into a simple SQLite3 database as a set of separate tables, one per sheet.

One thing we need to do to reconcile items in the metadata sheet and the sheetnames by joining a couple of the columns together with a subscript:

xl['RGC']["codeAbbrv"] = xl['RGC']["Entity code"].map(str) + '_' + xl['RGC']["Entity abbreviation"].map(str)

The bulk of the script is quite simple (see the full notebook here):

import sqlite3
con = sqlite3.connect("onsgeocodes.sqlite")

cols=['GEOGCD','GEOGNM','GEOGNMW','OPER_DATE','TERM_DATE','STATUS']

bigcodes=pd.DataFrame(columns=['sheet']+cols)
bigcodes.to_sql(con=con, name='codelist', index=False, if_exists='replace')

sheets= list(xl.keys())
sheets.remove('For_Scotland')
for sheet in sheets[2:]:
xl[sheet].to_sql(con=con, name=sheet, index=False, if_exists='replace')
xl[sheet]['sheet']=sheet
#Reorder the columns
xl[sheet][['sheet']+cols].to_sql(con=con, name='codelist', index=False, if_exists='append')

You may also notice that it creates a “big” table (codelist) that contains all the codes – which means we can look up the provenance of a particular code:

q='''
SELECT sheet, GEOGCD, GEOGNM, GEOGNMW, codelist.STATUS, "Entity name"
FROM codelist JOIN metadata WHERE "GEOGCD"="{code}" AND codeAbbrv=sheet
'''.format(code='W40000004')
pd.read_sql_query(q, con)
sheet GEOGCD GEOGNM GEOGNMW STATUS Entity name
0 W40_CMLAD W40000004 Denbighshire Sir Ddinbych live Census Merged Local Authority Districts

We can also look to see what (current) geographies might be associated with a particular name:

q='''
SELECT DISTINCT "Entity name", sheet FROM codelist JOIN metadata
WHERE "GEOGNM" LIKE "%{name}%" AND codeAbbrv=sheet AND codelist.STATUS="live"
'''.format(name='Isle of Wight')
pd.read_sql_query(q, con)
Entity name sheet
0 Super Output Areas, Lower Layer E01_LSOA
1 Super Output Areas, Middle Layer E02_MSOA
2 Unitary Authorities E06_UA
3 Westminster Parliamentary Constituencies E14_WPC
4 Community Safety Partnerships E22_CSP
5 Registration Districts E28_REGD
6 Registration Sub-district E29_REGSD
7 Travel to Work Areas E30_TTWA
8 Fire and Rescue Authorities E31_FRA
9 Built-up Areas E34_BUA
10 Clinical Commissioning Groups E38_CCG
11 Census Merged Local Authority Districts E41_CMLAD
12 Local Resilience Forums E48_LRF
13 Sustainability and Transformation Partnerships E54_STP

What I’m wondering now is – can I crib from the way the ergast API is put together to create a simple API that takes a code, or a name, and returns geography register information related to it?

The same approach could also be applied to the registers I pull down from NHS Digital (here) – which makes me think I should generate a big codelist table for those codes too…

PS this in part reminds me of a conversation years ago with Richard et al from @cottagelabs who were mooting, at the time, a service that would take an arbitrary code and try to pattern match the coding scheme it was part of and then look it up in that coding scheme.

PPS hmm, also thinks: maybe names associated with coding schemes could be added to a simple document tagger.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: