Via a GSS (Government Statistical Service) blog post yesterday – Why Do We Need Another Register? – announcing the Statistical Geography Register, which contains “the codes for each type of statistical geography within the UK”, I came across mention of the ONS Register of Geographic Codes.
This register, maintained by the Office of National Statistics, is released as a zip file containing an Excel spreadsheet. Separate worksheets in the spreadsheet list codes for various geographies in England and Wales (but not Scotland; that data is available elsewhere).
To make a rather more reproducible component for accessing the data, I hacked together a simple notebook to pull the data out of the spreadsheet and pop it into a simple SQLite3 database as a set of separate tables, one per sheet.
One thing we need to do to reconcile items in the metadata sheet and the sheetnames by joining a couple of the columns together with a subscript:
xl['RGC']["codeAbbrv"] = xl['RGC']["Entity code"].map(str) + '_' + xl['RGC']["Entity abbreviation"].map(str)
The bulk of the script is quite simple (see the full notebook here):
import sqlite3 con = sqlite3.connect("onsgeocodes.sqlite") cols=['GEOGCD','GEOGNM','GEOGNMW','OPER_DATE','TERM_DATE','STATUS'] bigcodes=pd.DataFrame(columns=['sheet']+cols) bigcodes.to_sql(con=con, name='codelist', index=False, if_exists='replace') sheets= list(xl.keys()) sheets.remove('For_Scotland') for sheet in sheets[2:]: xl[sheet].to_sql(con=con, name=sheet, index=False, if_exists='replace') xl[sheet]['sheet']=sheet #Reorder the columns xl[sheet][['sheet']+cols].to_sql(con=con, name='codelist', index=False, if_exists='append')
You may also notice that it creates a “big” table (codelist
) that contains all the codes – which means we can look up the provenance of a particular code:
q=''' SELECT sheet, GEOGCD, GEOGNM, GEOGNMW, codelist.STATUS, "Entity name" FROM codelist JOIN metadata WHERE "GEOGCD"="{code}" AND codeAbbrv=sheet '''.format(code='W40000004') pd.read_sql_query(q, con)
sheet | GEOGCD | GEOGNM | GEOGNMW | STATUS | Entity name | |
---|---|---|---|---|---|---|
0 | W40_CMLAD | W40000004 | Denbighshire | Sir Ddinbych | live | Census Merged Local Authority Districts |
We can also look to see what (current) geographies might be associated with a particular name:
q=''' SELECT DISTINCT "Entity name", sheet FROM codelist JOIN metadata WHERE "GEOGNM" LIKE "%{name}%" AND codeAbbrv=sheet AND codelist.STATUS="live" '''.format(name='Isle of Wight') pd.read_sql_query(q, con)
Entity name | sheet | |
---|---|---|
0 | Super Output Areas, Lower Layer | E01_LSOA |
1 | Super Output Areas, Middle Layer | E02_MSOA |
2 | Unitary Authorities | E06_UA |
3 | Westminster Parliamentary Constituencies | E14_WPC |
4 | Community Safety Partnerships | E22_CSP |
5 | Registration Districts | E28_REGD |
6 | Registration Sub-district | E29_REGSD |
7 | Travel to Work Areas | E30_TTWA |
8 | Fire and Rescue Authorities | E31_FRA |
9 | Built-up Areas | E34_BUA |
10 | Clinical Commissioning Groups | E38_CCG |
11 | Census Merged Local Authority Districts | E41_CMLAD |
12 | Local Resilience Forums | E48_LRF |
13 | Sustainability and Transformation Partnerships | E54_STP |
What I’m wondering now is – can I crib from the way the ergast API is put together to create a simple API that takes a code, or a name, and returns geography register information related to it?
The same approach could also be applied to the registers I pull down from NHS Digital (here) – which makes me think I should generate a big codelist table for those codes too…
PS this in part reminds me of a conversation years ago with Richard et al from @cottagelabs who were mooting, at the time, a service that would take an arbitrary code and try to pattern match the coding scheme it was part of and then look it up in that coding scheme.
PPS hmm, also thinks: maybe names associated with coding schemes could be added to a simple document tagger.