Trying to Get Hold of UK Air Quality Data Via a Python API

It’s that time of year again for prepping the end of course assessment material for our TM351 Data Management and Analysis course (not that I typically have much to do with preparing such things…!).

The end of course assessment is typically framed as a data project that requires students linking several datasets and finding interesting to say about them. This final project is set up via a continuous assessment activity that introduces one of the datasets and gets students started working with it – exploring what the dataset looks like, getting it into a database, generating some basis charts from it and starting to formulate some questions around it.

As with many of the data activities, my preference is for ones that makes use of national datasets with local relevance. This can add variety — if students compare data from three local authorities selected from across the UK, there’s a good chance they might select different locations — and it also provides them with the opportunity to carry out a data investigations for their local area using data that they may not have been aware even existed…

This year’s topic is likely to be bootstrapped around air quality data. Sites such as the London Air Quality network make data available for London boroughs, but it’d nice to be able to offer data fro a more national scope.

Looking at Defra’s UK Air website, data does seem to be available for sites across the UK, but the download form is horrible, hugely restrictive on the amount of data you can download, and not obviously open in the creation of URLs that can be machine generated and used to programmatically download data.

Which is not ideal…

However, it does seem that an API exists for R users in the form of David Carslaw’s openair package. So how does that work, then???

Poking around in the code, it seems that sampling site metadata as well air quality sample data is available via .Rdata packages.

Hmm… a bit more poking in the code turns up some URL patterns, and a quick search turns up for Python packages that can read .Rdata packages without the need to install R turns up pyreadr.

So here’s a quick first attempt at a Python downloader for UK air quality data:


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

The location IDs look a bit ad hoc/made up, but there is lat/long data, so it should be easy enough to call something like postcodes.io to find some rather more standardised administrative codes.

With a couple of tiny functions, it should be easy enough to grab data from the metadata dataframe to generate a simple ipywidgets powered UI that lets you select a local authority by name, perhaps pre-filtered to LAs within a particular selected region, and download just the data for that authority.

But that, as they say, is an exercise left for the reader…

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: