Each new year, one of my half-made resolutions is to pull together an opendata hub for my local area. Being an island, the clear boundary also provides opportunities for monitoring inflows and outflows that may be harder to identify in other areas.
And each year, it never happens, although each year I do dabble a bit with local data gleaned from national datasets.
So this year, I’m gonna to try to start pulling recipes into one place, and try to get into the habit of adding new datasets as tables in the same database. The database model will probably be hit and miss, but I’ll try to work out sensible primary and secondary keys where I can. A Linked Data model would probably make most sense, but having to spend time getting the modelling right is too much of an overhead and a distraction.
A couple of things I want to try to achieve:
- producing recipes that scale out; my interest is in grabbing and representing data from national datasets at a local level. The recipes should be easily keyed with identifiers that allow views to be rendered as easily for other areas as for the Isle of Wight;
- producing some sort of consistent workflow and coherent architecture where I can easily add more data views whenever I get to grips with a new dataset.
A great example of this, and one I really should take a couple of days holiday off to work in, comes in the form of the Trafford Datalab Ward profiler.
This is just packed with goodness:
- the ability to select different data topics and subtopics within those;
- choropleth maps and associated, sorted lollipop charts (the only chart type that looks to be supported at the moment);
- the ability to download the data;
- the ability to see the R code used to grab the data.
The source for the site / app is available as traffordDataLab/ward_profiler and the data and data grabbers can be seen at traffordDataLab/ward_data/.
The ward_data
repo is such a lovely, simple idea: topic areas with separate R code files to grab the data for a particular subtopic and dump it as a CSV.
Columns are normalised to:
area_code | area_name | indicator | period | measure | unit | value |
---|
The R code often seems to pull local data from national datasets using literal, embedded codes, but it would be too much overhead to recast all these as arguments and pop them at the top of each file where they can be easily changed, or provide a means of passing them into the file from a parameter file.
The next step might them be to automate the production of the parameter file from a text look-up (eg “Isle of Wight”, “Trafford”) against various code lookup tables.
My first impression was that it could also make sense to extract mutables, such as URLs (data URLs often change with updated datasets) into a separate file too, or at least parameterise them and move them to the top of the R files. But new version datafiles relating to a particular dataset sometimes require tweaks to parsers, even if the datasets are nominally the same. This issue of consistency is one of the downsides of data released as files (CSV etc) rather than accessed via an API. (The advantage is that you can often get the bulk data more easily as a file…) But it would make more sense (using the file based data grab route) to version each data grabber file, creating an ongoing series of forks for each new release, rather than change the last most recent version directly. This would also ensure that history keeps working and would let you compare datasets over several releases. It would still make sense to pull codes and URLs etc out into parameters at the top of the file. The codes at the top (a user area), to allow folk to easily grab data for another area; then the URLs in a maintenance area below, that could be readily changed when updating the file for a different release of the same data.
On the other hand, moving to API datagrabs may make more sense still. The R statxplorer
from the House of Commons Library’s Oli Hawkins could help there, at least when pulling on data from the DWP Stat-Xplore API (Stat-Xplore website).
The way the Trafford Data Lab have arranged their spatial datasets is also interesting — a repo containing shapefiles for different boundaries: traffordDataLab/spatial_data, which look like they might well have been pulled from the ONS geoportal boundaries collection and then converted to geojson. (One thing I think that could be useful would be a set of recipes for generating the different boundary sets identified by the Trafford data for arbitrary local authority areas. (i.e. allowing users to extract data for just their local area from a national dataset). related to this, I have a crude sketchbook here showing how to to grab some of the boundaries into a simple spatialite database.)
As well as the ward profiler, the Trafford Data Lab have also produced a range of other single web page apps that are generated from code that extracts local data from national datasets and constructs the a static index.html file that defines the app.
For example, this road traffic accident app:
is generated from a single Rmd file that can be found in this repo.
The Trafford Data Lab approach is elegant in so many ways: standardised data tables; scripts for pulling data into standardised data tables at a local level from national datasets; a ward data viewer that scales easily just by adding another item to a form in the single page web app that defines it; bespoke single web page app generators that build static single page web apps containing just the local data they need extracted from national datasets.
I’m not sure if a similar approach is being followed by folk over at the House of Commons Library, but they also seem to have got onto a roll lately with their Constituency Dashboards.
These dashboards scale views at a constituency level (with displays often rendered at ward level, or other smaller appropriate geographical area within the constituency:
The apps are built on top of Microsoft Power BI, and while code doesn’t appear to be available, you can grab the constituency level data from the app:
Some of the apps are also paged, though it seems that in some cases that’s not working so well… I don’t like the social media trackers much either…
PS To do: it’s been a long time since I had a look around the open data space. On the to do list: check out what ODI Leeds have been up to recently (repos). For example, West Yorkshire Mapping looks interesting… or at least, it would be if recipes for generating those different geodata slices were available too… Bath Hacked is another group who’s recent projects I need to catch up with (repos, though they looked to have stalled somewhat?). Data Plymouth / DataPlay is another, though I always used to find it hard to find any actual code outputs from their events…?