Category: Open Data

Grabbing Screenshots of folium Produced Choropleth Leaflet Maps from Python Code Using Selenium

I had a quick play with the latest updates to the folium python package today, generating a few choropleth maps around some of today’s Gov.UK data releases.

The problem I had was that folium generates an interactive Leaflet map as an HTML5 document (eg something like an interactive Google map), but I wanted a static image of it – a png file. So here’s a quick recipe showing how I did that, using a python function to automatically capture a screengrab of the map…

First up, a simple function to get a rough centre for the extent of some boundaries in a geoJSON boundaryfile containing the boundaries for LSOAs in the Isle of Wight:

#GeoJSON from
import json
import fiona

centre_lon, centre_lat=((bounds[0]+bounds[2])/2,(bounds[1]+bounds[3])/2)

Now we can get some data – I’m going to use the average travel time to a GP from today’s Journey times to key services by lower super output area data release and limit it to the Isle of Wight data.

import pandas as pd




The next thing to do is generate the map – folium makes this quite easy to do: all I need to do is point to the geoJSON file (geo_path), declare where to find the labels I’m using to identify each shape in that file (key_on), include my pandas dataframe (data), and state which columns include the shape/area identifiers and the values I want to visualise (columns=[ID_COL, VAL_COL]).

import folium
m = folium.Map([centre_lat,centre_lon], zoom_start=11)

    columns=['LSOA_code', 'GPPTt'],
    fill_color='PuBuGn', fill_opacity=1.0

The map object is included in the variable m. If I save the map file, I can then use the selenium testing package to open a browser window that displays the map, generate a screen grab of it and save the image, and then close the browser. Note that I found I had to add in a slight delay because the map tiles occasionally took some time to load.

#!pip install selenium

import os
import time
from selenium import webdriver


#Save the map as an HTML file

#Open a browser window...
browser = webdriver.Firefox()
#..that displays the map...
#Give the map tiles some time to load
#Grab the screenshot
#Close the browser

Here’s the image of the map that was captured:


I can now upload the image to WordPress and include it in an automatically produced blog post:-)

PS before I hit on the Selenium route, I dabbled with a less useful, but perhaps still handy library for taking screenshots: pyscreenshot.

#!pip install pyscreenshot
import pyscreenshot as ImageGrab

im=ImageGrab.grab(bbox=(157,200,1154,800)) # X1,Y1,X2,Y2
#To grab the whole screen, omit the bbox parameter'screenAreGrab.png',format='png')

The downside was I had to find the co-ordinates of the area of the screen I wanted to grab by hand, which I couldn’t find a way of automating… Still, could be handy…

A Quick Look Around the Open Data Landscape

I haven’t done a round up of open data news for a bit, so I here’s a quick skim through some of my current open browser tabs on the subject.

First up, the rather concerning news that DCLG [are] to withhold funding from district over transparency “failings”. Noticing a failure to publish information about a termination of employment pay settlement, it was also noticed in a letter to Rother District Council that it “appears from your website that your council has not published data in a number of important areas, for example, contracts over £5,000, land and assets, senior salaries, an organisation chart, trade union facility time, parking revenues, grants to the voluntary sector and the like”, in contravention of the Local Government Transparency Code 2014.

From running several open data training days for representatives of local councils and government departments on behalf of Open Knowledge recently, I know from looking at council websites that finding “transparency” data on official websites is not always that simple. (The transparency” keyword sometimes(?!) works, “open data” frequently doesn’t…; often using a web search engine with a site: search limit on the council website works better than the local search.) I’m still largely at a loss as to what can usefully be done with things like spending data, though I do have a couple of ideas about how we might feed some of that data into some small investigations…

However, I’m not convinced that punishing councils by withholding funding is the best approach to promoting open data publication. On the other hand, promoting effective data workflows that naturally support open data publishing (and hopefully, from that, “an increase in reuse within the organisation as data availability/awareness/accessibility improves”) and encouraging effective transparency through explaining how decisions were made in the context of available data (whilst at the same time making data available so that the data basis of those decisions can be checked) would both seem to be more useful approaches?

The funding being withheld from Rother Council seems to be the new burdens funding, which presumably helps cover the costs of publishing transparency data. Something that’s bugged me over the years (eg Some Sketchnotes on a Few of My Concerns About #opendata) is how privatisation of contracts is associated with several asymmetries in the public vs private provision of services. On the one hand, public bodies have transparency and freedom of information “burdens” placed on them, which means: 1) they take a financial hit, needing to cover the costs of meeting those burdens; 2) they are accountable, in that the public has access to certain sorts of information about their activities. Private contractors are not subject to the same terms, so not only can they competitively bid less than public bodies for service delivery, they also get to avoid the same public disclosure requirements about their activities and potentially are less accountable than the public body counterparts; who remain overall accountable as commissioners of the services; but presumably have to cover the costs of that accountability, as well as the administrative overheads of managing the private contracts.

Now it seems that the ICO is call[ing] for greater transparency around government outsourcing, publishing a report and a roadmap on the subject that recommend a “transparency by design” approach in which councils should:

– Make arrangements to publish as much information as possible, including the contract and regular performance information, in open formats with a licence permitting re-use.

– When drawing up the contract, think about any types of information that the contractor will hold on their behalf eg information that a public authority would reasonably need to see to monitor performance. Describe this in an annex to the contract. This is itself potentially in scope of a FOIA request.

– Set out in the contract the responsibilities of both parties when dealing with FOIA requests. Look at standard contract terms (eg the Model Services Contract ) for guidance.

Around about the same time, a Cabinet Office policy document on Transparency of suppliers and government to the public also appeared. Whilst on the one hand “Strategic Suppliers to Government will supply data on a contract basis that is then aggregated to the departmental level and aggregated again to the Government level” means it should be easier to see how much particular companies receive from government (assuming they don’t operate as BigCo Red Ltd, BigCo Blue Ltd, BigCo Purple Ltd, using a different company for each contract) it does mean that data can presumably also be aggregated to a point of meaninglessness.

(Just by the by, I tried looking through various NHS Commissioning Group and NHS Trust spending datasets looking to see how much was going to Virgin Care and other private providers. Whilst I could see from news reports and corporate websites that those operators were providing services in particular areas, I couldn’t find any spend items associated with them. Presumably I was looking in the wrong place… but if so, it suggests that even if you do have a question about spend in a particular context with a particular provider, it doesn’t necessarily follow that even if you think you know how to drive this open transparency data stuff, you’ll get anywhere…)

When looking at affordability of contracts, and retaining private vs public contractors, it would seem only fair that any additional costs associated with the contracting body having to meet transparency requirements “on behalf of” the private body should be considered part of the cost of the contract. If private bodies complain that this gives an unfair advantage to public bodies competing for service provision, they can perhaps opt-in to FOI regulations and transparency codes and cover the costs of disclosure of information themselves to level the playing field that way?

Another by the by… in appointments land, Mike Bracken has been appointed the UK’s first Chief Data Officer (CDO), suggesting that we should talk about “data as a public asset. In this regard, the National Information Infrastructure still appears to be a thing, for the moment at least. An implementation document was published in March that has some words in it (sic…!)…

As purdah approached, there was a sudden flurry of posts on the blog. Four challenges for the future of Open Data identified the following as major issues:

  • Pushing Open Data where it is not fully embraced
  • Achieving genuine (Open) Data by default; (this actually seems to be more about encouraging open data workflows/open data (re)use – “a general move to adopt data practice into the way public services are run”)
  • Improving public confidence in Open Data
  • Improving (infra)structure around Open Data

The question is – how to best address them? I think that Open Knowledge has delivered all the open data training sessions it was due to deliver under the open data voucher scheme, which means my occasional encounters with folk tasked with open data delivery from councils and government departments may have come to an end via that route; which is a shame, because I felt we never really got a chance to start building on those introductory sessions…

The Cabinet Office also made a state of the nation announcement to finish off the parliamentary session by announcing the Local authorities setting standards as Open Data Champions. A quick skim down the list seems to suggest that the champions are typically councils that have started their own datastore…

Validating Local Spending Data

In passing, I noticed on the Local Government Association (LGA) website a validator for checking the format of documents used to publish local council spending data, as well as various other data releases (contracts, planning applications, toilet locations, land holdings etc): LGA OpenData Schema Validator.


I wonder how many councils are publishing new releases that actually validate, and how many have “back-published” historical data releases using a format that validates?! When officers publish data files, I wonder how many of them even try to download and open the data files they have just published (to check the links work, the documents open as advertised, and also appear to contain what’s expected), let alone run either the uploaded or downloaded files through the validator (it often makes sense to do both: check the file validates before you publish it, then download it and check the downloaded version, just in case the publishing process has somehow mangled the file…)

Guidance for the spending data releases can be found here: Local government open data schemas: Spend

Documentation regarding the release of procurement and spending information (v. 1.1 dated 14/12/2014) can be found here: Local transparency guidance – publishing spending and procurement information.

I’ve still no real idea how to make interesting use of this data, or how DCLG expect folk to make use of it?!;-)

Pondering Local Spending Data, Again…

Last night I saw a mention of a budget review consultation being held by the Milton Keynes Council. I’ve idly wondered before about whether spending data could be used to inform these consultations, for example by roleplaying what the effects of a cut to a particular spending area might be at a transactional level. (For what it’s worth, I’ve bundled together the Milton Keynes spending data into a single (but uncleaned) CSV file here and posted the first couple of lines of a data conversation with it here. One of the things I realised is that I still don’t know how to visualise data by financial year, so I guess I need to spend some time looking at pandas timeseries support).

Another transparency/spending data story that caught my eye over the break was news of how Keighley Town Council had been chastised for its behaviour around various transparency issues (see for example the Audit Commission Report in the public interest on Keighley Town Council). Among other things, it seems that the council had “entered into a number of transactions with family members of Councillors and employees” (which makes me think that an earlier experiment I dabbled with that tried to reconcile councillor names with: a) directors of companies in general; b) directors of companies that trade with a council may be a useful tool to work up a bit further). They had also been lax in ensuring “appropriate arrangements were in place to deal with non-routine transactions such as the recovery of overpayments made to consultants”. I’ve noted before that where a council publishes all its spending data, not just amounts over £500, including negative payments, there may be interesting things to learn (eg Negative Payments in Local Spending Data).

It seems that the Audit Commission report was conducted in response to a request from a local campaigner (Keighley investigation: How a grandmother blew whistle on town council [Yorkshire Post, 20/12/14]). As you do, I wondered whether the spending data might have sent up an useful signals about any of the affairs the auditors – and local campaigners – took issue with. The Keighley Town Council website doesn’t make it obvious where the spending data can be found – the path you need to follow is Committees, then Finance and Audit, then Schedule of payments over £500 – and even then I can’t seem to find any data for the current financial year.

The data itself is published using an old Microsoft Office .doc format:


The extent of the data that is published is not brilliant… In terms of usefulness, this is pretty low quality stuff…


Getting the data, such as it is, into a canonical form is complicated by the crappy document format, though it’s not hard to imagine how such a thing came to be generated (council clerk sat using an old Pentium powered desktop and Windows 95, etc etc ;-). Thanks to a tip off from Alex Dutton, unoconv can convert the docs into a more usable format (apt-get update ; apt-get install -y libreoffice ; apt-get install -y unoconv); so for example, unoconv -f html 2014_04.doc converts the specified .doc file to an HTML document. (I also had a look at getting convertit, an http serverised version of unoconv, working in a docker container, but it wouldn’t build properly for me? Hopefully a tested version will appear on dockerhub at some point…:-)

This data still requires scraping of course… but I’m bored already…

PS I’m wondering if it would be useful to skim through some of the Audit Commission’s public interest reports to fish for ideas about interesting things to look for in the spending data?