The launch or official opening or whatever it was of the Open Data Institute this week provided another chance to grab a snapshot of notable folk in the community, as for example demonstrated by people commonly followed by users of the #ODIlaunch hashtag on Twitter. The PR campaign also resulted in the appearance of some open data related use cases, such as a report in the Economist about an analysis by MastodonC and Prescribing Analytics mapping prescription charges (R code available), with a view to highlighting where prescriptions for branded, as opposed to the recommended generic, drugs are being issued at wasteful expense to the NHS. (See Exploring GP Practice Level Prescribing Data for some of my entry level doodlings with prescription data.)
Quite by chance, I’ve been looking at some other health data recently, (Quick Shiny Demo – Exploring NHS Winter Sit Rep Data), which has been a real bundle of laughs. Looking at a range of health related datasets, data seems to be published at a variety of aggregation levels – individual practices and hospitals, Primary Care Trusts (PCTs), Strategic Health Authorities (SHAs) and the new Clinical Commissioning Groups (CCGs). Some of these map on to geographical regions, that can then be coloured according to a particular measure value associated with that area.
I’ve previously experimented with rendering shapefiles and choropleth maps (Amateur Mapmaking: Getting Started With Shapefiles) so I know R provides one possible environment for generating these maps, so I thought I’d try to pull together a recipe or two for supporting the creation of thematic maps based on health related geographical regions.
A quick trawl for PCT shapefiles turned up nothing useful. @jenit suggested @mastodonc, and @paulbradshaw pointed me to a dataset on Google Fusion Tables, discovered through the Fusion Tables search engine, that included PCT geometry data. So no shapefiles, but there is exportable KML data from Fusion Tables.
At this point I should have followed Paul Bradshaw’s advice, and just uploaded my own data (I was going to start out with mapping per capita uptake of dental services by PCT) to Fusion Tables, merging with the other data set, and generating my thematic maps that way.
But that wasn’t quite the point, which was actually an exercise in pulling together an R based recipe for generating these maps…
Anyway, I’ve made a start, and here’s the code I have to date:
##Example KML: https://dl.dropbox.com/u/1156404/nhs_pct.kml ##Example data: https://dl.dropbox.com/u/1156404/nhs_dent_stat_pct.csv install.packages("rgdal") library(rgdal) library(ggplot2) #The KML data downloaded from Google Fusion Tables fn='nhs_pct.kml' #Look up the list of layers ogrListLayers(fn) #The KML file was originally grabbed from Google Fusion Tables #There's only one layer...but we still need to identify it kml=readOGR(fn,layer='Fusiontables folder') #This seems to work for plotting boundaries: plot(kml) #And this: kk=fortify(kml) ggplot(kk, aes(x=long, y=lat,group=group))+ geom_polygon() #Add some data into the mix #I had to grab a specific sheet from the original spreadsheet and then tidy the data little... nhs <- read.csv("nhs_dent_stat_pct.csv") kml@data=merge(kml@data,nhs,by.x='Name',by.y='PCT.ONS.CODE') #I think I can plot against this data using plot()? plot(kml,col=gray(kml@data$A.30.Sep.2012/100)) #But is that actually doing what I think it's doing?! #And if so, how can experiment using other colour palettes? #But the real question is: HOW DO I DO COLOUR PLOTS USING gggplot? ggplot(kk, aes(x=long, y=lat,group=group)) #+ ????
Here’s what an example of the raw plot looks like:
And the greyscale plot, using one of the dental services uptake columns:
Here’s the base ggplot() view:
However, I don’t know how to actually now plot the data into the different areas? (Oh – might this help? CRAN Task View: Analysis of Spatial Data.)
If you know how to do the colouring, or ggplotting, please leave a comment, or alternatively, chip in an answer to a related question I posted on StackOverflow: Plotting Thematic Maps from KML Data Using ggplot2
PS The recent Chief Medical Officer’s Report makes widespread use of a whole range of graphical devices and charts, including cartograms:
Is there R support for cartograms yet, I wonder?! (Hmmm… maybe?)
PPS on the public facing national statistics front, I spotted this job ad yesterday – Head of Rich Content Development, ONS:
The postholder is responsible for inspiring and leading development of innovative rich content outputs for the ONS website and other channels, which anticipate and meet user needs and expectations, including those of the Citizen User. The role holder has an important part to play in helping ONS to realise its vision “for official statistics to achieve greater impact on key decisions affecting the UK and to encourage broader use across the country”.
1.Inspires, builds, leads and develops a multi-disciplinary team of designers, developers, data analysts and communications experts to produce innovative new outputs for the ONS website and other channels.
2. Keeps abreast of emerging trends and identifies new opportunities for the use of rich web content with ONS outputs.
3. Identifies new opportunities, proposes new directions and developments and gains buy in and commitment to these from Senior Executives and colleagues in other ONS business areas.
4. Works closely with business areas to identify, assess and commission new rich-content projects.
5. Provides, vision, guidance and editorial approval for new projects based on a continual understanding of user needs and expectations.
6. Develops and manages an ongoing portfolio of innovative content, maximising impact and value for money.
7. Builds effective partnerships with media to increase outreach and engagement with ONS content.
8. Establishes best practice in creation of rich content for the web and other channels, and works to improve practice and capability throughout ONS.