Tinkering With Data Consuming Wordpress Plugins

Now that I’ve got my own webspace on Reclaim Hosting and set up a handful of my own self-hosted WordPress blogs, I started having a tinker with some custom plugins. I’ve started off with something simple, a couple of shortcode plugins that I can use to embed a simple leaflet map into a post or a page.

datawire_–_test_blog___Just_seeing_what_the_bots_can_do…

So far, I’ve tried a couple of different approaches…

The first one has been pared down to just a default accepting shortcode:

[IWPlanningLeafletMap]

The shortcode calls a bit of code that makes a cached, (at most, three times a day) request to a morph.io scraper that returns a list of currently open for consultation planning applications to the Isle of Wight Council. (That said, I can pass in a few shortcode parameters relating to the size of the map (which is a fixed size, unfortunately – I haven’t worked out to make it flexible…) and the default location and zoom.) The scraper itself runs once daily.

scraperplugin

My feeling is that this sort of code would work best in the context of a WordPress page, acting as a destination that allows folk to just check on currently open applications.

The second plugin embeds a map that displays markers for recent house sales (as recorded by the Land Registry prices paid dataset). This dataset is published as a monthly set of data a month or two after the fact and is downloaded to my local desktop. A python script then reads in the data, creates a new WordPress post containing the shortcode with the data baked in, and uploads the post to WordPress (where it appears in the draft queue).

 alt=

In this shortcode, the marker data is currently encoded using the PHP serialise format (via the python phpserialize dumps method) and embedded in the post as the value of shortcode attribute.

[MultiMarkerLeafletMap zoom=11 lat=50.675 lon=-1.32 width=800 height=500 markers='a:112:{i:0;a:5:{s:3:"lat";d:50.699382843;s:4:"date";s:10:"2015-07-10";s:3:"lon";d:-1.29297620442;s:8:"location";s:22:"SAVOY COURT, TOWN...]

In this case, with the marker data baked into the shortcode, there’s a good argument for rendering the map within a timestamped post as a fixed map (at least, ‘fixed’ in the sense that the data is unchanging).

The PHP un/serialize route isn’t ideal because I think it raises security issues? I originally tried to pass the data as serialised JSON, but the data is in the form of a list and it seems the ] breaks things. I guess what I should really do is see if I can pass the data in as serialised JSON between the shortcode tags rather than pass it in as an attribute?

Another approach might be to define just a simple map embed shortcode, and then support additional shortcodes that add markers to the map?

My PHP coding is also a bit scrappy (and I keep forgetting the ;’s…:-( I think one thing I do need to do is pop the simple plugin code functions into a class to keep them safe, and also patch in a hack or two (as seems to be required?) so that the leaflet map libraries are only loaded into post and page headers for posts/pages that actually contain one of the map shortcodes.

I’m also thinking now I need to find a way to support boundary lines and shape colouring?

boundarymaplines

Hmm…

Experimenting With R – Point to Point Mapping With Great Circles

I’ve started doodling again… This time, around maps, looking for recipes that make life easier plotting lines to connect points on maps. The most attractive maps seem to use great circles to connect one point with another, these providing the shortest path between two points when you consider the Earth as a sphere.

Here’s one quick experiment (based on the Flowing Data blog post How to map connections with great circles), for an R/Shiny app that allows you to upload a CSV file containing a couple of location columns (at least) and an optional “amount” column, and it’ll then draw lines between the points on each row.

greatcircle map demo

The app requires us to solve several problems, including:

  • how to geocode the locations
  • how to plot the lines as great circles
  • how to upload the CSV file
  • how to select the from and two columns from the CSV file
  • how to optionally select a valid numerical column for setting line thickness

Let’s start with the geocoder. For convenience, I’m going to use the Google geocoder via the geocode() function from the ggmap library.

#Locations are in two columns, *fr* and *to* in the *dummy* dataframe
#If locations are duplicated in from/to columns, dedupe so we don't geocode same location more than once
locs=data.frame(place=unique(c(as.vector(dummy[[fr]]),as.vector(dummy[[to]]))),stringsAsFactors=F)
#Run the geocoder against each location, then transpose and bind the results into a dataframe
cbind(locs, t(sapply(locs$place,geocode, USE.NAMES=F))) 

The locs data is a vector of locations:

                    place
1              London, UK
2            Cambridge,UK
3            Paris,France
4       Sydney, Australia
5           Paris, France
6             New York,US
7 Cape Town, South Africa

The sapply(locs$place,geocode, USE.NAMES=F) function returns data that looks like:

    [,1]       [,2]     [,3]     [,4]      [,5]     [,6]      [,7]     
lon -0.1254872 0.121817 2.352222 151.207   2.352222 -74.00597 18.42406 
lat 51.50852   52.20534 48.85661 -33.86749 48.85661 40.71435  -33.92487

The transpose (t() gives us:

     lon        lat      
[1,] -0.1254872 51.50852 
[2,] 0.121817   52.20534 
[3,] 2.352222   48.85661 
[4,] 151.207    -33.86749
[5,] 2.352222   48.85661 
[6,] -74.00597  40.71435 
[7,] 18.42406   -33.92487

The cbind() binds each location with its lat and lon value:

                    place        lon       lat
1              London, UK -0.1254872  51.50852
2            Cambridge,UK   0.121817  52.20534
3            Paris,France   2.352222  48.85661
4       Sydney, Australia    151.207 -33.86749
5           Paris, France   2.352222  48.85661
6             New York,US  -74.00597  40.71435
7 Cape Town, South Africa   18.42406 -33.92487

Code that provides a minimal example for uploading the data from a CSV file on the desktop to the Shiny app, then creating dynamic drop lists containing column names, can be found here: Simple file geocoder (R/shiny app).

The following snippet may be generally useful for getting a list of column names from a data frame that correspond to numerical columns:

#Get a list of column names for numerical columns in data frame df
nums <- sapply(df, is.numeric)
names(nums[nums])

The code for the full application can be found as a runnable gist in RStudio from here: R/Shiny app – great circle mapping. [In RStudio, install.packages(“shiny”); library(shiny); runGist(9690079). The gist contains a dummy data file if you want to download it to try it out…]

Here’s the code explicitly…

The global.R file loads the necessary packages, installing them if they are missing:

#global.R

##This should detect and install missing packages before loading them - hopefully!
list.of.packages <- c("shiny", "ggmap","maps","geosphere")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(list.of.packages,function(x){library(x,character.only=TRUE)}) 

The ui.R file builds the Shiny app’s user interface. The drop down column selector lists are populated dynamically with the names of the columns in the data file once it is uploaded. An optional Amount column can be selected – the corresponding list only displays the names of numerical columns. (The lists of location columns to be geocoded should really be limited to non-numerical columns.) The action button prevents the geocoding routines firing until the user is ready – select the columns appropriately before geocoding (error messages are not handled very nicely;-)

#ui.R
shinyUI(pageWithSidebar(
  headerPanel("Great Circle Map demo"),
  
  sidebarPanel(
    #Provide a dialogue to upload a file
    fileInput('datafile', 'Choose CSV file',
              accept=c('text/csv', 'text/comma-separated-values,text/plain')),
    #Define some dynamic UI elements - these will be lists containing file column names
    uiOutput("fromCol"),
    uiOutput("toCol"),
    #Do we want to make use of an amount column to tweak line properties?
    uiOutput("amountflag"),
    #If we do, we need more options...
    conditionalPanel(
      condition="input.amountflag==true",
      uiOutput("amountCol")
    ),
    conditionalPanel(
      condition="input.amountflag==true",
      uiOutput("lineSelector")
    ),
    #We don't want the geocoder firing until we're ready...
    actionButton("getgeo", "Get geodata")
    
  ),
  mainPanel(
    tableOutput("filetable"),
    tableOutput("geotable"),
    plotOutput("geoplot")
  )
))

The server.R file contains the server logic for the app. One thing to note is the way we isolate some of the variables in the geocoder reactive function. (Reactive functions fire when one of the external variables they contain changes. To prevent the function firing when a variable it contains changes, we need to isolate it. (See the docs for me; for example, Shiny Lesson 7: Reactive outputs or Isolation: avoiding dependency.)

#server.R

shinyServer(function(input, output) {

  #Handle the file upload
  filedata <- reactive({
    infile <- input$datafile
    if (is.null(infile)) {
      # User has not uploaded a file yet
      return(NULL)
    }
    read.csv(infile$datapath)
  })

  #Populate the list boxes in the UI with column names from the uploaded file  
  output$toCol <- renderUI({
    df <-filedata()
    if (is.null(df)) return(NULL)
    
    items=names(df)
    names(items)=items
    selectInput("to", "To:",items)
  })
  
  output$fromCol <- renderUI({
    df <-filedata()
    if (is.null(df)) return(NULL)
    
    items=names(df)
    names(items)=items
    selectInput("from", "From:",items)
  })
  
  #If we want to make use of an amount column, we need to be able to say so...
  output$amountflag <- renderUI({
    df <-filedata()
    if (is.null(df)) return(NULL)
    
    checkboxInput("amountflag", "Use values?", FALSE)
  })

  output$amountCol <- renderUI({
    df <-filedata()
    if (is.null(df)) return(NULL)
    #Let's only show numeric columns
    nums <- sapply(df, is.numeric)
    items=names(nums[nums])
    names(items)=items
    selectInput("amount", "Amount:",items)
  })
  
  #Allow different line styles to be selected
  output$lineSelector <- renderUI({
    radioButtons("lineselector", "Line type:",
                 c("Uniform" = "uniform",
                   "Thickness proportional" = "thickprop",
                   "Colour proportional" = "colprop"))
  })
  
  #Display the data table - handy for debugging; if the file is large, need to limit the data displayed [TO DO]
  output$filetable <- renderTable({
    filedata()
  })
  
  #The geocoding bit... Isolate variables so we don't keep firing this...
  geodata <- reactive({
    if (input$getgeo == 0) return(NULL)
    df=filedata()
    if (is.null(df)) return(NULL)
    
    isolate({
      dummy=filedata()
      fr=input$from
      to=input$to
      locs=data.frame(place=unique(c(as.vector(dummy[[fr]]),as.vector(dummy[[to]]))),stringsAsFactors=F)      
      cbind(locs, t(sapply(locs$place,geocode, USE.NAMES=F))) 
    })
  })

  #Weave the goecoded data into the data frame we made from the CSV file
  geodata2 <- reactive({
    if (input$getgeo == 0) return(NULL)
    df=filedata()
    if (input$amountflag != 0) {
      maxval=max(df[input$amount],na.rm=T)
      minval=min(df[input$amount],na.rm=T)
      df$b8g43bds=10*df[input$amount]/maxval
    }
    gf=geodata()
    df=merge(df,gf,by.x=input$from,by.y='place')
    merge(df,gf,by.x=input$to,by.y='place')
  })
  
  #Preview the geocoded data
  output$geotable <- renderTable({
    if (input$getgeo == 0) return(NULL)
    geodata2()
  })
  
  #Plot the data on a map...
  output$geoplot<- renderPlot({
    if (input$getgeo == 0) return(map("world"))
    #Method pinched from: http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/
    map("world")
    df=geodata2()
    
    pal <- colorRampPalette(c("blue", "red"))
    colors <- pal(100)
    
    for (j in 1:nrow(df)){
      inter <- gcIntermediate(c(df[j,]$lon.x[[1]], df[j,]$lat.x[[1]]), c(df[j,]$lon.y[[1]], df[j,]$lat.y[[1]]), n=100, addStartEnd=TRUE)

      #We could possibly do more styling based on user preferences?
      if (input$amountflag == 0) lines(inter, col="red", lwd=0.8)
      else {
        if (input$lineselector == 'colprop') {
          colindex <- round( (df[j,]$b8g43bds[[1]]/10) * length(colors) )
          lines(inter, col=colors[colindex], lwd=0.8)
        } else if (input$lineselector == 'thickprop') {
          lines(inter, col="red", lwd=df[j,]$b8g43bds[[1]])
        } else lines(inter, col="red", lwd=0.8)
      } 
    } 
  })

})

So that’s the start of it… this app could be further developed in several ways, for example allowing the user to filter or colour displayed lines according to factor values in a further column (commodity type, for example), or produce a lattice of maps based on facet values in a column.

I also need to figure how to to save maps, and maybe produce zoomable ones. If geocoded points all lay within a blinding box limited to a particular geographical area, scaling the map view to show just that area might be useful.

Other techniques might include using proportional symbols (circles) at line landing points to show the sum of values incoming to that point, or some of values outgoing, or the difference between the two; (maybe use green for incoming outgoing, then size by the absolute difference?)

Amateur Mapmaking: Getting Started With Shapefiles

One of the great things about (software) code is that people build on it and out from it… Which means that as well as producing ever more complex bits of software, tools also get produced over time that make it easier to do things that were once hard to do, or required expensive commercial software tools.

Producing maps is a fine example of this. Not so very long ago, producing your own annotated maps was a hard thing to do. Then in June, 2005, or so, the Google Maps API came along and suddenly you could create your own maps (or at least, put markers on to a map if you had latitude and longitude co-ordinates available). Since then, things have just got easier. If you want to put markers on a map just given their addresses, it’s easy (see for example Mapping the New Year Honours List – Where Did the Honours Go?). You can make use of Ordnance Survey maps if you want to, or recolour and style maps so they look just the way you want.

Sometimes, though, when using maps to visualise numerical data sets, just putting markers onto a map, even when they are symbols sized proportionally in accordance with your data, doesn’t quite achieve the effect you want. Sometimes you just have to have a thematic, choropleth map:

OS thematic map example

The example above is taken from an Ordnance Survey OpenSpace tutorial, which walks you through the creation of thematic maps using the OS API.

But what do you do if the boundaries/shapes you want to plot aren’t supported by the OS API?

One of the common ways of publishing boundary data is in the form of shapefiles (suffix .shp, though they are often combined with several other files in a .zip package). So here’s a quick first attempt at plotting shapefiles and colouring them according to an appropriately defined data set.

The example is based on a couple of data sets – shapefiles of the English Government Office Regions (GORs), and a dataset from the Ministry of Justice relating to insolvencies that, amongst other things, describes numbers of insolvencies per time period by GOR.

The language I’m using is R, within the RStudio environment. Here’s the code:

#Download English Government Office Network Regions (GOR) from:
#http://www.sharegeo.ac.uk/handle/10672/50
##tmpdir/share geo loader courtesy of http://stackoverflow.com/users/1033808/paul-hiemstra
tmp_dir = tempdir()
url_data = "http://www.sharegeo.ac.uk/download/10672/50/English%20Government%20Office%20Network%20Regions%20(GOR).zip"
zip_file = sprintf("%s/shpfile.zip", tmp_dir)
download.file(url_data, zip_file)
unzip(zip_file, exdir = tmp_dir)

library(maptools)

#Load in the data file (could this be done from the downloaded zip file directly?
gor=readShapeSpatial(sprintf('%s/Regions.shp', tmp_dir))

#I can plot the shapefile okay...
plot(gor)

Here’s what it looks like:

Thematic maps for UK Government Office Regions in R

#I can use these commands to get a feel for the data contained in the shapefile...
summary(gor)
attributes(gor@data)
gor@data$NAME
#[1] North East               North West              
#[3] Greater London Authority West Midlands           
#[5] Yorkshire and The Humber South West              
#[7] East Midlands            South East              
#[9] East of England         
#9 Levels: East Midlands East of England ... Yorkshire and The Humber

#download data from http://www.justice.gov.uk/downloads/publications/statistics-and-data/courts-and-sentencing/csq-q3-2011-insolvency-tables.csv
insolvency<- read.csv("http://www.justice.gov.uk/downloads/publications/statistics-and-data/courts-and-sentencing/csq-q3-2011-insolvency-tables.csv")
#Grab a subset of the data, specifically to Q3 2011 and numbers that are aggregated by GOR
insolvencygor.2011Q3=subset(insolvency,Time.Period=='2011 Q3' & Geography.Type=='Government office region')

#tidy the data - you may need to download and install the gdata package first
#The subsetting step doesn't remove extraneous original factor levels, so I will.
require(gdata)
insolvencygor.2011Q3=drop.levels(insolvencygor.2011Q3)

names(insolvencygor.2011Q3)
#[1] "Time.Period"                 "Geography"                  
#[3] "Geography.Type"              "Company.Winding.up.Petition"
#[5] "Creditors.Petition"          "Debtors.Petition"  

levels(insolvencygor.2011Q3$Geography)
#[1] "East"                     "East Midlands"           
#[3] "London"                   "North East"              
#[5] "North West"               "South East"              
#[7] "South West"               "Wales"                   
#[9] "West Midlands"            "Yorkshire and the Humber"
#Note that these names for the GORs don't quite match the ones used in the shapefile, though how they relate one to another is obvious to us...

#So what next? [That was the original question...!]

#Here's the answer I came up with...
#Convert factors to numeric [ http://stackoverflow.com/questions/4798343/convert-factor-to-integer ]
#There's probably a much better formulaic way of doing this/automating this?
insolvencygor.2011Q3$Creditors.Petition=as.numeric(levels(insolvencygor.2011Q3$Creditors.Petition))[insolvencygor.2011Q3$Creditors.Petition]
insolvencygor.2011Q3$Company.Winding.up.Petition=as.numeric(levels(insolvencygor.2011Q3$Company.Winding.up.Petition))[insolvencygor.2011Q3$Company.Winding.up.Petition]
insolvencygor.2011Q3$Debtors.Petition=as.numeric(levels(insolvencygor.2011Q3$Debtors.Petition))[insolvencygor.2011Q3$Debtors.Petition]

#Tweak the levels so they match exactly (really should do this via a lookup table of some sort?)
i2=insolvencygor.2011Q3
i2c=c('East of England','East Midlands','Greater London Authority','North East','North West','South East','South West','Wales','West Midlands','Yorkshire and The Humber')
i2$Geography=factor(i2$Geography,labels=i2c)

#Merge the data with the shapefile
gor@data=merge(gor@data,i2,by.x='NAME',by.y='Geography')

#Plot the data using a greyscale
plot(gor,col=gray(gor@data$Creditors.Petition/max(gor@data$Creditors.Petition)))

And here’s the result:

Thematic map via augmented shapefile in R

Okay – so it’s maybe not the most informative of maps, it needs a scale, the London data is skewed, etc etc… But it shows that the recipe seems to work..

(Here’s a glimpse of how I worked my way to this example using a question to Stack Overflow: Plotting Thematic Maps in R Using Shapefiles and Data Files from DIfferent Sources (note: better solutions may have since been posted to that question, and which may improve on the recipe provided in this post…)

PS If the R thing is just too scary, here’s a recipe for plotting data using shapefiles in Google Fusion Tables [PDF] (alternative example) that makes use of the ShpEscape service for importing shapefiles into Fusion Tables (note that shpescape can be a bit slow converting an uploaded file and may appear to be doing nothing much at all for 10-20 minutes…). See also: Quantum GIS

Innovations in Campus Mapping

A post on the ever interesting Google Maps Mania blog alerted me to a new interactive campus map produced by the Marketing folk at Loughborough University built on top of Google maps (I wonder if this is partly driven by Loughborough’s seemingly close ties with the Google education folk? If so, I wonder if they’re doing anything interesting relating to search in education too?)

Loughborough campus map maps.lboro.ac.uk

The layers are reminiscent of the layers in the Southampton campus map (e.g. Open Data Powered Location Based Services in UK Higher Education).

A simpler approach is taken by Liverpool University, who have bookmarked a few markers onto a map:

Liverpool campus map

Bristol University’s map appears to fall somewhere between the two in terms of complexity:

Bristol U campu map

Have any other UK HEIs rolled out interactive maps built on top of a third party API such as Google’s or maybe OpenStreetmap?

I know that Lincoln and Southampton are also currently battling it out in the “who can do the best 3D models of university buildings in Google Earth” (e.g. Lincoln’s KML file – h/t @alexbilbie;-). Any other UK HEIs with 2.5/3D building model layers? [h/t re: 2.5D to @gothwin] (Why’s this useful? Because if you can identify buildings placed as 3D models in Google Earth, you also have the lat/long data for those building profiles;-) Of course, just having building markers on a Google Map would also give you that geo-location data.

Lincoln U 3d layers/kml in google earth

I wonder whether any universities have Google Streetmap (or equivalent) style navigation? Where campuses are town centre based, and built on public roads, I guess they may be? I also wonder how many universities have started marking up locations as Google Places, or places on other location based services (I notice the Lincoln KML layer links through to a venue/place definition on Foursquare). This sort of detail would then support services based around checkins. For a homebrew example of what this might look like, see the OU’s check-in app: wayOU – mobile location tracking app using linked data.

A crude alternative, for promo purposes at least, is the campus tour, of course. Here’s one from several years ago looking at the OU’s Walton Hall campus:

Any others out there? Or any other map/location based initiatives being carried out by UK HEIs, whether similar to the above mentioned ones, or maybe even something completely different?!;-)

UPDATE: I’m guessing interactive maps are tending towards the norm by now, but as an when I come across them, I;ll try to add links here:
Uni of York interactive campus map (Google map; bootstrapped from an SU app? Makes me think of Student as Producer and the LNCD initiative…;-)