OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Experimenting With R – Point to Point Mapping With Great Circles

I’ve started doodling again… This time, around maps, looking for recipes that make life easier plotting lines to connect points on maps. The most attractive maps seem to use great circles to connect one point with another, these providing the shortest path between two points when you consider the Earth as a sphere.

Here’s one quick experiment (based on the Flowing Data blog post How to map connections with great circles), for an R/Shiny app that allows you to upload a CSV file containing a couple of location columns (at least) and an optional “amount” column, and it’ll then draw lines between the points on each row.

greatcircle map demo

The app requires us to solve several problems, including:

  • how to geocode the locations
  • how to plot the lines as great circles
  • how to upload the CSV file
  • how to select the from and two columns from the CSV file
  • how to optionally select a valid numerical column for setting line thickness

Let’s start with the geocoder. For convenience, I’m going to use the Google geocoder via the geocode() function from the ggmap library.

#Locations are in two columns, *fr* and *to* in the *dummy* dataframe
#If locations are duplicated in from/to columns, dedupe so we don't geocode same location more than once
locs=data.frame(place=unique(c(as.vector(dummy[[fr]]),as.vector(dummy[[to]]))),stringsAsFactors=F)
#Run the geocoder against each location, then transpose and bind the results into a dataframe
cbind(locs, t(sapply(locs$place,geocode, USE.NAMES=F))) 

The locs data is a vector of locations:

                    place
1              London, UK
2            Cambridge,UK
3            Paris,France
4       Sydney, Australia
5           Paris, France
6             New York,US
7 Cape Town, South Africa

The sapply(locs$place,geocode, USE.NAMES=F) function returns data that looks like:

    [,1]       [,2]     [,3]     [,4]      [,5]     [,6]      [,7]     
lon -0.1254872 0.121817 2.352222 151.207   2.352222 -74.00597 18.42406 
lat 51.50852   52.20534 48.85661 -33.86749 48.85661 40.71435  -33.92487

The transpose (t() gives us:

     lon        lat      
[1,] -0.1254872 51.50852 
[2,] 0.121817   52.20534 
[3,] 2.352222   48.85661 
[4,] 151.207    -33.86749
[5,] 2.352222   48.85661 
[6,] -74.00597  40.71435 
[7,] 18.42406   -33.92487

The cbind() binds each location with its lat and lon value:

                    place        lon       lat
1              London, UK -0.1254872  51.50852
2            Cambridge,UK   0.121817  52.20534
3            Paris,France   2.352222  48.85661
4       Sydney, Australia    151.207 -33.86749
5           Paris, France   2.352222  48.85661
6             New York,US  -74.00597  40.71435
7 Cape Town, South Africa   18.42406 -33.92487

Code that provides a minimal example for uploading the data from a CSV file on the desktop to the Shiny app, then creating dynamic drop lists containing column names, can be found here: Simple file geocoder (R/shiny app).

The following snippet may be generally useful for getting a list of column names from a data frame that correspond to numerical columns:

#Get a list of column names for numerical columns in data frame df
nums <- sapply(df, is.numeric)
names(nums[nums])

The code for the full application can be found as a runnable gist in RStudio from here: R/Shiny app – great circle mapping. [In RStudio, install.packages("shiny"); library(shiny); runGist(9690079). The gist contains a dummy data file if you want to download it to try it out...]

Here’s the code explicitly…

The global.R file loads the necessary packages, installing them if they are missing:

#global.R

##This should detect and install missing packages before loading them - hopefully!
list.of.packages <- c("shiny", "ggmap","maps","geosphere")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(list.of.packages,function(x){library(x,character.only=TRUE)}) 

The ui.R file builds the Shiny app’s user interface. The drop down column selector lists are populated dynamically with the names of the columns in the data file once it is uploaded. An optional Amount column can be selected – the corresponding list only displays the names of numerical columns. (The lists of location columns to be geocoded should really be limited to non-numerical columns.) The action button prevents the geocoding routines firing until the user is ready – select the columns appropriately before geocoding (error messages are not handled very nicely;-)

#ui.R
shinyUI(pageWithSidebar(
  headerPanel("Great Circle Map demo"),
  
  sidebarPanel(
    #Provide a dialogue to upload a file
    fileInput('datafile', 'Choose CSV file',
              accept=c('text/csv', 'text/comma-separated-values,text/plain')),
    #Define some dynamic UI elements - these will be lists containing file column names
    uiOutput("fromCol"),
    uiOutput("toCol"),
    #Do we want to make use of an amount column to tweak line properties?
    uiOutput("amountflag"),
    #If we do, we need more options...
    conditionalPanel(
      condition="input.amountflag==true",
      uiOutput("amountCol")
    ),
    conditionalPanel(
      condition="input.amountflag==true",
      uiOutput("lineSelector")
    ),
    #We don't want the geocoder firing until we're ready...
    actionButton("getgeo", "Get geodata")
    
  ),
  mainPanel(
    tableOutput("filetable"),
    tableOutput("geotable"),
    plotOutput("geoplot")
  )
))

The server.R file contains the server logic for the app. One thing to note is the way we isolate some of the variables in the geocoder reactive function. (Reactive functions fire when one of the external variables they contain changes. To prevent the function firing when a variable it contains changes, we need to isolate it. (See the docs for me; for example, Shiny Lesson 7: Reactive outputs or Isolation: avoiding dependency.)

#server.R

shinyServer(function(input, output) {

  #Handle the file upload
  filedata <- reactive({
    infile <- input$datafile
    if (is.null(infile)) {
      # User has not uploaded a file yet
      return(NULL)
    }
    read.csv(infile$datapath)
  })

  #Populate the list boxes in the UI with column names from the uploaded file  
  output$toCol <- renderUI({
    df <-filedata()
    if (is.null(df)) return(NULL)
    
    items=names(df)
    names(items)=items
    selectInput("to", "To:",items)
  })
  
  output$fromCol <- renderUI({
    df <-filedata()
    if (is.null(df)) return(NULL)
    
    items=names(df)
    names(items)=items
    selectInput("from", "From:",items)
  })
  
  #If we want to make use of an amount column, we need to be able to say so...
  output$amountflag <- renderUI({
    df <-filedata()
    if (is.null(df)) return(NULL)
    
    checkboxInput("amountflag", "Use values?", FALSE)
  })

  output$amountCol <- renderUI({
    df <-filedata()
    if (is.null(df)) return(NULL)
    #Let's only show numeric columns
    nums <- sapply(df, is.numeric)
    items=names(nums[nums])
    names(items)=items
    selectInput("amount", "Amount:",items)
  })
  
  #Allow different line styles to be selected
  output$lineSelector <- renderUI({
    radioButtons("lineselector", "Line type:",
                 c("Uniform" = "uniform",
                   "Thickness proportional" = "thickprop",
                   "Colour proportional" = "colprop"))
  })
  
  #Display the data table - handy for debugging; if the file is large, need to limit the data displayed [TO DO]
  output$filetable <- renderTable({
    filedata()
  })
  
  #The geocoding bit... Isolate variables so we don't keep firing this...
  geodata <- reactive({
    if (input$getgeo == 0) return(NULL)
    df=filedata()
    if (is.null(df)) return(NULL)
    
    isolate({
      dummy=filedata()
      fr=input$from
      to=input$to
      locs=data.frame(place=unique(c(as.vector(dummy[[fr]]),as.vector(dummy[[to]]))),stringsAsFactors=F)      
      cbind(locs, t(sapply(locs$place,geocode, USE.NAMES=F))) 
    })
  })

  #Weave the goecoded data into the data frame we made from the CSV file
  geodata2 <- reactive({
    if (input$getgeo == 0) return(NULL)
    df=filedata()
    if (input$amountflag != 0) {
      maxval=max(df[input$amount],na.rm=T)
      minval=min(df[input$amount],na.rm=T)
      df$b8g43bds=10*df[input$amount]/maxval
    }
    gf=geodata()
    df=merge(df,gf,by.x=input$from,by.y='place')
    merge(df,gf,by.x=input$to,by.y='place')
  })
  
  #Preview the geocoded data
  output$geotable <- renderTable({
    if (input$getgeo == 0) return(NULL)
    geodata2()
  })
  
  #Plot the data on a map...
  output$geoplot<- renderPlot({
    if (input$getgeo == 0) return(map("world"))
    #Method pinched from: http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/
    map("world")
    df=geodata2()
    
    pal <- colorRampPalette(c("blue", "red"))
    colors <- pal(100)
    
    for (j in 1:nrow(df)){
      inter <- gcIntermediate(c(df[j,]$lon.x[[1]], df[j,]$lat.x[[1]]), c(df[j,]$lon.y[[1]], df[j,]$lat.y[[1]]), n=100, addStartEnd=TRUE)

      #We could possibly do more styling based on user preferences?
      if (input$amountflag == 0) lines(inter, col="red", lwd=0.8)
      else {
        if (input$lineselector == 'colprop') {
          colindex <- round( (df[j,]$b8g43bds[[1]]/10) * length(colors) )
          lines(inter, col=colors[colindex], lwd=0.8)
        } else if (input$lineselector == 'thickprop') {
          lines(inter, col="red", lwd=df[j,]$b8g43bds[[1]])
        } else lines(inter, col="red", lwd=0.8)
      } 
    } 
  })

})

So that’s the start of it… this app could be further developed in several ways, for example allowing the user to filter or colour displayed lines according to factor values in a further column (commodity type, for example), or produce a lattice of maps based on facet values in a column.

I also need to figure how to to save maps, and maybe produce zoomable ones. If geocoded points all lay within a blinding box limited to a particular geographical area, scaling the map view to show just that area might be useful.

Other techniques might include using proportional symbols (circles) at line landing points to show the sum of values incoming to that point, or some of values outgoing, or the difference between the two; (maybe use green for incoming outgoing, then size by the absolute difference?)

Written by Tony Hirst

March 24, 2014 at 11:17 am

5 Responses

Subscribe to comments with RSS.

  1. Cool stuff!

    One minor point I didn’t get. Why all the mucking about checking for packages in global.R? My understanding (which might be wrong, hence asking!) is that simply doing library() – or require() if you’re in a function – will do all this for you.

    The manual page (http://stat.ethz.ch/R-manual/R-devel/library/base/html/library.html) says ‘Both functions check and update the list of currently attached packages and do not reload a namespace which is already loaded.’ So my take here is that you’re unnecessarily duplicating this check with your code.

    Or have I missed something here?

    dougclow

    March 24, 2014 at 11:56 am

    • If you don’t have a package installed (eg it’s not part of the base distro and you haven’t pulled it down from eg CRAN yourself), then I don’t think library() will work?

      I was trying to code something that would bring in any dependencies that were missing without the user having to do anything other than be online.. (So I guess I should also download the CSV file automagically for demo purposes.. hmmm;-)

      Tony Hirst

      March 24, 2014 at 1:55 pm

      • to download a csv file from a url (e.g. form your public/data/ folder on cRunch, you can do the following:

        w = getURI(“http://crunch.kmi.open.ac.uk/people/~thirst/data/example.csv”, .encoding =”UTF-8″,.mapUnicode=F)

        data = read.csv(textConnection(w), header=T)

        or just directly:

        data = read.csv(url(“http://crunch.kmi.open.ac.uk/people/~thirst/data/example.csv”))

        Fridolin Wild

        March 27, 2014 at 12:19 pm

  2. […] I’ve started doodling again… This time, around maps, looking for recipes that make life easier plotting lines to connect points on maps.  […]

  3. […] article was first published on OUseful.Info, the blog… » Rstats, and kindly contributed to […]


Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 800 other followers

%d bloggers like this: