## Experimenting With R – Point to Point Mapping With Great Circles

I’ve started doodling again… This time, around maps, looking for recipes that make life easier plotting lines to connect points on maps. The most attractive maps seem to use great circles to connect one point with another, these providing the shortest path between two points when you consider the Earth as a sphere.

Here’s one quick experiment (based on the Flowing Data blog post How to map connections with great circles), for an R/Shiny app that allows you to upload a CSV file containing a couple of location columns (at least) and an optional “amount” column, and it’ll then draw lines between the points on each row. The app requires us to solve several problems, including:

• how to geocode the locations
• how to plot the lines as great circles
• how to upload the CSV file
• how to select the from and two columns from the CSV file
• how to optionally select a valid numerical column for setting line thickness

Let’s start with the geocoder. For convenience, I’m going to use the Google geocoder via the geocode() function from the ggmap library.

```#Locations are in two columns, *fr* and *to* in the *dummy* dataframe
#If locations are duplicated in from/to columns, dedupe so we don't geocode same location more than once
locs=data.frame(place=unique(c(as.vector(dummy[[fr]]),as.vector(dummy[[to]]))),stringsAsFactors=F)
#Run the geocoder against each location, then transpose and bind the results into a dataframe
cbind(locs, t(sapply(locs\$place,geocode, USE.NAMES=F))) ```

The locs data is a vector of locations:

```                    place
1              London, UK
2            Cambridge,UK
3            Paris,France
4       Sydney, Australia
5           Paris, France
6             New York,US
7 Cape Town, South Africa```

The sapply(locs\$place,geocode, USE.NAMES=F) function returns data that looks like:

```    [,1]       [,2]     [,3]     [,4]      [,5]     [,6]      [,7]
lon -0.1254872 0.121817 2.352222 151.207   2.352222 -74.00597 18.42406
lat 51.50852   52.20534 48.85661 -33.86749 48.85661 40.71435  -33.92487```

The transpose (t() gives us:

```     lon        lat
[1,] -0.1254872 51.50852
[2,] 0.121817   52.20534
[3,] 2.352222   48.85661
[4,] 151.207    -33.86749
[5,] 2.352222   48.85661
[6,] -74.00597  40.71435
[7,] 18.42406   -33.92487```

The cbind() binds each location with its lat and lon value:

```                    place        lon       lat
1              London, UK -0.1254872  51.50852
2            Cambridge,UK   0.121817  52.20534
3            Paris,France   2.352222  48.85661
4       Sydney, Australia    151.207 -33.86749
5           Paris, France   2.352222  48.85661
6             New York,US  -74.00597  40.71435
7 Cape Town, South Africa   18.42406 -33.92487```

Code that provides a minimal example for uploading the data from a CSV file on the desktop to the Shiny app, then creating dynamic drop lists containing column names, can be found here: Simple file geocoder (R/shiny app).

The following snippet may be generally useful for getting a list of column names from a data frame that correspond to numerical columns:

```#Get a list of column names for numerical columns in data frame df
nums <- sapply(df, is.numeric)
names(nums[nums])```

The code for the full application can be found as a runnable gist in RStudio from here: R/Shiny app – great circle mapping. [In RStudio, install.packages(“shiny”); library(shiny); runGist(9690079). The gist contains a dummy data file if you want to download it to try it out…]

Here’s the code explicitly…

The global.R file loads the necessary packages, installing them if they are missing:

```#global.R

##This should detect and install missing packages before loading them - hopefully!
list.of.packages <- c("shiny", "ggmap","maps","geosphere")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(list.of.packages,function(x){library(x,character.only=TRUE)}) ```

The ui.R file builds the Shiny app’s user interface. The drop down column selector lists are populated dynamically with the names of the columns in the data file once it is uploaded. An optional Amount column can be selected – the corresponding list only displays the names of numerical columns. (The lists of location columns to be geocoded should really be limited to non-numerical columns.) The action button prevents the geocoding routines firing until the user is ready – select the columns appropriately before geocoding (error messages are not handled very nicely;-)

```#ui.R
shinyUI(pageWithSidebar(

sidebarPanel(
#Provide a dialogue to upload a file
fileInput('datafile', 'Choose CSV file',
accept=c('text/csv', 'text/comma-separated-values,text/plain')),
#Define some dynamic UI elements - these will be lists containing file column names
uiOutput("fromCol"),
uiOutput("toCol"),
#Do we want to make use of an amount column to tweak line properties?
uiOutput("amountflag"),
#If we do, we need more options...
conditionalPanel(
condition="input.amountflag==true",
uiOutput("amountCol")
),
conditionalPanel(
condition="input.amountflag==true",
uiOutput("lineSelector")
),
#We don't want the geocoder firing until we're ready...
actionButton("getgeo", "Get geodata")

),
mainPanel(
tableOutput("filetable"),
tableOutput("geotable"),
plotOutput("geoplot")
)
))```

The server.R file contains the server logic for the app. One thing to note is the way we isolate some of the variables in the geocoder reactive function. (Reactive functions fire when one of the external variables they contain changes. To prevent the function firing when a variable it contains changes, we need to isolate it. (See the docs for me; for example, Shiny Lesson 7: Reactive outputs or Isolation: avoiding dependency.)

```#server.R

shinyServer(function(input, output) {

filedata <- reactive({
infile <- input\$datafile
if (is.null(infile)) {
# User has not uploaded a file yet
return(NULL)
}
})

#Populate the list boxes in the UI with column names from the uploaded file
output\$toCol <- renderUI({
df <-filedata()
if (is.null(df)) return(NULL)

items=names(df)
names(items)=items
selectInput("to", "To:",items)
})

output\$fromCol <- renderUI({
df <-filedata()
if (is.null(df)) return(NULL)

items=names(df)
names(items)=items
selectInput("from", "From:",items)
})

#If we want to make use of an amount column, we need to be able to say so...
output\$amountflag <- renderUI({
df <-filedata()
if (is.null(df)) return(NULL)

checkboxInput("amountflag", "Use values?", FALSE)
})

output\$amountCol <- renderUI({
df <-filedata()
if (is.null(df)) return(NULL)
#Let's only show numeric columns
nums <- sapply(df, is.numeric)
items=names(nums[nums])
names(items)=items
selectInput("amount", "Amount:",items)
})

#Allow different line styles to be selected
output\$lineSelector <- renderUI({
c("Uniform" = "uniform",
"Thickness proportional" = "thickprop",
"Colour proportional" = "colprop"))
})

#Display the data table - handy for debugging; if the file is large, need to limit the data displayed [TO DO]
output\$filetable <- renderTable({
filedata()
})

#The geocoding bit... Isolate variables so we don't keep firing this...
geodata <- reactive({
if (input\$getgeo == 0) return(NULL)
df=filedata()
if (is.null(df)) return(NULL)

isolate({
dummy=filedata()
fr=input\$from
to=input\$to
locs=data.frame(place=unique(c(as.vector(dummy[[fr]]),as.vector(dummy[[to]]))),stringsAsFactors=F)
cbind(locs, t(sapply(locs\$place,geocode, USE.NAMES=F)))
})
})

#Weave the goecoded data into the data frame we made from the CSV file
geodata2 <- reactive({
if (input\$getgeo == 0) return(NULL)
df=filedata()
if (input\$amountflag != 0) {
maxval=max(df[input\$amount],na.rm=T)
minval=min(df[input\$amount],na.rm=T)
df\$b8g43bds=10*df[input\$amount]/maxval
}
gf=geodata()
df=merge(df,gf,by.x=input\$from,by.y='place')
merge(df,gf,by.x=input\$to,by.y='place')
})

#Preview the geocoded data
output\$geotable <- renderTable({
if (input\$getgeo == 0) return(NULL)
geodata2()
})

#Plot the data on a map...
output\$geoplot<- renderPlot({
if (input\$getgeo == 0) return(map("world"))
#Method pinched from: http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/
map("world")
df=geodata2()

pal <- colorRampPalette(c("blue", "red"))
colors <- pal(100)

for (j in 1:nrow(df)){
inter <- gcIntermediate(c(df[j,]\$lon.x[], df[j,]\$lat.x[]), c(df[j,]\$lon.y[], df[j,]\$lat.y[]), n=100, addStartEnd=TRUE)

#We could possibly do more styling based on user preferences?
if (input\$amountflag == 0) lines(inter, col="red", lwd=0.8)
else {
if (input\$lineselector == 'colprop') {
colindex <- round( (df[j,]\$b8g43bds[]/10) * length(colors) )
lines(inter, col=colors[colindex], lwd=0.8)
} else if (input\$lineselector == 'thickprop') {
lines(inter, col="red", lwd=df[j,]\$b8g43bds[])
} else lines(inter, col="red", lwd=0.8)
}
}
})

})```

So that’s the start of it… this app could be further developed in several ways, for example allowing the user to filter or colour displayed lines according to factor values in a further column (commodity type, for example), or produce a lattice of maps based on facet values in a column.

I also need to figure how to to save maps, and maybe produce zoomable ones. If geocoded points all lay within a blinding box limited to a particular geographical area, scaling the map view to show just that area might be useful.

Other techniques might include using proportional symbols (circles) at line landing points to show the sum of values incoming to that point, or some of values outgoing, or the difference between the two; (maybe use green for incoming outgoing, then size by the absolute difference?)