Using R To Calculate A Simple Speed Model For Rally Stage Routes

A few months ago, I started hacking together an online e-book on Visualising WRC Rally Stages With rayshader and R. One of the sections (Estimating speeds) described the construction of a simple speed model based around the curvature of the stage route.

As part of another sprint into some rally data tinkering, this time focusing on Visualising WRC Telemetry Data With R, I’ve extracted just the essential code for creating the speed model and split it into a more self-contained extract: Creating a Route Speed Model. The intention is that I can use this speed model to help improve interpolation within a sparse telemetry time series.

Also on the to do list is to see if I can validate – or not! – the speed model using actual telemetry.

The recipe for building the model builds up from the a boundary convexity tool (bct()) that can be found in the rLFT processing linear features R package. This tool provides a handy routine for modeling the curvature along each point of a route in the form, a process that also returns the co-ordinates of a center of curvature for each sement. A separate function inspired by the pracma::circlefit() function, then finds the radius.

Because I don’t know how to write vectorised functions properly, I use the base::Vectorize() function to do the lifting for me around a simpler, non-vectorised function.

library(devtools)

# The curvature function takes an arc defined over
# x and y coordinate lists

#circlefit, from pracma::
circlefit = function (xp, yp, fast = TRUE) 
{
    if (!is.vector(xp, mode = "numeric") || !is.vector(yp, mode = "numeric")) 
        stop("Arguments 'xp' and 'yp' must be numeric vectors.")
    if (length(xp) != length(yp)) 
        stop("Vectors 'xp' and 'yp' must be of the same length.")
    if (!fast) 
        warning("Option 'fast' is deprecated and will not be used!", 
            call. = FALSE, immediate. = TRUE)
    n <- length(xp)
    p <- qr.solve(cbind(xp, yp, 1), matrix(xp^2 + yp^2, ncol = 1))
    v <- c(p[1]/2, p[2]/2, sqrt((p[1]^2 + p[2]^2)/4 + p[3]))
    rms <- sqrt(sum((sqrt((xp - v[1])^2 + (yp - v[2])^2) - v[3])^2)/n)
    #cat("RMS error:", rms, "\n")
    return(v)
}

curvature = function(x,y){
  #729181.8, 729186.1, 729190.4
  #4957667 , 4957676, 4957685
  tryCatch({
      # circlefit gives an error if we pass a straight line
      # Also hide the print statement in circlefit
      # circlefit() returns the x and y coords of the circle center
      # as well as the radius of curvature
      # We could then also calculate the angle and arc length
      circlefit(x,y)[3]
    },
    error = function(err) { 
      # For a straight, return the first co-ord and Inf diameter
      # Alternatively, pass zero diameter?
      c(x[1], y[1], Inf)[3]})
}

curvature2 = function(x1, x2, x3, y1, y2, y3){
  curvature(c(x1, x2, x3), c(y1, y2, y3))
}

# The base::Vectorize function provides a lazy way of 
# vectorising a non-vectorised function
curvatures = Vectorize(curvature2)

# The Midpoint values are calculated by rLFT::bct()
route_convexity$radius = curvatures(lag(route_convexity$Midpoint_X), 
                                    route_convexity$Midpoint_X,
                                    lead(route_convexity$Midpoint_X),
                                    lag(route_convexity$Midpoint_Y),
                                    route_convexity$Midpoint_Y,
                                    lead(route_convexity$Midpoint_Y)

A corner speed model than bins each segment into a corner type. This is inspired by the To See The Invisible rally pacenotes tutorial series by David Nafría which uses a simple numerical value to categorise the severity of each corner as well as identifying a nominal target speed for each corner category.

corner_speed_model = function(route_convexity){
  invisible_bins = c(0, 10, 15, 20, 27.5, 35,
                    45, 60, 77.5, 100, 175, Inf)

  route_convexity$invisible_ci = cut(route_convexity$radius,
                                     breaks = invisible_bins,
                                     labels = 1:(length(invisible_bins)-1),
                                     ordered_result=TRUE)
  
  # Speeds in km/h
  invisible_speeds = c(10, 40, 50, 60, 70, 80,
                       95, 110, 120, 130, 145)
  
  
  route_convexity$invisible_sp = cut(route_convexity$radius,
                                     breaks = invisible_bins,
                                     labels = invisible_speeds,
                                     ordered_result=TRUE)
  
  # Cast speed as factor, via character, to integer
  route_convexity$invisible_sp = as.integer(as.character(route_convexity$invisible_sp))
  
  route_convexity
}

We can now build up the speed model for the route. At each step we accelerate towards a nominal sector target speed (the invisible_sp value). We can’t accelerate infinitely fast, so our actual target accumulated speed for the segment, acc_sp, is a simple function of the current speed and the notional target speed. We can then calculate the notional time to complete that segment, invisible_time.

acceleration_model = function(route_convexity, stepdist=10){
  # Acceleration model
  sp = route_convexity$invisible_sp
  # Nominal starting target speed
  # In we don't set this, we don't get started moving
  sp[1] = 30 

  # Crude acceleration / brake weights
  acc = 1
  dec = 1
  for (i in 2:(length(sp)-1)) {
    # Simple linear model - accumulated speed is based on
    # the current speed and the notional segment speed
    # Accelerate up
    if (sp[i-1]<=sp[i]) sp[i] = (sp[i-1] + acc * sp[i]) / (1+acc)

    # Decelerate down
    if (sp[i]>sp[i+1]) sp[i] = (dec * sp[i] + sp[i+1]) / (1+dec)
  }

  route_convexity$acc_sp = sp
  route_convexity$acc_sp[length(sp)] = route_convexity$invisible_sp[length(sp)]

  # New time model
  # Also get speed in m/s for time calculation
  meters = 1000
  seconds_per_hour = 3600 # 60 * 60
  kph_unit = meters / seconds_per_hour
  route_convexity = route_convexity %>% 
                      mutate(segment_sp = route_convexity$acc_sp * kph_unit,
                             invisible_time = dist/segment_sp,
                             acc_time = cumsum(invisible_time))


  # So now we need to generate kilometer marks
  route_convexity$kmsection = 1 + trunc(route_convexity$MidMeas/1000)
  # We can use this to help find the time over each km

  route_convexity
}

With the speed model, we can then generate a simple plot of the anticipated speed against distance into route:

We can also plot the accumulated time into the route:

Finally, a simple cumulative sum of the time taken to complete each segment gives us an estimate of the stage time:

anticipated_time = function(route_convexity) {
  anticipated_time = sum(route_convexity$invisible_time[1:nrow(route_convexity)-1])
  cat(paste0("Anticipated stage time: ", anticipated_time %/% 60,
           'm ', round(anticipated_time %% 60, 1), 's' ))
}

anticipated_time(route_convexity)

# Anticipated stage time: 8m 40.3s

Next on my to do list is to generate an “ideal” route from a collection of telemetry traces from different drivers on the same stage.

If we know the start and end of the route are nominally at the same location, we can normalise the route length of multiple routes, map equidistant points onto each other, and then take the average. For example, this solution: https://stackoverflow.com/a/65341730/454773 seems to offer a sensible way forward. See also https://en.wikipedia.org/wiki/Dynamic_time_warping and https://dynamictimewarping.github.io/r/ .

I was rather surprised, though, not to find a related funciton in one of the ecology / animal tracking R packages that would, for example, pull out a “mean” route based on a collection of locations from a tracked animal or group of animals following the same (ish) path over a period of time. Or maybe I just didnlt spot it? (If you know of just such a function I can reuse, please let me know via the comments…)

Fragment: Tools of Production – ggalt and encircling scatterplot points in R and Python

In passing, I note ggalt, an R package containing some handy off-the-shelf geoms for use with ggplot2.

Using geom_encircle() you can trivially encircle a set of points which could be really handing when demonstrating / highlighting grouping various sets of points in a scatterplot:

See the end of this post for a recipe for creating a similar effect in Python.

You can also encircle and fill by group:

A lollipop chart . The geom_lollipop() geom provides a clean alternative to the bar chart (although with a possible loss of resolution around the actual value being indicated):

A dumbbell chart provides a really handy way of comparing differences between pairs of values. Enter, the geom_dumbbell():

The geom_dumbbell() will also do dodging of duplicate treatment values, which could be really useful:

The geom_xspline() geom provides a good range of controls for generating splines drawn relative to control points: “for each control point, the line may pass through (interpolate) the control point or it may only approach (approximate) the control point”.

The geom_encircle() idea is really handy for annotating charts. I donlt think there’s a native Pyhton seaborn method for this, but there is a hack to it (via this StackOverflow answer) using the scipy.spatial.ConvexHull() function:

# Via: https://stackoverflow.com/a/44577682

import matplotlib.pyplot as plt
import numpy as np; np.random.seed(1)
from scipy.spatial import ConvexHull

x1, y1 = np.random.normal(loc=5, scale=2, size=(2,15))
x2, y2 = np.random.normal(loc=8, scale=2.5, size=(2,13))

plt.scatter(x1, y1)
plt.scatter(x2, y2)

def encircle(x,y, ax=None, **kw):
    if not ax: ax=plt.gca()
    p = np.c_[x,y]
    hull = ConvexHull(p)
    poly = plt.Polygon(p[hull.vertices,:], **kw)
    ax.add_patch(poly)

encircle(x1, y1, ec="k", fc="gold", alpha=0.2)
encircle(x2, y2, ec="orange", fc="none")

plt.show()

It would be handy to add a buffer / margin region so the line encircles the points rather than going through the envelope loci? From this handy post on Drawing Boundaries in Python, one way of doing this is to cast the points defining the convex hull to a shapely shape (eg using boundary = shapely.geometry.MultiLineString(edge_points)) and then buffer it using a shapely shape buffer (boundary.buffer(1)). Alternatively, if the points are cast as shapely points using MultiPoint, then shapely also a convex hull function that returns and object that can be buffered from directly.

When Less is More: Data Tables That Make a Difference

In the previous post, From Visual Impressions to Visual Opinions, I gave various examples of charts that express opinions. In this post, I’ll share a few examples of how we can take a simple data table and derive multiple views from it that each provide a different take on the same story (or does that mean, tells different stories from the same set of "facts"?)

Here’s the original, base table, showing the recorded split times from a single rally stage. The time is the accumulated stage time to each split point (i.e. the elapsed stage time you see for a driver as they reach each split point):

From this, we immediately note the ordering (more on this in another post) which seems not useful. It is, in fact, the road order (i.e. the order in which each driver started the stage).

We also note that the final split is not the actual final stage time: the final split in this case was a kilometer or so before the stage end. So from the table, we can’t actually determine who won the stage.

Making a Difference

The times presented are the actual split times. But one thing we may be more interested in is the differences to see how far ahead or behind one driver another driver was at a particular point. We can subtract one driver’s time from anothers to find this difference. For example, how did the times at each split compare to first on road Ogier’s (OGI)?

Note that we can “rebase” the table relative to any driver by subtracting the required driver’s row from every other row in the original table.

From this “rebased” table, which has fewer digits (less ink) in it than the original, we can perhaps more easily see who was in the lead at each split, specifically, the person with the minimum relative time. The minimum value is trivially the most negative value in a column (i.e. at each split), or, if there are no negative values, the minimum zero value.

As well a subtracting one row from every other row to find the differences realative to a specified driver, we can also subtract the first column from the second, the second from the third etc to find the time it took to get from one split point to the next (we subtract 0 from the first split point time since the elapsed time into stage at the start of the stage is 0 seconds).

The above table shows the time taken to traverse the distance from one split point to the next; the extra split_N column is based on the final stage time. Once again, we could subtract one row from all the other rows to rebase these times relative to a particular driver to see the difference in time it took each driver to traverse a split section, relative to a specified driver.

As well as rebasing relative to an actual driver, we can also rebase relative to variously defined “ultimate” drivers. For example, if we find the minimum of each of the “split traverse” table columns, we create a dummy driver whose split section times represent the ultimate quickest times taken to get from one split to the next. We can then subtract this dumny row from every row of the split section times table:

In this case, the 0 in the first split tells us who got to the first split first, but then we lose information (withiut further calculation) about anything other than relative performance on each split section traverse. Zeroes in the other columns tell us who completed that particular split section traverse in the quickest time.

Another class of ultimate time dummy driver is the accumulated ultimate section time driver. That is, take the ultimate split sections then find the cumulative sum of them. These times then represent the dummy elapsed stage times of an ultimate driver who completed each split in the fastest split section time. If we rebase against that dummy driver:

In this case, there may be only a single 0, specifically at the first split.

A third possible ultimate dummy driver is the one who “as if” recorded the minimum actual elapsed time at each split. Again, we can rebase according to that driver:

In this case, will be at least one zero in each column (for the driver who recorded that particular elapsed time at each split).

Visualising the Difference

Viewing the above tables as purely numerical tables is fine as far as it goes, but we can also add visual cues to help us spot patterns, and different stories, more readily.

For example, looking at times rebased to the ultimate split section dummy driver, we get the following:

We see that SOL was flying from the second split onwards, getting from one split to another in pretty much the fastest time after a relatively poor start.

The variation in columns may also have something interesting to say. SOL somehow made time against pretty much every between split 4 and 5, but in the other sections (apart from the short last section to finish), there is quite a lot of variability. Checking this view against a split sectioned route map might help us understand whether there were particular features of the route that might explain these differences.

How about if we visualise the accumulated ultimate split section time dummy driver?

Here, we see that TAN was recording the best time compared the ultimate time as calculated against the sum of best split section times, but was still off the ultimate pace: it was his first split that made the difference.

How about if we rebase against the dummy driver that represents the driver with the fastest actual recorded accumulated time at each split:

Here, we see that TAN led the stage at each split point based on actual accumulated time.

Remember, all these stories were available in the original data table, but sometimes it takes a bit of differencing to see them clearly…

From Visual Impressions to Visual Opinions

In The Analytics Trap I scribbled some notes on how I like using data not as a source of "truth", but as a lens, or a perspective, from a particular viewpoint.

One idea I’ve increasingly noticed being talked about explcitly across various software projects I follow is the idea of opionated software and opionated design.

According to the Basecamp bible, Getting Real, [th]e best software takes sides. … [Apps should] have an attitude. This seems to lie at the heart of opinionated design.

A blog post from 2015, The Rise of Opinionated Software presents a widely shared definition: Opinionated Software is a software product that believes a certain way of approaching a business process is inherently better and provides software crafted around that approach. Other widely shared views relate to software design: opinonated software should have "a view" on how things are done and should enforce that view.

So this idea of opinion is perhaps one we can riff on.

I’ve been playing with data for years, and one of things I’ve believed, throughout, in my opinionated way, is that its an unreliable and opinionated witness.

In the liminal space between wake and sleep this morning, I started wondering about how visualisations in particular could range from providing visual impressions to visual opinions.

For example, here’s a view of a rally stage, overlaid onto a map:

This sort of thing is widely recongnisable to anyboy had use an online map, and anyone who has seen a printed map and drawn a route on it.

Example interactive map view

Here’s a visual impression of just the route:

View of route

Even this view is opinionated because the co-ordinates are projected to a particular co-ordinate system, albeit the one we are most familiar with when viewing online maps; but other projections are available.

Now here’s a more opinionated view of the route, with it cut into approximuately 1km segments:

Or the chart can express an opinion about where it things significant left and right hand corners are:

The following view has strong opinions about how to display each kilometer section: not only does it make claims about where it things significant right and left corners are, it also rotates each segment to so the start and end point of the section lay on the same horixontal line:

Another viewpoint brings in another dimension: elevation. It also transforms the flat 2D co-ordinates of each point along the route to a 1-D distance-along-route measure allowing us to plot the elevation against a 1-D representation of the route in a 2D (1D!) line chart.

Again, the chart expresses an opinion about where the significant right and left corners are. The chart also chooses not to be more helpful than it could be: if vertical grid lines corresponded to the start and end distace-into-stage values for the segmented plots, it would be easier to see how this chart relates to the 1km segmented sections.

At this point, you may say that the points are "facts" from the data, but again, they really aren’t. There are various ways of trying to define the intensity of a turn, and there may be various ways of calculating any particular measure that give slightly differnent results. Many definitions rely on particular parameter settings (for example, if you measure radius of curvature from three points on a route, how far should those points be apart? 1m? 10m? 20m? 50m?

The "result" is only a "fact" insofar as it represents the output of a particular calculation of a particular measure using a particular set of parameters, things that are typically not disclosed in chart labels, often aren’t mentioned in chart captions, and may or may not be disclosed in the surrounding text.

On the surface, the chart is simply expressing an opion about how tight any of the particular corners are. If we take it a face value, and trust its opinion is based on reasonable foundations, then we can accept (or not accept) the chart’s opinion aabout where the significant turns are.

If we were really motivated to understand the chart’s opinion further, if we had access to the code that generated it we could start to probe its definition of "significnant curvature" to see if we agree with the principles on which the chart has based its opinion. But in most cases, we don’t do that. We take the chart for what it is, typically accept it for what it appears to say, and ascribe some sort of truth to it.

But at the end of the day, it’s just an opinion.

The charts were generated using R based on ideas inspired by Visualising WRC Rally Stages With rayshader and R [repo].

Thinks Another: Using Spectrograms to Identify Stage Wiggliness?

Last night I started wondering about ways in which I might be able to use signal processing (Fourier analysis) or symbol dynamics (eg Thinks: Symbolic Dynamics for Categorising Rally Stage Wiggliness?) to help categorise the nature of rally stage twistiness.

Over a morning coffee break, I reminded myself of spectrograms, graphical devices that chunk a time series into a sequence of steps, and than display a frequency plot of each part. Which got me wondering: could I use a spectrogram to segment a stage route and analyse the spectrum of some signal taken along the route to identify wiggliness at that part of the stage?

If I’m reading it right [I wasn’t… the distances were wrong for a start: note to self – check the default parameter settings!], I think the following spectrogram does show some possible differences in wiggliness for different segments along the stage?

Image

The question then becomes: what signal (as a function of distance along line) to use? The above spectrogram is based on the perpendicular distance of the route from the straight line connecting the start and end points of the route.

# trj is a trajr route
straight = st_linestring(data.matrix(rbind(head(trj[,c('x','y')], 1),
                                           tail(trj[,c('x','y')], 1))))

straight_sf = st_sfc(straight,
                     crs=st_crs(utm_routes))

trj_d = TrajRediscretize(trj, 10)
utm_discretised = trj_d %>% 
                    sf::st_as_sf(coords = c("x","y")) %>% 
                    sf::st_set_crs(st_crs(utm_routes[route_index,]))

# Get the rectified distance from the midline
# Can we also get whether it's to left or right?
perp_distances = data.frame(d_ = st_distance(utm_discretised,
                                             straight_sf))
# Returned distance is given as units
perp_distances$d = as.integer(perp_distances$d_)

perp_distances$i = 10 * (1:nrow(perp_distances))
#perp_distances$i = units::set_units(10 * (1:nrow(perp_distances)), 'm')

We can then do something like a low pass filter:

library(signal)

# High pass filter
bf <- butter(2, 0.9, type="high")
perp_distances$d_hi <- filter(bf, perp_distances$d)

and generate the spectrogram show above:

# We could just plot this direct
spec = specgram(perp_distances$d_hi)

# Or make pretty
# Via:https://hansenjohnson.org/post/spectrograms-in-r/
library(oce)
# discard phase information
P = abs(spec$S)

# normalize
P = P/max(P)

# convert to dB
P = 10*log10(P)

# config time axis
t = spec$t

# plot spectrogram
imagep(x = t,
       y = spec$f,
       z = t(P),
       col = oce.colorsViridis,
       ylab = 'Frequency [Hz]',
       xlab = 'Time [s]',
       drawPalette = T,
       decimate = F
)

However, it would possibly make more sense to use something line the angle of turn, convexity index, or radius of curvature at each 10m step as the signal…

Hmmm…


Related: Rapid ipywidgets Prototyping Using Third Party Javascript Packages in Jupyter Notebooks With jp_proxy_widget (example of a waversurfer.js spectrogram js app widgetised for use in Jupyter notebooks).

If you listen to that track it’s really interesting seeing how the imagery maps onto the sound. Eg in the above image you can see a lag in an edge between right and left channels towards the end of the trace, which translates to hearing an effect in the left channel echoed a moment later in the right.

Which makes me think: could I use telemetry from two drivers as left and right stereo tracks and try to sonify the telemetry differences between them using distance along stage as the x axis value and some mapping of different telemetry channels onto frequency…? For example, brake on the bass, throttle at the top, and lateral acceleration in the mid-range?

Automatically Detecting Corners on Rally Stage Routes Using R

One of the things I’ve started pondering with my rally stage route metrics is the extent to which we might be able to generate stage descriptions of the sort you might find on the It Gets Faster Now blog. The idea wouldn’t necessarily be to create finished stage descriptions, more a set of notes that a journalist or fan could use as a prompt to create a more relevant description. (See these old Notes on Robot Churnalism, Part I – Robot Writers for a related discussion.)

So here’s some sketching related to that: identifying corners.

We can use the rLFT (Linear Feature Tools) R package to calculate a convexity measure at fixed sample points along a route (for a fascinating discussion of the curvature/convexity metric, see *Albeke, S.E. et al. Measuring boundary convexity at multiple spatial scales using a linear ‘moving window’ analysis: an application to coastal river otter habitat selection Landscape Ecology 25 (2010): 1575-1587).

By filtering on high absolute convexity sample points, we can do a little bit of reasoning around the curvature at each point to make an attempt at identifying the start of a corner:

library(rLFT)

stepdist = 10
window = 20
routeConvTable <- bct(utm_routes[1,],
                      # distance between measurements 
                      step = stepdist,
                      window = window, ridName = "Name")

head(routeConvTable)

We can then use the convexity index to highlight the sample points with a high convexity index:

corner_conv = 0.1

tight_corners = routeConvTable[abs(routeConvTable$ConvexityIndex)>corner_conv,]
tight_corners_zoom1 = tight_corners$Midpoint_Y>4964000 & tight_corners$Midpoint_Y<4965000

ggplot(data=trj[zoom1, ],
       aes(x=x, y=y)) + geom_path(color='grey') + coord_sf() +
  geom_text(data=tight_corners[tight_corners_zoom1,],
                           aes(label = ConvexityIndex,
                               x=Midpoint_X, y=Midpoint_Y),
                           size=2) +
  geom_point(data=tight_corners[tight_corners_zoom1,],
             aes(x=Midpoint_X, y=Midpoint_Y,
                 color= (ConvexityIndex>0) ), size=1) +
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45))
High convexity points along a route

We can now do a bit of reasoning to find the start of a corner (see Automatically Generating Stage Descriptions for more discussion about the rationale behind this):

cornerer = function (df, slight_conv=0.01, closeby=25){
  df %>%
    mutate(dirChange = sign(ConvexityIndex) != sign(lag(ConvexityIndex))) %>%
    mutate(straightish =  (abs(ConvexityIndex) < slight_conv)) %>%
    mutate(dist =  (lead(MidMeas)-MidMeas)) %>%
    mutate(nearby =  dist < closeby) %>%
    mutate(firstish = !straightish & 
                        ((nearby & !lag(straightish) & lag(dirChange)) |
                        # We don't want the previous node nearby
                        (!lag(nearby)) )  & !lag(nearby) )
}

tight_corners = cornerer(tight_corners)

Let’s see how it looks, labeling the points as we do so with the distance to the next sample point:

ggplot(data=trj[zoom1,],
       aes(x=x, y=y)) + geom_path(color='grey') + coord_sf() +
  ggrepel::geom_text_repel(data=tight_corners[tight_corners_zoom1,],
                           aes(label = dist,
                               x=Midpoint_X, y=Midpoint_Y),
                           size=3) +
  geom_point(data=tight_corners[tight_corners_zoom1,],
             aes(x=Midpoint_X, y=Midpoint_Y,
                 color= (firstish) ), size=1) +
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45))
Corner entry

In passing, we note we can identify the larg gap distances as "straights" (and then perhaps look for lower convexity index corners along the way we could label as "flowing" corners, perhaps).

Something else we might do is number the corners:

Numbered corners

There’s all sorts of fun to be had here, I think!

Personally Learning

Notes and reflections on a curiosity driven personal learning journey into geo and rasters and animal movement trajectory categorisation and all sorts of things that weren’t the point when I started…

Somewhen over the last month or so, I must have noticed a 3D map produced using the rayshader R package somewhere because I idly started wondering about whether I could use it to render a 3D rally stage map.

Just under three weeks ago, I started what was intended to be a half hour hack to give it a go, and it didn’t take too long to get something up and running…

Rally stage map rendered using rayshader

I then started tinkering a bit more and thinking about what else we might be able to do with linear geographies, such as generating elevation along route maps, for example, and also started keeping notes on various useful bits and bobs along the way: some notes on how geographic projections work, for example (which has been something of a blocker to me in the past) or how rasters work and how to process them.

I also had to try to get my head around R again (it’s been several years since I last used it) and started pondering about a useful way to structure my notes and then publish them somewhere: bookdown was the obvious candidate as I was working in RStudio (I seem to have developed a sick-in-the-stomach allergic reaction to Jupyter noteobooks, Python, VS Code and Javascript — they really are physically revolting / nausea inducing to me — after a work burn out over the last 9 months of last year).

I use code just a matter of course for all sorts of things, pretty much every day, and also use it recreationally, so R has provided a handy escape route for my code related urges (maybe I should pick up the opportunity to learn something new? The issue is, I tend to be task focussed when it comes to my personal learning, so I’d need to use a language that somehow made sense for a practical thing I want to achieve…)

Anyway, the rally route thing quickly turned into a curiosity driven learning journey: how could I render a raster in a ggplot, could I overlay tiles on a 3D rendered map:

Could I generate a ridge plot?

Ridge plot

Could I buffer a route and use it to crop a 3D model?

Could we convert an image to an elevation raster?

And so on..

When poking around looking for ideas about how to characterise how twisty or turny a route was, I stumbled across sinuosity as a metric, and from that idea quickly discovered a range of R packages that implements tools to characterise animal movement trajectories which we can easily apply to rally stage routes.

Enriching maps with data pulled in from OpenStreetMap also suggests how we might be able to use generate maps that might be useful in event planning (access roads, car parks, viewpoints, etc); and casting routes onto road networks (graph representations of road networks; think osmnx in Python, for example) made me wonder if I’d be able to generate road books and tulip maps (answer: not yet, at least…).

I’ve written my learning journey from the last 20 days or so up at RallyDataJunkie: Visualising Rally Stages; the original repo is here. A summary of topics is in the previous blog post: Visualising Rally Route Stages (with help from rayshader and some ecologists…).

Reflecting on what I’ve ended up with, the structure is partly reflective of the journey I followed, but it’s also a bit revisionist. The original motivation was the chapter on the rendering 3D stage maps; to do this I needed to get a sense of what I could do with 2D rayshader maps first (the 3D plot is just a change in the plot command from plot_map() to plot_3d()), and to do that properly I had to get better at working with route files and elevation matrices. Within the earlier chapters, I do try to follow the route I took learning about each topic, rather then presenting things in the order an academic treatment or traditional teaching route my follow: the point of the resource is not to “teach” linear geo hacking in a formal way, it’s a report of how I learned it, with some backdropped “really useful to know this” pointers added back to earlier stages as I realised I needed them for later things.

Something else you may note about the individual chapters is that there are chunks of repetition of code from earlier on: this is deliberate. The book is a personal reference manual for me, so when I refer back to it for how to do something in the future, there’s enough to get going (hopefully!) without having to keep referring explicitly to too many early chapters.

Another observation: I see this sort of personal learning as availing myself of (powerful) ideas or techniques that are out there that other people have already figured out, ideas or tools or techniques that can help me do a particular thing that I want to do, or make sense of a particular thing that I can’t get my head round (i.e. that help me (help myself) understand the how or the why of a particular thing). I don’t want to be taught. I want enough that I can use and learn from. In my personal learning journey, I’ll come to see why some things that were really handy or useful to help me get started may not be the best way of doing something as I get more skilled, but the more advanced idea would have hindered my learning journey if it had been forced on me. (When I see a new function with dozens of parameters, I stirp it down to what I think is all I need to get it to work, then start to perhaps add parameters back in…)

As teachers, we are often sort of experts, and follow a teaching line based on our expert knowledge, and what we know is good for folk to know foundationally, or that follows a canonical textbook line. But as a curiosity driven personal learner, I chase all manner of desire lines, sometimes having to go around an obstacle I can yet figure out, sometimes having to retrace my steps, sometimes having to go back to the beginning to collect something I didn’t realise I’d actually need.

I don’t care about the canon or the curriculum. I want to know how, then I want to know why, and at some point I may come to understand “oh yeah, it would have been really handy to to have known that at the start”. But whilst teaching is often about making sure everyone is prepared at each step for the step that comes next, learning for me is about heading out into the unknown and picking up stuff that’s useful as I find I need it. And that includes picking up on the theory.

For example, Finding the Racing Line collates a set of very mathematical references around finding optimal racing lines that I’ll perhaps pick into for nudges and examples and blind copying without understanding at times if it helps once I start to try to get my head round the lines rally drivers take round corners. Then I’ll go back to the pictures and equations and try to make sense of it once I’ve got to a position where things maybe work (eg visualised possible routes round a corner) but can I now figure out why and how, and can I make them work better. It may take years to understand the papers, if ever (I’ve been reading Racecar Engineering magazine for 15 years and most of it still doesn’t make much sense to me…), but I’ll pick the bits that look useful, and use the bits I can, and maybe go away to learn a bit more about something else that helps me then use a bit more of the papers, and so on. But doing a maths course, or a physics course wouldn’t help, becuase the teaching line would probably not be my curiosity driven learning line.

For me, playful curiosity is the driver that allows you stick at a problem till you figure it out — but why doesn’t it work? — or at least get into a frame of mind where you can just ignore it (for now) or park it until you figure something else out, or whatever… I’m not sure how the play relates to curiosity, or curiosity to play, but together they allow you to be creative and give you the persistence you need to figure stuff out enough to get stuff done…

Visualising Rally Route Stages (with help from rayshader and some ecologists…)

Inspired by some 3D map views generated using the rayshader and rgl R packages, I wondered how easy it would be to render some 3D maps of rally stages.

It didn’t take too long to get a quick example up and running but then I started wondering what else I could do with route and elevation data. And it turns out, quite a lot.

The result of my tinkerings to date is at rallydatajunkie.com/visualising-rally-stages. It concentrates soley on a "static analysis" of rally routes: no results, no telemetry, just the route.

Along the way, it covers the following topics:

  • using R spatial (sp) and simple features (sf) packages to represent routes;
  • using the leaflet, mapview and ggplot2 packages to render routes;
  • annotating and splitting routes / linestrings;
  • downloading elevation rasters using elevatr;
  • using the raster package to work with elevation rasters;
  • a review of map projections;
  • exploring various ways of rendering rasters and annotating them with derived terrain features;
  • rendering elevation rasters in 2D using rayshader;
  • an aside on converting images to elevation rasters;
  • rendering and cropping elevation rasters in 3D using rayshader;
  • rendering shadows for particular datetimes at specific locations (suncalc);
  • stage route analysis: using animal movement trajectory analysis tools (trajr, amt, rLFT) to characterise stage routes;
  • stage elevation visualisation and analysis (including elevation analysis using slopes);
  • adding OpenStreetMap data inclduing highways and buildings to maps (osmdata);
  • steps towards creating a road book / tulip map using by mapping stage routes onto OSM road networks (sfnetworks, dodgr).

Along the way, I had to learn various knitr tricks, eg for rendering images, HTML and movies in the output document.

The book itself was written uisng Rmd and then published via bookdown and Github Pages. The source repo is on Github at RallyDataJunkie/visualising-rally-stages.

Running R Projects in MyBinder – Dockerfile Creation With Holepunch

For those who don’t know it, MyBinder is a reproducible research automation tool that will take the contents of a Github repository, build a Docker container based on requirements files found inside the repo, and then present the user with a temporary, running container that can serve a Jupyter notebook, JupyterLab or RStudio environment to the user. All at the click of a button.

Although the primary, default, UI is the original Jupyter notebook interface, it is also possible to open a MyBinder environment into JupyterLab or, if the R packaging is install, RStudio.

For example, using the demo https://github.com/binder-examples/r repository, which contains a simple base R environment, with RStudio installed, we can use my Binder to launch RStudio running over the contents of that repository:

When we launch the binderised repo, we get — RStudio in the browser:

Part of the Binder magic is to install a set of required packages into the container, along with “content” documents (Jupyter notebooks, for example, or Rmd files), based on requirements identified in the repo. The build process is managed using a tool called repo2docker, and the way requirements / config files need to be defined can be found here.

To make building requirements files easier for R projects, the rather wonderful holepunch package will automatically parse the contents of an R project looking for package dependencies, and will then create a DESCRIPTION metadata file itemising the found R package dependencies. (holepunch can also be used to create install.R files.) Alongside it, a Dockerfile is created that references the DESCRIPTION file and allows Binderhub to build the container based on the project’s requirements.

For an example of how holepunch can be used in support of academic publishing, see this repo — rgayler/scorecal_CSCC_2019 — which contains the source documents for a recent presentation by Ross Gayler to the Credit Scoring & Credit Control XVI Conference. This repo contains the Rmd document required to generate the presentation PDF (via knitr) and Binder build files created by holepunch.

Clicking the repo’s MyBinder  button takes you, after a moment or two, to a running instance of RStudio, within which you can open, and edit, the presentation .Rmd file and knitr it to produce a presentation PDF.

In this particular case, the repository is also associated with a Zenodo DOI.

As well as launching Binderised repositories from the Github (or other repository) URL, MyBinder can also launch a container from a Zenodo DOI reference.

The screenshot actually uses the incorrect DOI…

For example, https://mybinder.org/v2/zenodo/10.5281/zenodo.3402938/?urlpath=rstudio.

Looking Up R / CRAN Package Maintainers With an ac.uk Affiliation

Trying to find an examiner for a particular PhD thesis relating to a rather interesting datastructure for wrangling messy datatables, I wondered whether we might find a likely suspect amongst the R package maintainer community.

We can get a list of R package maintainers here and a list of package name / short descriptions here.

FWIW, here’s the code fragment:

import pandas as pd

maintainers = pd.read_html('https://cran.r-project.org/web/checks/check_summary_by_maintainer.html')[0]
maintainers_email = maintainers.dropna(subset=[0])
maintainers_email[maintainers_email[0].str.contains('.ac.uk')][[0,1]]

packages = pd.read_html('https://cran.r-project.org/web/packages/available_packages_by_name.html')[0]
packages

maintainers_email_acuk = maintainers_email[maintainers_email[0].str.contains('.ac.uk')][[0,1]]
maintainers_email_acuk.merge(packages,left_on=1,right_on=0)

See also: What Do you Mean You Write Code EVERY DAY?, examples of which I’ve just turned into a new blog category: WDYMYWCED.