Tagged: WRC

Visualising WRC Rally Stages With Relive?

A few days ago, via Techcrunch, I came across the Relive application that visualises GPS traces of bike rides using 3D Google Earth style animations using a range of map data sources.

Data is uploaded using GPX, TCX, or FIT formatted data – all of which are new to me. Standard KML uploads don’t work – time stamps are required for each waypoint.

Along the route, photographic waypoints can be added to illustrate the journey, which got me thinking: this could be a really neat addition to the Rally-maps.com website, annotating stage maps after a race with:

  • photographs from various locations on the stage;
  • images at each split point showing the leaderboard and time splits from each stage;
  • pace info, showing the relative pace across each stage, perhaps captured from a reconnaissance vehicle or zero car.

Alternatively, it might be something that the WRC – or Red Bull TV, who are providing online and TV coverage of this year’s rallys – could publish?

And if they want to borrow some of my WRC chart styles for waypoint images, I’m sure something could be arranged:-)

Grouping Numbers that are Nearly the Same – Casual Clustering

A couple of reasons for tinkering with WRC rally data this year, over and the above the obvious of wanting to find a way to engage with motorsport at a data level, specifically, I wanted a context for thinking a bit more about ways of generating (commentary) text from timing data, as well as a “safe” environment in which I could look for ways of identifying features (or storypoints) in the data that might provide a basis for making interesting text comments.

One way in to finding features is to look at a visual representations of the data (that is, just look at charts) and see what jumps out… If anything does, then you can ponder ways of automating the detection or recognition of those visually compelling features, or things that correspond to them, or proxy for them, in some way. I’ll give an example of that in the next post in this series, but for now, let’s consider the following question:how can we group numbers that are nearly the same? For example, if I have a set of stage split times, how can I identify groups of drivers that have recorded exactly, or even just nearly, the same time?

Via StackOverflow, I found the following handy fragment:

def cluster(data, maxgap):
    '''Arrange data into groups where successive elements
       differ by no more than *maxgap*

        cluster([1, 6, 9, 100, 102, 105, 109, 134, 139], maxgap=10)
        [[1, 6, 9], [100, 102, 105, 109], [134, 139]]

        cluster([1, 6, 9, 99, 100, 102, 105, 134, 139, 141], maxgap=10)
        [[1, 6, 9], [99, 100, 102, 105], [134, 139, 141]]

    '''
    data.sort()
    groups = [[data[0]]]
    for x in data[1:]:
        if abs(x - groups[-1][-1]) <= maxgap:
            groups[-1].append(x)
        else:
            groups.append([x])
    return groups

print(cluster([2.1,7.4,3.9,4.6,2.5,2.4,2.52],0.35))
[[2.1, 2.4, 2.5, 2.52], [3.9], [4.6], [7.4]]

It struck me that a tweak to the code could limit the range of any grouping relative to a maximum distance between the first and the last number in any particular grouping – maybe I don’t want a group to have a range more than 0.41 for example (that is, strictly more than a dodgy floating point 0.4…):

def cluster2(data, maxgap, maxrange=None):
    data.sort()
    groups = [[data[0]]]
    for x in data[1:]:
        inmaxrange = True if maxrange is None else abs(x-groups[-1][0]) <=maxrange
        if abs(x - groups[-1][-1]) <= maxgap and inmaxrange:
            groups[-1].append(x)
            groups[-1].append(x)
        else:
            groups.append([x])
    return groups

print(cluster2([2.1,7.4,3.9,4.6,2.5,2.4,2.52],0.35,0.41))
[[2.1, 2.4, 2.5], [2.52], [3.9], [4.6], [7.4]]

A downside of this is we might argue we have mistakenly omitted a number that is very close to the last number in the previous group, when we should rightfully have included it, because it’s not really very far away from a number that is close to the group range threshold value…

In which case, we might pull back numbers into a group that are really close to the current last member in the group irrespective of whether we past the originally specified group range:

def cluster3(data, maxgap, maxrange=None, maxminrange=None):
    data.sort()
    groups = [[data[0]]]
    for x in data[1:]:
        inmaxrange = True if maxrange is None else abs(x-groups[-1][0])<=maxrange
        inmaxminrange = False if maxminrange is None else abs(x-groups[-1][-1])<=maxminrange
        if (abs(x - groups[-1][-1]) <= maxgap and inmaxrange) or inmaxminrange:
            groups[-1].append(x)
        else:
            groups.append([x])
    return groups

print(cluster3([2.1,7.4,3.9,4.6,2.5,2.4,2.52],0.35,0.41,0.25))
[[2.1, 2.4, 2.5, 2.52], [3.9], [4.6], [7.4]]

With these simple fragments, I can now find groups of times that are reasonably close to each other.

I can also look for times that are close to other times:

trythis = [x for x in cluster3([2.1,7.4,3.9,4.6,2.5,2.4,2.52],0.35,0.41,0.25) if 2.4 in x]
trythis[0] if len(trythis) else ''
[2.1, 2.4, 2.5, 2.52]

PS I think the following vectorised pandas fragments assign group numbers to rows based on the near matches of numerics in a specified column:

def numclustergroup(x,col,maxgap):
    x=x.sort_values(col)
    x['cluster'] = (x[col].diff()>=maxgap).cumsum()
    return x

def numclustergroup2(x,col,maxgap,maxrange):
    x=x.sort_values(col)
    x['cluster'] = (x[col].diff()>=maxgap).cumsum()
    x['cdiff']=x.groupby('cluster')[col].diff()
    x['cluster'] = ((x.groupby('cluster')['cdiff'].cumsum()>maxrange) | (x[col].diff()>=maxgap)).cumsum()
    return x.drop('cdiff',1)

def numclustergroup3(x,col,maxgap,maxrange,maxminrange):
    x=x.sort_values(col)
    x['cluster'] = (x[col].diff()>=maxgap).cumsum()
    x['cdiff']=x.groupby('cluster')[col].diff()
    x['cluster'] = (((x.groupby('cluster')['cdiff'].cumsum()>maxrange) | (x[col].diff()>=maxgap)) & (x[col].diff()>maxminrange) ).cumsum()
    return x.drop('cdiff',1)

#Test
uu=pd.DataFrame({'x':list(range(0,8)),'y':[1.3,2.1,7.4,3.9,4.6,2.5,2.4,2.52]})
numclustergroup(uu,'y',0.35)
numclustergroup2(uu,'y',0.35,0.41)
numclustergroup3(uu,'y',0.35,0.41,0.25)

The basic idea is to generate logical tests that evaluate as True whenever you want to increase the group number.