Tagged: sportsdata

Rally Stage Sector Charts, Overlaid On Stage Progress Charts

One of the last rally charts I sketched last year was a dodged bar chart showing “sector” times within a stage for each driver, either as absolute times or rebased relative to a particular target driver. This represented an alternative to the “driver subseries” line charts e.g. as shown here.

Re: the naming of driver subseries charts – this is intended to be reminiscent of seasonal subseries charts.

The original slit time data on the WRC site looks like this:

Taking the raw sector (split) times, we can rebase the times relative to a particular driver. In this case, I have rebased relative to OGI so the sector times are as shown in the table above. The colour basis is the opposite to the basis used in the chart because I am trying to highlight to the target driver where they lost time, rather than where the others gained time. It may be that the chart makes more sense to professionals if I change the colour basis in the chart below, to use green to show that the driver made up that amount of time on the target driver).

The dodged bar charts are ordered with the first split time at the top of the set for each driver. The overall stage time is the lower bar in each driver group.

Here’s how it looks using the other colour basis:

Hmm… maybe that is better…

Note that the above charts also differs from the WRC split times results table in the ordering. The results table orders results in order of start, whereas the above charts rank relative to stage position.

To generate the “sector” times, we can find the difference between each split for a particular driver, and between the final (overall) stage time and the final split time. As before, we can then rebase these times relative to a specific target driver.

This chart shows how the target driver compared to each of the other drivers on each of the “sectors” between each split. So we see OGI dropped quite a lot of time relative to other drivers on the fourth “sector” of the stage, between splits 3 and 4. He made up time against MEE on the first and second parts of the stage.

As well as just showing the split times, we can find the total time delta between the target driver and each of the other drivers on the stage as the sum of the sector times. We can use a lower graphic layer to underplot this total beneath the dodged bars for each driver.

The grey bars show the total time gained / lost by the target driver on the stage relative to the other drivers.

In a simular way, we can also overplot the dodged bars on top of a stage progress chart, recolouring slightly.

This increases the information density of the stage progress chart even further, and provides additional “delta” signals in keeping with the deltascope inspiration / basis for that chart type.

Again, this sort of fits with the warped hydraulic model: the dodged bars can be stacked to give the length of the lighter coloured bar underneath them

(I’m not sure I’ve ordered the drivers correctly in the above chart – it was generated from two discordantly arranged datasets. The final chart will be generated from a single, consistent dataframe.)

PS it strikes me that the dodged bars need to be transparent to show a solid bar underneath that doesn’t extend as far as the dodged bars. This might need some careful colour / transparency balancing.

Sports Data and R – Scope for a Thematic (Rather than Task) View? (Living Post)

Via my feeds, I noticed a package announcement today for cricketR!, a new package for analysing cricket performance data.

This got me wondering (again!) about what other sports related packages there might be out there, either in terms of functional thematic packages (to do with sport in general, or one sport in particular), or particular data packages, that either bundle up sports related data sets, or provide and API (that is, a wrapper for an official API, or a wrapper for a scraper that extracts data from one or more websites in a slightly scruffier way!)

This is just a first quick attempt, an unstructured listing that may also include data sets that are more generic than R-specific (eg CSV datafiles, or SQL database exports). I’ll try to keep this post updated as I find/hear about more packages, and also work a bit more on structuring it a little better. I really should pist this as a wiki somewhere – or perhaps curate something on Github?

  • generic:
    • SportsAnalytics [CRAN]: “infrastructure for sports analysis. Anyway, currently it is a selection of data sets, functions to fetch sports data, examples, and demos”.
    • PlayerRatings [CRAN]: “schemes for estimating player or team skill based on dynamic updating. Implemented methods include Elo, Glicko and Stephenson” (via Twitter: @UTVilla)
  • athletics:
    • olympic {ade4} [Inside-R packages]: “performances of 33 men’s decathlon at the Olympic Games (1988)”.
    • decathlon {GDAdata} [CRAN]: “Top performances in the Decathlon from 1985 to 2006.” (via comments: Antony Unwin)
    • MexLJ {GDAdata} [CRAN]: “Data from the longjump final in the 1968 Mexico Olympics.” (via comments: Antony Unwin)
  • baseball:
  • basketball:
  • biathlon:
  • chess:
  •  cricket:
  • darts:
    • darts [CRAN]: “Statistical Tools to Analyze Your Darts Game” (via comments: @MarchiMax)
  • football (American football):
  • football (soccer):
    • engsoccerdata [Github]: “a repository for complete soccer datasets, along with some built-in functions for analyzing parts of the data. Currently includes English League data, FA Cup data, Playoff data, some European leagues (Spain, Germany, Italy, Holland).”. Citation: James P. Curley (2015). engsoccerdata: English Soccer Data 1871-2015. R package version 0.1.4
    • UKSoccer {vcd} [Inside-R packages]: data “on the goals scored by Home and Away teams in the Premier Football League, 1995/6 season.”.
    • Soccer {PASWR} [Inside-R packages]: “how many goals were scored in the regulation 90 minute periods of World Cup soccer matches from 1990 to 2002”.
  • fbRanks [CRAN]: “Association Football (Soccer) Ranking via Poisson Regression: time dependent Poisson regression and a record of goals scored in matches to rank teams via estimated attack and defense strengths” (via comments: @MarchiMax)
  • golf:
  • gymnastics:
  • horse racing:
    • RcappeR [Github]: “tools to aid the analysis and handicapping of Thoroughbred Horse Racing” (via Twitter: @UTVilla)
    • rBloodstock [Github]: “datasets from Thoroughbred Bloodstock Sales, Tattersalls sales from 2010 to 2015 (incomplete)” (via Twitter: @UTVilla)
  • ice hockey:
    • nhlscrapr [CRAN]: “routines for extracting play-by-play game data for regular-season and playoff
      NHL games, particularly for analyses that depend on which players are on the ice”
      . [via comments – Triplethink]
    • hockey {gamlr} [Inside-R packages]: “information about play configuration and the players on ice (including goalies) for every goal from 2002-03 to 2012-13 NHL seasons” [via comments – Triplethink]
    • nhl-pbp [Github]: “code to parse and analyze NHL PBP data using R”.
    • ( liigadata (python) – utility for parsing Finnish ice hockey league game data from liiga.fi website)
  • motor sport:
  • skiing:
    • SpeedSki {GDAdata} [CRAN]: “World Speed Skiing Competition, Verbier 21st April, 2011.” (via comments: Antony Unwin)
  • sailing: I didn’t find any R packages, but I did find a sailing regatta results data interchange format: ISAF XML Regatta Reporting (XRR) Data Format
  • snooker:
  • swimming: I didn’t find any R packages, but I did find a swimming results data interchange format: Lenex; and a site that publishes data in that format: Omega Timing.
  • tennis:
    • tennis_MatchChartingProject: “The goal of the Match Charting Project (MCP) is to amass detailed records of professional matches.”.
    • servevolleyR [Github]: “R package for simulating tennis points:games:tiebreaks:sets:matches” (via Twitter: @UTVilla)
    • ([*Tennis Grand Slam Winners* dataset]()https://datascienceplus.com/visualizing-tennis-grand-slam-winners-performances/)

It would perhaps make more sense to try to collect rather more structured (meta)data for each package. For example: homepage, sport/discipline; analysis, data (package or API), or analysis and data; if data: year-range, source, data coverage (e.g. table column headings); if analysis, brief synopsis of tools available (e.g. chart generators).

If you know of any others, please let me know via the comments and I’ll try to keep this page updated with a reasonably current list.

As well as packages, here are some links to blog posts that look at sports data analysis using R:

Again, if you can recommend further posts, please let me know via the comments.

PS other sports data interchange formats: SportsML-G2