Via my feeds, I noticed a package announcement today for cricketR!, a new package for analysing cricket performance data.
This got me wondering (again!) about what other sports related packages there might be out there, either in terms of functional thematic packages (to do with sport in general, or one sport in particular), or particular data packages, that either bundle up sports related data sets, or provide and API (that is, a wrapper for an official API, or a wrapper for a scraper that extracts data from one or more websites in a slightly scruffier way!)
This is just a first quick attempt, an unstructured listing that may also include data sets that are more generic than R-specific (eg CSV datafiles, or SQL database exports). I’ll try to keep this post updated as I find/hear about more packages, and also work a bit more on structuring it a little better. I really should pist this as a wiki somewhere – or perhaps curate something on Github?
- generic:
- SportsAnalytics [CRAN]: “infrastructure for sports analysis. Anyway, currently it is a selection of data sets, functions to fetch sports data, examples, and demos”.
- PlayerRatings [CRAN]: “schemes for estimating player or team skill based on dynamic updating. Implemented methods include Elo, Glicko and Stephenson” (via Twitter: @UTVilla)
- athletics:
- olympic {ade4} [Inside-R packages]: “performances of 33 men’s decathlon at the Olympic Games (1988)”.
- decathlon {GDAdata} [CRAN]: “Top performances in the Decathlon from 1985 to 2006.” (via comments: Antony Unwin)
- MexLJ {GDAdata} [CRAN]: “Data from the longjump final in the 1968 Mexico Olympics.” (via comments: Antony Unwin)
- baseball:
- Lahman [R-Forge, CRAN]: Sean Lahman’s databases, “contains pitching, hitting, and fielding statistics for Major League Baseball from 1871 through 2013.” See also: Lahman: A New R Package for Baseball Stats.
- pitchRx [CRAN; Github: about, code]: “tools for collecting Major League Baseball (MLB) Gameday data”. pitchRx also “provides an easy and robust way to generate strike-zone plots using the ggplot2 package”. See also: Taming PITCHf/x Data with XML2R and pitchRx [R-Journal]
- retrosheets parser: “baseball runs created stats and play by play reader”
- openWAR [Github]: “an open-source system for computing Wins Above Replacement” (via comments: @MarchiMax)
- basketball:
- Going Under the Hood of the NCAA Tournament Visualization: “a guide [to] collect[ing], analyz[ing], and present[ing] the data”.
- bbscrapeR [Github]: “R package for collecting NBA play-by-play, shot location” (via comments: @MarchiMax)
- ballR [Github]( BallR: Interactive NBA Shot Charts with R and Shiny; uses shot data at the player-level from the NBA stats API)
- biathlon:
- chess:
- cricket:
- cricketr! [Github: about, code]: “statistics info available in ESPN Cricinfo Statsguru”. About: Introducing cricketr!: An R package to analyze performances of cricketers
- Sixer [about], Shiny app based on cricketr
- yorkR [about]: uses data from Cricsheet
- darts:
- darts [CRAN]: “Statistical Tools to Analyze Your Darts Game” (via comments: @MarchiMax)
- football (American football):
- FantasyFootballAnalyticsR [Github]: “R scripts and data files for conducting the analyses as described on fantasyfootballanalytics.net“.
- football (soccer):
- engsoccerdata [Github]: “a repository for complete soccer datasets, along with some built-in functions for analyzing parts of the data. Currently includes English League data, FA Cup data, Playoff data, some European leagues (Spain, Germany, Italy, Holland).”. Citation: James P. Curley (2015). engsoccerdata: English Soccer Data 1871-2015. R package version 0.1.4
- UKSoccer {vcd} [Inside-R packages]: data “on the goals scored by Home and Away teams in the Premier Football League, 1995/6 season.”.
- Soccer {PASWR} [Inside-R packages]: “how many goals were scored in the regulation 90 minute periods of World Cup soccer matches from 1990 to 2002”.
- fbRanks [CRAN]: “Association Football (Soccer) Ranking via Poisson Regression: time dependent Poisson regression and a record of goals scored in matches to rank teams via estimated attack and defense strengths” (via comments: @MarchiMax)
- golf:
- gymnastics:
- horse racing:
- ice hockey:
- nhlscrapr [CRAN]: “routines for extracting play-by-play game data for regular-season and playoff
NHL games, particularly for analyses that depend on which players are on the ice”. [via comments – Triplethink] - hockey {gamlr} [Inside-R packages]: “information about play configuration and the players on ice (including goalies) for every goal from 2002-03 to 2012-13 NHL seasons” [via comments – Triplethink]
- nhl-pbp [Github]: “code to parse and analyze NHL PBP data using R”.
- ( liigadata (python) – utility for parsing Finnish ice hockey league game data from liiga.fi website)
- nhlscrapr [CRAN]: “routines for extracting play-by-play game data for regular-season and playoff
- motor sport:
- NASCAR Winston Cup Race Results for 1975-2003 [Journal of Statistics Education dataset]: data at the race and driver/race level levels. (See also: NASCAR Data Analytics – python scraper.)
- ergastR under construction [code fragments: my own fumblings at an R wrapper for the ergast motor racing database online API. (See also: ergast data download)
- Formula E: see ergast motor racing database
- skiing:
- SpeedSki {GDAdata} [CRAN]: “World Speed Skiing Competition, Verbier 21st April, 2011.” (via comments: Antony Unwin)
- sailing: I didn’t find any R packages, but I did find a sailing regatta results data interchange format: ISAF XML Regatta Reporting (XRR) Data Format
- snooker:
- swimming: I didn’t find any R packages, but I did find a swimming results data interchange format: Lenex; and a site that publishes data in that format: Omega Timing.
- tennis:
- tennis_MatchChartingProject: “The goal of the Match Charting Project (MCP) is to amass detailed records of professional matches.”.
- servevolleyR [Github]: “R package for simulating tennis points:games:tiebreaks:sets:matches” (via Twitter: @UTVilla)
- ([*Tennis Grand Slam Winners* dataset](https://datascienceplus.com/visualizing-tennis-grand-slam-winners-performances/)
It would perhaps make more sense to try to collect rather more structured (meta)data for each package. For example: homepage, sport/discipline; analysis, data (package or API), or analysis and data; if data: year-range, source, data coverage (e.g. table column headings); if analysis, brief synopsis of tools available (e.g. chart generators).
If you know of any others, please let me know via the comments and I’ll try to keep this page updated with a reasonably current list.
As well as packages, here are some links to blog posts that look at sports data analysis using R:
- Analyzing Baseball Data with R [book]; [supporting data/code]. See also Jim Albert [homepage].
- Exploring Baseball Data with R [blog]
- Wrangling F1 Data With R [Leanpub book] Disclaimer: I wrote this
- Scraping and Analyzing Baseball Data with R [blogpost]
- OUseful.info – F1 datajunkie [blog topic feed]. Disclaimer: my blog feed
- Revolutions blog – sports tag
Again, if you can recommend further posts, please let me know via the comments.
PS other sports data interchange formats: SportsML-G2
Hi.
Do you know any hockey packages?
I found only NHLscrapr ( http://cran.r-project.org/web/packages/nhlscrapr/index.html ).
THX
@triplethink – not seen any so far; thanks for that link, though:-) [Update: examples posts from RBloggers – http://www.r-bloggers.com/?s=nhl ]
You’re welcome. THX for this article. :)
I tried nhlsrapr yesterday and it works fine.
And now I find another package – gamlr – http://www.cran.r-project.org/web/packages/gamlr/gamlr.pdf (page 8).
Basketball: https://github.com/cpsievert/bbscrapeR
Baseball: https://github.com/beanumber/openWAR
Darts: http://cran.r-project.org/web/packages/darts/index.html
Soccer: http://cran.r-project.org/web/packages/fbRanks/index.html
@MarchiMax Thanks.. the darts one surprised me:-)
Reblogged this on IT Lyderis.
You might also check out this website for fantasy football analysis: http://fantasyfootballanalytics.net/. It includes instructions for scraping and analyzing football data in R, and has free interactive tools based on R, Shiny, and OpenCPU.
Thanks – I have that already listed, I think?
Thanks!! I’ve got a Docker image with many of these at https://quay.io/repository/znmeb/osjourno-rde. I’ll be adding the others in the next release.
@znmeb_rfs Ooh – that looks interesting. There’s great scope, I think, for putting together distributions that make it easier for journalists to get started using this tools by removing the setup hassle, and effectively turning them in to run anywhere apps, eg launched by something like Kitematic?
I sketched one approach out around the ergast motor racing data that linked in MySQL database: https://blog.ouseful.info/2015/01/17/connecting-rstudio-and-mysql-docker-containers-the-ergastdb/
FWIW, I started trying to pull together list of various packages that support authoring Rmd/python across RStudio, IPython Notebooks, etc.: https://blog.ouseful.info/2015/06/06/ipython-markdown-opportunities/
Thank for getting this thread started!
There is a larger decathlon dataset (almost 8000 results) in the package GDAdata. The package also includes two smaller sports datasets, one for the World Speed Skiing Competition in 2011 and one for the longjump final in the 1968 Mexico Olympics—for those of us who remember the shock of Bob Beamon’s performance.
@Antony Thanks for those; FWIW, I noticed that the GDAdata package isn’t indexed on Inside-R?
I thought in the passed I’d done some decathlon treemaps, but closest I can find are some crude heptathlon sketches: https://blog.ouseful.info/2012/08/05/at-a-glance-view-of-2012-heptathlon-points-by-event/
Please, could you move this page to github (readme.md or something) because it is very interesting indeed.
You may also try to talk to people from https://github.com/ropensci/
Thanks.
As you specifically mentioned cricketR, you may be interested in a dashboard I have created based around that package
https://mytinyshinys.shinyapps.io/cricket
I found a package for scraping cricket data not listed:
http://nickzani.github.io/cricinfo.html
https://github.com/nickzani/Cricinfo
The guy behind the rBloodstock and RcappeR also has a package for data from English Premier League fantasy football.
https://github.com/durtal/fantasysocceR