Sports Data and R – Scope for a Thematic (Rather than Task) View? (Living Post)

Via my feeds, I noticed a package announcement today for cricketR!, a new package for analysing cricket performance data.

This got me wondering (again!) about what other sports related packages there might be out there, either in terms of functional thematic packages (to do with sport in general, or one sport in particular), or particular data packages, that either bundle up sports related data sets, or provide and API (that is, a wrapper for an official API, or a wrapper for a scraper that extracts data from one or more websites in a slightly scruffier way!)

This is just a first quick attempt, an unstructured listing that may also include data sets that are more generic than R-specific (eg CSV datafiles, or SQL database exports). I’ll try to keep this post updated as I find/hear about more packages, and also work a bit more on structuring it a little better. I really should pist this as a wiki somewhere – or perhaps curate something on Github?

  • generic:
    • SportsAnalytics [CRAN]: “infrastructure for sports analysis. Anyway, currently it is a selection of data sets, functions to fetch sports data, examples, and demos”.
    • PlayerRatings [CRAN]: “schemes for estimating player or team skill based on dynamic updating. Implemented methods include Elo, Glicko and Stephenson” (via Twitter: @UTVilla)
  • athletics:
    • olympic {ade4} [Inside-R packages]: “performances of 33 men’s decathlon at the Olympic Games (1988)”.
    • decathlon {GDAdata} [CRAN]: “Top performances in the Decathlon from 1985 to 2006.” (via comments: Antony Unwin)
    • MexLJ {GDAdata} [CRAN]: “Data from the longjump final in the 1968 Mexico Olympics.” (via comments: Antony Unwin)
  • baseball:
  • basketball:
  • biathlon:
  • chess:
  •  cricket:
  • darts:
    • darts [CRAN]: “Statistical Tools to Analyze Your Darts Game” (via comments: @MarchiMax)
  • football (American football):
  • football (soccer):
    • engsoccerdata [Github]: “a repository for complete soccer datasets, along with some built-in functions for analyzing parts of the data. Currently includes English League data, FA Cup data, Playoff data, some European leagues (Spain, Germany, Italy, Holland).”. Citation: James P. Curley (2015). engsoccerdata: English Soccer Data 1871-2015. R package version 0.1.4
    • UKSoccer {vcd} [Inside-R packages]: data “on the goals scored by Home and Away teams in the Premier Football League, 1995/6 season.”.
    • Soccer {PASWR} [Inside-R packages]: “how many goals were scored in the regulation 90 minute periods of World Cup soccer matches from 1990 to 2002”.
  • fbRanks [CRAN]: “Association Football (Soccer) Ranking via Poisson Regression: time dependent Poisson regression and a record of goals scored in matches to rank teams via estimated attack and defense strengths” (via comments: @MarchiMax)
  • golf:
  • gymnastics:
  • horse racing:
    • RcappeR [Github]: “tools to aid the analysis and handicapping of Thoroughbred Horse Racing” (via Twitter: @UTVilla)
    • rBloodstock [Github]: “datasets from Thoroughbred Bloodstock Sales, Tattersalls sales from 2010 to 2015 (incomplete)” (via Twitter: @UTVilla)
  • ice hockey:
    • nhlscrapr [CRAN]: “routines for extracting play-by-play game data for regular-season and playoff
      NHL games, particularly for analyses that depend on which players are on the ice”
      . [via comments – Triplethink]
    • hockey {gamlr} [Inside-R packages]: “information about play configuration and the players on ice (including goalies) for every goal from 2002-03 to 2012-13 NHL seasons” [via comments – Triplethink]
    • nhl-pbp [Github]: “code to parse and analyze NHL PBP data using R”.
    • ( liigadata (python) – utility for parsing Finnish ice hockey league game data from website)
  • motor sport:
  • skiing:
    • SpeedSki {GDAdata} [CRAN]: “World Speed Skiing Competition, Verbier 21st April, 2011.” (via comments: Antony Unwin)
  • sailing: I didn’t find any R packages, but I did find a sailing regatta results data interchange format: ISAF XML Regatta Reporting (XRR) Data Format
  • snooker:
  • swimming: I didn’t find any R packages, but I did find a swimming results data interchange format: Lenex; and a site that publishes data in that format: Omega Timing.
  • tennis:

It would perhaps make more sense to try to collect rather more structured (meta)data for each package. For example: homepage, sport/discipline; analysis, data (package or API), or analysis and data; if data: year-range, source, data coverage (e.g. table column headings); if analysis, brief synopsis of tools available (e.g. chart generators).

If you know of any others, please let me know via the comments and I’ll try to keep this page updated with a reasonably current list.

As well as packages, here are some links to blog posts that look at sports data analysis using R:

Again, if you can recommend further posts, please let me know via the comments.

PS other sports data interchange formats: SportsML-G2

What Happens If Yahoo! Pipes Dies?

News appeared recently that Yahoo’s video editing site Jumpcut has stopped accepting new uploads, and users are being encouraged to move over to flickr. (On the odd occasion I’ve played with online video suites, I’ve tended to use Jumpcut, so I’m not overjoyed about this. Just FYI, Jaycut or Photobucket (which uses Adobe Premiere Express) are my fallback positions…)

This news got me thinking – again – about what my fallback position would be if Yahoo! Pipes disappeared. (Regular readers – and anyone who’s seen me give a mashup related presentation lately – will know I’m a bit of a pipes junkie;-)

So here’s what I’ve been saying I’m going to do for a long time – and maybe by posting it I’ll provoke myself into doing something about it next year…

  1. Set up a wiki… Yahoo Pipes Code Bindings, or similar;
  2. for each block in Yahoo! Pipes, post the following:
    • an image of the block;
    • a code equivalent for that block; (e.g. a fragment of Python, PHP, Javascript or Google Mashup Editor code that is functionally equivalent to the block);
  3. that’s it… or maybe show a minimal example pipe using the block, and an equivalent, working, PHP, Python, Javascript or Google Mashup Editor programme;

What this would mean is that a screenshot of a Yahoo pipe could act as a specification for a a feed processing programme, and the bindings from blocks to code would allow a translation from the visual pipe description to some actual (working) code.

That would be okay for starters, and would at least mean I’d be able to ‘rescue’ large amounts of the functionality of pipes I’ve blogged about without having to rethink all the algorithms, or work out too much (if any) of the code. Cut and paste job from the code equivalents on the wiki… (err…?!)

As well as rescuing the functionality of the pipe, this approach also has the advantage of making Yahoo pipes acceptable as a rapid prototyping code for a list a quick rush of code that can be run on a server elsewhere.

How could the process be improved? Well, taking a cue from the AWSZone Sctarchpads (which, err, appear to be down at the moment?), it’d be nice to be able to just generate the code from the actual pipe.

How might we be able to do that? I’m not sure, but I’d like to think the following would be possible:

  1. using a browser extension, or Greasemonkey script, capture a Javascript object representation of a pipe from the Edit view of that pipe;
  2. parse the Javascript representation of the pipe and translate each Pipe block to the appropriate code binding;

So the vision here is you could edit a pipe, click a button, and generate the code equivalent of the pipe. (Of course, it’d be really nice if Pipes offered an “export pipe as code” option natively;-)

(After all, Zoho Creator Deploys to Google App Engine: “When you open an application in Zoho Creator in edit mode, you’ll see a new option ‘Deploy in App Engine’ under ‘More Actions’ menu (on the top). This option will let you generate and download the Python code (App Engine supports deployment of Python only apps) of your Zoho Creator application which you can then deploy to Google App Engine. … Zoho Creator essentially acts as an IDE for Google App Engine.” So why shouldn’t Pipes pipelines also deploy elsewhere too? Why shouldn’t “Yahoo pipes essentially act as an IDE for feed-powered pipelines in Python, PHP, Javascript and the Google Mashup Editor”?)

PS If anyone wants to create a wiki and start this process off, please be my guest (I’ll be largely offline over the Christmas period, so won’t be able to run with this idea until the New Year, if then…)