Another preparatory step before I start learning about stats in the context of Formula One… There are a couple of things I’m hoping to achieve when I actually start the journey: 1) finding ways of using stats to help to pull out patterns and events that are interesting from a storytelling or news perspective; 2) seeing if I can come up with any models that help forecast or predict race winners or performances over a race weekend.

There are a couple of problems I can foresee (?!) when it comes to the predictions: firstly, unlike horseracing, there aren’t that many F1 races each year to test the predictions against. Secondly, how do I even get a baseline start on the probabilities that driver X or team Y might end up on the podium?

It seems to me as if betting odds provide one publicly available “best guess” at the likelihood of any driver winning a race (a range of other bets are possible, of course, that give best guess predictions for other situations…) Having had a sheltered life, the world of betting is completely alien to me, so here’s what I think I’ve learned so far…

Odds are related to the anticipated likelihood of a particular event occurring and represent the winnings you get back (plus your stake) if a particular event happens. So 2/1 (2 to 1) fractional odds say: if the event happens, you’ll get 2 back for every 1 you placed, plus your stake back. If I bet 1 unit at 2/1 and win, I get 3 back: my original 1 plus 2 more. If I bet 3, I get 9 back: my original 3 plus 2 for every 1 I placed. Since I placed 3 1s, I get back 3 x 2 = 6 in winnings. Plus my original 3, which gives me 9 back on 3 staked, a profit of 6.

Odds are related (loosely) to the likelihood of an event happening. 2/1 odds represent a likelihood (probability) that an event will happen (1/3) = 0.333… of the time (to cur down on confusion between fractional odds and fractional probabilities, I’ll try to remember to put the fractional probabilities in brackets; so 1/2 is fractional odds of 2 to 1 on, and (1/2) is a probability of one half). To see how this works, consider an evens bet, fractional odds of 1/1, such as someone might make for tossing a coin. The probability of getting heads on a single toss is (1/2); the probability of getting tails is also (1/2). If I’m giving an absolutely fair book based on these likelihoods, I’d offer you even odds that you get a head, for example, on a single toss. After all, it’s (fifty/fifty) (fifty per cent chance either way) of whether a heads or tails will land face up. If there are three equally possible outcomes, (1/3) each, then I’d offer 2/1. After all, it’s twice as likely that something other than the single outcome you called would come up. If there are four possible outcomes, I’d offer 3/1, because it’s likely (if we played repeatedly) that three times out of four, you’d be wrong. So every three times out of four you’d lose and I’d take your stake. And on the fourth go, when you get it right, I give you your stake back for that round plus three for winning, so over all we’d be back where we started.

Decimal odds are a way of describing the return you get on a unit stake. So for a 2/1 bet, the decimal odds are three. For a 4/1 bet they’d be 5. For an N/1 bet they’d be 1+N. For an 1/2 (two to one on?) bet they’d be 1.5, for a 1/10 bet they’d be 1.1. So for a 1/M bet, 1+1/M. Generally, for an N/M bet, decimal odds are 1+N/M.

Decimal odds give an easy way in to calculating the likelihood of an event. Decimal odds of 3, (that is, fractional odds 2/1), describe an event that will happen (1/3) of the time in a fair game. That is (1/(decimal odds)) of the time. For fractional odds of N/M, you expect the event to happen with probability (1/(1+N/M))

In a completely fair book (?my phrase), the sum of the odds should lead to the summed probability of all possible events happening of 1. Bookmakers right the odds in their favour though, so the summed probabilities on a book will add up to more than 1 – this represents the bookmaker’s margin. If you’re betting on the toss of a coin with a bookie, they may offer you 99/100 for heads, evens for tails. If you play 400 games and bet 300 heads and 200 tails, winning 100 of each, you’ll overall stake 400, win 100 (plus 100 back) on tails along with 99 (plus 100 original stake) on heads. That is, you’ll have staked 400 and got back 399. The bookie will be 1 up overall. The summed probabilities add up to more than 1, since (1/2) + (1/(1+99/100)) = (0.5 + ~0.5025) > 1.

One off bets are no basis for a strategy. You need to bet regularly. One way of winning is to follow a *value betting* strategy where you place bets on outcomes that you predict are more likely than the odds you’re offered. This is counter to how the bookie works. If a bookie offers you fractional odds of 3/1 (expectation that the event will happen (1/4) of the time), and you have evidence that suggests it will happen (1/3) of the time (decimal odds of 3, fractional odds 2/1) then it’s worth your while repeatedly accepting the bet. After all, if you play 12 rounds, you’ll wager 12, and win on 12/3=4 occasions, getting 4 back (3 + your stake) each time, to give you a net return of 4 x 4 – 12 = 16 – 12 = +4. If the event had happened at the bookie’s predicted likelihood of 1/4 of the time, you would have got back ( 12/4 ) * 4 – 12 = +0 overall.

I’ve tried to do an R script to explore this:

#My vocabulary may be a bit confused herein #Corrections welcome in the comments from both statisticians and gamblers;-) #The offered odds price=4 #3/1 -> 3+1 That is, the decimal odds on fractional odds of 3/1 odds=1/price #The odds I've predicted myodds=1/3 #2/1 -> 1/(2+1) #The number of repeated trials in the game trials=10000 #The amount staked bet=1 #The experiment that we'll run trials number of times expt=function(trials,odds,myodds,bet){ #trial sets a uniform random number in ranger 0..1 df=data.frame(trial=runif(1:trials)) #The win condition happens at my predicted odds, ie if trial value is less than my odds #So if my odds are (1/4) = 0.25, a trial value in range 0..0.25 counts as a win # (df$trial<myodds) is TRUE if trial < myodds, which is cast by as.integer() to value 1 # If (df$trial<myodds) is FALSE, as.integer() returns 0 df$win=as.integer(df$trial<myodds) df$bet=bet #The winnings are calculated at the offered odds and are net of the stake #The df$win/odds = 1/odds = price (the decimal odds) on a win, else 0 #The actual win is the product of the stake (bet) and the decimal odds #The winnings are the return net of the initial amount staked #Where there is no win, the winnings are a loss of the value of the bet df$winnings=df$bet*df$win/odds-df$bet df } df.e=expt(trials,odds,myodds,bet) #The overall net winnings sum(df.e$winnings) #If myodds > odds, then I'm likely to end up winning on a value betting strategy #A way of running the experiment several times #There are probably better R protocols for doing this? runs=10 df.r=data.frame(s=numeric(),v=numeric()) for (i in 1:runs){ e=expt(trials,odds,myodds,bet) df.r=rbind(df.r,data.frame(s=sum(e$winnings),v=sd(e$winnings))) } #It would be nice to do some statistical graphics demonstrations of the different distributions of possible outcomes for different regimes. For example: ## different combinations of odds and myodds ## different numbers of trials ## different bet sizes

There are apparently also “efficient” ways of working out what stake to place (the “staking strategy”). The value strategy gives you the edge to win, long term, the staking strategy is how you maximise profits. See for example Horse Racing Staking and Betting: Horse racing basics part 2 or more mathematical treatments, such as The Kelly Criterion or Statistical Methodology for Profitable Sports Gambling. See also the notion of “betting rules”, eg A statistical development of fixed odds betting rules in soccer.

There is possibly some mileage to be had in getting to grips with R modeling using staking strategy models as an opening exercise, along with statistical graphical demonstrations of the same, but that is perhaps a little off topic for now…

To recap then, what I think I’ve learned is that we can test predictions against the benchmark of offered odds. The offered odds in themselves give us a ballpark estimate of what the (expert) bookmakers, as influenced by the betting/prediction market, expect the outcome of an event to be. Note that the odds are rigged to give summed probabilities over a range of events happening to be greater than 1, to build in a profit margin (does it have a proper name?) for the bookmaker. If we have a prediction model that appears to offer better odds on an event than the odds that are actually offered, and we believe in our prediction, we can run a value betting strategy on that basis and hopefully come out, over the long term, with a profit. The size of the profit is in part an indicator of how much more accurate our model is as a predictive model than the expert knowledge and prediction market basis that is used to set the bookie’s odds.

PS Re: the bookie’s profit, seems that this is called the *overround* or *vigorish*. The paper Forecasting sports tournaments by ratings of (prob)abilities: A comparison for the EURO 2008 makes clear the relationship between the bookies’ cut and the odds:

One thing that immediately springs to mind is to look at what sort of overround applies to different bookmakers around different sorts of F1 bets, and whether this is related to the apparent forecast accuracy of the odds offered, at least in ranking terms? (See the comments for a couple of links to papers on forecast accuracy of sports betting odds.)

PPS FWIW, as and when I come across R libraries to access bookmaker APIs, I’ll add them here:

– Betfair R package – access Betair API; another package (CRAN): Betfairly

You might also be interested in “proper scoring rules” as a better way of evaluating probabilistic forecasts once you have the outcomes. Betting scores you against the bookie; proper scoring rules score you against the outcomes.

Example of proper scoring rules used to evaluate predictions from US election pundits: http://appliedrationality.org/2012/11/09/was-nate-silver-the-most-accurate-2012-election-pundit/

@Doug Yes, you’re right of course… And I didn’t mention confidence limits on predictions either (I wonder if the margin, or whatever the term is, that the bookies build in to get summed probabilities > 1 is a reflection of their confidence in a reliable way?)

This whole journey is going to be very much baby steps for me… and I’m not totally sure what sorts of things I’ll find I need to cover to make sense of it all along the way (typical uncourse, eh?!)

Which is to say – all comments will be very much appreciated and could well guide what I look at and the way I wend my way through it all :-)

There’s an interesting model which i guess could be adapted to F1, in a paper titled “Probabilistic Modelling in Muliti-Competitor Games” by CJ Farmer. Might be worth a look.

Anyway, i’m enjoying your posts on F1 and am hoping to learn some R skills (complete novice atm). gl.

Thanks for that link… will check it out. Hope to get the first proper post in the series out over the weekend…:-)

All ideas welcome:-)

I’ve always called the bookmakers margin the ‘overround’ – i can’t recall it being called anything else in my reading on gambling, except in America where they talk about the ‘vigorish’ (or ‘vig’) which i assume to be the same.

I would suggest the book Precision: Statistical and Mathematical Methods in Horse Racing by CX Wong.

I believe some of the statistical approaches Wong recommends would apply.

RD

Thanks – will check it out…

A thought on how bookies set odds (nothing to do with R).

The odds offered by a bookmaker are only vaguely related to the probability of an outcome. They are strongly correlated with the payout of an outcome. Basically the bookmaker keeps track of how much he has to pay for outcome A and adjusts his odds accordingly. He has to a) be able to pay all the bettors and b) still make a profit.

The amount of money bet on an outcome is correlated (roughly) with the number of people betting on the outcome. The “smart money” might be on outcome A, but if the sentimental favourite is B then B will have the short odds.

My point is: if you’re using the odds as a signal, expect a lot of noise. as well.

Cheers,

Daggles.

@daggles Yes, thanks for that correction. I was packing a lot of assumptions in to the claim, particularly that the crowd was likely to be as good as anyone at forecasting a result and that the bookies would in turn reflect this in their pricing. One thing I could add to the list of things to do might be to see how effectively the odds (or ranking of odds) appear to reflect outcomes. For example, “Sports Forecasting: A Comparison of the Forecast Accuracy of Prediction Markets, Betting Odds and Tipsters” [preprint: http://www.ecm.bwl.uni-muenchen.de/publikationen/pdf/sports_forec.pdf final: http://test2.marketing.wiwi.uni-frankfurt.de/fileadmin/Publikationen/Spann-Skiera-2008-Journal-of-Forecasting_01.pdf%5D, “Issues in Sports Forecasting” [ http://www.gwu.edu/~forcpgm/2009-002.pdf ], [ http://test2.marketing.wiwi.uni-frankfurt.de/fileadmin/Publikationen/Spann-Skiera-2008-Journal-of-Forecasting_01.pdf ]

F1 further complicates the scene by allowing team orders, which casual betters might not factor in to their bets? So for example, if form on a particular weekend suggested that driver A were to just beat driver B in the same team, but driver B was in a championship race, or maybe had a home race where personal points tallies were not an issue, then form might not be the best guide of who would be allowed to take the higher position…?