<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>OUseful.Info, the blog... &#187; F1Stats &#8211; Correlations Between Qualifying, Grid and Race Classification</title>
	<atom:link href="http://blog.ouseful.info/2013/02/09/f1stats-correlations-between-qualifying-grid-and-race-classisification/feed/?withoutcomments=1" rel="self" type="application/rss+xml" />
	<link>http://blog.ouseful.info</link>
	<description>Trying to find useful things to do with emerging technologies in open education</description>
	<lastBuildDate>Wed, 19 Jun 2013 09:24:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.ouseful.info' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>OUseful.Info, the blog... &#187; F1Stats &#8211; Correlations Between Qualifying, Grid and Race Classification</title>
		<link>http://blog.ouseful.info</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.ouseful.info/osd.xml" title="OUseful.Info, the blog..." />
	<atom:link rel='hub' href='http://blog.ouseful.info/?pushpress=hub'/>
		<item>
		<title>F1Stats &#8211; Correlations Between Qualifying, Grid and Race Classification</title>
		<link>http://blog.ouseful.info/2013/02/09/f1stats-correlations-between-qualifying-grid-and-race-classisification/</link>
		<comments>http://blog.ouseful.info/2013/02/09/f1stats-correlations-between-qualifying-grid-and-race-classisification/#comments</comments>
		<pubDate>Sat, 09 Feb 2013 23:17:15 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Rstats]]></category>
		<category><![CDATA[f1datajunkie]]></category>
		<category><![CDATA[f1stats]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=9825</guid>
		<description><![CDATA[Following directly on from F1Stats – Visually Comparing Qualifying and Grid Positions with Race Classification, and continuing in my attempt to replicate some of the methodology and results used in A Tale of Two Motorsports: A Graphical-Statistical Analysis of How Practice, Qualifying, and Past SuccessRelate to Finish Position in NASCAR and Formula One Racing, here&#8217;s [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9825&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Following directly on from <a href="http://blog.ouseful.info/2013/01/30/f1stats-visually-comparing-qualifying-and-grid-positions-with-race-classification/">F1Stats – Visually Comparing Qualifying and Grid Positions with Race Classification</a>, and continuing in my attempt to replicate some of the methodology and results used in <a href="http://newton.uor.edu/FacultyFolder/Silva/NASCARvF1.pdf">A Tale of Two Motorsports: A Graphical-Statistical Analysis of How Practice, Qualifying, and Past SuccessRelate to Finish Position in NASCAR and Formula One Racing</a>, here&#8217;s a quick look at the correlation scores between the final practice, qualifying and grid positions and the final race classification.</p>
<p>I&#8217;ve already done brief review of what correlation means (sort of) in <a href="http://blog.ouseful.info/2013/01/25/f1stats-a-prequel-to-getting-started-with-rank-correlations/">F1Stats – A Prequel to Getting Started With Rank Correlations</a>, so I&#8217;m just going to dive straight in with some R code that shows how I set about trying to find the correlations between the different classifications:</p>
<p>Here&#8217;s the answer from the <s>back of the book</s> paper that we&#8217;re aiming for&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f1vnascarcorrelation.png"><img src="http://ouseful.files.wordpress.com/2013/02/f1vnascarcorrelation.png?w=700" alt="F1VNASCARcorrelation"   class="alignnone size-full wp-image-9826" /></a></p>
<p>Here&#8217;s what I got:</p>
<p><small>
<pre>&gt; corrs.df[order(corrs.df$V1),]
              V1   p3pos.int    qpos.int     grid.int racepos.raw    pval.grid    pval.qpos  pval.p3pos
2      AUSTRALIA  0.30075188  0.01503759  0.087218045           1 7.143421e-01 9.518408e-01 0.197072158
13      MALAYSIA  0.42706767  0.57293233  0.630075188           1 3.584362e-03 9.410805e-03 0.061725312
6          CHINA -0.26015038  0.57443609  0.514285714           1 2.183596e-02 9.193214e-03 0.266812583
3        BAHRAIN  0.13082707  0.73233083  0.739849624           1 2.900250e-04 3.601434e-04 0.581232598
16         SPAIN  0.25112782  0.80451128  0.804511278           1 2.179221e-05 2.179221e-05 0.284231482
14        MONACO  0.51578947  0.48120301  0.476691729           1 3.513870e-02 3.326706e-02 0.021403708
17        TURKEY  0.52330827  0.73082707  0.730827068           1 3.756531e-04 3.756531e-04 0.019344720
9  GREAT BRITAIN  0.65413534  0.83007519  0.830075188           1 8.921842e-07 8.921842e-07 0.002260234
8        GERMANY  0.32030075  0.46917293  0.452631579           1 4.657539e-02 3.844275e-02 0.168419054
10       HUNGARY  0.49649123  0.37017544  0.370175439           1 1.194050e-01 1.194050e-01 0.032293715
7         EUROPE  0.28120301  0.72030075  0.720300752           1 4.997719e-04 4.997719e-04 0.228898214
4        BELGIUM  0.06766917  0.62105263  0.621052632           1 4.222076e-03 4.222076e-03 0.777083014
11         ITALY  0.52932331  0.52481203  0.524812030           1 1.895282e-02 1.895282e-02 0.017815489
15     SINGAPORE  0.50526316  0.58796992  0.715789474           1 5.621214e-04 7.414170e-03 0.024579520
12         JAPAN  0.34912281  0.74561404  0.849122807           1 0.000000e+00 3.739715e-04 0.143204045
5         BRAZIL -0.51578947 -0.02105263 -0.007518797           1 9.771776e-01 9.316030e-01 0.021403708
1      ABU DHABI  0.42556391  0.66466165  0.628571429           1 3.684738e-03 1.824565e-03 0.062722332</pre>
<p></small></p>
<p>The paper mistakenly reports the grid values as the qualifying positions, so if we look down the grid.int column that I use to contain the correlation values between the <em>grid</em> and final classifications, we see they broadly match the values quoted in the paper. I also calculated the p-values and they seem to be a little bit off, but of the right order.</p>
<p>And here&#8217;s the R-code I used to get those results&#8230; The first chunk is just the loader, a refinement of the code I have used previously:</p>
<pre class="brush: r; title: ; notranslate">require(RSQLite)
require(reshape)

#Data downloaded from my f1com scraper on scraperwiki
f1 = dbConnect(drv=&quot;SQLite&quot;, dbname=&quot;f1com_megascraper.sqlite&quot;)

getRacesData.full=function(year='2012'){
  #Data query
  results.combined=dbGetQuery(f1,
                              paste('SELECT raceResults.year as year, qualiResults.pos as qpos, p3Results.pos as p3pos, raceResults.pos as racepos, raceResults.race as race, raceResults.grid as grid, raceResults.driverNum as driverNum, raceResults.raceNum as raceNum FROM raceResults, qualiResults, p3Results WHERE raceResults.year==',year,' and raceResults.year = qualiResults.year and raceResults.year = p3Results.year and raceResults.race = qualiResults.race and raceResults.race = p3Results.race and raceResults.driverNum = qualiResults.driverNum and raceResults.driverNum = p3Results.driverNum;',sep=''))
  
  #Data tidying
  results.combined=ddply(results.combined,.(race),mutate,racepos.raw=1:length(race))
  for (i in c('racepos','grid','qpos','p3pos','driverNum'))
    results.combined[[paste(i,'.int',sep='')]]=as.integer( as.character(results.combined[[i]]))
  results.combined$race=reorder(results.combined$race,results.combined$raceNum)
  
  results.combined
}

f1 = dbConnect(drv=&quot;SQLite&quot;, dbname=&quot;f1com_megascraper.sqlite&quot;)

results.combined=getRacesData.full(2009)
corrs.df[order(corrs.df$V1),]</pre>
<p>Here&#8217;s the actual correlation calculation &#8211; I use the <a href="http://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.html"><tt>cor</tt> function</a>:</p>
<pre class="brush: r; title: ; notranslate">#The cor() function returns data that looks like:
#            p3pos.int   qpos.int   grid.int racepos.raw
#p3pos.int   1.0000000 0.31578947 0.28270677  0.30075188
#qpos.int    0.3157895 1.00000000 0.97744361  0.01503759
#grid.int    0.2827068 0.97744361 1.00000000  0.08721805
#racepos.raw 0.3007519 0.01503759 0.08721805  1.00000000
#Row/col 4 relates to the correlation with the race classification, so for now just return that

corr.rank.race=function(results.combined,cmethod='spearman'){
  ##Correlations
  corrs=NULL
  #Run through the races
  for (i in levels(factor(results.combined$race))){
    results.classified = subset( results.combined,
                                 race==i,
                                 select=c('p3pos.int','qpos.int','grid.int','racepos.raw'))
    #print(i)
    #print( results.classified)
    cp=cor(results.classified,method=cmethod,use=&quot;complete.obs&quot;)
    #print(cp[4,])
    corrs=rbind(corrs,c(i,cp[4,]))
  }
  corrs.df=as.data.frame(corrs)
  
  signif=data.frame()
  for (i in levels(factor(results.combined$race))){
    results.classified = subset( results.combined,
                                 race==i,
                                 select=c('p3pos.int','qpos.int','grid.int','racepos.raw'))
    #p.value
    pval.grid=cor.test(results.classified$racepos.raw,results.classified$grid.int,method=cmethod,alternative = &quot;two.sided&quot;)$p.value
    pval.qpos=cor.test(results.classified$racepos.raw,results.classified$qpos.int,method=cmethod,alternative = &quot;two.sided&quot;)$p.value
    pval.p3pos=cor.test(results.classified$racepos.raw,results.classified$p3pos.int,method=cmethod,alternative = &quot;two.sided&quot;)$p.value

    signif=rbind(signif,data.frame(race=i,pval.grid=pval.grid,pval.qpos=pval.qpos,pval.p3pos=pval.p3pos))
  }

  corrs.df$qpos.int=as.numeric(as.character(corrs.df$qpos.int))
  corrs.df$grid.int=as.numeric(as.character(corrs.df$grid.int))
  corrs.df$p3pos.int=as.numeric(as.character(corrs.df$p3pos.int))
  
  corrs.df=merge(corrs.df,signif,by.y='race',by.x='V1')
  corrs.df$V1=factor(corrs.df$V1,levels=levels(results.combined$race))
  corrs.df
}

corrs.df=corr.rank.race(results.combined)</pre>
<p>It&#8217;s then trivial to plot the result:</p>
<pre class="brush: r; title: ; notranslate">require(ggplot2)
xRot=function(g,s=5,lab=NULL) g+theme(axis.text.x=element_text(angle=-90,size=s))+xlab(lab)

g=ggplot(corrs.df)+geom_point(aes(x=V1,y=grid.int))
g=xRot(g,6)+xlab(NULL)+ylab('Correlation')+ylim(0,1)
g=g+ggtitle('F1 2009 Correlation: grid and final classification')
g</pre>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f12009gridfinalcorr.png"><img src="http://ouseful.files.wordpress.com/2013/02/f12009gridfinalcorr.png?w=700" alt="f12009gridfinalcorr"   class="alignnone size-full wp-image-9829" /></a></p>
<p><a href="http://blog.ouseful.info/2013/01/25/f1stats-a-prequel-to-getting-started-with-rank-correlations/">Recalling that</a> there are different types of rank correlation function, specifically &#8220;Kendall’s τ (that is, Kendall’s Tau; this coefficient is based on concordance, which describes how the sign of the difference in rank between pairs of numbers in one data series is the same as the sign of the difference in rank between a corresponding pair in the other data series&#8221;, I wondered whether it would make sense to look at correlations under this measure to see whether there were any obvious looking differences compared to Spearmans&#8217;s rho, that might prompt us to look at the actual grid/race classifications to see which score appears to be more meaningful.</p>
<p>The easiest way to spot the difference is probably graphically:</p>
<pre class="brush: r; title: ; notranslate">corrs.df2=corr.rank.race(results.combined,'kendall')
corrs.df2[order(corrs.df2$V1),]

g=ggplot(corrs.df)+geom_point(aes(x=V1,y=grid.int),col='red',size=4)
g=g+geom_point(data=corrs.df2, aes(x=V1,y=grid.int),col='blue')
g=xRot(g,6)+xlab(NULL)+ylab('Correlation')+ylim(0,1)
g=g+ggtitle('F1 2009 Correlation: grid and final classification')
g</pre>
<p><small>
<pre>corrs.df2[order(corrs.df2$V1),]
              V1   p3pos.int    qpos.int    grid.int racepos.raw    pval.grid    pval.qpos  pval.p3pos
2      AUSTRALIA  0.17894737 -0.01052632  0.04210526           1 8.226829e-01 9.744669e-01 0.288378196
13      MALAYSIA  0.26315789  0.41052632  0.46315789           1 3.782665e-03 1.110136e-02 0.112604127
6          CHINA -0.20000000  0.41052632  0.35789474           1 2.832863e-02 1.110136e-02 0.233266557
3        BAHRAIN  0.07368421  0.51578947  0.52631579           1 8.408301e-04 1.099522e-03 0.677108239
16         SPAIN  0.17894737  0.64210526  0.64210526           1 2.506940e-05 2.506940e-05 0.288378196
14        MONACO  0.38947368  0.35789474  0.35789474           1 2.832863e-02 2.832863e-02 0.016406081
17        TURKEY  0.37894737  0.64210526  0.64210526           1 2.506940e-05 2.506940e-05 0.019784403
9  GREAT BRITAIN  0.46315789  0.63157895  0.63157895           1 3.622261e-05 3.622261e-05 0.003782665
8        GERMANY  0.23157895  0.31578947  0.30526316           1 6.380788e-02 5.475355e-02 0.164976406
10       HUNGARY  0.36842105  0.36842105  0.36842105           1 2.860214e-02 2.860214e-02 0.028602137
7         EUROPE  0.21052632  0.62105263  0.62105263           1 5.176962e-05 5.176962e-05 0.208628398
4        BELGIUM  0.02105263  0.46315789  0.46315789           1 3.782665e-03 3.782665e-03 0.923502331
11         ITALY  0.35789474  0.36842105  0.36842105           1 2.373450e-02 2.373450e-02 0.028328627
15     SINGAPORE  0.35789474  0.45263158  0.55789474           1 3.589956e-04 4.748310e-03 0.028328627
12         JAPAN  0.26315789  0.57894737  0.69590643           1 6.491222e-06 3.109641e-04 0.124796908
5         BRAZIL -0.37894737 -0.05263158 -0.04210526           1 8.226829e-01 7.732195e-01 0.019784403
1      ABU DHABI  0.34736842  0.61052632  0.55789474           1 3.589956e-04 7.321900e-05 0.033643947</pre>
<p></small></p>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f12009gridracecorrspearmanredvkendallblue.png"><img src="http://ouseful.files.wordpress.com/2013/02/f12009gridracecorrspearmanredvkendallblue.png?w=700" alt="f12009gridracecorrspearmanredvkendallblue"   class="alignnone size-full wp-image-9831" /></a></p>
<p>Hmm.. Kendall gives lower values for all races except Hungary &#8211; maybe put that on the &#8220;must look at Hungary compared to the other races&#8221; pile&#8230;;-)</p>
<p>One thing that did occur to me was that I have access to race data from other years, so it shouldn&#8217;t be too hard to see how the correlations play out over the years at different circuits (do grid/race correlations tend to be higher at some circuits, for example?).</p>
<pre class="brush: r; title: ; notranslate">testYears=function(years=2009:2012){
  bd=NULL
  for (year in years) {
    d=getRacesData.full(year)
    corrs.df=corr.rank.race(d)
    bd=rbind(bd,cbind(year,corrs.df))
  }
  bd
}

a=testYears(2006:2012)
ggplot(a)+geom_point(aes(x=year,y=grid.int))+facet_wrap(~V1)+ylim(0,1)

g=ggplot(a)+geom_boxplot(aes(x=V1,y=grid.int))
g=xRot(g)
g
</pre>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f1cirr2006_12.png"><img src="http://ouseful.files.wordpress.com/2013/02/f1cirr2006_12.png?w=700" alt="f1cirr2006_12"   class="alignnone size-full wp-image-9832" /></a></p>
<p>So Spain and Turkey look like they tend to the processional? Let&#8217;s see if a boxplot bears that out:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f12006_12boxplotbycct.png"><img src="http://ouseful.files.wordpress.com/2013/02/f12006_12boxplotbycct.png?w=700" alt="f12006_12boxplotbycct"   class="alignnone size-full wp-image-9835" /></a></p>
<p>How predictable have the years been, year on year?</p>
<pre class="brush: r; title: ; notranslate">g=ggplot(a)+geom_point(aes(x=V1,y=grid.int))+facet_wrap(~year)+ylim(0,1)
g=xRot(g)
g

ggplot(a)+geom_boxplot(aes(x=factor(year),y=grid.int))</pre>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f12006_12corrbyyear.png"><img src="http://ouseful.files.wordpress.com/2013/02/f12006_12corrbyyear.png?w=700" alt="f12006_12corrbyyear"   class="alignnone size-full wp-image-9833" /></a></p>
<p>And as a boxplot:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f12006_12processional.png"><img src="http://ouseful.files.wordpress.com/2013/02/f12006_12processional.png?w=700" alt="f12006_12processional"   class="alignnone size-full wp-image-9834" /></a></p>
<p>From a betting point of view, (eg <a href="http://blog.ouseful.info/2013/01/28/getting-started-with-f1-betting-data/">Getting Started with F1 Betting Data</a> and <a href="http://blog.ouseful.info/2013/01/16/the-basics-of-betting-as-a-way-of-keeping-score/">The Basics of Betting as a Way of Keeping Score…</a>) it possibly also makes sense to look at the correlation between the P3 times and the qualifying classification to see if there is a testable edge in the data when it comes to betting on quali?</p>
<p>I think I need to tweak my code slightly to make it easy to pull out correlations between specific columns, but that&#8217;ll have to wait for another day&#8230;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/9825/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/9825/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9825&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/02/09/f1stats-correlations-between-qualifying-grid-and-race-classisification/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f1vnascarcorrelation.png" medium="image">
			<media:title type="html">F1VNASCARcorrelation</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f12009gridfinalcorr.png" medium="image">
			<media:title type="html">f12009gridfinalcorr</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f12009gridracecorrspearmanredvkendallblue.png" medium="image">
			<media:title type="html">f12009gridracecorrspearmanredvkendallblue</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f1cirr2006_12.png" medium="image">
			<media:title type="html">f1cirr2006_12</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f12006_12boxplotbycct.png" medium="image">
			<media:title type="html">f12006_12boxplotbycct</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f12006_12corrbyyear.png" medium="image">
			<media:title type="html">f12006_12corrbyyear</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f12006_12processional.png" medium="image">
			<media:title type="html">f12006_12processional</media:title>
		</media:content>
	</item>
	</channel>
</rss>
