<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>OUseful.Info, the blog... &#187; f1datajunkie</title>
	<atom:link href="http://blog.ouseful.info/tag/f1datajunkie/feed/?withoutcomments=1" rel="self" type="application/rss+xml" />
	<link>http://blog.ouseful.info</link>
	<description>Trying to find useful things to do with emerging technologies in open education</description>
	<lastBuildDate>Sun, 19 May 2013 11:35:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.ouseful.info' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>OUseful.Info, the blog... &#187; f1datajunkie</title>
		<link>http://blog.ouseful.info</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.ouseful.info/osd.xml" title="OUseful.Info, the blog..." />
	<atom:link rel='hub' href='http://blog.ouseful.info/?pushpress=hub'/>
		<item>
		<title>F1Stats &#8211; Correlations Between Qualifying, Grid and Race Classification</title>
		<link>http://blog.ouseful.info/2013/02/09/f1stats-correlations-between-qualifying-grid-and-race-classisification/</link>
		<comments>http://blog.ouseful.info/2013/02/09/f1stats-correlations-between-qualifying-grid-and-race-classisification/#comments</comments>
		<pubDate>Sat, 09 Feb 2013 23:17:15 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Rstats]]></category>
		<category><![CDATA[f1datajunkie]]></category>
		<category><![CDATA[f1stats]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=9825</guid>
		<description><![CDATA[Following directly on from F1Stats – Visually Comparing Qualifying and Grid Positions with Race Classification, and continuing in my attempt to replicate some of the methodology and results used in A Tale of Two Motorsports: A Graphical-Statistical Analysis of How Practice, Qualifying, and Past SuccessRelate to Finish Position in NASCAR and Formula One Racing, here&#8217;s [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9825&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Following directly on from <a href="http://blog.ouseful.info/2013/01/30/f1stats-visually-comparing-qualifying-and-grid-positions-with-race-classification/">F1Stats – Visually Comparing Qualifying and Grid Positions with Race Classification</a>, and continuing in my attempt to replicate some of the methodology and results used in <a href="http://newton.uor.edu/FacultyFolder/Silva/NASCARvF1.pdf">A Tale of Two Motorsports: A Graphical-Statistical Analysis of How Practice, Qualifying, and Past SuccessRelate to Finish Position in NASCAR and Formula One Racing</a>, here&#8217;s a quick look at the correlation scores between the final practice, qualifying and grid positions and the final race classification.</p>
<p>I&#8217;ve already done brief review of what correlation means (sort of) in <a href="http://blog.ouseful.info/2013/01/25/f1stats-a-prequel-to-getting-started-with-rank-correlations/">F1Stats – A Prequel to Getting Started With Rank Correlations</a>, so I&#8217;m just going to dive straight in with some R code that shows how I set about trying to find the correlations between the different classifications:</p>
<p>Here&#8217;s the answer from the <s>back of the book</s> paper that we&#8217;re aiming for&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f1vnascarcorrelation.png"><img src="http://ouseful.files.wordpress.com/2013/02/f1vnascarcorrelation.png?w=700" alt="F1VNASCARcorrelation"   class="alignnone size-full wp-image-9826" /></a></p>
<p>Here&#8217;s what I got:</p>
<p><small>
<pre>&gt; corrs.df[order(corrs.df$V1),]
              V1   p3pos.int    qpos.int     grid.int racepos.raw    pval.grid    pval.qpos  pval.p3pos
2      AUSTRALIA  0.30075188  0.01503759  0.087218045           1 7.143421e-01 9.518408e-01 0.197072158
13      MALAYSIA  0.42706767  0.57293233  0.630075188           1 3.584362e-03 9.410805e-03 0.061725312
6          CHINA -0.26015038  0.57443609  0.514285714           1 2.183596e-02 9.193214e-03 0.266812583
3        BAHRAIN  0.13082707  0.73233083  0.739849624           1 2.900250e-04 3.601434e-04 0.581232598
16         SPAIN  0.25112782  0.80451128  0.804511278           1 2.179221e-05 2.179221e-05 0.284231482
14        MONACO  0.51578947  0.48120301  0.476691729           1 3.513870e-02 3.326706e-02 0.021403708
17        TURKEY  0.52330827  0.73082707  0.730827068           1 3.756531e-04 3.756531e-04 0.019344720
9  GREAT BRITAIN  0.65413534  0.83007519  0.830075188           1 8.921842e-07 8.921842e-07 0.002260234
8        GERMANY  0.32030075  0.46917293  0.452631579           1 4.657539e-02 3.844275e-02 0.168419054
10       HUNGARY  0.49649123  0.37017544  0.370175439           1 1.194050e-01 1.194050e-01 0.032293715
7         EUROPE  0.28120301  0.72030075  0.720300752           1 4.997719e-04 4.997719e-04 0.228898214
4        BELGIUM  0.06766917  0.62105263  0.621052632           1 4.222076e-03 4.222076e-03 0.777083014
11         ITALY  0.52932331  0.52481203  0.524812030           1 1.895282e-02 1.895282e-02 0.017815489
15     SINGAPORE  0.50526316  0.58796992  0.715789474           1 5.621214e-04 7.414170e-03 0.024579520
12         JAPAN  0.34912281  0.74561404  0.849122807           1 0.000000e+00 3.739715e-04 0.143204045
5         BRAZIL -0.51578947 -0.02105263 -0.007518797           1 9.771776e-01 9.316030e-01 0.021403708
1      ABU DHABI  0.42556391  0.66466165  0.628571429           1 3.684738e-03 1.824565e-03 0.062722332</pre>
<p></small></p>
<p>The paper mistakenly reports the grid values as the qualifying positions, so if we look down the grid.int column that I use to contain the correlation values between the <em>grid</em> and final classifications, we see they broadly match the values quoted in the paper. I also calculated the p-values and they seem to be a little bit off, but of the right order.</p>
<p>And here&#8217;s the R-code I used to get those results&#8230; The first chunk is just the loader, a refinement of the code I have used previously:</p>
<pre class="brush: r; title: ; notranslate">require(RSQLite)
require(reshape)

#Data downloaded from my f1com scraper on scraperwiki
f1 = dbConnect(drv=&quot;SQLite&quot;, dbname=&quot;f1com_megascraper.sqlite&quot;)

getRacesData.full=function(year='2012'){
  #Data query
  results.combined=dbGetQuery(f1,
                              paste('SELECT raceResults.year as year, qualiResults.pos as qpos, p3Results.pos as p3pos, raceResults.pos as racepos, raceResults.race as race, raceResults.grid as grid, raceResults.driverNum as driverNum, raceResults.raceNum as raceNum FROM raceResults, qualiResults, p3Results WHERE raceResults.year==',year,' and raceResults.year = qualiResults.year and raceResults.year = p3Results.year and raceResults.race = qualiResults.race and raceResults.race = p3Results.race and raceResults.driverNum = qualiResults.driverNum and raceResults.driverNum = p3Results.driverNum;',sep=''))
  
  #Data tidying
  results.combined=ddply(results.combined,.(race),mutate,racepos.raw=1:length(race))
  for (i in c('racepos','grid','qpos','p3pos','driverNum'))
    results.combined[[paste(i,'.int',sep='')]]=as.integer( as.character(results.combined[[i]]))
  results.combined$race=reorder(results.combined$race,results.combined$raceNum)
  
  results.combined
}

f1 = dbConnect(drv=&quot;SQLite&quot;, dbname=&quot;f1com_megascraper.sqlite&quot;)

results.combined=getRacesData.full(2009)
corrs.df[order(corrs.df$V1),]</pre>
<p>Here&#8217;s the actual correlation calculation &#8211; I use the <a href="http://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.html"><tt>cor</tt> function</a>:</p>
<pre class="brush: r; title: ; notranslate">#The cor() function returns data that looks like:
#            p3pos.int   qpos.int   grid.int racepos.raw
#p3pos.int   1.0000000 0.31578947 0.28270677  0.30075188
#qpos.int    0.3157895 1.00000000 0.97744361  0.01503759
#grid.int    0.2827068 0.97744361 1.00000000  0.08721805
#racepos.raw 0.3007519 0.01503759 0.08721805  1.00000000
#Row/col 4 relates to the correlation with the race classification, so for now just return that

corr.rank.race=function(results.combined,cmethod='spearman'){
  ##Correlations
  corrs=NULL
  #Run through the races
  for (i in levels(factor(results.combined$race))){
    results.classified = subset( results.combined,
                                 race==i,
                                 select=c('p3pos.int','qpos.int','grid.int','racepos.raw'))
    #print(i)
    #print( results.classified)
    cp=cor(results.classified,method=cmethod,use=&quot;complete.obs&quot;)
    #print(cp[4,])
    corrs=rbind(corrs,c(i,cp[4,]))
  }
  corrs.df=as.data.frame(corrs)
  
  signif=data.frame()
  for (i in levels(factor(results.combined$race))){
    results.classified = subset( results.combined,
                                 race==i,
                                 select=c('p3pos.int','qpos.int','grid.int','racepos.raw'))
    #p.value
    pval.grid=cor.test(results.classified$racepos.raw,results.classified$grid.int,method=cmethod,alternative = &quot;two.sided&quot;)$p.value
    pval.qpos=cor.test(results.classified$racepos.raw,results.classified$qpos.int,method=cmethod,alternative = &quot;two.sided&quot;)$p.value
    pval.p3pos=cor.test(results.classified$racepos.raw,results.classified$p3pos.int,method=cmethod,alternative = &quot;two.sided&quot;)$p.value

    signif=rbind(signif,data.frame(race=i,pval.grid=pval.grid,pval.qpos=pval.qpos,pval.p3pos=pval.p3pos))
  }

  corrs.df$qpos.int=as.numeric(as.character(corrs.df$qpos.int))
  corrs.df$grid.int=as.numeric(as.character(corrs.df$grid.int))
  corrs.df$p3pos.int=as.numeric(as.character(corrs.df$p3pos.int))
  
  corrs.df=merge(corrs.df,signif,by.y='race',by.x='V1')
  corrs.df$V1=factor(corrs.df$V1,levels=levels(results.combined$race))
  corrs.df
}

corrs.df=corr.rank.race(results.combined)</pre>
<p>It&#8217;s then trivial to plot the result:</p>
<pre class="brush: r; title: ; notranslate">require(ggplot2)
xRot=function(g,s=5,lab=NULL) g+theme(axis.text.x=element_text(angle=-90,size=s))+xlab(lab)

g=ggplot(corrs.df)+geom_point(aes(x=V1,y=grid.int))
g=xRot(g,6)+xlab(NULL)+ylab('Correlation')+ylim(0,1)
g=g+ggtitle('F1 2009 Correlation: grid and final classification')
g</pre>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f12009gridfinalcorr.png"><img src="http://ouseful.files.wordpress.com/2013/02/f12009gridfinalcorr.png?w=700" alt="f12009gridfinalcorr"   class="alignnone size-full wp-image-9829" /></a></p>
<p><a href="http://blog.ouseful.info/2013/01/25/f1stats-a-prequel-to-getting-started-with-rank-correlations/">Recalling that</a> there are different types of rank correlation function, specifically &#8220;Kendall’s τ (that is, Kendall’s Tau; this coefficient is based on concordance, which describes how the sign of the difference in rank between pairs of numbers in one data series is the same as the sign of the difference in rank between a corresponding pair in the other data series&#8221;, I wondered whether it would make sense to look at correlations under this measure to see whether there were any obvious looking differences compared to Spearmans&#8217;s rho, that might prompt us to look at the actual grid/race classifications to see which score appears to be more meaningful.</p>
<p>The easiest way to spot the difference is probably graphically:</p>
<pre class="brush: r; title: ; notranslate">corrs.df2=corr.rank.race(results.combined,'kendall')
corrs.df2[order(corrs.df2$V1),]

g=ggplot(corrs.df)+geom_point(aes(x=V1,y=grid.int),col='red',size=4)
g=g+geom_point(data=corrs.df2, aes(x=V1,y=grid.int),col='blue')
g=xRot(g,6)+xlab(NULL)+ylab('Correlation')+ylim(0,1)
g=g+ggtitle('F1 2009 Correlation: grid and final classification')
g</pre>
<p><small>
<pre>corrs.df2[order(corrs.df2$V1),]
              V1   p3pos.int    qpos.int    grid.int racepos.raw    pval.grid    pval.qpos  pval.p3pos
2      AUSTRALIA  0.17894737 -0.01052632  0.04210526           1 8.226829e-01 9.744669e-01 0.288378196
13      MALAYSIA  0.26315789  0.41052632  0.46315789           1 3.782665e-03 1.110136e-02 0.112604127
6          CHINA -0.20000000  0.41052632  0.35789474           1 2.832863e-02 1.110136e-02 0.233266557
3        BAHRAIN  0.07368421  0.51578947  0.52631579           1 8.408301e-04 1.099522e-03 0.677108239
16         SPAIN  0.17894737  0.64210526  0.64210526           1 2.506940e-05 2.506940e-05 0.288378196
14        MONACO  0.38947368  0.35789474  0.35789474           1 2.832863e-02 2.832863e-02 0.016406081
17        TURKEY  0.37894737  0.64210526  0.64210526           1 2.506940e-05 2.506940e-05 0.019784403
9  GREAT BRITAIN  0.46315789  0.63157895  0.63157895           1 3.622261e-05 3.622261e-05 0.003782665
8        GERMANY  0.23157895  0.31578947  0.30526316           1 6.380788e-02 5.475355e-02 0.164976406
10       HUNGARY  0.36842105  0.36842105  0.36842105           1 2.860214e-02 2.860214e-02 0.028602137
7         EUROPE  0.21052632  0.62105263  0.62105263           1 5.176962e-05 5.176962e-05 0.208628398
4        BELGIUM  0.02105263  0.46315789  0.46315789           1 3.782665e-03 3.782665e-03 0.923502331
11         ITALY  0.35789474  0.36842105  0.36842105           1 2.373450e-02 2.373450e-02 0.028328627
15     SINGAPORE  0.35789474  0.45263158  0.55789474           1 3.589956e-04 4.748310e-03 0.028328627
12         JAPAN  0.26315789  0.57894737  0.69590643           1 6.491222e-06 3.109641e-04 0.124796908
5         BRAZIL -0.37894737 -0.05263158 -0.04210526           1 8.226829e-01 7.732195e-01 0.019784403
1      ABU DHABI  0.34736842  0.61052632  0.55789474           1 3.589956e-04 7.321900e-05 0.033643947</pre>
<p></small></p>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f12009gridracecorrspearmanredvkendallblue.png"><img src="http://ouseful.files.wordpress.com/2013/02/f12009gridracecorrspearmanredvkendallblue.png?w=700" alt="f12009gridracecorrspearmanredvkendallblue"   class="alignnone size-full wp-image-9831" /></a></p>
<p>Hmm.. Kendall gives lower values for all races except Hungary &#8211; maybe put that on the &#8220;must look at Hungary compared to the other races&#8221; pile&#8230;;-)</p>
<p>One thing that did occur to me was that I have access to race data from other years, so it shouldn&#8217;t be too hard to see how the correlations play out over the years at different circuits (do grid/race correlations tend to be higher at some circuits, for example?).</p>
<pre class="brush: r; title: ; notranslate">testYears=function(years=2009:2012){
  bd=NULL
  for (year in years) {
    d=getRacesData.full(year)
    corrs.df=corr.rank.race(d)
    bd=rbind(bd,cbind(year,corrs.df))
  }
  bd
}

a=testYears(2006:2012)
ggplot(a)+geom_point(aes(x=year,y=grid.int))+facet_wrap(~V1)+ylim(0,1)

g=ggplot(a)+geom_boxplot(aes(x=V1,y=grid.int))
g=xRot(g)
g
</pre>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f1cirr2006_12.png"><img src="http://ouseful.files.wordpress.com/2013/02/f1cirr2006_12.png?w=700" alt="f1cirr2006_12"   class="alignnone size-full wp-image-9832" /></a></p>
<p>So Spain and Turkey look like they tend to the processional? Let&#8217;s see if a boxplot bears that out:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f12006_12boxplotbycct.png"><img src="http://ouseful.files.wordpress.com/2013/02/f12006_12boxplotbycct.png?w=700" alt="f12006_12boxplotbycct"   class="alignnone size-full wp-image-9835" /></a></p>
<p>How predictable have the years been, year on year?</p>
<pre class="brush: r; title: ; notranslate">g=ggplot(a)+geom_point(aes(x=V1,y=grid.int))+facet_wrap(~year)+ylim(0,1)
g=xRot(g)
g

ggplot(a)+geom_boxplot(aes(x=factor(year),y=grid.int))</pre>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f12006_12corrbyyear.png"><img src="http://ouseful.files.wordpress.com/2013/02/f12006_12corrbyyear.png?w=700" alt="f12006_12corrbyyear"   class="alignnone size-full wp-image-9833" /></a></p>
<p>And as a boxplot:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/02/f12006_12processional.png"><img src="http://ouseful.files.wordpress.com/2013/02/f12006_12processional.png?w=700" alt="f12006_12processional"   class="alignnone size-full wp-image-9834" /></a></p>
<p>From a betting point of view, (eg <a href="http://blog.ouseful.info/2013/01/28/getting-started-with-f1-betting-data/">Getting Started with F1 Betting Data</a> and <a href="http://blog.ouseful.info/2013/01/16/the-basics-of-betting-as-a-way-of-keeping-score/">The Basics of Betting as a Way of Keeping Score…</a>) it possibly also makes sense to look at the correlation between the P3 times and the qualifying classification to see if there is a testable edge in the data when it comes to betting on quali?</p>
<p>I think I need to tweak my code slightly to make it easy to pull out correlations between specific columns, but that&#8217;ll have to wait for another day&#8230;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/9825/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/9825/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9825&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/02/09/f1stats-correlations-between-qualifying-grid-and-race-classisification/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f1vnascarcorrelation.png" medium="image">
			<media:title type="html">F1VNASCARcorrelation</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f12009gridfinalcorr.png" medium="image">
			<media:title type="html">f12009gridfinalcorr</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f12009gridracecorrspearmanredvkendallblue.png" medium="image">
			<media:title type="html">f12009gridracecorrspearmanredvkendallblue</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f1cirr2006_12.png" medium="image">
			<media:title type="html">f1cirr2006_12</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f12006_12boxplotbycct.png" medium="image">
			<media:title type="html">f12006_12boxplotbycct</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f12006_12corrbyyear.png" medium="image">
			<media:title type="html">f12006_12corrbyyear</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/f12006_12processional.png" medium="image">
			<media:title type="html">f12006_12processional</media:title>
		</media:content>
	</item>
		<item>
		<title>Getting Started with F1 Betting Data</title>
		<link>http://blog.ouseful.info/2013/01/28/getting-started-with-f1-betting-data/</link>
		<comments>http://blog.ouseful.info/2013/01/28/getting-started-with-f1-betting-data/#comments</comments>
		<pubDate>Mon, 28 Jan 2013 19:06:15 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[f1stats]]></category>
		<category><![CDATA[Rstats]]></category>
		<category><![CDATA[Uncourse]]></category>
		<category><![CDATA[f1betting]]></category>
		<category><![CDATA[f1datajunkie]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=9605</guid>
		<description><![CDATA[As part of my &#8220;learn about Formula One Stats&#8221; journey, one of the things I wanted to explore was how F1 betting odds change over the course of a race weekend, along with how well they predict race weekend outcomes. Courtesy of @flutterF1, I managed to get a peek of some betting data from one [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9605&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>As part of my &#8220;<a href="http://blog.ouseful.info/tag/f1stats/?order=asc">learn about Formula One Stats</a>&#8221; journey, one of the things I wanted to explore was how F1 betting odds change over the course of a race weekend, along with how well they predict race weekend outcomes.</p>
<p>Courtesy of @flutterF1, I managed to get a peek of some betting data from one of the race weekends last year year. In this preliminary post, I&#8217;ll describe some of the ways I started to explore the data initially, before going on to look at some of the things it might be able to tell us in more detail in a future post.</p>
<p>(I&#8217;m guessing that it&#8217;s possible to buy historical data(?), as well as collecting it yourself it for personal research purposes? eg Betfair have <a href="http://bdp.betfair.com/index.php?option=com_content&amp;task=view&amp;id=33&amp;Itemid=62">an api</a>, and there&#8217;s at least one R library to access it: <a href="http://cran.r-project.org/web/packages/betfairly/index.html">betfairly</a>.)</p>
<p>The application I&#8217;ll be using to explore the data is <a href="http://rstudio.org/">RStudio</a>, the cross-platform integrated development environment for the R programming language. Note that I will be making use of some R packages that are not part of the base install, so you will need to load them yourself. (I really need to find a robust safe loader that installs any required packages first if they have not already been installed.)</p>
<p>The data @flutterF1 showed me came in two spreadsheets. The first (filename convention <tt>RACE Betfair Odds Race Winner.xlsx</tt>) appears to contain a list of frequently sampled timestamped odds from Betfair, presumably, for each driver recorded over the course of the weekend. The second (filename convention <tt>RACE Bookie Odds.xlsx</tt>) has multiple sheets that contain less frequently collected odds from different online bookmakers for each driver on a variety of bets &#8211; race winner, pole position, top 6 finisher, podium, fastest lap, first lap leader, winner of each practice session, and so on.</p>
<p>Both the spreadsheets were supplied as Excel spreadsheets. I guess that many folk who collect betting data store it as spreadsheets, so this recipe for loading spreadsheets in to an R environment might be useful to them. The <tt>gdata</tt> library provides hooks for working with Excel documents, so I opted for that.</p>
<p>Let&#8217;s look at the Betfair prices spreadsheet first. The top line is junk, so we&#8217;ll skip it on load, and add in our own column names, based on John&#8217;s description of the data collected in this file:</p>
<blockquote><p>The US Betfair Odds Race Winner.xslx is a raw data collection with 5 columns&#8230;.<br />
1) The timestap (an annoying format but there is a reason for this albeit a pain to work with).<br />
2) The driver.<br />
3) The last price money was traded at.<br />
4) the total amount of money traded on that driver so far.<br />
5) If the race is in ‘In-Play’. True means the race has started – however this goes from the warm up lap, not the actual start.</p>
<p>To reduce the amount of data I only record it when the price traded changes or if the amount changes.</p></blockquote>
<p>Looking through the datafile, they appear to be some gotchas, so these need cleansing out:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/01/datafile-gotchas.png"><img src="http://ouseful.files.wordpress.com/2013/01/datafile-gotchas.png?w=700" alt="datafile gotchas"   class="alignnone size-full wp-image-9607" /></a></p>
<p>Here&#8217;s my initial loader script:</p>
<pre class="brush: r; title: ; notranslate">library(gdata)
xl=read.xls('US Betfair Odds Race Winner.xlsx',skip = 1)
colnames(xl)=c('dt','driver','odds','amount','racing')

#Cleansing pass
bf.odds=subset(xl,racing!='')

str(bf.odds)
'data.frame':	10732 obs. of  5 variables:
 $ dt    : Factor w/ 2707 levels &quot;11/16/2012 12:24:52 AM&quot;,..: 15 15 15 15 15 15 15 15 15 15 ...
 $ driver: Factor w/ 34 levels &quot; Recoding Began&quot;,..: 19 11 20 16 18 29 26 10 31 17 ...
 $ odds  : num  3.9 7 17 16.5 24 140 120 180 270 550 ...
 $ amount: num  1340 557 120 118 195 ...
 $ racing: int  0 0 0 0 0 0 0 0 0 0 ...

#Generate a proper datetime field from the dt column
#This is a hacked way of doing it. How do I do it properly?
bf.odds$dtt=as.POSIXlt(gsub(&quot;T&quot;, &quot; &quot;, bf.odds$dt))

#If we rerun str(), we get the following extra line in the results:
# $ dtt   : POSIXlt, format: &quot;2012-11-11 11:00:08&quot; &quot;2012-11-11 11:00:08&quot; &quot;2012-11-11 11:00:08&quot; &quot;2012-11-11 11:00:08&quot; ...
</pre>
<p>Here&#8217;s what the raw data, as loaded, looks like to the eye:<br />
<a href="http://ouseful.files.wordpress.com/2013/01/betfair-spreadsheet.png"><img src="http://ouseful.files.wordpress.com/2013/01/betfair-spreadsheet.png?w=700" alt="Betfair spreadsheet"   class="alignnone size-full wp-image-9606" /></a></p>
<p>Having loaded the data, cleansed it, and cast a proper datetime column, it&#8217;s easy enough to generate a few plots:</p>
<pre class="brush: r; title: ; notranslate">#We're going to make use of the ggplot2 graphics library
library(ggplot2)

#Let's get a quick feel for bets around each driver
g=ggplot(xl)+geom_point(aes(x=dtt,y=odds))+facet_wrap(~driver,scales=&quot;free_y&quot;)
g=g+theme(axis.text.x=element_text(angle=-90))
g

#Let's look in a little more detail around a particular driver within a particular time window
g=ggplot(subset(xl,driver==&quot;Lewis Hamilton&quot;))+geom_point(aes(x=dtt,y=odds))+facet_wrap(~driver,scales=&quot;free_y&quot;)
g=g+theme(axis.text.x=element_text(angle=-90))
g=g+ scale_x_datetime(limits=c(as.POSIXct('2012/11/18 18:00:00'), as.POSIXct('2012/11/18 22:00:00')))
g</pre>
<p>Here are the charts (obviously lacking in caption, tidy labels and so on).</p>
<p>Firstly, the odds by driver:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/01/odds-by-driver.png"><img src="http://ouseful.files.wordpress.com/2013/01/odds-by-driver.png?w=700" alt="odds by driver"   class="alignnone size-full wp-image-9608" /></a></p>
<p>Secondly, zooming in on a particular driver in a particular time window:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/01/timewindow.png"><img src="http://ouseful.files.wordpress.com/2013/01/timewindow.png?w=700" alt="timewindow"   class="alignnone size-full wp-image-9609" /></a></p>
<p>That all seems to work okay, so how about the other spreadsheet?</p>
<pre class="brush: r; title: ; notranslate">#There are several sheets to choose from, named as follows:
#Race,Pole,Podium,Points,SC,Fastest Lap, Top 6, Hattrick,Highest Scoring,FP1, ReachQ3,FirstLapLeader, FP2, FP3

#Load in data from a particular specified sheet
race.odds=read.xls('USA Bookie Odds.xlsx',sheet='Race')

#The datetime column appears to be in Excel datetime format, so cast it into something meaningful
race.odds$tTime=as.POSIXct((race.odds$Time-25569)*86400, tz=&quot;GMT&quot;,origin=as.Date(&quot;1970-1-1&quot;))
#Note that I am not I checking for gotcha rows, though maybe I should...?

#Use the directlabels package to help tidy up the display a little
library(directlabels)

#Let's just check we've got something loaded - prune the display to rule out the longshots
g=ggplot(subset(race.odds,Bet365&lt;30),aes(x=tTime,y=Bet365,group=Runner,col=Runner,label=Runner))
g=g+geom_line()+theme_bw()+theme(legend.position = &quot;none&quot;)
g=g+geom_dl(method=list('top.bumpup',cex=0.6))
g=g+scale_x_datetime(expand=c(0.15,0))
g</pre>
<p>Here&#8217;s a view over the drivers&#8217; odds to win, with the longshots pruned out:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/01/example-race-odds-by-driver.png"><img src="http://ouseful.files.wordpress.com/2013/01/example-race-odds-by-driver.png?w=700" alt="example race odds by driver"   class="alignnone size-full wp-image-9611" /></a></p>
<p>With a little bit of fiddling, we can also look to see how the odds for a particular driver compare for different bookies:</p>
<pre class="brush: r; title: ; notranslate">#Let's see if we can also plot the odds by bookie
colnames(race.odds)
#[1] &quot;Time&quot; &quot;Runner&quot; &quot;Bet365&quot; &quot;SkyBet&quot; &quot;Totesport&quot; &quot;Boylesport&quot; &quot;Betfred&quot;     
# [8] &quot;SportingBet&quot; &quot;BetVictor&quot; &quot;BlueSQ&quot; &quot;Paddy.Power&quot; &quot;Stan.James&quot; &quot;X888Sport&quot; &quot;Bwin&quot;        
#[15] &quot;Ladbrokes&quot; &quot;X188Bet&quot; &quot;Coral&quot; &quot;William.Hill&quot; &quot;You.Win&quot; &quot;Pinnacle&quot; &quot;X32.Red&quot;     
#[22] &quot;Betfair&quot; &quot;WBX&quot; &quot;Betdaq&quot; &quot;Median&quot; &quot;Median..&quot; &quot;Min&quot; &quot;Max&quot;         
#[29] &quot;Range&quot; &quot;tTime&quot;   

#We can remove items from this list using something like this:
tmp=colnames(race.odds)
#tmp=tmp[tmp!='Range']
tmp=tmp[tmp!='Range' &amp; tmp!='Median' &amp; tmp!='Median..' &amp; tmp!='Min' &amp; tmp!= 'Max' &amp; tmp!= 'Time']
#Then we can create a subset of cols
race.odds.data=subset(race.odds,select=tmp)

#Melt the data
library(reshape)
race.odds.data.m=melt(race.odds.data,id=c('tTime','Runner'))

#head( race.odds.data.m)
#                tTime                 Runner variable value
#1 2012-11-11 19:07:01 Sebastian Vettel (Red)   Bet365  2.37
#2 2012-11-11 19:07:01   Lewis Hamilton (McL)   Bet365  3.25
#3 2012-11-11 19:07:01  Fernando Alonso (Fer)   Bet365  6.00
#...

#Now we can plot how the different bookies compare
g=ggplot(subset(race.odds.data.m,value&lt;30 &amp; Runner=='Sebastian Vettel (Red)'),aes(x=tTime,y=value,group=variable,col=variable,label=variable))
g=g+geom_line()+theme_bw()+theme(legend.position = &quot;none&quot;)
g=g+geom_dl(method=list('top.bumpup',cex=0.6))
g=g+scale_x_datetime(expand=c(0.15,0))
g</pre>
<p><a href="http://ouseful.files.wordpress.com/2013/01/bookies-odds.png"><img src="http://ouseful.files.wordpress.com/2013/01/bookies-odds.png?w=700" alt="bookies odds"   class="alignnone size-full wp-image-9610" /></a></p>
<p>Okay, so that all seems to work&#8230; Now I can start pondering what sensible questions to ask&#8230;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/9605/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/9605/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9605&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/01/28/getting-started-with-f1-betting-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/datafile-gotchas.png" medium="image">
			<media:title type="html">datafile gotchas</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/betfair-spreadsheet.png" medium="image">
			<media:title type="html">Betfair spreadsheet</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/odds-by-driver.png" medium="image">
			<media:title type="html">odds by driver</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/timewindow.png" medium="image">
			<media:title type="html">timewindow</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/example-race-odds-by-driver.png" medium="image">
			<media:title type="html">example race odds by driver</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/bookies-odds.png" medium="image">
			<media:title type="html">bookies odds</media:title>
		</media:content>
	</item>
		<item>
		<title>My Personal Intro to F1 Race Statistics</title>
		<link>http://blog.ouseful.info/2013/01/11/my-personal-intro-to-f1-race-statistics/</link>
		<comments>http://blog.ouseful.info/2013/01/11/my-personal-intro-to-f1-race-statistics/#comments</comments>
		<pubDate>Fri, 11 Jan 2013 00:07:06 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Anything you want]]></category>
		<category><![CDATA[f1stats]]></category>
		<category><![CDATA[Rstats]]></category>
		<category><![CDATA[Uncourse]]></category>
		<category><![CDATA[f1datajunkie]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=9458</guid>
		<description><![CDATA[One of the many things I keep avoiding is statistics. I&#8217;ve never really been convinced about the 5% significance level thing; as far as I can tell, hardly anything that&#8217;s interesting normally distributes; all the counting that&#8217;s involved just confuses me; and I never really got to grips with confidently combining probabilities. I find a [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9458&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>One of the many things I keep avoiding is statistics. I&#8217;ve <a href="http://stats.stackexchange.com/questions/10510/what-are-good-references-containing-arguments-against-null-hypothesis-significan">never really been convinced about the 5% significance level thing</a>; as far as I can tell, <a href="http://exploringdatablog.blogspot.co.uk/2011/09/long-tail-of-pareto-distribution.html">hardly anything that&#8217;s interesting normally distributes</a>; all the counting that&#8217;s involved just confuses me; and I never really got to grips with confidently combining probabilities. I find a lot of <a href="http://www.open.edu/openlearn/science-maths-technology/mathematics-and-statistics/statistics/confusing-terms-statistics">statistics related language impenetrable</a> too, with an obscure vocabulary and some very peculiar usage. (Regular readers of this blog know that&#8217;s true here, as well ;-)</p>
<p>So this year I&#8217;m going to try to do some stats, and use some stats, and see if I can find out from personal need and personal interest whether they lead me to any insights about, or stories hidden within, various data sets I keep playing with. So things like: looking for patterns or trends, looking for outliers, and comparing one thing with another. If I can find any statistics that appear to suggest particular betting odds look particularly favourable, that might be interesting too. (As Nate Silver suggests, betting, even fantasy betting, is a great way of keeping score&#8230;)</p>
<p>Note that what I hope will turn into a series of posts should not be viewed as tutorial notes &#8211; they&#8217;re far more likely to be akin to student notes on a problem set the student is trying to work through, without having attended any of the required courses, and without having taken the time to read through a proper tutorial on the subject. Nor do I intend to to set out with a view to learning particular statistical techniques. Instead, I&#8217;ll be dipping into the world of stats looking for useful tools to see if they help me explore particular questions that come to mind and then try to apply them cut-and-past fashion, which is how I approach most of my coding!</p>
<p>Bare naked learning, in other words.</p>
<p>So if you thought I had any great understanding about stats &#8211; in fact, any understanding at all &#8211; I&#8217;m afraid I&#8217;m going to disabuse you of that notion. As to my use of the R statistical programming language, that&#8217;s been pretty much focussed on using it for generating graphics in a hacky way. (I&#8217;ve also found it hard, in the past, plotting pixels on screen and page in a programmable way, but R graphics libraries such as ggplot2 make it really easy at a high level of abstraction&#8230;:-)</p>
<p>That&#8217;s the setting then&#8230; Now: #f1stats. What&#8217;s that all about?</p>
<p>Notwithstanding the above (that this isn&#8217;t about learning a particular set of stats methods defined in advance) I did do a quick trawl looking for &#8220;F1 stats tutorials&#8221; to see if there were any that I could crib from directly; but my search didn&#8217;t turn up much that was directly and immediately useful (if you know of anything that might be, please post a link in the comments). There were a few things that looked like they might be interesting, so here&#8217;s a quick dump of the relevant&#8230;</p>
<ul>
<li>First up, I&#8217;ve been reading Nate Silver&#8217;s <a href="http://www.amazon.co.uk/The-Signal-Noise-Science-Prediction/dp/1846147522/?tag=ouseful-21">The Signal and the Noise</a>, which mentions the aging stats and aging models for baseball players. I found a paper on <a href="http://www.csef.it/WP/wp226.pdf">The Age Productivity Gradient: Evidence from a sample of F1 Drivers</a>, which hasn&#8217;t got too many scary equations in, so I may try to replicate that and then bring the models up to date (the paper is dated 2009). It would have been so nice if the authors had published code equivalents in R that I could have played with directly, but I haven&#8217;t been able to find it if they did. I also found a paper on <a href="http://cowles.econ.yale.edu/P/cp/p12b/p1255.pdf">Estimated Age Effects in Baseball</a>, again with equations but no code, but it may provide additional clues. From a quick skim, I think there may be some mileage in trying to get my head round different ways of comparing <em>rankings</em>.</li>
<li><a href="http://newton.uor.edu/FacultyFolder/Silva/NASCARvF1.pdf">A Tale of Two Motorsports: A Graphical-Statistical Analysis of How Practice, Qualifying, and Past Success Relate to Finish Position in NASCAR and Formula One Racing</a> is perhaps an easier thing to try to copy for starters, though&gt;</li>
<li>The article <a href="http://journal.sjdm.org/11/rh18/rh18.html">The wisdom of ignorant crowds: predicting sport outcomes by mere recognition</a> explores a simple tournament winner predicting strategy  based on how recognisable the names of competitors are. (I guess social media metrics might be a proxy for recognition? Hmm.. could test that I suppose with reference to this paper?) One thing that caught my eye were a couple of simple schemes for benchmarking different prediction models, which might be something I could pull on if I end up exploring prediction models?</li>
<li>NASCAR results have featured in several papers (I think there&#8217;s also a NASCAR dataset available in R?) so I&#8217;ll probably try dipping in to them at some point to see if I can do similar things with F1 data. For example, an analysis of <a href="http://www.amstat.org/publications/jse/v14n3/datasets.winner.html">NASCAR Winston Cup Race Results for 1975-2003</a>; a couple of papers on <a href="http://madison.byu.edu/racing/racing.html">hierarchical modelling of auto-racing results</a>; and some <a href="http://www.stats.ox.ac.uk/~doucet/caron_doucet_bayesianbradleyterry.pdf">Bayesian</a> <a href="http://research.microsoft.com/pubs/81134/plackett.pdf">inference</a> <a href="http://sites.stat.psu.edu/~dhunter/papers/bt.pdf">stuff</a> that I guess is really beyond me for now and that I really really could do with a pre-built R libraries for; </li>
<li>an MSc thesis I&#8217;ve referred to before on <a href="http://www.enm.bris.ac.uk/teaching/projects/2008_09/ih5137/">Prediction of Formula One Race Results Using Driver Characteristics</a> has some handy ideas that I might be able to draw on if I have a look at laptime data;</li>
<li>One of the the things I&#8217;ve been pondering is ways of ranking drivers based on fast lap times (eg during qualifying, vs. during the race). Although not about motor sport, or any sort of racing, <a href="http://www.thesportjournal.org/article/new-method-ranking-total-driving-performance-pga-tour">A New Method for Ranking Total Driving Performance on the PGA Tour</a> has a metric I may be able to bastardise in a Formula One context. The same periodical also has an article on <a href="http://www.thesportjournal.org/article/do-reliable-predictors-exist-outcomes-nascar-races">Do Reliable Predictors Exist for the Outcomes of NASCAR Races?</a>, the techniques of which might be applicable to F1? <a href="http://cluteonline.com/journals/index.php/JBER/article/viewArticle/2403">Predicting The Outcome Of NASCAR Races: The Role Of Driver Experience</a> looks to be in a similar vein too&#8230;</li>
<li>A paper on <a href="http://belkcollegeofbusiness.uncc.edu/jberkow1/BerkowitzJSE.pdf">Outcome Uncertainty in NASCAR</a> looks at how attendance and TV audience figures are influenced by race expectations, which might be something that could also be explored in context of UK F1 TV audience figures. That said, the notion of &#8220;outcome uncertainty&#8221; itself, and related measures, might also be worth exploring in their own right?</li>
</ul>
<p>If you know of any other relevant looking papers or articles, please post a link in the comments.</p>
<p>[MORE LINKS...<br />
- <a href="http://david.stadelmann-online.com/pdf/0015_formula1.pdf">Who is the Best Formula 1 Driver? An Econometric Analysis</a><br />
]</p>
<p>I was hoping to finish this post with a couple of quick R hacks around some F1 datasets, but I&#8217;ve just noticed that today, as in yesterday, has become tomorrow, as in today, and this post is probably already long enough&#8230; So it&#8217;ll have to wait for another day&#8230;</p>
<p>PS FWIW, I also note the arrival of the <a href="http://analytics.theiegroup.com/sports-london?sf8427527=1">Sports Analytics Innovation Summit</a> in London in March&#8230; I doubt I have the impact required to make it as a <a href="http://analytics.theiegroup.com/sports-london/media-partners">media partner</a> though&#8230; Although maybe OpenLearn does&#8230;?!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/9458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/9458/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9458&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/01/11/my-personal-intro-to-f1-race-statistics/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>
	</item>
		<item>
		<title>Emergent Social Interest Mapping &#8211; Red Bull Racing Facebook Group</title>
		<link>http://blog.ouseful.info/2012/12/05/emergent-social-interest-mapping-red-bull-racing-facebook-group/</link>
		<comments>http://blog.ouseful.info/2012/12/05/emergent-social-interest-mapping-red-bull-racing-facebook-group/#comments</comments>
		<pubDate>Wed, 05 Dec 2012 23:29:13 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[ESP]]></category>
		<category><![CDATA[f1datajunkie]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=9258</guid>
		<description><![CDATA[With the possibility that my effectively unlimited Twitter API key will die at some point in the Spring with the Twitter API upgrade, I&#8217;m starting to look around for alternative sources of interest signal (aka getting ready to say &#8220;bye, bye, Twitter interest mapping&#8221;). And Facebook groups look like they may offer once possibility&#8230; Some [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9258&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>With the possibility that my effectively unlimited Twitter API key <a href="https://dev.twitter.com/calendar">will die at some point in the Spring</a> with the Twitter API upgrade, I&#8217;m starting to look around for alternative sources of interest signal (aka getting ready to say &#8220;bye, bye, Twitter interest mapping&#8221;). And Facebook groups look like they may offer once possibility&#8230;</p>
<p>Some time ago, I did a demo of how to map the the common <em>Facebook Likes</em> of my Facebook friends (<a href="http://blog.ouseful.info/2012/01/04/social-interest-positioning-visualising-facebook-friends-likes/">Social Interest Positioning – Visualising Facebook Friends’ Likes With Data Grabbed Using Google Refine</a>). In part inspired by a conversation today about profiling the interests of members of particular Facebook groups, I thought I&#8217;d have a quick peek at the Facebook API to see if it&#8217;s possible to grab the membership list of arbitrary, open Facebook groups, and then pull down the list of <em>Likes</em> made by the members of the group.</p>
<p>As with my other social positioning/social interest mapping experiments, the idea behind this approach is broadly this: users express interest through some sort of public action, such as following a particular Twitter account that can be associated with a particular interest. In this case, the signal I&#8217;m associating with an expression of interest is a Facebook Like. To locate something in interest space, we need to be able to detect a set of users associated with that thing, identify each of their interests, and then find interests they have in common. These shared interests (ideally over and above a &#8220;background level of shared interest&#8221;, aka the Stephen Fry effect (from  Twitter, where a large number of people in any set of people appear to follow Stephen Fry oblivious of other more pertinent shared interests that are peculiar to that set of people) are then assumed to be representative of the interests associated with the thing. In this case, the thing is a Facebook group, the users associated with the thing are the group members, and the interests associated with the thing are the things commonly liked by members of the group.</p>
<p>Simples.</p>
<p>So for example, here is the social interest positioning of the <a href="https://www.facebook.com/groups/2311573955/">Red Bull Racing group</a> on Facebook, based on a sample of 3000 members of the group. Note that a significant number of these members returned no likes, either because they haven&#8217;t liked anything, or because their personal privacy settings are such that they do not publicly share their likes. </p>
<p><img src="http://ouseful.files.wordpress.com/2012/12/rbr_fbgroup_commonlikes.png?w=700&#038;h=700" alt="rbr_fbGroup_commonLikes" width="700" height="700" class="alignnone size-full wp-image-9259" /></p>
<p>As we might expect, the members of this group also appear to have an interest in other Formula One related topics, from F1 in general, to various F1 teams and drivers, and to motorsport and motoring in general (top half of the map). We also find music preferences (the cluster to the left of the map) and TV programmes (centre bottom of the map) that are of common interest, though I have no idea yet whether these are background radiation interests (that is, the Facebook equivalent of the Stephen Fry effect on Twitter) or are peculiar to this group. I&#8217;m not sure whether the cluster of beverage related preferences at the bottom right corner of the map is notable either?</p>
<p>This information is visualised using Gephi, using data grabbed via the following Python script:</p>
<pre class="brush: python; title: ; notranslate">#This is a really simple script:
##Grab the list of members of a Facebook group (no paging as yet...)
###For each member, try to grab their Likes

import urllib,simplejson,csv,argparse

#Grab a copy of a current token from an example Facebook API call.
#Something a bit like this:
#AAAAAAITEghMBAOMYrWLBTYpf9ciZBLXaw56uOt2huS7C4cCiOiegEZBeiZB1N4ZCqHgQZDZD

parser = argparse.ArgumentParser(description='Generate social positioning map around a Facebook group')

parser.add_argument('-gid',default='2311573955',help='Facebook group ID')
#gid='2311573955'

parser.add_argument('-FBTOKEN',help='Facebook API token')

#Quick test - output file is simple 2 column CSV that we can render in Gephi
fn='fbgroupliketest_'+str(gid)+'.csv'
writer=csv.writer(open(fn,'wb+'),quoting=csv.QUOTE_ALL)

uids=[]

def getGroupMembers(gid):
	gurl='https://graph.facebook.com/'+str(gid)+'/members?limit=5000&amp;access_token='+FBTOKEN
	data=simplejson.load(urllib.urlopen(gurl))
	if &quot;error&quot; in data:
		print &quot;Something seems to be going wrong - check OAUTH key?&quot;
		print data['error']['message'],data['error']['code'],data['error']['type']
		exit(-1)
	else:
		return data

#Grab the likes for a particular Facebook user by Facebook User ID
def getLikes(uid,gid):
	#Should probably implement at least a simple cache here
	lurl=&quot;https://graph.facebook.com/&quot;+str(uid)+&quot;/likes?access_token=&quot;+FBTOKEN
	ldata=simplejson.load(urllib.urlopen(lurl))
	print ldata
	
	if len(ldata['data'])&gt;0:	
		for i in ldata['data']:
			if 'name' in i:
				writer.writerow([str(uid),i['name'].encode('ascii','ignore')])
				#We could colour nodes based on category, etc, though would require richer output format.
				#In the past, I have used the networkx library to construct &quot;native&quot; graph based representations of interest networks.
				if 'category' in i: 
					print str(uid),i['name'],i['category']

#For each user in the group membership list, get their likes				
def parseGroupMembers(groupData,gid):
	for user in groupData['data']:
		uid=user['id']
		writer.writerow([str(uid),str(gid)])
		#x is just a fudge used in progress reporting
		x=0
		#Prevent duplicate fetches
		if uid not in uids:
			getLikes(user['id'],gid)
			uids.append(uid)
			#Really crude progress reporting
			print x
			x=x+1
	#need to handle paging?
	#parse next page URL and recall this function


groupdata=getGroupMembers(gid)
parseGroupMembers(groupdata,gid)</pre>
<p>Note that I have no idea whether or not this is in breach of Facebook API terms and conditions, nor have I reflected on the ethical implications of running this sort of analysis, over and the above remarking that it&#8217;s the same general approach I apply to mapping social interests on Twitter.</p>
<p>As to where next with this? It brings into focus again the question of identifying common interests pertinent to this particular group, compared to background popular interest that might be expressed by any random set of people. But having got a new set of data to play with, it will perhaps make it easier to test the generalisability of any model or technique I do come up with for filtering out, or normalising against, background interest.</p>
<p>Other directions this could go? Using a single group to bootstrap a walk around the interest space? For example, in the above case, trying to identify groups associated with Sebastian Vettel, or F1, and then repeating the process? It might also make sense to look at the categories of the notable shared interests; (from a quick browse, these include, for example, things like <em>Movie</em>, <em>Product/service</em>, <em>Public figure</em>, <em>Games/toys</em>, <em>Sports Company</em>, <em>Athlete</em>, <em>Interest</em>, <em>Sport</em>; is there a full vocabulary available, I wonder? How might we use this information?)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/9258/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/9258/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9258&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2012/12/05/emergent-social-interest-mapping-red-bull-racing-facebook-group/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/12/rbr_fbgroup_commonlikes.png" medium="image">
			<media:title type="html">rbr_fbGroup_commonLikes</media:title>
		</media:content>
	</item>
		<item>
		<title>More Shiny Goodness &#8211; Tinkering With the Ergast Motor Racing Data API</title>
		<link>http://blog.ouseful.info/2012/12/04/more-shiny-goodness-tinkering-with-the-ergast-motor-racing-data-api/</link>
		<comments>http://blog.ouseful.info/2012/12/04/more-shiny-goodness-tinkering-with-the-ergast-motor-racing-data-api/#comments</comments>
		<pubDate>Tue, 04 Dec 2012 14:14:40 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Rstats]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[ergastAPI]]></category>
		<category><![CDATA[f1datajunkie]]></category>
		<category><![CDATA[Rstudio]]></category>
		<category><![CDATA[Shiny]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=9230</guid>
		<description><![CDATA[I had a bit of a play with Shiny over the weekend, using the Ergast Motor Racing Data API and the magical Shiny library for R, that makes building interactive, browser based applications around R a breeze. As this is just a quick heads-up/review post, I&#8217;ll largely limit myself to a few screenshots. When I [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9230&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I had a bit of a play with Shiny over the weekend, using the <a href="http://ergast.com/mrd/">Ergast Motor Racing Data API</a> and the magical <a href="http://www.rstudio.com/shiny/">Shiny library for R</a>, that makes building interactive, browser based applications around R a breeze.</p>
<p>As this is just a quick heads-up/review post, I&#8217;ll largely limit myself to a few screenshots. When I get a chance, I&#8217;ll try to do a bit more of a write-up, though this may actually just take the form of more elaborate documentation of the app, both within the code and in the form of explanatory text in the app itself.</p>
<p>If you want to try ou the app, you can find an instance here: <a href="http://glimmer.rstudio.com/psychemedia/f1ergastdemo">F1 2012 Laptime Explorer</a>. The <a href="https://gist.github.com/4188912">code is also available</a>.</p>
<p>Here&#8217;s the initial view &#8211; the frist race of the season is selected as a default and data loaded in. The driver list is for all drivers represented during the season.</p>
<p><img src="http://ouseful.files.wordpress.com/2012/12/f1-2012-shiny-ergast-explorer.png?w=700&#038;h=404" alt="f1 2012 shiny ergast explorer" width="700" height="404" class="alignnone size-full wp-image-9254" /></p>
<p>THe driver selectors allow us to just display traces for selected drivers.</p>
<p>The <em>Race History chart</em> is a classic results chart. It show the difference between the race time to date for each driver, by lap, compared to the average lap time for the winner times the lap number. (As such, this is an offline statistic &#8211; it is calculated when the winner&#8217;s overall average laptime is known).</p>
<p><img src="http://ouseful.files.wordpress.com/2012/12/race-hisotry-classic-chart.png?w=700&#038;h=350" alt="race hisotry - classic chart" width="700" height="350" class="alignnone size-full wp-image-9253" /></p>
<p>Variants of the classic Race History chart are possible, for example, using different base line times, but I haven&#8217;t implemented any of them  &#8211; or the necessary UI controls. Yet&#8230;</p>
<p>The <em>Lap Chart</em> is another classic:</p>
<p><img src="http://ouseful.files.wordpress.com/2012/12/lap-chart-another-classic.png?w=700&#038;h=356" alt="Lap chart - another classic" width="700" height="356" class="alignnone size-full wp-image-9252" /></p>
<p>Annotations for this chart are also supported, describing all drivers who final status was not &#8220;Finished&#8221;.</p>
<p><img src="http://ouseful.files.wordpress.com/2012/12/lap-chart-with-annotations.png?w=700&#038;h=344" alt="lap chart with annotations" width="700" height="344" class="alignnone size-full wp-image-9251" /></p>
<p>The <em>Lap Evolution chart</em> shows how each driver&#8217;s laptime evolved over the course of the race compared with the fastest overall recorded laptime.</p>
<p><img src="http://ouseful.files.wordpress.com/2012/12/lap-evolution.png?w=700&#038;h=348" alt="Lap evolution" width="700" height="348" class="alignnone size-full wp-image-9250" /></p>
<p>The <em>Personal Lap Evolution</em> chart shows how each driver&#8217;s laptime evolved over the course of the race compared with their personal fastest laptime.</p>
<p><img src="http://ouseful.files.wordpress.com/2012/12/personal-lap-evolution.png?w=700&#038;h=349" alt="Personal lap evolution" width="700" height="349" class="alignnone size-full wp-image-9249" /></p>
<p>The <em>Personal Deltas Chart</em> shows the difference between one laptime and the next for each driver.</p>
<p><img src="http://ouseful.files.wordpress.com/2012/12/personal-deltas.png?w=700&#038;h=366" alt="Personal deltas" width="700" height="366" class="alignnone size-full wp-image-9248" /></p>
<p>The <em>Race Summary Chart</em> is a chart of my own design that tries to capture notable features relating to race position &#8211; the grid position (blue circle), final classification (red circle), position at the end of the first lap (the + or horizontal bar). The violin plot shows the distribution of how many laps the driver spent in each race position. Where the chart is wide, the driver spent a large number of laps in that position.</p>
<p><img src="http://ouseful.files.wordpress.com/2012/12/race-summary.png?w=700&#038;h=422" alt="race summary" width="700" height="422" class="alignnone size-full wp-image-9247" /></p>
<p>The x-axis ordering pulls out different features about how the race progressed. I need to add in a control that lets the user select different orderings.</p>
<p>Finally, the <em>Fast Lap text scatterplot</em> shows the fastest laptime for each driver and the lap at which they recorded it.</p>
<p><img src="http://ouseful.files.wordpress.com/2012/12/fastlaps.png?w=700&#038;h=357" alt="fastlaps" width="700" height="357" class="alignnone size-full wp-image-9246" /></p>
<p>So &#8211; that&#8217;s a quick review of the app. All in all it took maybe 3 hours getting my head round the data parsing, 2-3 hours figuring what I wanted to do and learning how to do it in Shiny, and a couple of hours doing it/starting to document/annotate it. Next time, it&#8217;ll be much quicker&#8230;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/9230/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/9230/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9230&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2012/12/04/more-shiny-goodness-tinkering-with-the-ergast-motor-racing-data-api/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/12/f1-2012-shiny-ergast-explorer.png" medium="image">
			<media:title type="html">f1 2012 shiny ergast explorer</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/12/race-hisotry-classic-chart.png" medium="image">
			<media:title type="html">race hisotry - classic chart</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/12/lap-chart-another-classic.png" medium="image">
			<media:title type="html">Lap chart - another classic</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/12/lap-chart-with-annotations.png" medium="image">
			<media:title type="html">lap chart with annotations</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/12/lap-evolution.png" medium="image">
			<media:title type="html">Lap evolution</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/12/personal-lap-evolution.png" medium="image">
			<media:title type="html">Personal lap evolution</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/12/personal-deltas.png" medium="image">
			<media:title type="html">Personal deltas</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/12/race-summary.png" medium="image">
			<media:title type="html">race summary</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/12/fastlaps.png" medium="image">
			<media:title type="html">fastlaps</media:title>
		</media:content>
	</item>
		<item>
		<title>Interactive Scenarios With Shiny &#8211; The Race to the F1 2012 Drivers&#8217; Championship</title>
		<link>http://blog.ouseful.info/2012/11/18/interactive-scenarios-with-shiny-the-race-to-the-f1-2012-drivers-championship/</link>
		<comments>http://blog.ouseful.info/2012/11/18/interactive-scenarios-with-shiny-the-race-to-the-f1-2012-drivers-championship/#comments</comments>
		<pubDate>Sun, 18 Nov 2012 18:38:25 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Rstats]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[ddj]]></category>
		<category><![CDATA[f1datajunkie]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=9019</guid>
		<description><![CDATA[In Paths to the F1 2012 Championship Based on How They Might Finish in the US Grand Prix I posted a quick hack to calculate the finishing positions that would determine the F1 2012 Drivers&#8217; Championship in today&#8217;s United States Grand Prix, leaving a tease dangling around the possibility of working out what combinations would [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9019&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In <a href="http://blog.ouseful.info/2012/11/18/paths-to-the-f1-2012-championship-based-on-how-they-might-finish-in-the-us-grand-prix/">Paths to the F1 2012 Championship Based on How They Might Finish in the US Grand Prix</a> I posted a quick hack to calculate the finishing positions that would determine the F1 2012 Drivers&#8217; Championship in today&#8217;s United States Grand Prix, leaving a tease dangling around the possibility of working out what combinations would lead to a VET or ALO victory if the championship isn&#8217;t decided today. So in the hour before the race started, I began to doodle a quick&#8217;n'dirty interactive app that would let me keep track of what the championship scenarios would be for the Brazil race given the lap by lap placement of VET and ALO during the US Grand Prix. Given the prep I&#8217;d done in the aforementioned post, this meant figuring out how to code up a similar algorithm in R, and then working out how to make it interactive&#8230;</p>
<p>But before I show you how I did it, here&#8217;s the scenario for Brazil given how the US race finished:</p>
<p><a href="http://ouseful.files.wordpress.com/2012/11/f1-drivers-championship-brazil-scenarios.png"><img src="http://ouseful.files.wordpress.com/2012/11/f1-drivers-championship-brazil-scenarios.png?w=700&#038;h=255" alt="" title="F1 Drivers&#039; championship - Brazil scenarios" width="700" height="255" class="alignnone size-full wp-image-9032" /></a></p>
<p>So how was this quick hack app done&#8230;?</p>
<p>Trying out the new <a href="http://www.rstudio.com/shiny/">Shiny interactive stats app builder</a> from the RStudio folk has been on my to do list for some time. It didn&#8217;t take long to realise that an interactive race scenario builder would provide an ideal context for trying it out. There are essentially two (with a minor middle third) steps to a Shiny model:</p>
<ol>
<li>work out the points difference between VET and ALO for all their possible points combinations in the US Grand Prix;</li>
<li>calculate the points difference going into the Brazilian Grand Prix;</li>
<li>calculate the possible outcomes depending on placements in the Brazilian Grand Prix (essentially, an application of the algorithm I did in the original post).</li>
</ol>
<p>The Shiny app requires two bits of code &#8211; a UI in file ui.R, in which I define two sliders that allow me to set the actual (or anticpated, or possible;-) race classifications in the US for Vettel and Alonso:</p>
<pre class="brush: r; title: ; notranslate">library(shiny)

shinyUI(pageWithSidebar(
  
  # Application title
  headerPanel(&quot;F1 Driver Championship Scenarios&quot;),
  
  # Sidebar with a slider input for number of observations
  sidebarPanel(
    sliderInput(&quot;alo&quot;, 
                &quot;ALO race pos in United States Grand Prix:&quot;, 
                min = 1, 
                max = 11, 
                value = 1),
    sliderInput(&quot;vet&quot;, 
                &quot;VET race pos in United States Grand Prix:&quot;, 
                min = 1, 
                max = 11, 
                value = 2)
  ),
  
  # Show a plot of the generated model
  mainPanel(
    plotOutput(&quot;distPlot&quot;)
  )
))</pre>
<p>And some logic, in file server.R (original had errors; hopefully now bugfixed&#8230;) &#8211; the original &#8220;Paths to the Championship&#8221; unpicks elements of  the algorithm in a little more detail, but basically I figure out the points difference between VET and ALO based on the points difference at the start of the race and the additional points difference arising from the posited finishing positions for the US race, and then generate a matrix that works out the difference in points awarded for each possible combination of finishes in Brazil:</p>
<pre class="brush: r; title: ; notranslate">library(shiny)
library(ggplot2)
library(reshape)

# Define server logic required to generate and plot a random distribution
shinyServer(function(input, output) {
  points=data.frame(pos=1:11,val=c(25,18,15,12,10,8,6,4,2,1,0))
  points[[1,2]]
  a=245
  v=255
  
  pospoints=function(a,v,pdiff,points){
    pp=matrix(ncol = nrow(points), nrow = nrow(points))
    for (i in 1:nrow(points)){
      for (j in 1:nrow(points))
        pp[[i,j]]=v-a+pdiff[[i,j]]
    }
    pp
  }
  
  pdiff=matrix(ncol = nrow(points), nrow = nrow(points))
  for (i in 1:nrow(points)){
    for (j in 1:nrow(points))
      pdiff[[i,j]]=points[[i,2]]-points[[j,2]]
  }
  
  ppx=pospoints(a,v,pdiff,points)
  
  winmdiff=function(vadiff,pdiff,points){
    win=matrix(ncol = nrow(points), nrow = nrow(points))
    for (i in 1:nrow(points)){
      for (j in 1:nrow(points))
        if (i==j) win[[i,j]]=''
        else if ((vadiff+pdiff[[i,j]])&gt;=0) win[[i,j]]='VET'
        else win[[i,j]]='ALO'
    }
    win
  }
  
  # Function that generates a plot of the distribution. The function
  # is wrapped in a call to reactivePlot to indicate that:
  #
  #  1) It is &quot;reactive&quot; and therefore should be automatically 
  #     re-executed when inputs change
  #  2) Its output type is a plot 
  #
  output$distPlot &lt;- reactivePlot(function() {
    wmd=winmdiff(ppx[[input$vet,input$alo]],pdiff,points)
    wmdm=melt(wmd)
    g=ggplot(wmdm)+geom_text(aes(X1,X2,label=value,col=value))
    g=g+xlab('VET position in Brazil')+ ylab('ALO position in Brazil')
    g=g+labs(title=&quot;Championship outcomes in Brazil&quot;)
    g=g+ theme(legend.position=&quot;none&quot;)
    g=g+scale_x_continuous(breaks=seq(1, 11, 1))+scale_y_continuous(breaks=seq(1, 11, 1))
    print(g)
  })
})</pre>
<p>To run the app, if your server and ui files are in some directory <tt>shinychamp</tt>, then something like the following should et the Shiny app running:</p>
<pre class="brush: plain; title: ; notranslate">library(shiny)
runApp(&quot;~/path/to/my/shinychamp&quot;)</pre>
<p><a href="http://glimmer.rstudio.com/psychemedia/f1champ/">Here&#8217;s what it looks like</a>:</p>
<p><a href="http://glimmer.rstudio.com/psychemedia/f1champ/"><img src="http://ouseful.files.wordpress.com/2012/11/f1-2012-scenario-1.png?w=700&#038;h=254" alt="" title="f1 2012 scenario 1" width="700" height="254" class="alignnone size-full wp-image-9024" /></a></p>
<p><a href="http://glimmer.rstudio.com/psychemedia/f1champ/"><img src="http://ouseful.files.wordpress.com/2012/11/f1-2012-scenario-2.png?w=700&#038;h=250" alt="" title="f1 2012 scenario 2" width="700" height="250" class="alignnone size-full wp-image-9025" /></a></p>
<p><a href="http://glimmer.rstudio.com/psychemedia/f1champ/"><img src="http://ouseful.files.wordpress.com/2012/11/f1-2012-scenario-31.png?w=700&#038;h=264" alt="" title="f1 2012 scenario 3" width="700" height="264" class="alignnone size-full wp-image-9026" /></a></p>
<p><a href="http://glimmer.rstudio.com/psychemedia/f1champ/"><img src="http://ouseful.files.wordpress.com/2012/11/f1-2012-scenario-42.png?w=700&#038;h=258" alt="" title="f1 2012 scenario 4" width="700" height="258" class="alignnone size-full wp-image-9027" /></a></p>
<p>You can find the code on github here: <a href="https://github.com/psychemedia/f1DataJunkie/tree/master/f1djR/shinychamp">F1 Championship 2012 &#8211; scenarios if the race gets to Brazil&#8230;</a></p>
<p>Unfortunately, until a hosted service is available, you&#8217;ll have to run it yourself if you want to try it out&#8230;</p>
<p>Disclaimer: I&#8217;ve been rushing to get this posted before the start of the race&#8230; If you spot errors, please shout!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/9019/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/9019/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9019&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2012/11/18/interactive-scenarios-with-shiny-the-race-to-the-f1-2012-drivers-championship/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/f1-drivers-championship-brazil-scenarios.png" medium="image">
			<media:title type="html">F1 Drivers&#039; championship - Brazil scenarios</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/f1-2012-scenario-1.png" medium="image">
			<media:title type="html">f1 2012 scenario 1</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/f1-2012-scenario-2.png" medium="image">
			<media:title type="html">f1 2012 scenario 2</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/f1-2012-scenario-31.png" medium="image">
			<media:title type="html">f1 2012 scenario 3</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/f1-2012-scenario-42.png" medium="image">
			<media:title type="html">f1 2012 scenario 4</media:title>
		</media:content>
	</item>
		<item>
		<title>Paths to the F1 2012 Championship Based on How They Might Finish in the US Grand Prix</title>
		<link>http://blog.ouseful.info/2012/11/18/paths-to-the-f1-2012-championship-based-on-how-they-might-finish-in-the-us-grand-prix/</link>
		<comments>http://blog.ouseful.info/2012/11/18/paths-to-the-f1-2012-championship-based-on-how-they-might-finish-in-the-us-grand-prix/#comments</comments>
		<pubDate>Sun, 18 Nov 2012 12:59:27 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Infoskills]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[ddj]]></category>
		<category><![CDATA[f1datajunkie]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=9009</guid>
		<description><![CDATA[If you haven&#8217;t already seen it, one of the breakthrough visualisations of the US elections was the New York Times Paths to the Election scenario builder. With the F1 drivers&#8217; championship in the balance this weekend, I wondered what chances were of VET claiming the championship this weekend. The only contender is ALO, who is [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9009&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you haven&#8217;t already seen it, one of the breakthrough visualisations of the US elections was the New York Times <a href="http://chartsnthings.tumblr.com/post/35616801795/some-sketches-from-the-times-scenario-builder">Paths to the Election</a> scenario builder. With the F1 drivers&#8217; championship in the balance this weekend, I wondered what chances were of VET claiming the championship this weekend. The only contender is ALO, who is currently ten points behind.</p>
<p>A quick Python script shows the outcome depending on the relative classification of ALO and VET at the end of today&#8217;s race. (If the drivers are 25 points apart, and ALO then wins in Brazil with VET out of the points, I think VET will win on countback based on having won more races.)</p>
<pre class="brush: python; title: ; notranslate">#The current points standings
vetPoints=255
aloPoints=245

#The points awarded for each place in the top 10; 0 points otherwise
points=[25,18,15,12,10,8,6,4,2,1,0]

#Print a header row (there's probably a more elegant way of doing this...;-)
for x in ['VET\ALO',1,2,3,4,5,6,7,8,9,10,'11+']: print str(x)+'\t',
print ''

#I'm going to construct a grid, VET's position down the rows, ALO across the columns
for i in range(len(points)):
	#Build up each row - start with VET's classification
	row=[str(i+1)]
	#Now for the columns - that is, ALO's classification
	for j in range(len(points)):
		#Work out the points if VET is placed i+1  and ALO at j+1 (i and j start at 0)
		#Find the difference between the points scores
		#If the difference is &gt;= 25 (the biggest points diff ALO could achieve in Brazil), VET wins
		if ((vetPoints+points[i])-(aloPoints+points[j])&gt;=25):
			row.append(&quot;VET&quot;)
		else: row.append(&quot;?&quot;)
	#Print the row a slightly tidier way...
	print '\t'.join(row)</pre>
<p>(Now I wonder &#8211; how would I write that script in R?)</p>
<p>And the result?</p>
<pre>VET\ALO	1	2	3	4	5	6	7	8	9	10	11+	
1	?	?	?	?	VET	VET	VET	VET	VET	VET	VET
2	?	?	?	?	?	?	?	?	VET	VET	VET
3	?	?	?	?	?	?	?	?	?	?	VET
4	?	?	?	?	?	?	?	?	?	?	?
5	?	?	?	?	?	?	?	?	?	?	?
6	?	?	?	?	?	?	?	?	?	?	?
7	?	?	?	?	?	?	?	?	?	?	?
8	?	?	?	?	?	?	?	?	?	?	?
9	?	?	?	?	?	?	?	?	?	?	?
10	?	?	?	?	?	?	?	?	?	?	?
11	?	?	?	?	?	?	?	?	?	?	?
</pre>
<p>Which is to say, VET wins if:</p>
<ul>
<li>VET wins the race and ALO is placed 5th or lower;</li>
<li>VET is second in the race and ALO is placed 9th or lower;</li>
<li>VET is third in the race and ALO is out of the points (11th or lower)</li>
</ul>
<p>We can also look at the points differences (define a row2 as row, then use <tt>row2.append(str((vetPoints+points[i])-(aloPoints+points[j])))</tt>):</p>
<pre>VET\ALO	1	2	3	4	5	6	7	8	9	10	11+	
1	10	17	20	23	25	27	29	31	33	34	35
2	3	10	13	16	18	20	22	24	26	27	28
3	0	7	10	13	15	17	19	21	23	24	25
4	-3	4	7	10	12	14	16	18	20	21	22
5	-5	2	5	8	10	12	14	16	18	19	20
6	-7	0	3	6	8	10	12	14	16	17	18
7	-9	-2	1	4	6	8	10	12	14	15	16
8	-11	-4	-1	2	4	6	8	10	12	13	14
9	-13	-6	-3	0	2	4	6	8	10	11	12
10	-14	-7	-4	-1	1	3	5	7	9	10	11
11	-15	-8	-5	-2	0	2	4	6	8	9	10
</pre>
<p>We could then do a similar exercise for the Brazil race, and essentially get all the information we need to do a scenario builder like the New York Times election scenario builder&#8230; Which I would try to do, but I&#8217;ve had enough screen time for the weekend already&#8230;:-(</p>
<p>PS FWIW, here&#8217;s a quick table showing the awarded points difference between two drivers depending on their relative classification in a race:</p>
<pre>A\B	1	2	3	4	5	6	7	8	9	10	11+
1	X	7	10	13	15	17	19	21	23	24	25
2	-7	X	3	6	8	10	12	14	16	17	18
3	-10	-3	X	3	5	7	9	11	13	14	15
4	-13	-6	-3	X	2	4	6	8	10	11	12
5	-15	-8	-5	-2	X	2	4	6	8	9	10
6	-17	-10	-7	-4	-2	X	2	4	6	7	8
7	-19	-12	-9	-6	-4	-2	X	2	4	5	6
8	-21	-14	-11	-8	-6	-4	-2	X	2	3	4
9	-23	-16	-13	-10	-8	-6	-4	-2	X	1	2
10	-24	-17	-14	-11	-9	-7	-5	-3	-1	X	1
11	-25	-18	-15	-12	-10	-8	-6	-4	-2	-1	X
</pre>
<p>Here&#8217;s how to use this chart in association with the previous. Looking at the previous chart, if VET finishes second and ALO third, the points difference is 13 in favour of VET. Looking at the chart immediately above, if we let VET = A and ALO = B, then the columns correspond to ALO&#8217;s placement, and the rows to VET. VET (A) needs to lose 14 or more points to lose the championship (that is, we&#8217;re looking for values of -14 or less). In particular, ALO (B, columns) needs to finish 1st with VET (A) 5th or worse, 2nd with A 8th or worse, or 3rd with VET 10th or worse.</p>
<p>And the script:</p>
<pre class="brush: python; title: ; notranslate">print '\t'.join(['A\B','1','2','3','4','5','6','7','8','9','10','11+'])
for i in range(len(points)):
	row=[str(i+1)]
	for j in range(len(points)):
		if i!=j:row.append(str(points[i]-points[j]))
		else: row.append('X')</pre>
<p>And now for the rest of the weekend&#8230;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/9009/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/9009/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9009&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2012/11/18/paths-to-the-f1-2012-championship-based-on-how-they-might-finish-in-the-us-grand-prix/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>
	</item>
		<item>
		<title>The Race to the F1 2012 Drivers&#8217; Championship &#8211; Initial Sketches</title>
		<link>http://blog.ouseful.info/2012/11/16/the-race-to-the-f1-2012-drivers-championship/</link>
		<comments>http://blog.ouseful.info/2012/11/16/the-race-to-the-f1-2012-drivers-championship/#comments</comments>
		<pubDate>Fri, 16 Nov 2012 23:59:13 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Rstats]]></category>
		<category><![CDATA[ddj]]></category>
		<category><![CDATA[f1datajunkie]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=8986</guid>
		<description><![CDATA[In part inspired by the chart described in The electoral map sans the map, I thought I&#8217;d start mulling over a quick sketch showing the race to the 2012 Formula One Drivers&#8217; Championship. The chart needs to show tension somehow, so in this first really quick and simple rough sketch, you really do have to [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=8986&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In part inspired by the chart described in <a href="http://junkcharts.typepad.com/junk_charts/2012/11/the-electoral-map-sans-the-map.html">The electoral map sans the map</a>, I thought I&#8217;d start mulling over a quick sketch showing the race to the 2012 Formula One Drivers&#8217; Championship.</p>
<p>The chart needs to show tension somehow, so in this first really quick and simple rough sketch, you really do have to put yourself in the graph and start reading it from left to right:</p>
<p><a href="http://ouseful.files.wordpress.com/2012/11/f1-2012-race-to-champ-rnd-18.png"><img src="http://ouseful.files.wordpress.com/2012/11/f1-2012-race-to-champ-rnd-18.png?w=700" alt="" title="f1 2012 race to champ rnd 18"   class="alignnone size-full wp-image-8988" /></a></p>
<p>The data is pulled in from the <a href="http://ergast.com/mrd/">Ergast API</a> as JSON data, which is then parsed and visualised using R:</p>
<pre class="brush: r; title: ; notranslate">require(RJSONIO)
require(ggplot2)

#initialise a data frame
champ &lt;- data.frame(round=numeric(),
                 driverID=character(), 
                 position=numeric(), points=numeric(),wins=numeric(),
                 stringsAsFactors=FALSE)

#This is a fudge at the moment - should be able to use a different API call to 
#get the list of races to date, rather than hardcoding latest round number
for (j in 1:18){
  resultsURL=paste(&quot;http://ergast.com/api/f1/2012/&quot;,j,&quot;/driverStandings&quot;,&quot;.json&quot;,sep='')
  print(resultsURL)
  results.data.json=fromJSON(resultsURL,simplify=FALSE)
  rd=results.data.json$MRData$StandingsTable$StandingsLists[[1]]$DriverStandings
  for (i in 1:length(rd)){
    champ=rbind(champ,data.frame(round=j, driverID=rd[[i]]$Driver$driverId,
                               position=as.numeric(as.character(rd[[i]]$position)),
                                points=as.numeric(as.character(rd[[i]]$points)),
                                                  wins=as.numeric(as.character(rd[[i]]$wins)) ))
  }
}
champ

#Horrible fudge - should really find a better way of filtering?
test2=subset(champ,( driverID=='vettel' | driverID=='alonso' | driverID=='raikkonen'|driverID=='webber' | driverID=='hamilton'|driverID=='button' ))

#Really rough sketch, in part inspired by http://junkcharts.typepad.com/junk_charts/2012/11/the-electoral-map-sans-the-map.html
ggplot(test2)+geom_line(aes(x=round,y=points,group=driverID,col=driverID))+labs(title=&quot;F1 2012 - Race to the Championship&quot;)

#I wonder if it would be worth annotating the chart with labels explaining any DNF reasons at parts where points stall?</pre>
<p>So, that&#8217;s the quickest and dirtiest chart I could think of &#8211; where to take this next? One way would be to start making the chart look cleaner; another possibility would be to start looking at adding labels, highlights, and maybe pushing all but ALO and VET into the background? (GDS do some nice work in this vein, eg <a href="http://digital.cabinetoffice.gov.uk/2012/11/12/updating-performance-dashboard/">Updating the GOV.UK Performance Dashboard</a>; this StoryTellingWithData <a href="http://www.storytellingwithdata.com/2012/11/to-stack-or-not-to-stack.html">post on stacked bar charts</a> also has some great ideas about how to make simple, clean and effective use of text and highlighting&#8230;).</p>
<p>Let&#8217;s try cleaning it up a little, and then highlight the championship contenders?</p>
<p><a href="http://ouseful.files.wordpress.com/2012/11/f1-2012-race-cleaner.png"><img src="http://ouseful.files.wordpress.com/2012/11/f1-2012-race-cleaner.png?w=700" alt="" title="f1 2012 race cleaner"   class="alignnone size-full wp-image-8996" /></a></p>
<pre class="brush: r; title: ; notranslate">test3=subset(test,( driverID=='vettel' | driverID=='alonso' ))
test4=subset(test,( driverID=='raikkonen'|driverID=='webber' | driverID=='hamilton'|driverID=='button' ))

ggplot(test4) + geom_line(aes(x=round,y=position,group=driverID),col='lightgrey') + geom_line(data=test3,aes(x=round,y=position,group=driverID,col=driverID)) + labs(title=&quot;F1 2012 - Race to the Championship&quot;)
</pre>
<p>Hmm&#8230; I&#8217;m not sure about those colours? Maybe use Blue for VET and Red for ALO?</p>
<p><a href="http://ouseful.files.wordpress.com/2012/11/f1-chanps-redblue.png"><img src="http://ouseful.files.wordpress.com/2012/11/f1-chanps-redblue.png?w=700" alt="" title="f1 chanps redblue"   class="alignnone size-full wp-image-8998" /></a></p>
<p>I really hacked the path to this &#8211; there must be a cleaner way?!</p>
<pre class="brush: r; title: ; notranslate">ggplot(test4)+geom_line(aes(x=round,y=points,group=driverID),col='lightgrey') + geom_line(data=subset(test3,driverID=='vettel'),aes(x=round,y=points),col='blue') + geom_line(data=subset(test3,driverID=='alonso'),aes(x=round,y=points),col='red') + labs(title=&quot;F1 2012 - Race to the Championship&quot;)</pre>
<p>Other chart types are possible too, I suppose? Such as something in the style of a lap chart?</p>
<p><a href="http://ouseful.files.wordpress.com/2012/11/f1-2012-race-chart.png"><img src="http://ouseful.files.wordpress.com/2012/11/f1-2012-race-chart.png?w=700" alt="" title="f1 2012 race chart"   class="alignnone size-full wp-image-8992" /></a></p>
<pre class="brush: r; title: ; notranslate">ggplot(test2)+geom_line(aes(x=round,y=position,group=driverID,col=driverID))+labs(title=&quot;F1 2012 - Race to the Championship&quot;)</pre>
<p>Hmmm&#8230; Just like the first sketch, this one is cluttered and confusing too&#8230; How about if we clean it as above to highlight just the contenders?</p>
<p><a href="http://ouseful.files.wordpress.com/2012/11/f1-2012-race-lapchart-style-cleaner.png"><img src="http://ouseful.files.wordpress.com/2012/11/f1-2012-race-lapchart-style-cleaner.png?w=700" alt="" title="f1 2012 race lapchart style cleaner"   class="alignnone size-full wp-image-8997" /></a></p>
<pre class="brush: r; title: ; notranslate">ggplot(test4) + geom_line(aes(x=round,y=points,group=driverID),col='lightgrey') + geom_line(data=test3,aes(x=round,y=points,group=driverID,col=driverID)) + labs(title=&quot;F1 2012 - Race to the Championship&quot;)</pre>
<p>A little cleaner, maybe? And with the colour tweak:</p>
<p><a href="http://ouseful.files.wordpress.com/2012/11/f1-2102-champ-lapchart-redblue.png"><img src="http://ouseful.files.wordpress.com/2012/11/f1-2102-champ-lapchart-redblue.png?w=700" alt="" title="f1 2102 champ lapchart redblue"   class="alignnone size-full wp-image-9000" /></a></p>
<pre class="brush: r; title: ; notranslate">ggplot(test4) + geom_line(aes(x=round,y=position,group=driverID),col='lightgrey') + geom_line(data=subset(test3,driverID=='vettel'),aes(x=round,y=position),col='blue') + geom_line(data=subset(test3,driverID=='alonso'),aes(x=round,y=position),col='red') + labs(title=&quot;F1 2012 - Race to the Championship&quot;)</pre>
<p>Something that really jumps out at me in this chart are the gridlines &#8211; they really need fixing? But what would be best to show?</p>
<p>Hmm, before we do that, how about an animation? (Does WordPress.com allow animated gifs?)</p>
<p><a href="http://ouseful.files.wordpress.com/2012/11/animation.gif"><img src="http://ouseful.files.wordpress.com/2012/11/animation.gif?w=700" alt="" title="animation"   class="alignnone size-full wp-image-9004" /></a></p>
<p>Here&#8217;s the code (it requires the <tt>animation</tt> package):</p>
<pre class="brush: r; title: ; notranslate">library(animation)
race.ani= function(...) {
  for (i in 1:18) {
    g=ggplot(subset(test3, round&lt;=i)) + geom_line(aes(x=round,y=position,group=driverID),col='lightgrey')+geom_line(data=subset(test3,driverID=='vettel' &amp; round&lt;=i),aes(x=round,y=position),col='blue')+geom_line(data=subset(test3,driverID=='alonso' &amp; round &lt;=i),aes(x=round,y=position),col='red')+labs(title=&quot;F1 2012 - Race to the Championship&quot;)+xlim(1,18)
    print(g)
  }
}
saveMovie(race.ani(), interval = 0.4, outdir = getwd())</pre>
<p>And for the other chart:</p>
<p><a href="http://ouseful.files.wordpress.com/2012/11/animation1.gif"><img src="http://ouseful.files.wordpress.com/2012/11/animation1.gif?w=700" alt="" title="animation"   class="alignnone size-full wp-image-9005" /></a></p>
<p>Hmmm&#8230;</p>
<p>How&#8217;s about another sort of view  &#8211; the points difference between VET and ALO?</p>
<p><a href="http://ouseful.files.wordpress.com/2012/11/f1-2012-vet-v-alo.png"><img src="http://ouseful.files.wordpress.com/2012/11/f1-2012-vet-v-alo.png?w=700" alt="" title="f1 2012 vet v alo"   class="alignnone size-full wp-image-9007" /></a></p>
<pre class="brush: r; title: ; notranslate">alo=subset(test3,driverID=='alonso')
vet=subset(test3,driverID=='vettel')
colnames(vet)=c(&quot;round&quot;,&quot;driverID&quot;,&quot;vposition&quot;,&quot;vpoints&quot;,&quot;vwins&quot;)
colnames(alo)=c(&quot;round&quot;,&quot;driverID&quot;,&quot;aposition&quot;,&quot;apoints&quot;,&quot;awins&quot;)
cf= merge(alo,vet,by=c('round'))
ggplot(cf) + geom_bar( aes(x=round,y=vpoints-apoints,fill=(vpoints-apoints)&gt;0), stat='identity') + labs(title=&quot;F1 2012 Championship - VET vs ALO&quot;)</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/8986/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/8986/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=8986&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2012/11/16/the-race-to-the-f1-2012-drivers-championship/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/f1-2012-race-to-champ-rnd-18.png" medium="image">
			<media:title type="html">f1 2012 race to champ rnd 18</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/f1-2012-race-cleaner.png" medium="image">
			<media:title type="html">f1 2012 race cleaner</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/f1-chanps-redblue.png" medium="image">
			<media:title type="html">f1 chanps redblue</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/f1-2012-race-chart.png" medium="image">
			<media:title type="html">f1 2012 race chart</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/f1-2012-race-lapchart-style-cleaner.png" medium="image">
			<media:title type="html">f1 2012 race lapchart style cleaner</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/f1-2102-champ-lapchart-redblue.png" medium="image">
			<media:title type="html">f1 2102 champ lapchart redblue</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/animation.gif" medium="image">
			<media:title type="html">animation</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/animation1.gif" medium="image">
			<media:title type="html">animation</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/11/f1-2012-vet-v-alo.png" medium="image">
			<media:title type="html">f1 2012 vet v alo</media:title>
		</media:content>
	</item>
		<item>
		<title>F1 2012 Mid-Season Review</title>
		<link>http://blog.ouseful.info/2012/08/30/f1-2012-mid-season-review/</link>
		<comments>http://blog.ouseful.info/2012/08/30/f1-2012-mid-season-review/#comments</comments>
		<pubDate>Thu, 30 Aug 2012 08:26:12 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Anything you want]]></category>
		<category><![CDATA[Rstats]]></category>
		<category><![CDATA[f1datajunkie]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=8448</guid>
		<description><![CDATA[Rather belatedly, I got around to posting a series of posts summarising the Formula One season to date: F1 2012 Mid-Season Review &#8211; Grid/Classification Analysis: for example, how do the drivers&#8217; grid and final classifications compare? F1 2012 Mid-Season Review &#8211; Pit Stops: for example, how does pit stop performance across the teams compare? F1 [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=8448&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Rather belatedly, I got around to posting a series of posts summarising the Formula One season to date:</p>
<ul>
<li><a href="http://f1datajunkie.blogspot.co.uk/2012/08/f1-2012-mid-season-review.html">F1 2012 Mid-Season Review &#8211; Grid/Classification Analysis</a>: for example, how do the drivers&#8217; grid and final classifications compare?<br />
<a href="http://f1datajunkie.blogspot.co.uk/2012/08/f1-2012-mid-season-review.html"><img src="http://ouseful.files.wordpress.com/2012/08/f12012midsummarygridclasschange.png?w=300&#038;h=225" alt="" title="f12012midSummaryGridClassChange" width="300" height="225" class="alignnone size-medium wp-image-8449" /></a>
</li>
<li><a href="http://f1datajunkie.blogspot.co.uk/2012/08/f1-2012-mid-season-review-pit-stops.html">F1 2012 Mid-Season Review &#8211; Pit Stops</a>: for example, how does pit stop performance across the teams compare?<br />
<a href="http://f1datajunkie.blogspot.co.uk/2012/08/f1-2012-mid-season-review-pit-stops.html"><img src="http://ouseful.files.wordpress.com/2012/08/f12012midseasonpitloess.png?w=300" alt="" title="f12012midseasonpitloess" width="300" height="225" class="alignnone size-full wp-image-8450" /></a></li>
<li><a href="http://f1datajunkie.blogspot.co.uk/2012/08/f1-2012-mid-season-review-qualifying.html">F1 2012 Mid-Season Review &#8211; Qualifying Analysis</a>: for example, how do normalised qualifying lap times compare across the teams over the season so far?<br />
<a href="http://f1datajunkie.blogspot.co.uk/2012/08/f1-2012-mid-season-review-qualifying.html"><img src="http://ouseful.files.wordpress.com/2012/08/f12012midseasonteambestloess.png?w=300" alt="" title="f12012midseasonteambestloess" width="300" height="225" class="alignnone size-full wp-image-8451" /></a></li>
</ul>
<p>It&#8217;s also worth comparing the charts to the <a href="http://www.google.co.uk/search?q=site%3Af1datajunkie.blogspot.com+intitle%3A%222011+review%22">F1 2011 Season Review charts</a>.</p>
<p>The R-code used to generate the graphics can be found here: <a href="https://gist.github.com/3524075">F1 2012 Mid-Season Review &#8211; R Markdown</a>.</p>
<p>Comments/suggestions/code improvements and extensions etc all welcome&#8230;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/8448/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/8448/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=8448&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2012/08/30/f1-2012-mid-season-review/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/08/f12012midsummarygridclasschange.png?w=300" medium="image">
			<media:title type="html">f12012midSummaryGridClassChange</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/08/f12012midseasonpitloess.png?w=300" medium="image">
			<media:title type="html">f12012midseasonpitloess</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/08/f12012midseasonteambestloess.png?w=300" medium="image">
			<media:title type="html">f12012midseasonteambestloess</media:title>
		</media:content>
	</item>
		<item>
		<title>F1 Championship Points as a d3.js Powered Sankey Diagram</title>
		<link>http://blog.ouseful.info/2012/05/24/f1-championship-points-as-a-d3-js-powered-sankey-diagram/</link>
		<comments>http://blog.ouseful.info/2012/05/24/f1-championship-points-as-a-d3-js-powered-sankey-diagram/#comments</comments>
		<pubDate>Thu, 24 May 2012 08:39:26 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Anything you want]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[ergastAPI]]></category>
		<category><![CDATA[f1datajunkie]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=7899</guid>
		<description><![CDATA[d3.js crossed my path a couple of times yesterday: firstly, in the form of an enquiry about whether I&#8217;d be interested in writing a book on d3.js (I&#8217;m not sure I&#8217;m qualified: as I responded, I&#8217;m more of a script kiddie who sees things I can reuse, rather than have any understanding at all about [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=7899&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>d3.js crossed my path a couple of times yesterday: firstly, in the form of an enquiry about whether I&#8217;d be interested in writing a book on d3.js (I&#8217;m not sure I&#8217;m qualified: as I responded, I&#8217;m more of a script kiddie who sees things I can reuse, rather than have any understanding at all about how d3.js does what it does&#8230;); secondly, via a link to d3.js creator Mike Bostock&#8217;s new demo of <a href="http://bost.ocks.org/mike/sankey/">Sankey diagrams built using d3.js</a>:</p>
<p><a href="http://bost.ocks.org/mike/sankey/"><img src="http://ouseful.files.wordpress.com/2012/05/sankeydiagram-d3js.png?w=700&#038;h=560" alt="" title="sankeyDiagram-d3js" width="700" height="560" class="alignnone size-full wp-image-7901" /></a></p>
<p>Hmm&#8230; Sankey diagrams are good for visualising flow, so to get to grips myself with seeing if I could plug-and-play with the component, I needed an appropriate data set. F1 related data is usually my first thought as far as testbed data goes (no confidences to break, the STEM/innovation outreach/tech transfer context, etc etc) so what things flow in F1? What quantities are conserved whilst being passed between different classes of entity? How about points&#8230; points are awarded on a per race basis to drivers who are members of teams. It&#8217;s also a championship sport, run over several races. The individual Driver Championship is a competition between drivers to accumulate the most points over the course of the season, and the Constructor Chanmpionship is a battle between teams. Which suggests to me that a Sankey plot of points from races to drivers and then constructors might work?</p>
<p>So what do we need to do? First up, look at the source code for the demo using View Source. Here&#8217;s the relevant bit:</p>
<p><a href="http://ouseful.files.wordpress.com/2012/05/sankey-get-data.png"><img src="http://ouseful.files.wordpress.com/2012/05/sankey-get-data.png?w=700" alt="" title="sankey get data"   class="alignnone size-full wp-image-7903" /></a></p>
<p>Data is being pulled in from a relatively addressed file, <em>energy.json</em>. Let&#8217;s see what it looks like:</p>
<p><a href="http://ouseful.files.wordpress.com/2012/05/sanky-data.png"><img src="http://ouseful.files.wordpress.com/2012/05/sanky-data.png?w=700" alt="" title="sankey data"   class="alignnone size-full wp-image-7902" /></a></p>
<p>Okay &#8211; a node list and an edge list. From previous experience, I know that there is a d3.js <a href="http://networkx.lanl.gov/reference/readwrite.json_graph.html">JSON exporter</a> built into the Python networkx library, so maybe we can generate the data file from a network representation of the data in networkx?</p>
<p>Here we are: <a href="http://networkx.lanl.gov/reference/generated/networkx.readwrite.json_graph.node_link_data.html#networkx.readwrite.json_graph.node_link_data">node_link_data(G)</a> &#8220;[r]eturn data in node-link format that is suitable for JSON serialization and use in Javascript documents.&#8221;</p>
<p>Next step &#8211; getting the data. I&#8217;ve already done a demo of visualising <a href="https://views.scraperwiki.com/run/treemap_demo_-_multi-round_sports_event/?">F1 championship points sourced from the Ergast motor racing API as a treemap</a> (but not blogged it? Hmmm&#8230;. must fix that) that draws on a JSON data feed <a href="https://scraperwiki.com/views/ergast_championship_gviz/">constructed from data extracted from the Ergast API</a> so I can clone that code and use it as the basis for constructing a directed graph that represents points allocations: race nodes are linked to driver nodes with edges weighted by points scored in that race, and driver nodes are connected to teams by edges weighted according to the total number of points the driver has earned so far. (Hmm, that gives me an idea for a better way of coding the weight for that edge&#8230;)</p>
<p>I don&#8217;t have time to blog the how to of the code right now &#8211; train and boat to catch &#8211; but will do so later. If you want to look at the code, it&#8217;s here: <a href="https://scraperwiki.com/views/ergast_championship_nodelist/">Ergast Championship nodelist</a>. And here&#8217;s the result &#8211; <a href="https://views.scraperwiki.com/run/f1_championship_sankey_demo/">F1 Chanpionship 2012 Points as a Sankey Diagram</a>:</p>
<p><a href="http://ouseful.files.wordpress.com/2012/05/f1-points-sankey-diagram.png"><img src="http://ouseful.files.wordpress.com/2012/05/f1-points-sankey-diagram.png?w=700&#038;h=369" alt="" title="F1 points sankey diagram" width="700" height="369" class="alignnone size-full wp-image-7900" /></a></p>
<p>See what I mean about being a cut and paste script kiddie?!;-)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/7899/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/7899/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=7899&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2012/05/24/f1-championship-points-as-a-d3-js-powered-sankey-diagram/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/05/sankeydiagram-d3js.png" medium="image">
			<media:title type="html">sankeyDiagram-d3js</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/05/sankey-get-data.png" medium="image">
			<media:title type="html">sankey get data</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/05/sanky-data.png" medium="image">
			<media:title type="html">sankey data</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/05/f1-points-sankey-diagram.png" medium="image">
			<media:title type="html">F1 points sankey diagram</media:title>
		</media:content>
	</item>
	</channel>
</rss>
