I spent most of today, err, yesterday, failing to hold back the tears as the medal performances from the Team GB Olympians kept rolling in… So to celebrate one of those wonderful performances, here are a couple of quick sketches of how Jessica Ennis made her medal in the Heptathlon. (The data is cut and pasted from the BBC website and available here: data; the script used to generate the images is pasted below. The images posted aren’t the best quality because the data doesn’t appear to be freely available, presumably to prevent freeloaders like me competing with the way results are displayed by media orgs that have paid for data and presumably the right to publish it… (Anyone know any good readings about how sports data is protected by IP and data laws?))
And a zoom in above 600 points…
I think the original data I grabbed included info about whether scores were season bests or personal bests, which could also be used to add richness to the chart, for example, using colour or different symbols to denote SB or PB. I also considered a colouring based on track vs field, or track vs throwing vs jumping, to see whether or not we could identify athletes with strong preferences in any of those areas, but it’s getting a bit late/early hours and I need to get some sleep!
For a proper infographic based on the macroscopic view presented by the top two charts, it would probably make sense to use icons rather than text to identify each event, as well as denoting PB/SB; if you peer at the top two charts closely, you’ll notice there’s a dot marking the point score for each Athlete in each event. If we let icons float a little to avoid collisions, we could use an arrow or other connective pointing device to associate event icons with the corresponding Points registration point.
As and when results come in from the Olympics relating to the medal winning performances in the pure events (Women’s Javelin, 800m etc) it could be quite interesting comparing those to the heptathlon performances? A comparison of where the three heptathlon medallists would have finished in each of the pure events might also be interesting (would any of them have made any of the finals, or semi-finals, for example?)
Here’s the script I used to generate the plots…
hd <- read.csv("~/Downloads/Heptathlon - Sheet2.csv") #Generate a summary stats table overall=aggregate(hd$Points, by=list(Athlete=hd$Athlete), FUN=sum) #And order it by result overall=overall[order(overall$x,decreasing=T),] #Mix in overal points to the original data hd2=merge(hd,overall,by='Athlete') #and then use the overall points data to reorder Athlete factors accordingly hd2= transform(hd2, Athlete=reorder(Athlete, x) ) #ALso order the events properly hd2$Event = with(hd2, factor(Event, levels = c('100m Hurdles','High Jump','Shot Put','200m','Long Jump','Javelin','800m'))) #Now generate the plot (this was the first sketch that came to mind...) require(ggplot2) ggplot(hd2)+geom_point(aes(x=Athlete,y=Points,col=Event),size=1) + geom_text(aes(x=Athlete,y=Points,label=Event,col=Event),size=2,,angle=45) + opts(axis.text.x=theme_text(angle=90,size=5),title="2012 Olympics Heptathlon") + scale_x_discrete(expand = c(0.05,0)) #for over 600 points: +ylim(600,1200) #Faceted plot of high points achieving events by athlete ggplot(hd2) + geom_point(aes(Event,Points)) + facet_wrap( ~ Athlete) + ylim(500,1200) + opts(axis.text.x=theme_text(angle=90,size=5))
Please let me know via the comments if you come up with any other interesting views over this data…:-)
PS It suddenly struck me that there may be variability in the range of points awarded in each discipline, so I threw a quick chart together to explore that:
Additional code is:
ggplot(hd2) + geom_point(aes(x=Event,y=Points),size=1) + geom_text(aes(x=Event,y=Points,label=Athlete),size=2,,angle=45) + opts(axis.text.x=theme_text(angle=90,size=5),title="2012 Olympics Heptathlon") + scale_x_discrete(expand = c(0.05,0)) + ylim(600,1200)
I guess I should do some distribution plots too? And maybe figure in personal performances relative to eg median scores in each event?
PPS Here are the points scored by the medalists per event in the context of each other and the other participants:
#Hack to partition results into top 3 vs the rest #Should do this as top 3 proper, rather than above points score hd2b=subset(hd2,subset=(x<6620)) hd2a=subset(hd2,subset=(x>6620)) g=ggplot(hd2b) + geom_point(aes(x=Event,y=Points),col='lightgrey',size=1) + geom_text(aes(x=Event,y=Points,label=Athlete),size=2,col='lightgrey',angle=45) + opts(axis.text.x=theme_text(angle=90,size=6),title="2012 Olympics Heptathlon") + scale_x_discrete(expand = c(0.05,0)) + ylim(600,1200) g = g + geom_point(data=hd2a,aes(x=Event,y=Points),size=1,col='red') + geom_text(data=hd2a,aes(x=Event,y=Points,label=Athlete),size=2,col='red',angle=45) print(g)