And a zoom in above 600 points…

Another view of the over 600 point scores, by athlete:

I think the original data I grabbed included info about whether scores were season bests or personal bests, which could also be used to add richness to the chart, for example, using colour or different symbols to denote SB or PB. I also considered a colouring based on track vs field, or track vs throwing vs jumping, to see whether or not we could identify athletes with strong preferences in any of those areas, but it’s getting a bit late/early hours and I need to get some sleep!

For a proper infographic based on the macroscopic view presented by the top two charts, it would probably make sense to use icons rather than text to identify each event, as well as denoting PB/SB; if you peer at the top two charts closely, you’ll notice there’s a dot marking the point score for each Athlete in each event. If we let icons float a little to avoid collisions, we could use an arrow or other connective pointing device to associate event icons with the corresponding Points registration point.

As and when results come in from the Olympics relating to the medal winning performances in the pure events (Women’s Javelin, 800m etc) it could be quite interesting comparing those to the heptathlon performances? A comparison of where the three heptathlon medallists would have finished in each of the pure events might also be interesting (would any of them have made any of the finals, or semi-finals, for example?)

Here’s the script I used to generate the plots…

hd <- read.csv("~/Downloads/Heptathlon - Sheet2.csv") #Generate a summary stats table overall=aggregate(hd$Points, by=list(Athlete=hd$Athlete), FUN=sum) #And order it by result overall=overall[order(overall$x,decreasing=T),] #Mix in overal points to the original data hd2=merge(hd,overall,by='Athlete') #and then use the overall points data to reorder Athlete factors accordingly hd2= transform(hd2, Athlete=reorder(Athlete, x) ) #ALso order the events properly hd2$Event = with(hd2, factor(Event, levels = c('100m Hurdles','High Jump','Shot Put','200m','Long Jump','Javelin','800m'))) #Now generate the plot (this was the first sketch that came to mind...) require(ggplot2) ggplot(hd2)+geom_point(aes(x=Athlete,y=Points,col=Event),size=1) + geom_text(aes(x=Athlete,y=Points,label=Event,col=Event),size=2,,angle=45) + opts(axis.text.x=theme_text(angle=90,size=5),title="2012 Olympics Heptathlon") + scale_x_discrete(expand = c(0.05,0)) #for over 600 points: +ylim(600,1200) #Faceted plot of high points achieving events by athlete ggplot(hd2) + geom_point(aes(Event,Points)) + facet_wrap( ~ Athlete) + ylim(500,1200) + opts(axis.text.x=theme_text(angle=90,size=5))

Please let me know via the comments if you come up with any other interesting views over this data…:-)

PS It suddenly struck me that there may be variability in the range of points awarded in each discipline, so I threw a quick chart together to explore that:

Additional code is:

`ggplot(hd2) + geom_point(aes(x=Event,y=Points),size=1) + geom_text(aes(x=Event,y=Points,label=Athlete),size=2,,angle=45) + opts(axis.text.x=theme_text(angle=90,size=5),title="2012 Olympics Heptathlon") + scale_x_discrete(expand = c(0.05,0)) + ylim(600,1200)`

I guess I should do some distribution plots too? And maybe figure in personal performances relative to eg median scores in each event?

PPS Here are the points scored by the medalists per event in the context of each other and the other participants:

#Hack to partition results into top 3 vs the rest #Should do this as top 3 proper, rather than above points score hd2b=subset(hd2,subset=(x<6620)) hd2a=subset(hd2,subset=(x>6620)) g=ggplot(hd2b) + geom_point(aes(x=Event,y=Points),col='lightgrey',size=1) + geom_text(aes(x=Event,y=Points,label=Athlete),size=2,col='lightgrey',angle=45) + opts(axis.text.x=theme_text(angle=90,size=6),title="2012 Olympics Heptathlon") + scale_x_discrete(expand = c(0.05,0)) + ylim(600,1200) g = g + geom_point(data=hd2a,aes(x=Event,y=Points),size=1,col='red') + geom_text(data=hd2a,aes(x=Event,y=Points,label=Athlete),size=2,col='red',angle=45) print(g)]]>