I spent most of today, err, yesterday, failing to hold back the tears as the medal performances from the Team GB Olympians kept rolling in… So to celebrate one of those wonderful performances, here are a couple of quick sketches of how Jessica Ennis made her medal in the Heptathlon. (The data is cut and pasted from the BBC website and available here: data; the script used to generate the images is pasted below. The images posted aren’t the best quality because the data doesn’t appear to be freely available, presumably to prevent freeloaders like me competing with the way results are displayed by media orgs that have paid for data and presumably the right to publish it… (Anyone know any good readings about how sports data is protected by IP and data laws?))
And a zoom in above 600 points…
Another view of the over 600 point scores, by athlete:
I think the original data I grabbed included info about whether scores were season bests or personal bests, which could also be used to add richness to the chart, for example, using colour or different symbols to denote SB or PB. I also considered a colouring based on track vs field, or track vs throwing vs jumping, to see whether or not we could identify athletes with strong preferences in any of those areas, but it’s getting a bit late/early hours and I need to get some sleep!
For a proper infographic based on the macroscopic view presented by the top two charts, it would probably make sense to use icons rather than text to identify each event, as well as denoting PB/SB; if you peer at the top two charts closely, you’ll notice there’s a dot marking the point score for each Athlete in each event. If we let icons float a little to avoid collisions, we could use an arrow or other connective pointing device to associate event icons with the corresponding Points registration point.
As and when results come in from the Olympics relating to the medal winning performances in the pure events (Women’s Javelin, 800m etc) it could be quite interesting comparing those to the heptathlon performances? A comparison of where the three heptathlon medallists would have finished in each of the pure events might also be interesting (would any of them have made any of the finals, or semi-finals, for example?)
Here’s the script I used to generate the plots…
hd <- read.csv("~/Downloads/Heptathlon - Sheet2.csv") #Generate a summary stats table overall=aggregate(hd$Points, by=list(Athlete=hd$Athlete), FUN=sum) #And order it by result overall=overall[order(overall$x,decreasing=T),] #Mix in overal points to the original data hd2=merge(hd,overall,by='Athlete') #and then use the overall points data to reorder Athlete factors accordingly hd2= transform(hd2, Athlete=reorder(Athlete, x) ) #ALso order the events properly hd2$Event = with(hd2, factor(Event, levels = c('100m Hurdles','High Jump','Shot Put','200m','Long Jump','Javelin','800m'))) #Now generate the plot (this was the first sketch that came to mind...) require(ggplot2) ggplot(hd2)+geom_point(aes(x=Athlete,y=Points,col=Event),size=1) + geom_text(aes(x=Athlete,y=Points,label=Event,col=Event),size=2,,angle=45) + opts(axis.text.x=theme_text(angle=90,size=5),title="2012 Olympics Heptathlon") + scale_x_discrete(expand = c(0.05,0)) #for over 600 points: +ylim(600,1200) #Faceted plot of high points achieving events by athlete ggplot(hd2) + geom_point(aes(Event,Points)) + facet_wrap( ~ Athlete) + ylim(500,1200) + opts(axis.text.x=theme_text(angle=90,size=5))
Please let me know via the comments if you come up with any other interesting views over this data…:-)
PS It suddenly struck me that there may be variability in the range of points awarded in each discipline, so I threw a quick chart together to explore that:
Additional code is:
ggplot(hd2) + geom_point(aes(x=Event,y=Points),size=1) + geom_text(aes(x=Event,y=Points,label=Athlete),size=2,,angle=45) + opts(axis.text.x=theme_text(angle=90,size=5),title="2012 Olympics Heptathlon") + scale_x_discrete(expand = c(0.05,0)) + ylim(600,1200)
I guess I should do some distribution plots too? And maybe figure in personal performances relative to eg median scores in each event?
PPS Here are the points scored by the medalists per event in the context of each other and the other participants:
#Hack to partition results into top 3 vs the rest #Should do this as top 3 proper, rather than above points score hd2b=subset(hd2,subset=(x<6620)) hd2a=subset(hd2,subset=(x>6620)) g=ggplot(hd2b) + geom_point(aes(x=Event,y=Points),col='lightgrey',size=1) + geom_text(aes(x=Event,y=Points,label=Athlete),size=2,col='lightgrey',angle=45) + opts(axis.text.x=theme_text(angle=90,size=6),title="2012 Olympics Heptathlon") + scale_x_discrete(expand = c(0.05,0)) + ylim(600,1200) g = g + geom_point(data=hd2a,aes(x=Event,y=Points),size=1,col='red') + geom_text(data=hd2a,aes(x=Event,y=Points,label=Athlete),size=2,col='red',angle=45) print(g)
8 thoughts on “At A Glance View of the 2012 Olympics Heptathlon Performances”
I don’t think the actual data is protected by copyright, since it’s considered a “fact” who won what medals (but I’m not a lawyer)
Such detailed data; good job. Very pleased for Jessica Ennis winning the olympic heptathlon gold.
The labels are much to small for me to read… Otherwise very cool!
Not just you Erik. Nice post all the same. While I like seeing the 45 degree text for seeing the sample of code, isn’t colouring the words based on event and then having a legend redundant? I would have preferred points on the first few graphs over words. As for other ways to view it, your first plot under the PS, the black text plot, would be nice to see that as a violin plot! Cheers.
Thanks for the comment – re: the redundant legend: agreed:-) Please feel free to tweak the code and post a link back to improved versions of the charts here in the comments:-) It’d be interesting to see what alternative views over the data folk can come up with, both minor tweaks/improvements, as well as radically alternative views. Part of the reason for doing the charts this was was to try to capture suggested improvements:-)
Hi Erik – yes, agreed re the labels being too small. That was in part so that this image doesn’t necessarily get reused… Part of my agenda is to try to get folk engaging with the R scripts themselves and fixing the dodgy bits, as well as posting suggested improvements back here in the comments so we can try and capture some of the reasoning that goes on when developing an effective chart… A lot of tutorials seem to assume the best practice – what I want to capture is something of the process of what decisions/reasoning goes on when thinking about how we can *improve* the display of a particular chart:-)
What would interest me is to measure the effect on the second day results if you are tested for doping after the first day. There was an outrage in Lithuania (the delegation made an official protest), since our athlete (Austra Skujyte)was the only one who got tested after the first day, thus losing valuable recovery time.
Comments are closed.