I did the Two Castles Run today; it’s a 10km race between Warwick and Kenilworth castles. The organizers were very quick to put the results online and even went the extra mile of offering them as a CSV file. It was therefore very tempting to launch R and see what the distribution looked like (and how I fared compared to the rest of the runners).
After a quick R script to read and parse the data:
1 2 3 4 5 6 7 8
library(ggplot2) results<-read.csv("2011TwoCastlesRun.csv") results$Minutes<-sapply(as.character(results$ChipTime), FUN=function(s) sum(as.integer(strsplit(s,':')[])*c(60,1,1/60))) summary(results$Minutes[results$M.F=="M"]) p<-ggplot(results,aes(Minutes,colour=M.F))+geom_density() print(p) print(results[results$Bib==2474,])
the distribution of the results (in minutes) looks like this:
As expected, men are faster on average than women but it’s funny to see how similar the two curves are; they even have the same small bump after the median. I wonder what makes those bumps.
My time today was 48’29 (or 48.4833 minutes), which places me at the 740th position. How good is that? Well,
summary(results$Minutes) Min. 1st Qu. Median Mean 3rd Qu. Max. 32.08 48.72 55.13 55.86 61.46 99.50
So I’m in the first quartile!
But wait, looking at men only:
summary(results$Minutes[results$M.F=="M"]) Min. 1st Qu. Median Mean 3rd Qu. Max. 32.08 46.68 51.85 52.84 57.62 99.50
I’m not any more. Still closer to the quartile than the median though!
A reader sent me the distribution for the 2010 results:
They both look very similar year on year. Actually, putting them on top of each other for direct comparison, the similarity is even more obvious:
1 2 3 4
results<-rbind(results2010,results2011) results$year<-c(rep(2010,dim(results2010)),rep(2011,dim(results2011))) p<-ggplot(results)+geom_density(aes(Minutes,linetype=factor(year)))+facet_grid(.~M.F) print(p)
1 2 3 4
m<-merge(results2011,results2010,by=c("Forename","Surname"),suffixes=c(".2011",".2010")) p<-ggplot(m,aes(Minutes.2011-Minutes.2010))+geom_density() print(p) count(m$Minutes.2011<m$Minutes.2010)$freq*100/dim(m)