I did the Two Castles Run today; it’s a 10km race between Warwick and Kenilworth castles. The organizers were very quick to put the results online and even went the extra mile of offering them as a CSV file. It was therefore very tempting to launch R and see what the distribution looked like (and how I fared compared to the rest of the runners).
After a quick R script to read and parse the data:
1 2 3 4 5 6 7 8 | library(ggplot2) results<-read.csv("2011TwoCastlesRun.csv") results$Minutes<-sapply(as.character(results$ChipTime), FUN=function(s) sum(as.integer(strsplit(s,':')[[1]])*c(60,1,1/60))) summary(results$Minutes[results$M.F=="M"]) p<-ggplot(results,aes(Minutes,colour=M.F))+geom_density() print(p) print(results[results$Bib==2474,]) |
the distribution of the results (in minutes) looks like this:

As expected, men are faster on average than women but it’s funny to see how similar the two curves are; they even have the same small bump after the median. I wonder what makes those bumps.
My time today was 48’29 (or 48.4833 minutes), which places me at the 740th position. How good is that? Well,
summary(results$Minutes) Min. 1st Qu. Median Mean 3rd Qu. Max. 32.08 48.72 55.13 55.86 61.46 99.50
So I’m in the first quartile!
But wait, looking at men only:
summary(results$Minutes[results$M.F=="M"]) Min. 1st Qu. Median Mean 3rd Qu. Max. 32.08 46.68 51.85 52.84 57.62 99.50
I’m not any more. Still closer to the quartile than the median though!
UPDATE:
A reader sent me the distribution for the 2010 results:

They both look very similar year on year. Actually, putting them on top of each other for direct comparison, the similarity is even more obvious:
1 2 3 4 | results<-rbind(results2010,results2011) results$year<-c(rep(2010,dim(results2010)[1]),rep(2011,dim(results2011)[1])) p<-ggplot(results)+geom_density(aes(Minutes,linetype=factor(year)))+facet_grid(.~M.F) print(p) |

Did people do better this year? A simple merge shows us the improvement (or lack thereof) of the runners who ran both races (assuming no homonyms):
1 2 3 4 | m<-merge(results2011,results2010,by=c("Forename","Surname"),suffixes=c(".2011",".2010")) p<-ggplot(m,aes(Minutes.2011-Minutes.2010))+geom_density() print(p) count(m$Minutes.2011<m$Minutes.2010)$freq*100/dim(m)[1] |

[1] 27.98077 72.01923
Indeed! Most people (72%) improved their time, some by a 10min margin. That’s encouraging!

Practising for running to useR 2011 then?
You can find out exactly where you came in the distribution, by using ecdf, e.g.,
ecdf(results$Minutes)(48.4833)
Ah ah, yes Coventry wasn’t too far from the finish line.
Good call on ecdf:
which looks good. But limiting the selection to men is more humbling:
I am impressed by the fact that there are categories for every 5 year group! This makes more groups, thus more chances for top positions, and it is also fairer! The route seemed rather flat, overall…
And having so many runners of unknown category makes it even easier too I guess:
The route isn’t that flat! It’s actually (slightly) uphill for a substantial length, and almost never downhill.