It is less than a week before the 2012 Olympic games will start in London. No surprise therefore that the papers are all over it, including a lot of data and statistis around the games.
The Economist investigated the potential financial impact on sponsors (some benefits), tax payers (no benefits) and the athletes (if they are lucky) in its recent issue and video.
100m men final
The biggest event of the Olympics will be one of the shortest: the 100 metres men final. It will be all over in less than 10 seconds. In 1968 Jim Hines was the first gold medal winner, who achieved a sub-ten-seconds time and since 1984 all gold medal winners have run faster than 10 seconds. The historical run times of the past Olympics going back to 1896 are available from databasesport.com.
Looking at the data it appears that a simple log-linear model will give a reasonable forecast for the 2012 Olympic's result (ignoring the 1896 time). Of course such a model doesn't make sense forever, as it would suggest that future run-times will continue to shrink. Hence, some kind of logistics model might be a better approach, but I have no idea what would be a sensible floor for it. Others have used ideas from extreme value theory to investigate the 100m sprint, see the paper by Einmahl and Smeets, which would suggest a floor greater than 9 seconds.
|Historical winning times for the 100m mean final.|
Red line: log-linear regression, black line: logistic regression.
My simple log-linear model forecasts a winning time of 9.68 seconds, which is 1/100 of a second faster than Usain Bolt's winning time in Beijing in 2008, but still 1/10 of a second slower than his 2009 World Record (9.58s) in Berlin.
Never-mind, I shall stick to my forecast. The 100m final will be held on 5 August 2012. Now even I get excited about the Olympics, and be it for less than 10 seconds.
Here is the R code used in this the post:
library(XML) library(drc) url <- "http://www.databaseolympics.com/sport/sportevent.htm?enum=110&sp=ATH" data <- readHTMLTable(readLines(url), which=2, header=TRUE) golddata <- subset(data, Medal %in% "GOLD") golddata$Year <- as.numeric(as.character(golddata$Year)) golddata$Result <- as.numeric(as.character(golddata$Result)) tail(golddata,10) logistic <- drm(Result~Year, data=subset(golddata, Year>=1900), fct = L.4()) log.linear <- lm(log(Result)~Year, data=subset(golddata, Year>=1900)) years <- seq(1896,2012, 4) predictions <- exp(predict(log.linear, newdata=data.frame(Year=years))) plot(logistic, xlim=c(1896,2012), ylim=c(9.5,12), xlab="Year", main="Olympic 100 metre", ylab="Winning time for the 100m men's final (s)") points(golddata$Year, golddata$Result) lines(years, predictions, col="red") points(2012, predictions[length(years)], pch=19, col="red") text(2012, 9.55, round(predictions[length(years)],2))
Update 5 August 2012You find a comparison of my forecast to the final outcome of Usain Bolt's winning time of 9.63s on my follow-up post.
The other day I saw a fantastic exhibition of work by Bridget Riley. Karsten Schubert, who is Riley's main agent, has a some of her most famous and influential artwork from 1960 - 1966 on display, including the seminal Moving Squares from 1961.
|Photo of Moving Squares by Bridget Riley, 1961|
Emulsion on board, 123.2 x 121.3cm
In the 1960s Bridget Riley created some great black and white artwork, which at a first glance may look simple and deterministic or sometimes random, but has fascinated me since I saw some of her work for the first time about 9 years ago at the Tate Modern.
Her work prompted a very simple question to me: When does a pattern appear random? As human beings most of our life is focused on pattern recognition. It is about making sense of the world around us, being able to understand what people are saying; seeing lots of different things and yet knowing when something is a table and when it is not. No surprise, I suppose, that pattern recognition is such a big topic in statistics and machine learning.
Of course I couldn't resist trying to reproduce the Moving Squares in R. Here it is:
## Inspired by Birdget Riley's Moving Squares x <- c(0, 70, 140, 208, 268, 324, 370, 404, 430, 450, 468, 482, 496, 506,516, 523, 528, 533, 536, 542, 549, 558, 568, 581, 595, 613, 633, 659, 688, 722, 764, 810) y <- seq(from=0, to=840, by=70) m <- length(y) n <- length(x) z <- t(matrix(rep(c(0,1), m*n/2), nrow=m)) image(x[-n], y[-m], z[-n,-m], col=c("black", "white"), axes=FALSE, xlab="", ylab="")
However, what may look similar on screen is quite different when you see the actual painting. Thus, if you are in London and have time, make your way to the gallery in Soho. I recommend it!
The second Cologne R user meeting took place last Friday, 6 July 2012, at the Institute of Sociology. Thanks to Bernd Weiß, who provided the meeting room, we didn't have to worry about the infrastructure, like we did at our first gathering.
Again, we had an interesting mix of people turning up, with a very diverse background from chemistry to geo-science, energy, finance, sociology, pharma, physics, psychology, mathematics, statistics, computer science, telco, etc. Yet, the gender mix was still somewhat biased, with only one female attendee.
We had two fantastic talks by Bernd Weiß and Stephan Sprenger. Bernd talked about Emacs' Org-Mode and R. He highlighted the differences between literate programming and reproducible research and if you follow his slides, you get the impression that Emacs with Org-mode is the all-singing-all-dancing editor, or for those who speak German: eine eierlegende Wollmilchsau.
Reduce allows me to apply a function successively over a vector.
What does that mean? Well, if I would like to add up the figures 1 to 5, I could say:
add <- function(x,y) x+y