Predicting events, when they haven't happened yet

Suppose you have to predict the probabilities of events which haven't happened yet. How do you do this?

Here is an example from the 1950s when Longley-Cook, an actuary at an insurance company, was asked to price the risk for a mid-air collision of two planes, an event which as far as he knew hadn't happened before. The civilian airline industry was still very young, but rapidly growing and all Longely-Cook knew was that there were no collisions in the previous 5 years [1].

Where do you start?

Although the probability for a mid-air collision should be very low for any given plane, the probability for an event in a year will be higher.

Let's think of the years as a series of Bernoulli trials with unknown probability \(p\). That's a likelihood. If I start with an uninformed prior, such as a Beta(\(\alpha,\beta\)) with \(\alpha=1, \beta=1\) then I can use the concept of Bayesian conjugates to update my prior believe.

In this case the posterior parameter distribution is Beta again with hyper-parameters \(\alpha'=\alpha + \sum_{i=1}^n x_i,\, \beta'=\beta + n - \sum_{i=1}^n x_i\), where \(x_i=1\) if the event occurred, or 0 otherwise and \(n\) is the number of years.

Thus, the updated parameters are \(\alpha'=1, \beta'=6\), with a posterior predictive mean of \(\alpha'/(\alpha'+\beta')=1/7\). That is a 14.3% chance for a mid-air collision in the next year with a 95% confidence interval of [0, 39%]. Or, in other words, if I round 39% to 40%, a return period of 2.5 years (1/0.4), i.e. up to 4 incidents in 10 years should be allowed for. That's what Longley-Cook predicted.

Tragically, 128 people died over the Grand Canyon in 1956, and 4 years after that, 134 people died over New York City.

Wikipedia lists 51 notable civilian mid-air collisions since 1922, including helicopters and space crafts. Since 1955 there were 11 incidents that had more than 100 fatalities, the last one in 2006.

So, what would this mean to Mr. Longley-Cook today? Well, first of all that his prediction wasn't too bad at all. Perhaps, he would set the probability at (1+11)/((1+11)+(1+60-11)\(\approx\)20% today. He may have argued that 2 x 20% = 40% of the average plane value should be included in the world wide premium for airline hull to cover mid-air collisions.

R code

Interested in the application of R in insurance? Join us at the 3rd R in Insurance conference in Amsterdam, 29 June 2015.


[1] Computational Actuarial Science with R, Edited by Arthur Charpentier, Chapman and Hall/CRC Reference - 656 Pages

Session Info

R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.3 (Yosemite)

[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] XML_3.98-1.1

loaded via a namespace (and not attached):
[1] tools_3.2.0


  1. Great post idea! Love it! It inspired me to use this concept for writing a blog post on prediction of extreme rare losses in financial trading. I understand that "n" can be freely interpreted, i.e. by choosing a specific time interval, right? Could you also jot down why 39% (or 40%) you interpret as a return period here? I have a feeling but it would be helpful to understand it better. Thanks again!!

  2. Well, a probability of 40% also means that you would expect 0.4 events per year, or better 4 in 10 years, or every 2.5 years on average.

  3. Ok, got it. Last question. Why do you add 1 to 11 (and so on) in the last paragraph? What this "1" means?

  4. Well, if I keep my prior a Beta with alpha=1 and beta=1, then the updated hyper-parameters are given as alpha' = alpha+number of events = 1 + 11 and beta' = beta + n years - number of events = 1 + 60 - 11, and with that the posterior predictive mean is alpha'/(alpha'+beta')=(1+11)/(1+11 + 1+60-11).

  5. disqus_72GXGq6drQ7 May 2015 at 11:01

    Clearly, my university studies and all those actuarial exam passes are not wasted! Thanks for another informative article!

  6. disqus_72GXGq6drQ7 May 2015 at 11:47

    Must try and investigate this with U(0,1) prior or some others