mages' blog

How cold is it? A Bayesian attempt to measure temperature

It is getting colder in London, yet it is still quite mild considering that it is late November. Well, indoors it still feels like 20°C (68°F) to me, but I have been told last week that I should switch on the heating.

Luckily I found an old thermometer to check. The thermometer showed 18°C. Is it really below 20°C?

The thermometer is quite old and I'm not sure that is works properly anymore. So, what shall I do now? Perhaps I should consider that both measurements are uncertain and try to combine them.

I believe that I can sense the temperature within ±3°C, while I think that the thermometer still works within ±2°C. Assuming that both measurements follow a Gaussian (Normal) distribution, with the uncertainties given as standard deviations, I can use Bayes' theorem to combine my hypothesis with the data. The posterior distribution will be Gaussian again with conjugated hyper-parameters:
\[
\mu=\left.\left(\frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^n x_i}{s^2}\right)\right/\left(\frac{1}{\sigma_0^2} + \frac{n}{s^2}\right) \\
\sigma^2=\left(\frac{1}{\sigma_0^2} + \frac{n}{s^2}\right)^{-1}
\]With \(K := \frac{n\sigma_0^2}{s^2+n\sigma_0^2} \) this simplifies to:
\[
\mu = K\, \bar{x} + (1 - K)\, \mu_0 \mbox{, with } \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i\\
\sigma = s \,\sqrt{K/n}
\]In my case I have: \(n=1,\; x_1=18^{\circ}C,\; s=2^{\circ}C,\; \mu_0=20^{\circ}C,\; \sigma_0=3^{\circ}C\).

Hence, the posterior distribution has parameters \(\mu=18.6^{\circ}C\) and \(\sigma=1.7^{\circ}C\). Thus, my best guess would be that is actually a little colder than I thought. One could argue that the probability that is below 20° is 80%.

Over the last five days my perception of the temperature didn't change, neither did the weather forecast, but the measurements showed: 18°C, 19°C, 17.5°C, 18°C, 18.5°C.

With that information the parameters update to \(\mu=18.3^{\circ}C\) and \(\sigma=0.9^{\circ}C\). I can't deny it any longer it has got colder. The probability that is below 20°C is now at 97% and the heating is on.


Without any prior knowledge I may have used a t-test to check the measurements. But here I believe that I have information about the thermometer and my own temperature sensing abilities which I don't want to ignore.

R code


Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] BayesianFirstAid_0.1 rjags_3-14           coda_0.16-1         
[4] lattice_0.20-29     

loaded via a namespace (and not attached):
[1] grid_3.1.2    MASS_7.3-35   mnormt_1.5-1  stringr_0.6.2

First steps with ChainLadder: Import triangle from Excel into R

Taking the first step is often the hardest: getting data from Excel into R.

Suppose you would like to use the ChainLadder package to forecast future claims payments for a run-off triangle that you have stored in Excel.


How do you get the triangle into R and execute a reserving function, such as MackChainLadder?

Well, there are many ways to do this and the ChainLadder package vignette, as well as the R manual on Data Import/Export have all of the details, but here is a quick and dirty solution using a CSV-file.

Open a new Excel workbook and copy your triangle into cell A1, with the first column being the accident or origin period and the first row describing the development period or age. You find an example CSV-file on GitHub.

Make sure that your triangle has no formatting, such a commas to separate thousands, as Excel will save those cell as characters. Now open R and go through the following commands:

# The first command will open a window and 
# ask you to select your CSV-file
myCSVfile <- file.choose()
# Read file into R
dat <- read.csv(file=myCSVfile) 
# use read.csv2 if semicolons are used as a separator
# likely to be the case if you are in continental Europe

dat # to see your data

     AY   X1    X2    X3    X4    X5    X6    X7    X8    X9   X10
1  1981 5012  8269 10907 11805 13539 16181 18009 18608 18662 18834
2  1982  106  4285  5396 10666 13782 15599 15496 16169 16704    NA
3  1983 3410  8992 13873 16141 18735 22214 22863 23466    NA    NA
4  1984 5655 11555 15766 21266 23425 26083 27067    NA    NA    NA
5  1985 1092  9565 15836 22169 25955 26180    NA    NA    NA    NA
6  1986 1513  6445 11702 12935 15852    NA    NA    NA    NA    NA
7  1987  557  4020 10946 12314    NA    NA    NA    NA    NA    NA
8  1988 1351  6947 13112    NA    NA    NA    NA    NA    NA    NA
9  1989 3133  5395    NA    NA    NA    NA    NA    NA    NA    NA
10 1990 2063    NA    NA    NA    NA    NA    NA    NA    NA    NA
Ok, the data is in R, but now you have to convert it into a triangle. A triangle is basically a matrix with extra attributes. To do this execute the following steps.

# Load the ChainLadder package
library(ChainLadder)
# Ignore first column which holds accident year information 
tri <- dat[,-1]
# Convert to matrix
tri <- as.matrix(tri)
# Add dimension names
dimnames(tri) <- list(origin=dat[,1], dev=1:ncol(tri))
# Convert into a triangle class
tri <- as.triangle(tri)
tri
      dev
origin    1     2     3     4     5     6     7     8     9    10
  1981 5012  8269 10907 11805 13539 16181 18009 18608 18662 18834
  1982  106  4285  5396 10666 13782 15599 15496 16169 16704    NA
  1983 3410  8992 13873 16141 18735 22214 22863 23466    NA    NA
  1984 5655 11555 15766 21266 23425 26083 27067    NA    NA    NA
  1985 1092  9565 15836 22169 25955 26180    NA    NA    NA    NA
  1986 1513  6445 11702 12935 15852    NA    NA    NA    NA    NA
  1987  557  4020 10946 12314    NA    NA    NA    NA    NA    NA
  1988 1351  6947 13112    NA    NA    NA    NA    NA    NA    NA
  1989 3133  5395    NA    NA    NA    NA    NA    NA    NA    NA
  1990 2063    NA    NA    NA    NA    NA    NA    NA    NA    NA
With those preparations done you can execute the MackChainLadder function:

M <- MackChainLadder(tri, est.sigma = "Mack")
M
     Latest Dev.To.Date Ultimate   IBNR Mack.S.E CV(IBNR)
1981 18,834       1.000   18,834      0        0      NaN
1982 16,704       0.991   16,858    154      206    1.339
1983 23,466       0.974   24,083    617      623    1.010
1984 27,067       0.943   28,703  1,636      747    0.457
1985 26,180       0.905   28,927  2,747    1,469    0.535
1986 15,852       0.813   19,501  3,649    2,002    0.549
1987 12,314       0.694   17,749  5,435    2,209    0.406
1988 13,112       0.546   24,019 10,907    5,358    0.491
1989  5,395       0.336   16,045 10,650    6,333    0.595
1990  2,063       0.112   18,402 16,339   24,566    1.503

               Totals
Latest:    160,987.00
Dev:             0.76
Ultimate:  213,122.23
IBNR:       52,135.23
Mack S.E.:  26,909.01
CV(IBNR):        0.52

To copy the full triangle back into Excel you can use the clipboard:
write.table(M$FullTriangle, file="clipboard", sep="\t").
Go back to Excel and hit <Ctrl> + V on your keyboard to paste the data into R.

For more details see the package vignette and Dan's post on pasting triangles from Excel into R via the clipboard.

If you are after a thorough overview of R in insurance take a look at the book Computational Actuarial Science with R.

Finally, join the Special Interest Group on using R in actuarial science and insurance to share your questions and answers.

Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] ChainLadder_0.1.9 systemfit_1.1-14  lmtest_0.9-33    
[4] zoo_1.7-11        car_2.0-21        Matrix_1.1-4     

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3     actuar_1.1-6        cluster_1.15.3     
 [4] foreign_0.8-61      Formula_1.1-2       grid_3.1.2         
 [7] Hmisc_3.14-5        lattice_0.20-29     latticeExtra_0.6-26
[10] MASS_7.3-35         nnet_7.3-8          plyr_1.8.1         
[13] RColorBrewer_1.0-5  Rcpp_0.11.3         reshape2_1.4       
[16] rpart_4.1-8         sandwich_2.3-2      splines_3.1.2      
[19] statmod_1.4.20      stringr_0.6.2       survival_2.37-7    
[22] tools_3.1.2         tweedie_2.2.1 

Unknown pleasures

Have I missed unknown pleasures in Python by focusing on R?

A comment on my blog post of last week suggested just that. Reason enough to explore Python a little. Learning another computer language is like learning another human language - it takes time. Often it is helpful to start by translating from the new language back into the old one.

I found a Python script by Ludwig Schwardt that creates a plot like this:


It is only a small Python script, but it illustrated how to:
  • load packages
  • use conditional statements
  • add comments
  • write functions
  • deal with arrays
  • write loops
  • create plots
  • save output into a PDF file
Here is my translation into R, which actually generated the plot above.

R code


Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] signal_0.7-4 e1071_1.6-4 

loaded via a namespace (and not attached):
[1] class_7.3-11 MASS_7.3-35  tools_3.1.2 

Phase plane analysis in R

The forthcoming R Journal has an interesting article about phaseR: An R Package for Phase Plane Analysis of Autonomous ODE Systems by Michael J. Grayling. The package has some nice functions to analysis one and two dimensional dynamical systems.

As an example I use here the FitzHugh-Nagumo system introduced earlier:
\begin{align}
\dot{v}=&2 (w + v - \frac{1}{3}v^3) + I_0 \\
\dot{w}=&\frac{1}{2}(1 - v - w)\\
\end{align}
The FitzHugh-Nagumo system is a simplification of the Hodgkin-Huxley model of spike generation in squid giant axon. Here \(I_0\) is a bifurcation parameter. As I decrease \(I_0\) from 0 the system dynamics change (Hopf-bifurcation): a stable equilibrium solution transform into a limit cycle.

Following Michael's paper, I can use phaseR to plot the velocity field, add nullclines and plot trajectories from different starting points.

Here I plot the FitzHugh-Nagumo system for four different parameters of \(I_0\) and three different initial starting values. The blue line show the nullcline of \(w\) i.e. \(\dot{w}=0\), while the red line shows the nullcline of \(v\). For \(I_0=-2\) I can observe the limit cycle.



Yet, I was a little surprised that the paper didn't make any references to the XPPAUT software by Bard Ermentrout, which has been around for many years as tool to analyse dynamical systems.


The GUI to the software itself gives many more options to analyse dynamical systems, including an interface to the popular bifurcation program AUTO. A good tutorial with the FitzHugh-Nagumo model was given by Mathieu Desroches at the ICS summer school 2012.

Of course I could use XPPAUT as a pure integration engine from R as well:


Considering that R started as a tool for statisticians it has made a remarkable journey; here competing with more traditional engineering tools like Matlab with MatCont or special software like XPPAUT. If someday, someone would find the time and motivation to write an interface to AUTO then R would indeed be a very good environment for the analysis of dynamical systems.

Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] phaseR_1.3     deSolve_1.10-9

loaded via a namespace (and not attached):
[1] tools_3.1.2

It is the small data that matters the most

Everyone is talking about Big Data1, but it is the small data that is holding everything together. The small slowly changing reference tables are the linchpins. Unfortunately, too often politics gets in the way as those small tables, maintained by humans, don't get the attention they deserve; or in other words their owners, if they exists - many of these little tables are orphans, make changes without understanding the potential consequences on downstream systems.

Here is a little story as it happened over the weekend.

A friend of mine flew on Saturday night from Vienna to Yerevan. The plane was scheduled to leave Austria at 10:15 PM and to arrive at 4:35 AM in Armenia. However, last weekend was the weekend when the summer time or daylight saving time (DST) ended in many European countries, including Austria, but not in Armenia.

Armenia used DST in 1981-1995 and 1997-2011. This may look a little odd, but Armenia is bar far not the only region that has changed its policy over the years. Russia decided in 2011 not to switch to winter time and keep DST all year round, only to switch to permanent winter time last weekend again.

Back to my friend, she arrives early Sunday morning at the airport in Yerivan, but her friend who promised to pick her up is not there. After a little while she takes a cab to her friends house, making it just in time before he was about to leave for the airport. What happened? Well, her friend relied on his mobile phone for his wake-up call. However, his mobile phone wasn't aware of the fact that Armenia didn't have DST anymore and set his clock back by one hour, allowing him more sleep. His phone might either be a little old or getting its signal from Russia, I don't know. Yet, this little story illustrates nicely how reliant we are nowadays on our interconnected devices having access to the correct reference data.

Imagine being in an intensive care unit relying on medial devices that have to work correctly together on time signals, when they are sourced from different companies, countries and running on different systems. Try to stay healthy, particular in late March and October, would be my advise.

So, how do systems/computers actually know which timezone to use?

Most computers will use the tz reference code and database maintained by a group of volunteers until 2011 and now by IANA. Still, how do they get informed that a region or state decided to change the rules? By keeping their ears on the ground!

The following extract from the Asia file of the tz database 2014b release is quite telling:


# Armenia
# From Paul Eggert (2006-03-22):
# Shanks & Pottenger have Yerevan switching to 3:00 (with Russian DST)
# in spring 1991, then to 4:00 with no DST in fall 1995, then
# readopting Russian DST in 1997.  Go with Shanks & Pottenger, even
# when they disagree with others.  Edgar Der-Danieliantz
# reported (1996-05-04) that Yerevan probably wouldn't use DST
# in 1996, though it did use DST in 1995.  IATA SSIM (1991/1998) reports 
# that Armenia switched from 3:00 to 4:00 in 1998 and observed DST 
# after 1991, but started switching at 3:00s in 1998.

# From Arthur David Olson (2011-06-15):
# While Russia abandoned DST in 2011, Armenia may choose to
# follow Russia's "old" rules.

# From Alexander Krivenyshev (2012-02-10):
# According to News Armenia, on Feb 9, 2012,
# http://newsarmenia.ru/society/20120209/42609695.html
#
# The Armenia National Assembly adopted final reading of Amendments to the
# Law "On procedure of calculation time on the territory of the Republic of
# Armenia" according to which Armenia [is] abolishing Daylight Saving Time.
# or
# (brief)
# http://www.worldtimezone.com/dst_news/dst_news_armenia03.html
# Zone NAME  GMTOFF RULES FORMAT [UNTIL]
Zone Asia/Yerevan 2:58:00 - LMT 1924 May  2
   3:00 - YERT 1957 Mar    # Yerevan Time
   4:00 RussiaAsia YER%sT 1991 Mar 31 2:00s
   3:00 1:00 YERST 1991 Sep 23 # independence
   3:00 RussiaAsia AM%sT 1995 Sep 24 2:00s
   4:00 - AMT 1997
   4:00 RussiaAsia AM%sT 2012 Mar 25 2:00s
   4:00 - AMT

Does this all sounds familiar to you and the challenges in your own organisation? Well, I gave a talk on Small & Big Data recently with a colleague of mine, should you be interested to find out more about this topic.

1. or Tiny Data in Rasmus' case

Approximating the impact of inflation

The other day someone mentioned to me a rule of thumb he was using to estimate the number of years \(n\) it would take for inflation to destroy half of the purchasing power of today's money:
\[ n = \frac{70}{p}\]
Here \(p\) is the inflation in percent, e.g. if the inflation rate is \(2\%\) then today's money would buy only half of today's goods and services in 35 years. You can also think of a saving account with an interest rate of \(2\%\) that would double your money in 35 years.

It is not difficult to derive this formula, and I will do this below, I just wonder if the craft of approximating answers to questions is slowly eroding as we have ever more powerful computer and access to more and more data at our finger tips? Well, I better write down my derivation, before I forget it again.

The starting point is:
\[
2K = K (1 + \frac{p}{100})^n
\]
This is equivalent to:
\[
2 = (1 + \frac{p}{100})^n
\]
Taking the log gives:
\[
\log(2) = n \log(1 + \frac{p}{100})
\]
The first term of the Taylor series approximation of \(\log(1+x)\) for small \(x\) is \(x\). Hence for small \(p\) I can set:
\[
\log(2) \doteq n \, \frac{p}{100}
\]
Next I have to estimate the value for \(\log(2)\). Writing it as an integral leads to:
\[
\log(2) = \int_1^2 \frac{1}{x} \,dx
\]
Using Simpson's rule I can approximate the integral with:
\[
\int_1^2 \frac{1}{x} \,dx \doteq \frac{2-1}{6} (1+4\frac{2}{1+2}+\frac{1}{2} )
= \frac{25}{36} \doteq 0.7
\]
Thus,
\[
n \doteq \frac{70}{p}
\]
Plotting the two formulas against each other reveals that the approximation works pretty well, even for inflation rates up to 10%.

R Code

Here is the R code to reproduce the plot.
curve(70/x, from=1, to=10, 
      xlab="Inflation rate p%", 
      ylab="Number of years for purchaing power to half", 
      main="Impact of inflation on purchasing power",
      col="blue", 
      type="p", pch=16, cex=0.5)
curve(log(2)/(log(1+x/100)), 
      from=1, to=10, add=TRUE, 
      col="red")
legend("topright", 
       legend=c("70/p","log(2)/log(1+p/100)"), 
       bty="n",
       col=c("blue", "red"), 
       pch=c(16,16), pt.cex=c(1,1))

googleVis 0.5.6 released on CRAN

Version 0.5.6 of googleVis was released on CRAN over the weekend. This version fixes a bug in gvisMotionChart. Its arguments xvar, yvar, sizevar and colorvar were not always picked up correctly.

Thanks to Juuso Parkkinen for reporting this issue.

Example: Love, or to love

A few years ago Martin Hilpert posted an interesting case study for motion charts. Martin is a linguist and he researched how the usage of words in American English changed over time, e.g. some words were more often used as nouns in the past and then became more popular as a verb. Do you talk about love, or do you tell someone that you love her/him? Visit his motion chart web page for more information and details!


R code


Session Info

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] googleVis_0.5.6

loaded via a namespace (and not attached):
[1] RJSONIO_1.3-0 tools_3.1.1