# mages' blog

We released googleVis version 0.5.8 on CRAN last week. The update is a maintenance release for the forthcoming release of R 3.2.0.

 Screen shot of some of the Google Charts
New to googleVis? The package provides an interface between R and the Google Charts Tools, allowing you to create interactive web charts from R without uploading your data to Google. The charts are displayed by default via the R internal help browser.

Visit the examples of all googleVis charts on CRAN and review the vignettes.

## Communicating Risk and Uncertainty

David Spiegelhalter gave a fascinating talk on Communicating Risk and Uncertainty to the Public & Policymakers at the Grantham Institute of the Imperial College in London last Tuesday.

In a very engaging way David gave many examples and anecdotes from his career in academia and advisory. I believe his talk will be published on the Grantham Institute's YouTuble channel, so I will only share a few highlights and thoughts that stuck in my mind here.

### Framing: Opportunity vs. Risk

Do you prefer survival ratios or mortality rates? The way you present probabilities/frequencies will make a difference in the way the audience perceives them and therefore how they will make decisions. David suggested one should aim to give a balanced view to mitigate the framing effect; e.g. show the opportunity and risk, the survival ratio and mortality rate. Interestingly, that is exactly how Wolfram Alpha displays life expectancies.

 Life expectance: UK, male, 62 years old (Source: Wolfram Alpha)

### Absolute and relative risk

What did you have for breakfast this morning? Fried up bacon? Perhaps you have fried up bacon every day? It could increase your chances of developing pancreatic cancer by almost a fifth, as an article by the Daily Express on 16th January 2012 suggested.

 Daily Express, 16th January 2012

Well, this might be true, but fortunately pancreatic cancer is quite rare. About 5 people out of 400 will develop this cancer even without eating fried up bacon every day (absolute risk). Eating fry ups every morning will increase your chances by 20% (relative risk) to 6 out of 400 - still pretty low. Less of a scary headline.

Small increases in absolute risks can result in large increases in relative risk. Depending on how you choose to frame the story you can generate very different reactions. David mentioned also the work of Gerd Gigerenzer, who is promoting the use of natural frequencies in the context of medical conversations, e.g. say '1 out of 200 people will experience a certain effect', instead of 'the probability for this effect is 0.5%'.

### Conditional probabilities

One of the big challenges in communicating risk are conditional probabilities. I believe, conditional probabilities are not intuitive for most of us. As an example David talked about breast cancer screening. The test itself is not a 100% reliable. No test is. There are always results which are false positives (test predicts cancer when there is none) and false negatives (test predicts no cancer when there is one). Tests are usually designed to have a higher false alarm rate than no alarm at all. Hence, even when the test result is positive the likelihood of cancer could be low, because cancer is rare and the test is not that good. David presented a great diagram of the NHS breast screening - Helping you decide leaflet to illustrate the conditional probabilities for breast screening.

 NHS breast screening - Helping you decide
I think this is a wonderful example of trying to give a balanced view of the underlying risk, communicated in a very clear and understandable way.

### Translating probabilities into words

The use of natural frequencies is not always appropriate and sometimes probabilities are more useful. However, instead of saying that something has a 50% chance of occurring, you might also say it is equally likely to happen than not. David mentioned the work of the Intergovernmental Panel on Climate Change (IPCC) that aims to use a consistent language in all their reports. To achieve that, they publish guidance on consistent treatments of uncertainty. This is their translation of probabilities into words:

 Source: IPCC Uncertainty guidance note
I think this table will become handy in the future when I have to elicit expert opinions.

### Speaking with confidence

The IPCC uncertainty guidance also talks about confidence and interestingly they correlate the level of agreement with the level of evidence. Note, even with a lot of evidence, without agreement the confidence is relatively low.

 Source: IPCC Uncertainty guidance note
I wonder, what low agreement with robust evidence could mean. Perhaps, even if there is strong evidence that I should change my diet, e.g. drink less coffee, I might not agree with it and as a result wouldn't take any actions.

### Fan charts

The last example from David's talk I will mention here is the Bank of England's fan chart of GDP projection. The Bank of England has been using this chart now for a number of years and tweaked it only a little over that time. Until the financial crisis the Bank of England showed bands of confidence within the 5-95% interval. The financial crisis happened outside that interval, which of course doesn't mean that it couldn't happen, it was just very unlikely. Since then they illustrate the 0-100% confidence interval as a grey background. The other interesting aspect is, that they don't show the mean projection line initially to overcome the anchoring effect. Note also that the term projection is used and not prediction.

 November 2014 GDP Fan chart (Source: Bank of England)

Finally, David announced that he has a new book in the pipeline. I am sure it will sell well, as it has the catchy title Sex by numbers.

## Extended Kalman filter example in R

Last week's post about the Kalman filter focused on the derivation of the algorithm. Today I will continue with the extended Kalman filter (EKF) that can deal also with nonlinearities. According to Wikipedia the EKF has been considered the de facto standard in the theory of nonlinear state estimation, navigation systems and GPS.

### Kalman filter

I had the following dynamic linear model for the Kalman filter last week:
\begin{align} x_{t+1} & = A x_t + w_t,\quad w_t \sim N(0,Q)\\ y_t &=G x_t + \nu_t, \quad \nu_t \sim N(0,R)\\ x_1 & \sim N(\hat{x}_0, \Sigma_0) \end{align}With $$x_t$$ describing the state space evolution, $$y_t$$ the observations, $$A, Q, G, R, \Sigma_0$$ matrices of appropriate dimensions, $$w_t$$ the evolution error and $$\nu_t$$ the observation error. The Kalman filter provides recursive estimators for $$x_t$$ via:
\begin{align} K_{t} &= A \Sigma_t G' (G \Sigma_t G' + R)^{-1}\\ \hat{x}_{t+1} &= A \hat{x_t} + K_{t} (y_t - G \hat{x}_t) \\ \Sigma_{t+1} &= A \Sigma_t A' - K_{t} G \Sigma_t A' + Q \end{align}In the case of nonlinearities on the right hand side of either the state ($$x_t$$) or observation ($$y_t$$) equation the extended Kalman filter uses a simple and elegant trick: Taylor series of the first order, or in other words, I simply linearise the right hand side. The matrices $$A$$ and $$G$$ will be the Jacobian matrices of the respected vector functions.

### Logistic growth

As an example I will use a logistic growth model, inspired by the Hakell example given by Dominic Steinitz.

The logistic growth model can be written as a time-invariant dynamical system with growth rate $$r$$ and carrying capacity $$k$$:
\begin{aligned} \dot{p} & = r p\Big(1 - \frac{p}{k}\Big) \end{aligned}The above ordinary differential equation has the well known analytical solution:
$p = \frac{kp_0\exp(r\,t)}{k + p_0(\exp(r\,t) - 1)}$Suppose I observe data of a population for which I know the carrying capacity $$k$$, but where the growth rate $$r$$ is noisy.

### Extended Kalman filter

The state space and observation model can then be written as:
\begin{aligned} r_i &= r_{i-1} \\ p_i &= \frac{kp_{i-1}\exp(r_{i-1}\Delta T)}{k + p_{i-1}(\exp(r_{i-1}\Delta T) - 1)} \\ y_i &= \begin{bmatrix}0 & 1\end{bmatrix} \begin{bmatrix}r_i \\ p_i\end{bmatrix} + \nu \end{aligned}Or with $$x_i:=\begin{bmatrix}r_i & p_i\end{bmatrix}'$$ as:
\begin{aligned} x_i &= a(x_i)\\ y_i &= G x_i + \nu_i, \quad \nu_i \sim N(0,R) \end{aligned}In my example the state space model is purely deterministic, so there isn't any evolution noise and hence $$Q=0$$.

With an initial prior guess for $$x_0$$ and $$\Sigma_0$$ and I am ready to go. The R code below shows my implementation with the algorithm above. Note that I use the jacobian function of the numDeriv package.
After a few time steps the extended Kalman filter does a fantastic job in reducing the noise. Perhaps this shouldn't be too surprising as a local linearisation of the logistic growth function will give a good fit. The situation might be different for highly nonlinear functions. The Wikipedia article about the Kalman filter suggests the unscented version in those cases.

### Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] numDeriv_2012.9-1

loaded via a namespace (and not attached):
[1] tools_3.1.2

## Kalman filter example visualised with R

At the last Cologne R user meeting Holger Zien gave a great introduction to dynamic linear models (dlm). One special case of a dlm is the Kalman filter, which I will discuss in this post in more detail. I kind of used it earlier when I measured the temperature with my Arduino at home.

Over the last week I came across the wonderful quantitative economic modelling site quant-econ.net, designed and written by Thomas J. Sargent and John Stachurski. The site not only provides access to their lecture notes, including the Kalman fitler, but also code in Python and Julia.

I will take their example of the Kalman filter and go through it with R. I particularly liked their visuals of the various steps of the Kalman filter.

### Motivation

Suppose I have a little robot that moves autonomously over my desk. How would the robot know where it is? Well, suppose the robot can calculate from its wheels' movements how far it went and an additional sensor (e.g. GPS-like) provides also information about its location. Of course both pieces of information will be imprecise. How can the various signals be combined?

This is a classic scenario for the Kalman filter. Its key assumptions are that the errors/noise are Gaussian and that the state space evolution $$x_t$$ from one time step to the next is linear, so is the mapping to the sensor signals $$y_t$$. For the example I will use it reads:
\begin{align} x_{t+1} & = A x_t + w,\quad w \sim N(0,Q)\\ y_t &=G x_t + \nu, \quad \nu \sim N(0,R)\\ x_1 & \sim N(\hat{x}_0, \Sigma_0) \end{align}With $$A, Q, G, R, \Sigma$$ matrices of appropriate dimensions. The Kalman filter provides recursive estimators for $$x_t$$:
\begin{align} K_{t} &= A \Sigma_t G' (G \Sigma_t G' + R)^{-1}\\ \hat{x}_{t+1} &= A \hat{x_t} + K_{t} (y_t - G \hat{x}_t) \\ \Sigma_{t+1} &= A \Sigma_t A' - K_{t} G \Sigma_t A' + Q \end{align}

Let's say at time $$t_0$$ the robot has the expected position $$\hat{x} = (0.2, -0.2)$$. That means, the robot doesn't know its location on the table with certainty. Further I shall assume that the probability density function of the position follows a Normal distribution with covariance matrix $$\Sigma = \left[\begin{array}{cc} 0.4 & 0.3\\ 0.3 & 0.45 \end{array} \right]$$
The prior distribution of the robot's position can be visualised in R with a contour plot.

library(mnormt)
xhat <- c(0.2, -0.2)
Sigma <- matrix(c(0.4, 0.3,
0.3, 0.45), ncol=2)
x1 <- seq(-2, 4,length=151)
x2 <- seq(-4, 2,length=151)
f <- function(x1,x2, mean=xhat, varcov=Sigma)
dmnorm(cbind(x1,x2), mean,varcov)
z <- outer(x1,x2, f)
mycols <- topo.colors(100,0.5)
image(x1,x2,z, col=mycols, main="Prior density",
xlab=expression('x'[1]), ylab=expression('x'[2]))
points(0.2, -0.2, pch=19)
text(0.1, -0.2, labels = expression(hat(x)), adj = 1)

### Sensor information

The 'GPS' sensor provides also data about the location of the robot, again the sensor signal can be regarded as the expected position with some noise. Suppose the reading of the sensor is $$y=(2.3, -1.9)$$. I will assume that the sensor signal has an error following a Normal distribution, with a covariance matrix $$R=0.5 \Sigma$$.

Conceptually I think about it like this, given that the robot is at position $$x$$, what would be the likelihood that the sensor reading is $$y\,|\,x\,$$? The Kalman filter assumes a linear mapping from $$x$$ to $$y$$:
$y = G x + \nu \mbox{ with } \nu \sim N(0,R)$In my case $$G$$ will be the identity matrix. Let's add the sensor information to the plot.

R <- 0.5 * Sigma
z2 <- outer(x1,x2, f, mean=c(2.3, -1.9), varcov=R)
image(x1, x2, z2, col=mycols, main="Sensor density")
points(2.3, -1.9, pch=19)
text(2.2, -1.9, labels = "y", adj = 1)
points(0.2, -0.2, pch=19)
text(0.1, -0.2, labels = expression(hat(x)), adj = 1)

### Filtering

I have two pieces of information about the location of the robot, both with a Normal distribution, my prior is $$x \sim N(\hat{x},\Sigma)$$ and for the distribution of my sensor I have $$y\,|\,x \sim N(Gx, R)$$.

What I would like to know is the location of the robot given the sensor reading:
$p(x \,|\, y) = \frac{p(y \,|\, x) \, p(x)} {p(y)}$Now, because $$x, y$$ are Normal and $$G$$ is the identity matrix, I know that the posterior distribution of $$x \,|\, y$$ is Normal again, with parameters (see conjugate prior):
$\hat{x}_f = (\Sigma^{-1} + R^{-1})^{-1} (\Sigma^{-1} \hat{x} + R^{-1} y) \\ \Sigma_f = (\Sigma^{-1} + R^{-1})^{-1}$Using the matrix inversion identity $(A^{-1} + B^{-1})^{-1} = A - A (A + B)^{-1}A = A (A + B)^{-1} B$ I can write the above as:
\begin{align} \hat{x}_f & = (\Sigma - \Sigma (\Sigma + R)^{-1}\Sigma)(\Sigma^{-1} \hat{x} + R^{-1}y)\\ & =\hat{x} - \Sigma (\Sigma + R)^{-1} \hat{x} + \Sigma R^{-1} y -\Sigma (\Sigma + R)^{-1}\Sigma R^{-1}y\\ & = \hat{x} + \Sigma (\Sigma + R)^{-1} (y - \hat{x})\\ & = (1.667, -1.333)\\ \Sigma_f &= \Sigma - \Sigma (\Sigma + R)^{-1} \Sigma\\ &=\left[\begin{array}{lll} 0.133 & 0.10\\ 0.100 & 0.15 \end{array} \right] \end{align}In the more general case when $$G$$ is not the identity matrix I have
\begin{align} \hat{x}_f & = \hat{x} + \Sigma G' (G \Sigma G' + R)^{-1} (y - G \hat{x})\\ \Sigma_f &= \Sigma - \Sigma G' (G \Sigma G' + R)^{-1} G \Sigma \end{align}The updated posterior probability density function $$p(x\,|\,y)=N(\hat{x}_f, \Sigma_f)$$ is visualised in the chart below.
I can see how the prior and likelihood distributions have influenced the posterior distribution of the robot's location. The sensor information is regarded more credible than the prior information and hence the posterior is closer to the sensor data.

G = diag(2)
y <- c(2.4, -1.9)
xhatf <- xhat + Sigma %*% t(G) %*% solve(G %*% Sigma %*% t(G) + R) %*% (y - G %*% xhat)
Sigmaf <- Sigma - Sigma %*% t(G) %*% solve(G %*% Sigma %*% t(G) + R) %*% G %*% Sigma
z3 <- outer(x1, x2, f, mean=c(xhatf), varcov=Sigmaf)
image(x1, x2, z3, col=mycols,
xlab=expression('x'[1]), ylab=expression('x'[2]),
main="Filtered density")
points(xhatf[1], xhatf[2], pch=19)
text(xhatf[1]-0.1, xhatf[2],
labels = expression(hat(x)[f]), adj = 1)
points(0.2, -0.2, pch=19, col=lb)
text(0.1, -0.2, labels = expression(hat(x)), adj = 1, col=lb)
points(2.3, -1.9, pch=19, col=lb)
text(2.2, -1.9,labels = "y", adj = 1, col=lb)

### Forecasting

Ok, I have a better understanding of the robot's position at time $$t_0$$, but the robot is moving and what I'd like to know is where it will be after one time step.

Suppose I have a linear model that explains how the state evolves over time, e.g. based on wheel spin. Again, I will assume a Gaussian process. In this toy example I follow the assumptions of Sargent and Stachurski, although this doesn't make much sense for the robot:
$x_{t+1} = A x_t + w_{t+1} \mbox{, where } w_t \sim N(0, Q)$with$\begin{split}A = \left( \begin{array}{cc} 1.2 & 0.0 \\ 0.0 & -0.2 \end{array} \right), \qquad Q = 0.3 \Sigma\end{split}$As all the processes are linear and Normal the calculations are straightforward:
\begin{align} \mathbb{E} [A x_f + w] &= A \mathbb{E} [x_f] + \mathbb{E} [w]\\ &= A \hat{x}_f\\ &= A \hat{x} + A \Sigma G' (G \Sigma G' + R)^{-1}(y - G \hat x) \end{align}\begin{align} \operatorname{Var} [A x_f + w] &= A \operatorname{Var}[x_f] A' + Q\\ &= A \Sigma_f A' + Q\\ &= A \Sigma A' - A \Sigma G' (G \Sigma G' + R)^{-1} G \Sigma A' + Q \end{align}The matrix $$A \Sigma G' (G \Sigma G' + R)^{-1}$$ is often written as $$K_\Sigma$$ and called the Kalman gain. Using this notation, I can summarise the results as follows:\begin{split}\begin{align} \hat{x}_{new} &:= A \hat{x} + K_{\Sigma} (y - G \hat{x}) \\ \Sigma_{new} &:= A \Sigma A' - K_{\Sigma} G \Sigma A' + Q \nonumber \end{align}\end{split}Adding the prediction to my chart gives me:

A <- matrix(c(1.2, 0,
0, -0.2), ncol=2)
Q <- 0.3 * Sigma
K <- A %*% Sigma %*% t(G) %*% solve(G%*% Sigma %*% t(G) + R)
xhatnew <- A %*% xhat + K %*% (y - G %*% xhat)
Sigmanew <- A %*% Sigma %*% t(A) - K %*% G %*% Sigma %*% t(A) + Q
z4 <- outer(x1,x2, f, mean=c(xhatnew), varcov=Sigmanew)
image(x1, x2, z4, col=mycols,
xlab=expression('x'[1]), ylab=expression('x'[2]),
main="Predictive density")
points(xhatnew[1], xhatnew[2], pch=19)
text(xhatnew[1]-0.1, xhatnew[2],
labels = expression(hat(x)[new]), adj = 1)
points(xhatf[1], xhatf[2], pch=19, col=lb)
text(xhatf[1]-0.1, xhatf[2], col=lb,
labels = expression(hat(x)[f]), adj = 1)
points(0.2, -0.2, pch=19, col=lb)
text(0.1, -0.2, labels = expression(hat(x)), adj = 1, col=lb)
points(2.3, -1.9, pch=19, col=lb)
text(2.2, -1.9,labels = "y", adj = 1, col=lb)
That's it, having gone through the updating process for one time step gives me the recursive Kalman filter iteration mentioned above.

Another way to visualise the four steps can be achieved with the lattice package:
library(lattice)
grid <- expand.grid(x=x1,y=x2)
grid$Prior <- as.vector(z) grid$Likelihood <- as.vector(z2)
grid$Posterior <- as.vector(z3) grid$Predictive <- as.vector(z4)
contourplot(Prior + Likelihood + Posterior + Predictive ~ x*y,
data=grid, col.regions=mycols, region=TRUE,
as.table=TRUE,
xlab=expression(x[1]),
ylab=expression(x[2]),
main="Kalman Filter",
panel=function(x,y,...){
panel.grid(h=-1, v=-1)
panel.contourplot(x,y,...)
})

You can find the code of this post on Github.

### Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] lattice_0.20-29 mnormt_1.5-1

loaded via a namespace (and not attached):
[1] grid_3.1.2  tools_3.1.2