# mages' blog

## Communicating Risk and Uncertainty

David Spiegelhalter gave a fascinating talk on Communicating Risk and Uncertainty to the Public & Policymakers at the Grantham Institute of the Imperial College in London last Tuesday.

In a very engaging way David gave many examples and anecdotes from his career in academia and advisory. I believe his talk will be published on the Grantham Institute's YouTuble channel, so I will only share a few highlights and thoughts that stuck in my mind here.

### Framing: Opportunity vs. Risk

Do you prefer survival ratios or mortality rates? The way you present probabilities/frequencies will make a difference in the way the audience perceives them and therefore how they will make decisions. David suggested one should aim to give a balanced view to mitigate the framing effect; e.g. show the opportunity and risk, the survival ratio and mortality rate. Interestingly, that is exactly how Wolfram Alpha displays life expectancies.

 Life expectance: UK, male, 62 years old (Source: Wolfram Alpha)

### Absolute and relative risk

What did you have for breakfast this morning? Fried up bacon? Perhaps you have fried up bacon every day? It could increase your chances of developing pancreatic cancer by almost a fifth, as an article by the Daily Express on 16th January 2012 suggested.

 Daily Express, 16th January 2012

Well, this might be true, but fortunately pancreatic cancer is quite rare. About 5 people out of 400 will develop this cancer even without eating fried up bacon every day (absolute risk). Eating fry ups every morning will increase your chances by 20% (relative risk) to 6 out of 400 - still pretty low. Less of a scary headline.

Small increases in absolute risks can result in large increases in relative risk. Depending on how you choose to frame the story you can generate very different reactions. David mentioned also the work of Gerd Gigerenzer, who is promoting the use of natural frequencies in the context of medical conversations, e.g. say '1 out of 200 people will experience a certain effect', instead of 'the probability for this effect is 0.5%'.

### Conditional probabilities

One of the big challenges in communicating risk are conditional probabilities. I believe, conditional probabilities are not intuitive for most of us. As an example David talked about breast cancer screening. The test itself is not a 100% reliable. No test is. There are always results which are false positives (test predicts cancer when there is none) and false negatives (test predicts no cancer when there is one). Tests are usually designed to have a higher false alarm rate than no alarm at all. Hence, even when the test result is positive the likelihood of cancer could be low, because cancer is rare and the test is not that good. David presented a great diagram of the NHS breast screening - Helping you decide leaflet to illustrate the conditional probabilities for breast screening.

 NHS breast screening - Helping you decide
I think this is a wonderful example of trying to give a balanced view of the underlying risk, communicated in a very clear and understandable way.

### Translating probabilities into words

The use of natural frequencies is not always appropriate and sometimes probabilities are more useful. However, instead of saying that something has a 50% chance of occurring, you might also say it is equally likely to happen than not. David mentioned the work of the Intergovernmental Panel on Climate Change (IPCC) that aims to use a consistent language in all their reports. To achieve that, they publish guidance on consistent treatments of uncertainty. This is their translation of probabilities into words:

 Source: IPCC Uncertainty guidance note
I think this table will become handy in the future when I have to elicit expert opinions.

### Speaking with confidence

The IPCC uncertainty guidance also talks about confidence and interestingly they correlate the level of agreement with the level of evidence. Note, even with a lot of evidence, without agreement the confidence is relatively low.

 Source: IPCC Uncertainty guidance note
I wonder, what low agreement with robust evidence could mean. Perhaps, even if there is strong evidence that I should change my diet, e.g. drink less coffee, I might not agree with it and as a result wouldn't take any actions.

### Fan charts

The last example from David's talk I will mention here is the Bank of England's fan chart of GDP projection. The Bank of England has been using this chart now for a number of years and tweaked it only a little over that time. Until the financial crisis the Bank of England showed bands of confidence within the 5-95% interval. The financial crisis happened outside that interval, which of course doesn't mean that it couldn't happen, it was just very unlikely. Since then they illustrate the 0-100% confidence interval as a grey background. The other interesting aspect is, that they don't show the mean projection line initially to overcome the anchoring effect. Note also that the term projection is used and not prediction.

 November 2014 GDP Fan chart (Source: Bank of England)

Finally, David announced that he has a new book in the pipeline. I am sure it will sell well, as it has the catchy title Sex by numbers.

## Extended Kalman filter example in R

Last week's post about the Kalman filter focused on the derivation of the algorithm. Today I will continue with the extended Kalman filter (EKF) that can deal also with nonlinearities. According to Wikipedia the EKF has been considered the de facto standard in the theory of nonlinear state estimation, navigation systems and GPS.

### Kalman filter

I had the following dynamic linear model for the Kalman filter last week:
\begin{align} x_{t+1} & = A x_t + w_t,\quad w_t \sim N(0,Q)\\ y_t &=G x_t + \nu_t, \quad \nu_t \sim N(0,R)\\ x_1 & \sim N(\hat{x}_0, \Sigma_0) \end{align}With $$x_t$$ describing the state space evolution, $$y_t$$ the observations, $$A, Q, G, R, \Sigma_0$$ matrices of appropriate dimensions, $$w_t$$ the evolution error and $$\nu_t$$ the observation error. The Kalman filter provides recursive estimators for $$x_t$$ via:
\begin{align} K_{t} &= A \Sigma_t G' (G \Sigma_t G' + R)^{-1}\\ \hat{x}_{t+1} &= A \hat{x_t} + K_{t} (y_t - G \hat{x}_t) \\ \Sigma_{t+1} &= A \Sigma_t A' - K_{t} G \Sigma_t A' + Q \end{align}In the case of nonlinearities on the right hand side of either the state ($$x_t$$) or observation ($$y_t$$) equation the extended Kalman filter uses a simple and elegant trick: Taylor series of the first order, or in other words, I simply linearise the right hand side. The matrices $$A$$ and $$G$$ will be the Jacobian matrices of the respected vector functions.

### Logistic growth

As an example I will use a logistic growth model, inspired by the Hakell example given by Dominic Steinitz.

The logistic growth model can be written as a time-invariant dynamical system with growth rate $$r$$ and carrying capacity $$k$$:
\begin{aligned} \dot{p} & = r p\Big(1 - \frac{p}{k}\Big) \end{aligned}The above ordinary differential equation has the well known analytical solution:
$p = \frac{kp_0\exp(r\,t)}{k + p_0(\exp(r\,t) - 1)}$Suppose I observe data of a population for which I know the carrying capacity $$k$$, but where the growth rate $$r$$ is noisy.

### Extended Kalman filter

The state space and observation model can then be written as:
\begin{aligned} r_i &= r_{i-1} \\ p_i &= \frac{kp_{i-1}\exp(r_{i-1}\Delta T)}{k + p_{i-1}(\exp(r_{i-1}\Delta T) - 1)} \\ y_i &= \begin{bmatrix}0 & 1\end{bmatrix} \begin{bmatrix}r_i \\ p_i\end{bmatrix} + \nu \end{aligned}Or with $$x_i:=\begin{bmatrix}r_i & p_i\end{bmatrix}'$$ as:
\begin{aligned} x_i &= a(x_i)\\ y_i &= G x_i + \nu_i, \quad \nu_i \sim N(0,R) \end{aligned}In my example the state space model is purely deterministic, so there isn't any evolution noise and hence $$Q=0$$.

With an initial prior guess for $$x_0$$ and $$\Sigma_0$$ and I am ready to go. The R code below shows my implementation with the algorithm above. Note that I use the jacobian function of the numDeriv package.
After a few time steps the extended Kalman filter does a fantastic job in reducing the noise. Perhaps this shouldn't be too surprising as a local linearisation of the logistic growth function will give a good fit. The situation might be different for highly nonlinear functions. The Wikipedia article about the Kalman filter suggests the unscented version in those cases.

### Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] numDeriv_2012.9-1

loaded via a namespace (and not attached):
[1] tools_3.1.2

## Kalman filter example visualised with R

At the last Cologne R user meeting Holger Zien gave a great introduction to dynamic linear models (dlm). One special case of a dlm is the Kalman filter, which I will discuss in this post in more detail. I kind of used it earlier when I measured the temperature with my Arduino at home.

Over the last week I came across the wonderful quantitative economic modelling site quant-econ.net, designed and written by Thomas J. Sargent and John Stachurski. The site not only provides access to their lecture notes, including the Kalman fitler, but also code in Python and Julia.

I will take their example of the Kalman filter and go through it with R. I particularly liked their visuals of the various steps of the Kalman filter.

### Motivation

Suppose I have a little robot that moves autonomously over my desk. How would the robot know where it is? Well, suppose the robot can calculate from its wheels' movements how far it went and an additional sensor (e.g. GPS-like) provides also information about its location. Of course both pieces of information will be imprecise. How can the various signals be combined?

This is a classic scenario for the Kalman filter. Its key assumptions are that the errors/noise are Gaussian and that the state space evolution $$x_t$$ from one time step to the next is linear, so is the mapping to the sensor signals $$y_t$$. For the example I will use below it reads:
\begin{align} x_{t+1} & = A x_t + w,\quad w \sim N(0,Q)\\ y_t &=G x_t + \nu, \quad \nu \sim N(0,R)\\ x_1 & \sim N(\hat{x}_0, \Sigma_0) \end{align}With $$A, Q, G, R, \Sigma$$ matrices of appropriate dimensions. The Kalman filter provides recursive estimators for $$x_t$$:
\begin{align} K_{t} &= A \Sigma_t G' (G \Sigma_t G' + R)^{-1}\\ \hat{x}_{t+1} &= A \hat{x_t} + K_{t} (y_t - G \hat{x}_t) \\ \Sigma_{t+1} &= A \Sigma_t A' - K_{t} G \Sigma_t A' + Q \end{align}

Let's say at time $$t_0$$ the robot has the expected position $$\hat{x} = (0.2, -0.2)$$. That means, the robot doesn't know its location on the table with certainty. Further I shall assume that the probability density function of the position follows a Normal distribution with covariance matrix $$\Sigma = \left[\begin{array}{cc} 0.4 & 0.3\\ 0.3 & 0.45 \end{array} \right]$$
The prior distribution of the robot's position can be visualised in R with a contour plot.

library(mnormt)
xhat <- c(0.2, -0.2)
Sigma <- matrix(c(0.4, 0.3,
0.3, 0.45), ncol=2)
x1 <- seq(-2, 4,length=151)
x2 <- seq(-4, 2,length=151)
f <- function(x1,x2, mean=xhat, varcov=Sigma)
dmnorm(cbind(x1,x2), mean,varcov)
z <- outer(x1,x2, f)
mycols <- topo.colors(100,0.5)
image(x1,x2,z, col=mycols, main="Prior density",
xlab=expression('x'[1]), ylab=expression('x'[2]))
points(0.2, -0.2, pch=19)
text(0.1, -0.2, labels = expression(hat(x)), adj = 1)

### Sensor information

The 'GPS' sensor provides also data about the location of the robot, again the sensor signal can be regarded as the expected position with some noise. Suppose the reading of the sensor is $$y=(2.3, -1.9)$$. I will assume that the sensor signal has an error following a Normal distribution, with a covariance matrix $$R=0.5 \Sigma$$.

Conceptually I think about it like this, given that the robot is at position $$x$$, what would be the likelihood that the sensor reading is $$y\,|\,x\,$$? The Kalman filter assumes a linear mapping from $$x$$ to $$y$$:
$y = G x + \nu \mbox{ with } \nu \sim N(0,R)$In my case $$G$$ will be the identity matrix. Let's add the sensor information to the plot.

R <- 0.5 * Sigma
z2 <- outer(x1,x2, f, mean=c(2.3, -1.9), varcov=R)
image(x1, x2, z2, col=mycols, main="Sensor density")
points(2.3, -1.9, pch=19)
text(2.2, -1.9, labels = "y", adj = 1)
points(0.2, -0.2, pch=19)
text(0.1, -0.2, labels = expression(hat(x)), adj = 1)

### Filtering

I have two pieces of information about the location of the robot, both with a Normal distribution, my prior is $$x \sim N(\hat{x},\Sigma)$$ and for the distribution of my sensor I have $$y\,|\,x \sim N(Gx, R)$$.

What I would like to know is the location of the robot given the sensor reading:
$p(x \,|\, y) = \frac{p(y \,|\, x) \, p(x)} {p(y)}$Now, because $$x, y$$ are Normal and $$G$$ is the identity matrix, I know that the posterior distribution of $$x \,|\, y$$ is Normal again, with parameters (see conjugate prior):
$\hat{x}_f = (\Sigma^{-1} + R^{-1})^{-1} (\Sigma^{-1} \hat{x} + R^{-1} y) \\ \Sigma_f = (\Sigma^{-1} + R^{-1})^{-1}$Using the matrix inversion identity $(A^{-1} + B^{-1})^{-1} = A - A (A + B)^{-1}A = A (A + B)^{-1} B$ I can write the above as:
\begin{align} \hat{x}_f & = (\Sigma - \Sigma (\Sigma + R)^{-1}\Sigma)(\Sigma^{-1} \hat{x} + R^{-1}y)\\ & =\hat{x} - \Sigma (\Sigma + R)^{-1} \hat{x} + \Sigma R^{-1} y -\Sigma (\Sigma + R)^{-1}\Sigma R^{-1}y\\ & = \hat{x} + \Sigma (\Sigma + R)^{-1} (y - \hat{x})\\ & = (1.667, -1.333)\\ \Sigma_f &= \Sigma - \Sigma (\Sigma + R)^{-1} \Sigma\\ &=\left[\begin{array}{lll} 0.133 & 0.10\\ 0.100 & 0.15 \end{array} \right] \end{align}In the more general case when $$G$$ is not the identity matrix I have
\begin{align} \hat{x}_f & = \hat{x} + \Sigma G' (G \Sigma G' + R)^{-1} (y - G \hat{x})\\ \Sigma_f &= \Sigma - \Sigma G' (G \Sigma G' + R)^{-1} G \Sigma \end{align}The updated posterior probability density function $$p(x\,|\,y)=N(\hat{x}_f, \Sigma_f)$$ is visualised in the chart below.
I can see how the prior and likelihood distributions have influenced the posterior distribution of the robot's location. The sensor information is regarded more credible than the prior information and hence the posterior is closer to the sensor data.

G = diag(2)
y <- c(2.4, -1.9)
xhatf <- xhat + Sigma %*% t(G) %*% solve(G %*% Sigma %*% t(G) + R) %*% (y - G %*% xhat)
Sigmaf <- Sigma - Sigma %*% t(G) %*% solve(G %*% Sigma %*% t(G) + R) %*% G %*% Sigma
z3 <- outer(x1, x2, f, mean=c(xhatf), varcov=Sigmaf)
image(x1, x2, z3, col=mycols,
xlab=expression('x'[1]), ylab=expression('x'[2]),
main="Filtered density")
points(xhatf[1], xhatf[2], pch=19)
text(xhatf[1]-0.1, xhatf[2],
labels = expression(hat(x)[f]), adj = 1)
points(0.2, -0.2, pch=19, col=lb)
text(0.1, -0.2, labels = expression(hat(x)), adj = 1, col=lb)
points(2.3, -1.9, pch=19, col=lb)
text(2.2, -1.9,labels = "y", adj = 1, col=lb)

### Forecasting

Ok, I have a better understanding of the robot's position at time $$t_0$$, but the robot is moving and what I'd like to know is where the it will be after one time step.

Suppose I have a linear model that explains how the state evolves over time, e.g. based on wheel spin. Again, I will assume a Gaussian process. In this toy example I follow the assumptions of Sargent and Stachurski, although this doesn't make much sense for the robot:
$x_{t+1} = A x_t + w_{t+1} \mbox{, where } w_t \sim N(0, Q)$with$\begin{split}A = \left( \begin{array}{cc} 1.2 & 0.0 \\ 0.0 & -0.2 \end{array} \right), \qquad Q = 0.3 \Sigma\end{split}$As all the processes are linear and Normal the calculations are straightforward:
\begin{align} \mathbb{E} [A x_f + w] &= A \mathbb{E} x_f + \mathbb{E} w\\ &= A \hat{x}_f\\ &= A \hat{x} + A \Sigma G' (G \Sigma G' + R)^{-1}(y - G \hat x) \end{align}\begin{align} \operatorname{Var} [A x_f + w] &= A \operatorname{Var}[x_f] A' + Q\\ &= A \Sigma_f A' + Q\\ &= A \Sigma A' - A \Sigma G' (G \Sigma G' + R)^{-1} G \Sigma A' + Q \end{align}The matrix $$A \Sigma G' (G \Sigma G' + R)^{-1}$$ is often written as $$K_\Sigma$$ and called the Kalman gain. Using this notation, I can summarise the results as follows:\begin{split}\begin{align} \hat{x}_{new} &:= A \hat{x} + K_{\Sigma} (y - G \hat{x}) \\ \Sigma_{new} &:= A \Sigma A' - K_{\Sigma} G \Sigma A' + Q \nonumber \end{align}\end{split}Adding the prediction to my chart gives me:

A <- matrix(c(1.2, 0,
0, -0.2), ncol=2)
Q <- 0.3 * Sigma
K <- A %*% Sigma %*% t(G) %*% solve(G%*% Sigma %*% t(G) + R)
xhatnew <- A %*% xhat + K %*% (y - G %*% xhat)
Sigmanew <- A %*% Sigma %*% t(A) - K %*% G %*% Sigma %*% t(A) + Q
z4 <- outer(x1,x2, f, mean=c(xhatnew), varcov=Sigmanew)
image(x1, x2, z4, col=mycols,
xlab=expression('x'[1]), ylab=expression('x'[2]),
main="Predictive density")
points(xhatnew[1], xhatnew[2], pch=19)
text(xhatnew[1]-0.1, xhatnew[2],
labels = expression(hat(x)[new]), adj = 1)
points(xhatf[1], xhatf[2], pch=19, col=lb)
text(xhatf[1]-0.1, xhatf[2], col=lb,
labels = expression(hat(x)[f]), adj = 1)
points(0.2, -0.2, pch=19, col=lb)
text(0.1, -0.2, labels = expression(hat(x)), adj = 1, col=lb)
points(2.3, -1.9, pch=19, col=lb)
text(2.2, -1.9,labels = "y", adj = 1, col=lb)
That's it, having gone through the updating process for one time step gives me the recursive Kalman filter iteration mentioned above.

Another way to visualise the four steps can be achieved with the lattice package:
library(lattice)
grid <- expand.grid(x=x1,y=x2)
grid$Prior <- as.vector(z) grid$Likelihood <- as.vector(z2)
grid$Posterior <- as.vector(z3) grid$Predictive <- as.vector(z4)
contourplot(Prior + Likelihood + Posterior + Predictive ~ x*y,
data=grid, col.regions=mycols, region=TRUE,
as.table=TRUE,
xlab=expression(x[1]),
ylab=expression(x[2]),
main="Kalman Filter",
panel=function(x,y,...){
panel.grid(h=-1, v=-1)
panel.contourplot(x,y,...)
})

You can find the code of this post on Github.

### Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] lattice_0.20-29 mnormt_1.5-1

loaded via a namespace (and not attached):
[1] grid_3.1.2  tools_3.1.2

## Notes from the Kölner R meeting, 12 December 2014

Last week's Cologne R user group meeting was the best attended so far, and it was a remarkable event - I believe not a single line of R code was shown. Still, it was an R user group meeting with two excellent talks, and you will understand shortly why not much R code needed to be displayed.

### Introduction to Julia for R Users

Hans Werner Borchers joined us from Mannheim to give an introduction to Julia for R users. Julia is a high-level, high-performance dynamic programming language for technical computing. The language has gained some considerable traction over the last two years and it was great to get an overview from a familiar perspective.

Interestingly, as Hans Werner pointed out, Julia is by far not the only new language around the block. Indeed, over the last decade nearly every year saw the announcement of a new language. Also big tech companies such as Microsoft, Google, Mozilla and Apple are trying to push their own programming languages: F# (2005), Go (2009), Rust (2010) and Swift (2014) respectively.

Over the more recent years we notice a movement towards the use of LLVM (Low Level Virtual Machine), on which Julia is based as well and which makes it fast. The just in time compilation demands a little mind shift if you come from R, where the mantra for speed is: vectorise - remove all for-loops. Well, the opposite is true for Julia, because your code will be compiled. For-loops are much easier to understand for the underlying compiler. Hans Werner's slides provide some good examples to get you started and pointers to further resources.

### Dynamic Linear Models and Kalman Filtering

Forecasting time series was the topic of Holger Zien's talk. Holger gained his first experience with time series during his PhD, when he worked with experimental sensor data. That meant he had lots of data, which could often be regarded stationary as well.

Nowadays, his challenges can be very different, sometimes only a few data points from a non-stationary process are available, and yet he is still expected to predict the future.

Dynamic linear models (dlm) can provide a remedy in those situations. In their simplest version a dlm links system and observational equations in the following way:
$y_t = F \theta_t + \nu_t\quad\mbox{observation eq. }\\ \theta_t = G \theta_{t-1} + \omega_t\quad\mbox{system eq.}$ with $$\nu_t, \omega_t$$ mutually independent random variables. A special case of dynamic linear models is the well known Kalman filter. In the more general case $$y_t$$ and $$\theta_t$$ are vectors and $$F_t, G_t$$ are time variant matrices.

Holger explained that a dlm can principally be used for three purposes:
• Filtering: Estimate of the current value of the state/system variable.
• Smoothing: Estimate of past values of the state/system variable, i.e., estimating at time $$t$$ given measurements up to time $$t' > t$$.
• Forecasting: Forecasting future observations or values of the state/system variable.
With this introduction outlined Holger showed us how various ARMA and linear regression models can be expressed as dlm. His talk concluded with some remarks about his personal experience and references to various R packages, articles and books.

### Next Kölner R meeting

The next meeting is scheduled for 6 March 2015.

Please get in touch if you would like to present and share your experience, or indeed if you have a request for a topic you would like to hear more about. For more details see also our Meetup page.

Thanks again to Bernd Weiß for hosting the event and Revolution Analytics for their sponsorship.

## Next Kölner R User Meeting: Friday, 12 December 2014

The next Cologne R user group meeting is scheduled for this Friday, 12 December 2014.

We have an exciting agenda with two talks on Julia and Dynamic Linear Models:

### Introduction to Julia for R Users

Hans Werner Borchers

Julia is a high-performance dynamic programming language for scientific computing, with a syntax that is familiar to users of other technical computing environments (Matlab, Python, R, etc.). It provides a sophisticated compiler, high performance with numerical accuracy, and extensive mathematical function libraries.

Some of the highlights of Julia are an implementation of automated differentiation, an optimisation modelling language, also integrating some of the optimisation solvers available from the COIN-OR project, and user-contributed packages for time series, statistics and machine learning, or operations research.

### Dynamic Linear Models and Kalman Filtering

Holger Zien

One of the problems most commonly presented to everybody working in statistics is forecasting time series. The textbook answer is to fit an ARMA model to the data and to use the model for prediction. This approach works well for long and stationary time series. However, often the actual time series one is given to analyse are more complicated. They are ridiculously short and clearly non-stationary. Dynamic Linear Models (DLM) and Kalman Filtering go one step beyond ARMA models and may be applied to these more complicated data. I will give a brief introduction into their mathematical background and will talk about my experience using them in practice.

### Drinks and Networking

The event will be followed by drinks and schnitzel at the Lux.

The organisers, Bernd Weiß and Markus Gesmann, gratefully acknowledge the sponsorship of Revolution Analytics, who support the Cologne R user group as part of their vector programme.

View Larger Map

## Measuring temperature with my Arduino

It is really getting colder in London - it is now about 5°C outside. The heating is on and I have got better at measuring the temperature at home as well. Or, so I believe.

Last week's approach of me guessing/feeling the temperature combined with an old thermometer was perhaps too simplistic and too unreliable. This week's attempt to measure the temperature with my Arduino might be a little OTT (over the top), but at least I am using the micro-controller again.

Yet, I have to overcome similar challenges again as last week. The Arduino with the temperature sensor (TMP36) should be a better device to measure the temperature than my thermometer, but it isn't perfect either and the readings will have some uncertainty as well. But, it will be easy to make frequent measurements and with a little trick I should be able to filter out the noise and get a much better view of the 'true' temperature.

The trick is Bayesian filtering, or as it turns out, a simplified Kalman filter (univariate, no control actions, no transition matrices).

I can't observe the temperature directly, I can only try to measure it. I think of this scenario as a series of Bayesian networks (or Hidden Markov Model). The hidden or latent variable is the 'true' temperature and the observable variable is the reading of my Arduino sensor.

I start with a prior guess of the Temperature at time $$k=0$$, just like last week (note, here I will use $$x_k$$ instead of $$\mu_k$$). My initial prior is $$N(x_0, \sigma_0^2)$$. Now I predict the state (temperature) $$\hat{x}_1$$ at the next time step. I expect the temperature to be largely unchanged and therefore my prediction is just $$\hat{x}_1 = x_0 + \varepsilon_P$$ with $$\varepsilon_P \sim N(0, \sigma_P^2)$$ describing the additional process variance going from one time step to the next. The process variance is independent from the uncertainty of $$x_0$$, hence I can add them up and thus my prior for $$x_1$$ is $$N(x_0, \sigma_0^2+\sigma_P^2)$$.

The measurement of the temperature should reflect the 'true' temperature with some measurement uncertainty $$\sigma_M$$. My prediction of the measurement is $$\hat{z}_1 = \hat{x}_1 + \varepsilon_M$$ with $$\varepsilon_M \sim N(0, \sigma_M^2)$$. The best estimator for $$\hat{z}_1$$ is the actual measurement $$z_1$$ and therefore the likelihood distribution of $$z_1|x_1$$ is $$\,N(z_1, \sigma_M)$$. Applying the formulas for Bayesian conjugates again gives the hyper-parameters for the posterior normal distribution $$x_1|\,z_1, x_0$$, the temperature given the measurement and my prediction from the previous time step:

$x_1=\left.\left(\frac{x_0}{\sigma_0^2+\sigma_P^2} + \frac{z_1}{\sigma_M^2}\right)\right/\left(\frac{1}{\sigma_0^2+\sigma_P^2} + \frac{1}{\sigma_M^2}\right) \\ \sigma_1^2=\left(\frac{1}{\sigma_0^2+\sigma_P^2} + \frac{1}{\sigma_M^2}\right)^{-1}$With $$K_1 := \frac{\sigma_0^2+\sigma_P^2}{\sigma_M^2+\sigma_0^2+\sigma_P^2}$$ this simplifies to:
$x_1 = K_1\, z_1 + (1 - K_1)\, x_0\\ \sigma_1^2 = (1 - K_1) (\sigma_0^2 + \sigma_P^2)$
This gives the simple iterative updating procedure (or Kalman filter):
$K_{k+1} = \frac{\sigma_k^2+\sigma_P^2}{\sigma_M^2+\sigma_k^2+\sigma_P^2} \\ \sigma_{k+1}^2 = (1 - K_{k+1}) (\sigma_k^2 + \sigma_P^2)\\ x_{k+1} = K_{k+1}\, z_{k+1} + (1 - K_{k+1})\, x_k$
In my code below I set the process variance to $$\sigma_P^2=0.01$$ and the measurement variance to $$\sigma_M^2=0.5$$, which led to a fairly stable reading of 21.5°C.

Changing the parameters $$\sigma_P$$ and $$\sigma_M$$ influence how quickly the filter reacts to changes. Here is an example where I briefly touch the temperature sensor with the above parameters. The time axis in those screen shots is about 6 seconds.

I have to do a bit more reading on the Kalman filter. It appears to be an immensely powerful tool to extract the signal from the noise. Well, it helped to put a man on the moon.

### Code

This is the Processing and Arduino code I used in this post. You may have to change the port number in line 28 to your own settings.

## How cold is it? A Bayesian attempt to measure temperature

It is getting colder in London, yet it is still quite mild considering that it is late November. Well, indoors it still feels like 20°C (68°F) to me, but I have been told last week that I should switch on the heating.

Luckily I found an old thermometer to check. The thermometer showed 18°C. Is it really below 20°C?

The thermometer is quite old and I'm not sure that is works properly anymore. So, what shall I do now? Perhaps I should consider that both measurements are uncertain and try to combine them.

I believe that I can sense the temperature within ±3°C, while I think that the thermometer still works within ±2°C. Assuming that both measurements follow a Gaussian (Normal) distribution, with the uncertainties given as standard deviations, I can use Bayes' theorem to combine my hypothesis with the data. The posterior distribution will be Gaussian again with conjugated hyper-parameters:
$\mu=\left.\left(\frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^n x_i}{s^2}\right)\right/\left(\frac{1}{\sigma_0^2} + \frac{n}{s^2}\right) \\ \sigma^2=\left(\frac{1}{\sigma_0^2} + \frac{n}{s^2}\right)^{-1}$With $$K := \frac{n\sigma_0^2}{s^2+n\sigma_0^2}$$ this simplifies to:
$\mu = K\, \bar{x} + (1 - K)\, \mu_0 \mbox{, with } \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i\\ \sigma = s \,\sqrt{K/n}$In my case I have: $$n=1,\; x_1=18^{\circ}C,\; s=2^{\circ}C,\; \mu_0=20^{\circ}C,\; \sigma_0=3^{\circ}C$$.

Hence, the posterior distribution has parameters $$\mu=18.6^{\circ}C$$ and $$\sigma=1.7^{\circ}C$$. Thus, my best guess would be that is actually a little colder than I thought. One could argue that the probability that is below 20° is 80%.

Over the last five days my perception of the temperature didn't change, neither did the weather forecast, but the measurements showed: 18°C, 19°C, 17.5°C, 18°C, 18.5°C.

With that information the parameters update to $$\mu=18.3^{\circ}C$$ and $$\sigma=0.9^{\circ}C$$. I can't deny it any longer it has got colder. The probability that is below 20°C is now at 97% and the heating is on.

Without any prior knowledge I may have used a t-test to check the measurements. But here I believe that I have information about the thermometer and my own temperature sensing abilities which I don't want to ignore.

### Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
[7] base

other attached packages:
[1] BayesianFirstAid_0.1 rjags_3-14           coda_0.16-1
[4] lattice_0.20-29

loaded via a namespace (and not attached):
[1] grid_3.1.2    MASS_7.3-35   mnormt_1.5-1  stringr_0.6.2