# How to use optim in R

A friend of mine asked me the other day how she could use the function`optim`

in R to fit data. Of course there are built in functions for fitting data in R and I wrote about this earlier. However, she wanted to understand how to do this from scratch using `optim`

. The function

`optim`

provides algorithms for general purpose optimisations and the documentation is perfectly reasonable, but I remember that it took me a little while to get my head around how to pass data and parameters to `optim`

. Thus, here are two simple examples.I start with a linear regression by minimising the residual sum of square and discuss how to carry out a maximum likelihood estimation in the second example.

### Minimise residual sum of squares

I start with an x-y data set, which I believe has a linear relationship and therefore I'd like to fit y against x by minimising the residual sum of squares.```
dat=data.frame(x=c(1,2,3,4,5,6),
y=c(1,3,5,6,8,12))
```

Next, I create a function that calculates the residual sum of square of my data against a linear model with two parameter. Think of `y = par[1] + par[2] * x`

.```
min.RSS <- function(data, par) {
with(data, sum((par[1] + par[2] * x - y)^2))
}
```

Optim minimises a function by varying its parameters. The first argument of `optim`

are the parameters I'd like to vary, `par`

in this case; the second argument is the function to be minimised, `min.RSS`

. The tricky bit is to understand how to apply `optim`

to your data. The solution is the `...`

argument in `optim`

, which allows me to pass other arguments through to `min.RSS`

, here my data. Therefore I can use the following statement:```
result <- optim(par = c(0, 1), min.RSS, data = dat)
# I find the optimised parameters in result$par
# the minimised RSS is stored in result$value
result
## $par
## [1] -1.267 2.029
##
## $value
## [1] 2.819
##
## $counts
## function gradient
## 89 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
```

Let me plot the result:```
plot(y ~ x, data = dat, main="Least square regression")
abline(a = result$par[1], b = result$par[2], col = "red")
```

Great, this looks reasonable. How does it compare against the built in linear regression in R?

```
lm(y ~ x, data = dat)
##
## Call:
## lm(formula = y ~ x, data = dat)
##
## Coefficients:
## (Intercept) x
## -1.27 2.03
```

Spot on! I get the same answer.### Maximum likelihood

In my second example I look at count data and I would like to fit a Poisson distribution to this data.Here is my data:

```
obs = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 42, 43)
freq = c(1392, 1711, 914, 468, 306, 192, 96, 56, 35, 17, 15, 6, 2, 2, 1, 1)
x <- rep(obs, freq)
plot(table(x), main="Count data")
```

To fit a Poisson distribution to

`x`

I don't minimise the residual sum of squares, instead I maximise the likelihood for the chosen parameter lambda.The likelihood function is given by:

`lklh.poisson <- function(x, lambda) lambda^x/factorial(x) * exp(-lambda)`

and the sum of the log-liklihood function is:```
log.lklh.poisson <- function(x, lambda){
-sum(x * log(lambda) - log(factorial(x)) - lambda)
}
```

By default `optim`

searches for parameters, which minimise the function `fn`

. In order to find a maximium, all I have to do is change the sign of the function, hence `-sum(...)`

.```
optim(par = 2, log.lklh.poisson, x = x)
## Warning: one-diml optimization by Nelder-Mead is unreliable: use "Brent"
## or optimize() directly
## $par
## [1] 2.704
##
## $value
## [1] 9966
##
## $counts
## function gradient
## 24 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
```

Ok, the warning message tells me that I shoud use another optimisation algorithm, as I have a one dimensional problem - a single parameter. Thus, I follow the advise and get:```
optim(par = 2, log.poisson, x = x, method = "Brent", lower = 2, upper = 3)
## $par
## [1] 2.704
##
## $value
## [1] 9966
##
## $counts
## function gradient
## NA NA
##
## $convergence
## [1] 0
##
## $message
## NULL
```

It's actually the same result. Let's compare the result to `fitdistr`

, which uses maximum liklihood as well.```
library(MASS)
fitdistr(x, "Poisson")
## lambda
## 2.70368
## (0.02277)
```

No surprise here, it gives back the mean, which is the maximum likelihood parameter.```
mean(x)
## [1] 2.704
```

For more information on optimisation and mathematical programming with R see the CRAN Task View on this subject.### Session Info

```
sessionInfo()
## R version 2.15.2 (2012-10-26)
## Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] MASS_7.3-22
```

Subscribe to:
Post Comments
(
Atom
)

Hi,

ReplyDeleteI tried the same thing as yours. But I am getting different answers from optim and lm(). Please look into it.

Following is the data.

Int Ext y

1 89 21 2625

2 93 24 2700

3 91 21 3100

4 122 23 3150

5 115 27 3175

6 100 18 3100

7 98 19 2700

8 105 16 2475

9 112 23 3625

10 109 28 3525

11 130 20 3225

12 104 25 3450

13 104 20 2425

14 111 26 3025

15 97 28 3625

16 115 29 2750

17 113 25 3150

18 88 23 2600

19 108 19 2525

20 101 16 2650

Following is the code.

min.function <- function(data, par)

{

with(data, sum((par[1]+par[2]*Int+par[3]*Ext - y)^2))

}

result <- optim(par =c(0,1,2), min.function, data=dat)

result

$par

[1] -52.53001 16.02561 59.25839

$value

[1] 2069061

$counts

function gradient

192 NA

$convergence

[1] 0

$message

NULL

From built in lm package

lm(formula = y ~ Int+Ext)

Coefficients:

(Intercept) Int Ext

993.92 8.22 49.71

Please help.

Why am I getting different answers from both ?

Change you initial parameters from c(0,1,2) to c(100, 1, 2) and see if this has an impact.

ReplyDeleteHi Markus,

ReplyDeleteIt worked. Awesome. But could you please explain why it worked ? I just followed your code. I really do not understand how par =c(0,1,2) or par = c(100,1,2) are related to par[1], par[2] and par[3] ?

Thank you for the help.

ayush

The parameters par[1], par[2], par[3] are initialised via the vector par=c(0,1,2). I believe, you should have used a different optimisation method for your problem, as optim(par=c(0,1,2), min.function, data=dat, method="BFGS") would give you the answer you are looking for. I don't know which fitting method 'lm' is using in the background, but I am sure the answer is in the documentation or online.

ReplyDeleteOk. But, how does initialization of parameters affects the answer. Acc. to me answer should come out to be the same from both initial values (100 or 0).

ReplyDeleteAlso, what other optimization method would you suggest for my problem.

Thank you.

Well, optim is looking for minima and what do you do if your function has serval minima and your start close a local one? I think you have to take a text book and look into this in more detail and read the documentation.

ReplyDeleteHi Markus,

ReplyDeleteI am grad student and I have written a custom function for piece wise continuous fit with constraints on boundaries. I have used optim() and I am getting decent results but in order to improve these results I am trying to program in the gradient as well, but when I put the gradient in optim returns the exact same parameters which I give it to initialize. If you have any suggestions please share. Also, I faced same problem with optimx() and lab.optim(). Thanks in advance.

I suggest you post your question with a small reproducible example to R-help: http://www.r-project.org/mail.html.

ReplyDeleteHello Markus

ReplyDeleteThe blog you have posted has been very informative and helpful!I have been trying to calculate the maximum likelihood estimate of an ARMA process using optim.

In the process I have to recursively find the error terms(white noise) using a for loop which involves the use of parameters.I want to optimize the sum of the square of the error terms and hence estimate the parameters. I believe that both the processes cannot be done under the same function(fn). If I calculate the for loop under a different function the error message of "parameters" are missing crops up.

Moreover I have found that if we optimized a function using two different parameters an error message of "second parameter" missing with no default comes up.

Could you please advice me on these two issues.

Thanks