How to use optim in R

9 comments
A friend of mine asked me the other day how she could use the function optim in R to fit data. Of course there are functions for fitting data in R and I wrote about this earlier. However, she wanted to understand how to do this from scratch using optim.

The function optim provides algorithms for general purpose optimisations and the documentation is perfectly reasonable, but I remember that it took me a little while to get my head around how to pass data and parameters to optim. Thus, here are two simple examples.

I start with a linear regression by minimising the residual sum of square and discuss how to carry out a maximum likelihood estimation in the second example.

Minimise residual sum of squares

I start with an x-y data set, which I believe has a linear relationship and therefore I'd like to fit y against x by minimising the residual sum of squares.
dat=data.frame(x=c(1,2,3,4,5,6), 
               y=c(1,3,5,6,8,12))
Next, I create a function that calculates the residual sum of square of my data against a linear model with two parameter. Think of y = par[1] + par[2] * x.
min.RSS <- function(data, par) {
              with(data, sum((par[1] + par[2] * x - y)^2))
             }
Optim minimises a function by varying its parameters. The first argument of optim are the parameters I'd like to vary, par in this case; the second argument is the function to be minimised, min.RSS. The tricky bit is to understand how to apply optim to your data. The solution is the ... argument in optim, which allows me to pass other arguments through to min.RSS, here my data. Therefore I can use the following statement:
result <- optim(par = c(0, 1), min.RSS, data = dat)
# I find the optimised parameters in result$par
# the minimised RSS is stored in result$value
result
## $par
## [1] -1.267  2.029
## 
## $value
## [1] 2.819
## 
## $counts
## function gradient 
##       89       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL
Let me plot the result:
plot(y ~ x, data = dat)
abline(a = result$par[1], b = result$par[2], col = "red")


Great, this looks reasonable. How does it compare against the built in linear regression in R?
lm(y ~ x, data = dat)
## 
## Call:
## lm(formula = y ~ x, data = dat)
## 
## Coefficients:
## (Intercept)            x  
##       -1.27         2.03
Spot on! I get the same answer.

Maximum likelihood

In my second example I look at count data and I would like to fit a Poisson distribution to this data.

Here is my data:
obs = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 42, 43)
freq = c(1392, 1711, 914, 468, 306, 192, 96, 56, 35, 17, 15, 6, 2, 2, 1, 1)
x <- rep(obs, freq)
plot(table(x))

To fit a Poisson distribution to x I don't minimise the residual sum of squares, instead I maximise the likelihood for the chosen parameter lambda.

The likelihood function is given by:
lklh.poisson <- function(x, lambda) lambda^x/factorial(x) * exp(-lambda)
and the sum of the log-liklihood function is:
log.lklh.poisson <- function(x, lambda){ 
                     -sum(x * log(lambda) - log(factorial(x)) - lambda) 
                     }
By default optim searches for parameters, which minimise the function fn. In order to find a maximium, all I have to do is change the sign of the function, hence -sum(...).
optim(par = 2, log.lklh.poisson, x = x)
## Warning: one-diml optimization by Nelder-Mead is unreliable: use "Brent"
## or optimize() directly
## $par
## [1] 2.704
## 
## $value
## [1] 9966
## 
## $counts
## function gradient 
##       24       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL
Ok, the warning message tells me that I shoud use another optimisation algorithm, as I have a one dimensional problem - a single parameter. Thus, I follow the advise and get:
optim(par = 2, log.poisson, x = x, method = "Brent", lower = 2, upper = 3)
## $par
## [1] 2.704
## 
## $value
## [1] 9966
## 
## $counts
## function gradient 
##       NA       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL
It's actually the same result. Let's compare the result to fitdistr, which uses maximum liklihood as well.
library(MASS)
fitdistr(x, "Poisson")
##    lambda 
##   2.70368 
##  (0.02277)
No surprise here, it gives back the mean, which is the maximum likelihood parameter.
mean(x)
## [1] 2.704
For more information on optimisation and mathematical programming with R see the CRAN Task View on this subject.

Session Info

sessionInfo()
## R version 2.15.2 (2012-10-26)
## Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] MASS_7.3-22

9 comments :

  1. Hi,

    I tried the same thing as yours. But I am getting different answers from optim and lm(). Please look into it.

    Following is the data.
    Int Ext y

    1 89 21 2625
    2 93 24 2700
    3 91 21 3100
    4 122 23 3150
    5 115 27 3175
    6 100 18 3100
    7 98 19 2700
    8 105 16 2475
    9 112 23 3625
    10 109 28 3525
    11 130 20 3225
    12 104 25 3450
    13 104 20 2425
    14 111 26 3025
    15 97 28 3625
    16 115 29 2750
    17 113 25 3150
    18 88 23 2600
    19 108 19 2525
    20 101 16 2650

    Following is the code.

    min.function <- function(data, par)
    {
    with(data, sum((par[1]+par[2]*Int+par[3]*Ext - y)^2))
    }
    result <- optim(par =c(0,1,2), min.function, data=dat)
    result

    $par

    [1] -52.53001 16.02561 59.25839

    $value

    [1] 2069061

    $counts

    function gradient

    192 NA

    $convergence

    [1] 0

    $message

    NULL

    From built in lm package

    lm(formula = y ~ Int+Ext)

    Coefficients:

    (Intercept) Int Ext

    993.92 8.22 49.71

    Please help.
    Why am I getting different answers from both ?

    ReplyDelete
  2. Change you initial parameters from c(0,1,2) to c(100, 1, 2) and see if this has an impact.

    ReplyDelete
  3. Ayush Raj Singh23 May 2013 04:38

    Hi Markus,

    It worked. Awesome. But could you please explain why it worked ? I just followed your code. I really do not understand how par =c(0,1,2) or par = c(100,1,2) are related to par[1], par[2] and par[3] ?

    Thank you for the help.



    ayush

    ReplyDelete
  4. The parameters par[1], par[2], par[3] are initialised via the vector par=c(0,1,2). I believe, you should have used a different optimisation method for your problem, as optim(par=c(0,1,2), min.function, data=dat, method="BFGS") would give you the answer you are looking for. I don't know which fitting method 'lm' is using in the background, but I am sure the answer is in the documentation or online.

    ReplyDelete
  5. Ok. But, how does initialization of parameters affects the answer. Acc. to me answer should come out to be the same from both initial values (100 or 0).

    Also, what other optimization method would you suggest for my problem.

    Thank you.

    ReplyDelete
  6. Well, optim is looking for minima and what do you do if your function has serval minima and your start close a local one? I think you have to take a text book and look into this in more detail and read the documentation.

    ReplyDelete
  7. Ashish Verma28 June 2013 19:57

    Hi Markus,

    I am grad student and I have written a custom function for piece wise continuous fit with constraints on boundaries. I have used optim() and I am getting decent results but in order to improve these results I am trying to program in the gradient as well, but when I put the gradient in optim returns the exact same parameters which I give it to initialize. If you have any suggestions please share. Also, I faced same problem with optimx() and lab.optim(). Thanks in advance.

    ReplyDelete
  8. I suggest you post your question with a small reproducible example to R-help: http://www.r-project.org/mail.html.

    ReplyDelete
  9. Arjun Kavallur31 January 2014 09:35

    Hello Markus
    The blog you have posted has been very informative and helpful!I have been trying to calculate the maximum likelihood estimate of an ARMA process using optim.
    In the process I have to recursively find the error terms(white noise) using a for loop which involves the use of parameters.I want to optimize the sum of the square of the error terms and hence estimate the parameters. I believe that both the processes cannot be done under the same function(fn). If I calculate the for loop under a different function the error message of "parameters" are missing crops up.
    Moreover I have found that if we optimized a function using two different parameters an error message of "second parameter" missing with no default comes up.
    Could you please advice me on these two issues.
    Thanks

    ReplyDelete