5 Chapter 5 exercises
Suppose that we want to generate random variables from a distribution with cdf \[ F(y) = \dfrac{2}{\pi} \arcsin(\sqrt{y}) \quad \text{for}~y \in [0, 1]. \] We can generate a single random variable, \(Y^*\), say, by generating a random Uniform([0, 1]) random variable, \(U^*\), say, and then finding \(Y^*\) such that \(F(Y^*) = U^*\). Write a function in
R
,rarcsine(n)
, that generates \(n\) random variables with cdf \(F(y)\) above usingR
’suniroot()
function, and then generate \(n = 10\) variates. Note thatasin(y)
evaluates \(\arcsin(y)\) inR
for \(y =\)y
.
Solution
We’ll start by writing a function to evaluate \(F()\), called
F
.Then we want the function for which we want the root, which we’ll call
F_root()
.Next we’ll use
uniroot()
to find \(y\) such that \(F(y) = u\).## [1] 0.09332532
Putting this together into function
rarcsine(n)
, we getand so
## [1] 0.9034382 0.3821925 0.8172672 0.2686639 0.9919419 0.9232536 0.6268923 ## [8] 0.8650127 0.1923155 0.8662630
gives \(n = 10\) random variates.
Consider that the independent sample \(\mathbf{y} = (y_1, \ldots, y_n)\) is from the pdf \[ f(y) = 2\theta y\text{exp}\left\{-\theta y^2\right\} \quad \text{for}~y > 0 \] with parameter \(\theta > 0\).
Show that the maximum likelihood estimate of \(\theta\) is given by \(\hat \theta = n / (\sum_{i = 1}^n y_i^2)\).
Solution
The log-likelihood is \[ \log f(\mathbf{y} \mid \theta) = n \log (\theta) - \theta \sum_{i = 1}^n y_i^2 + \text{constant} \] and so \[ \dfrac{\partial \log f(\mathbf{y} \mid \theta)}{\partial \theta} = \dfrac{n}{\theta} - \sum_{i = 1}^n y_i^2. \] Setting \(\partial \log f(\mathbf{y} \mid \theta) / \partial \theta = 0\) gives \(\hat \theta = n / \sum_{i = 1}^n y_i^2\).
Given that \(y_1, \ldots, y_n\) take the values \[ 0.15, 0.24, 0.33, 0.43, 0.49, 0.57, 0.57, 0.63, 0.71, 0.93, 1.15, 1.22, 1.23, 1.23, 1.28 \] find \(\hat \theta\).
Solution
We can calculate this in
R
withand so \(\hat \theta = 1.428\).
Use
R
’soptimize()
function to verify \(\hat \theta\), assuming \(\hat \theta \in [0.1, 4]\).
Solution
We now need a function to evaluate the negative log-likelihood (ignoring the constant, because this doesn’t vary with \(\theta\)), as we want to find \(\theta\) that minimises this. We’ll call this function
nd0(theta, y)
.Then we call
optimize()
accordingly## $minimum ## [1] 1.427901 ## ## $objective ## [1] 9.656731
and we see that element
minimum
in the list confirms \(\hat \theta\).
Consider using Newton’s method to find \(\hat \theta\) from Question 2.
Find the second derivative of the log-likelihood w.r.t. \(\theta\).
Solution
The second derivative of the log-likelihood is \[ \dfrac{\partial^2 \log f(\mathbf{y} \mid \theta)}{\partial \theta^2} = -\dfrac{n}{\theta^2}. \]
To find maximum likelihood estimates using Newton’s method, we want to minimise the negative log-likelihood. Hence show that if we want to minimise \(-\log f(\mathbf{y} \mid \theta)\) then a step of Newton’s method, given \(\theta\), is given by \[ \dfrac{\partial [-\log f(\mathbf{y} \mid \theta)] / \partial \theta}{\partial^2 [-\log f(\mathbf{y} \mid \theta)] / \partial \theta^2} = \dfrac{\sum_{i = 1}^n y_i^2 - n / \theta}{n / \theta^2}. \]
Solution
The step is \[ \dfrac{\partial [-\log f(\mathbf{y} \mid \theta)] / \partial \theta}{\partial^2 [-\log f(\mathbf{y} \mid \theta)] / \partial \theta^2} = \dfrac{\sum_{i = 1}^n y_i^2 - n / \theta}{n / \theta^2} \]
as required.
Write functions in
R
,nd1(theta, y)
andnd2(theta, y)
, that return the first and second derivatives of \(-\log f(\mathbf{y} \mid \theta)\) w.r.t. \(\theta\) based on analytical expressions for the derivatives.
Solution
Here we’ll ensure that
nd1()
andnd2()
return a vector and matrix, respectively, for generality if we’re working with higher dimensions, but returning scalars is also fine here.nd1 <- function(theta, y) { # Function to evaluate first derivative w.r.t. theta # theta is a scalar # y is a vector # returns a 1-vector as.vector(sum(y^2) - n / theta) } nd2 <- function(theta, y) { # Function to evaluate second derivative w.r.t. theta # theta is a scalar # y is a vector # returns a 1x1 matrix matrix(n / theta^2, 1, 1) }
Use your functions from Question 3(c) to find the first Newton step, given \(\theta_0 = 1\) and the sample of data from Question 2(b).
Solution
We’ll load the data
and then first step is
## [1] 0.29968
Perform four further steps of Newton’s method. After how many steps does Newton’s method agree with \(\hat \theta\) from Question 2(b) to within three decimal places.
Solution
The following code performs five iterations of Newton’s method in total, starting at \(\theta_0\).
iterations <- 5 theta_i <- numeric(iterations + 1) theta_i[1] <- 1 for (i in 1 + 1:iterations) { theta_i[i] <- theta_i[i - 1] - solve(nd2(theta_i[i - 1], y), nd1(theta_i[i - 1], y)) } theta_i
## [1] 1.000000 1.299680 1.416402 1.427826 1.427919 1.427919
## [1] 1.427919
If we compare these to \(\hat \theta\) by calculating the absolute difference we get
## [1] 4.279187e-01 1.282387e-01 1.151687e-02 9.288927e-05 6.042653e-09 ## [6] 0.000000e+00
which is less than \(10^{-3}\) by its fourth element, i.e. the third iteration of Newton’s method.
Evaluate the Hessian of the negative log-likelihood at \(\hat \theta\). This should be positive definite if \(\hat \theta\) is the maximum likelihood estimate, which can be assessed by all of its eigenvalues being positive. Check whether this is the case.
Solution
The following calculates the Hessian and its eigenvalues
and we can then check whether they’re all positive
## [1] TRUE
which they are.
Suppose that the sample of data \[ -7.7, -0.5, -0.1, 1.2, 1.3, 2.4, 3.7, 5.7, 6.2, 8.4, 13.9, 24.4 \] can be modelled as independent realisations from the pdf \[ f(y \mid \mu) = \dfrac{2}{\pi \left[(y - \mu)^2 + 4\right]}, \] where \(-\infty < \mu < \infty\) is an unknown parameter.
Find the maximum likelihood estimate of \(\mu\), denoted \(\hat \mu\), using Newton’s method with a suitable numbers of iterations.
Solution
We’ll start by finding the log-likelihood and its first and second derivatives w.r.t. \(\mu\). The log-likelihood is given by \[ \log f(\mathbf{y} \mid \mu) = n \log 2 - n \log \pi - \sum_{i = 1}^n \log\left([y_i - \mu]^2 + 4\right) \] and so \[ \dfrac{\partial \log f(\mathbf{y} \mid \mu)}{\partial \mu} = 2\sum_{i = 1}^n \dfrac{y_i - \mu}{(y_i - \mu)^2 + 4} \] and therefore \[ \dfrac{\partial^2 \log f(\mathbf{y} \mid \mu)}{\partial \mu^2} = 4 \sum_{i = 1}^n \left[\dfrac{y_i - \mu}{(y_i - \mu)^2 + 4}\right]^2 - 2 \sum_{i = 1}^n \dfrac{1}{(y_i - \mu)^2 + 4}. \]
Then we’ll write functions in
R
to evaluate these,d0(mu, y, mult = 1)
,d1(mu, y, mult = 1)
andd2(mu, y, mult = 1)
, respectively, wheremult
can be set to-1
when we want to deal with the negative log-likelihood.d0 <- function(mu, y, mult = 1) { # function to evaluate log-likelihood # mu is a scalar # y is a vector # returns a scalar n <- length(y) mult * (n * log(2) - n * log(pi) - sum(log((y - mu)^2 + 4))) } d1 <- function(mu, y, mult = 1) { # function to evaluate first derivative of log-likelihood w.r.t. mu # mu is a scalar # y is a vector # returns a scalar n <- length(y) mult * (2 * sum((y - mu) / ((y - mu)^2 + 4))) } d2 <- function(mu, y, mult = 1) { # function to evaluate second derivative of log-likelihood w.r.t. mu # mu is a scalar # y is a vector # returns a scalar n <- length(y) n * log(2) - n * log(pi) - sum(log((y - mu)^2 + 4)) 2 * mult * (2 * sum(((y - mu) / ((y - mu)^2 + 4))^2) - sum(1 / ((y - mu)^2 + 4))) }
Next we’ll perform our iterations of Newton’s method. We’ll deem a ‘suitable’ number of iterations to be when the parameter estimates for one iteration are within \(10^{-4}\) of those of the previous iteration.
theta_i <- 1 # set theta_0 while(TRUE) { theta_last <- theta_i[length(theta_i)] theta_next <- theta_last - solve(d2(theta_last, y2, -1), d1(theta_last, y2, -1)) theta_i <- c(theta_i, theta_next) if (abs(theta_last - theta_next) < 1e-4) break } theta_i
## [1] 1.000000 2.099045 2.215480 2.219332 2.219336
We see that our 4th iteration has met our stopping criteria. Hence we have \(\hat \theta =\) 2.2193, to four decimal places.
Then use
optimize()
to verify your estimate of \(\mu\) based on Newton’s method from Question 4(a).
Solution
For
optimize()
we can use the following call, and will assume \(\hat \theta \in [1, 3]\).## $minimum ## [1] 2.219318 ## ## $objective ## [1] 41.79389
Looking at the
minimum
element of the list thatoptimize()
returns, we see agreement between its estimates of \(\theta\) and that of Question 4(a).
Consider a log-Normal model for the wind speeds of Example 5.5, so that \(Y \sim \text{log-Normal}(\mu, \sigma^2)\) and hence \[ f_Y(y) = \dfrac{1}{y\sqrt {2\pi \sigma^2}}\exp \left\{-\dfrac{\left[\log(y) - \mu \right]^2}{2\sigma^2}\right\} \quad y > 0 \] and for \(\sigma^2 > 0\). The log-likelihood for an independent sample \(\mathbf{y} = (y_1, \ldots, y_n)\) is given by \[ \log f(\mathbf{y} \mid \mu, \sigma^2) = -\dfrac{n}{2} \log(2 \pi) - \dfrac{n}{2} \log \sigma^2 - \sum_{i = 1}^n \log(y_i) - \dfrac{1}{2 \sigma^2} \sum_{i = 1}^n (\log y_i- \mu)^2 \] and its gradient operator is given by \[ \begin{pmatrix} \dfrac{\partial \log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial \mu}\\ \dfrac{\partial \log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial \sigma^2} \end{pmatrix} = \begin{pmatrix} \dfrac{1}{\sigma^2} \sum_{i = 1}^n (\log y_i - \mu)\\ -\dfrac{n}{2 \sigma^2} + \dfrac{1}{2\sigma^4} \sum_{i = 1}^n (\log y_i - \mu)^2 \end{pmatrix}. \]
For the wind speed data, \[ \sum_{i = 1}^n \log y_i = -12.755, \quad \sum_{i = 1}^n (\log y_i)^2 = 155.675 \] and \(n = 31\). Write \(\log f(\mathbf{y} \mid \mu, \sigma^2)\) in terms of the summary statistics \(s_1\) and \(s_2\), so that \(\log f(\mathbf{y} \mid \mu, \sigma^2)\) can be evaluated without knowing \(y_1, \ldots, y_n\).
Solution
\[ \log f(\mathbf{y} \mid \mu, \sigma^2) = -\dfrac{n}{2} \log(2 \pi) - \dfrac{n}{2} \log \sigma^2 - s_1 - \dfrac{1}{2 \sigma^2} (s_2 - 2\mu s_1 + n \mu^2). \]
Write a function in
R
,ln(pars, s1, s2, n, mult = 1)
, that evaluates \(m \log f(\mathbf{y} \mid \mu, \sigma^2)\), wherepars
is a 2-vector such thatpars[1]
\(= \mu\) andpars[2]
\(= \sigma^2\),s1
\(= s_1\),s2
\(= s_2\),n
\(= n\) andmult
\(= m\). Your function should ensure that \(\sigma^2 > 0\).
Solution
mu <- 1 sigsq <- 2 ln <- function(pars, s1, s2, n, mult = 1) { # Function to evaluate log-Normal(mu, sig^2) log-likelihood # pars is a 2-vector: pars[1] = mu, pars[2] = sig^2 # s1 and s2 are scalars # n is an integer # mult is a scalar; defaults to 1 # returns a scalar mu <- pars[1] sigsq <- pars[2] if (sigsq <= 0) return(mult * -1e8) out <- -.5 * n * (log(2 * pi) + log(sigsq)) out <- out - .5 * (s2 - 2 * mu * s1 + n * mu^2) / sigsq mult * (out - s1) }
Find \[ \begin{pmatrix} \dfrac{\log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial \mu}\\ \dfrac{\log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial \sigma^2} \end{pmatrix} \] in terms of \(s_1\) and \(s_2\).
Solution
\[ \begin{pmatrix} \dfrac{\partial \log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial \mu}\\ \dfrac{\partial \log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial \sigma^2} \end{pmatrix} = \begin{pmatrix} \dfrac{1}{\sigma^2} (s_1 - n\mu^2)\\ -\dfrac{n}{2 \sigma^2} + \dfrac{1}{2\sigma^4} (s_2 - 2 \mu s_1 + n\mu^2) \end{pmatrix}. \]
Write a function in
R
,ln_d1(pars, s1, s2, n, mult = 1)
, that evaluates the first derivative of \(m \log f(\mathbf{y} \mid \mu, \sigma^2)\) w.r.t. \((\mu, \sigma^2)\), wherepars
is a 2-vector such thatpars[1]
\(= \mu\) andpars[2]
\(= \sigma^2\),s1
\(= s_1\),s2
\(= s_2\),n
\(= n\) andmult
\(= m\).
Solution
ln_d1 <- function(pars, s1, s2, n, mult= 1) { # Function to evaluate first derivative of log-Normal(mu, sig^2) # log-likelihood w.r.t (\mu, \sigma^2) # pars is a 2-vector: pars[1] = mu, pars[2] = sig^2 # s1 and s2 are scalars # n is an integer # mult is a scalar; defaults to 1 # returns a 2-vector mu <- pars[1] sigsq <- pars[2] out <- numeric(2) out[1] <- (s1 - n * mu) / sigsq out[2] <- -.5 * n / sigsq + .5 * (s2 - 2 * mu * s1 + n * mu^2) / (sigsq^2) mult * out }
Using values for \((\mu, \sigma^2)\) of \((\mu_0, \sigma_0^2) = (1, 2)\), check your function
ln_d1()
by finite-differencing.
Solution
We’ll use function
fd()
from the lecture notes.fd <- function(x, f, delta = 1e-6, ...) { # Function to evaluate derivative by finite-differencing # x is a p-vector # fn is the function for which the derivative is being calculated # delta is the finite-differencing step, which defaults to 10^{-6} # returns a vector of length x f0 <- f(x, ...) p <- length(x) f1 <- numeric(p) for (i in 1:p) { x1 <- x x1[i] <- x[i] + delta f1[i] <- f(x1, ...) } (f1 - f0) / delta }
Then we’ll load \(\mu_0\) and \(\sigma_0^2\).
Finally we’ll evaluate the gradient and its finite-differencing counterpart
## [1] -21.87750 18.77313
## [1] -21.87751 18.77311
which are both the same, so it looks as if our function to evaluate the gradient is returning the correct values. (We could check with more values of \((\mu, \sigma^2)\) if we’re really keen, but one check is usually sufficient!)
Using \((\mu_0, \sigma_0^2)\) from Question 5(e) as starting values, use
optim()
together withln()
andln_d1()
to find the maximum likelihood estimates of \((\mu, \sigma^2)\), \((\hat \mu, \hat \sigma^2)\), via the BFGS method.
Solution
s1 <- -12.755 s2 <- 155.675 n <- 31 hats <- optim(c(mu0, sigsq0), ln, ln_d1, s1 = s1, s2 = s2, n = n, mult = -1, method = 'BFGS')$par hats
## [1] -0.4114503 4.8524782
Evaluate the first derivative of \(\log f(\mathbf{y} \mid \mu, \sigma^2)\) at \((\hat \mu, \hat \sigma^2)\) and comment on whether \((\hat \mu, \hat \sigma^2)\) are likely to be maximum likelihood estimates.
Solution
The first derivatives at \((\hat \mu, \hat \sigma^2)\) are given by
## [1] -8.394066e-06 2.346390e-06
which are very closely to zero, indicating we’re likely to have found the maximum likelihood estimates.
For the log-Normal wind speed model of Question 5(a), the Hessian matrix of second derivatives is given by \[ \nabla^2 \log f(\mathbf{y} \mid \mu, \sigma^2) = \begin{pmatrix} \dfrac{\partial^2 \log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial \mu^2} & \dfrac{\partial^2 \log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial \mu \partial \sigma^2}\\ \dfrac{\partial^2 \log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial \mu \partial \sigma^2} & \dfrac{\partial^2 \log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial (\sigma^2)^2} \end{pmatrix} \] where \[\begin{align*} \dfrac{\partial^2 \log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial \mu^2} &= -\dfrac{n}{\sigma^2}\\ \dfrac{\partial^2 \log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial \mu \partial \sigma^2} &= -\dfrac{1}{\sigma^4} \sum_{i = 1}^n (\log y_i - \mu)\\ \dfrac{\partial^2 \log f(\mathbf{y} \mid \mu, \sigma^2)}{\partial (\sigma^2)^2} &= \dfrac{n}{2 \sigma^4} - \dfrac{1}{\sigma^6} \sum_{i = 1}^n (\log y_i - \mu)^2. \end{align*}\]
Write a function,
nl_d2(pars, s1, s2, n, mult = -1)
, that evaluates the Hessian matrix in terms of \(s_1\) and \(s_2\) given in Question 5(a) and assuming that same arguments asnl_d1()
of Question 5(d).
Solution
ln_d2 <- function(pars, s1, s2, n, mult= 1) { # Function to evaluate second derivatives of log-Normal(mu, sig^2) # log-likelihood w.r.t (\mu, \sigma^2) # pars is a 2-vector: pars[1] = mu, pars[2] = sig^2 # s1 and s2 are scalars # n is an integer # mult is a scalar; defaults to 1 # returns a 2x2 matrix mu <- pars[1] sigsq <- pars[2] out <- matrix(0, 2, 2) out[1, 1] <- - n / sigsq out[1, 2] <- out[2, 1] <- -(s1 - n * mu) / (sigsq^2) out[2, 2] <- .5 * n / sigsq^2 - (s2 - 2 * mu * s1 + n * mu^2) / (sigsq^3) mult * out }
Using
nl()
andnl_d1()
from Question 5(a) andnl_d2()
from Question 6(a), find \((\hat \mu_2, \hat \sigma_2^2)\), the maximum likelihood estimates, using Newton’s method vianlminb()
inR
.
Solution
## [1] -0.4114516 4.8524818
Evaluate the first derivative of \(\log f(\mathbf{y} \mid \mu, \sigma^2)\) at \((\hat \mu_2, \hat \sigma_2^2)\) from Question 6(b) and comment on whether \((\hat \mu_2, \hat \sigma_2^2)\) are likely to be maximum likelihood estimates.
Solution
We can proceed as in question \(\ref{wind12}\).
## [1] -1.098215e-15 5.506706e-14
which are still very closely to zero, indicating we’re likely to have found the maximum likelihood estimates.
Evaluate the Hessian matrix of \(\log f(\mathbf{y} \mid \mu, \sigma^2)\) at \((\hat \mu_2, \hat \sigma_2^2)\) from Question 6(b) and comment on whether \((\hat \mu_2, \hat \sigma_2^2)\) are likely to be maximum likelihood estimates.
Solution
We’ll calculate the Hessian
and then, after it’s been negated, want all of its eigenvalues to be positive
## [1] TRUE
which they are.
Recall the wind speed data of Example 5.5. Consider these as independent realisations from the Gamma(\(\alpha\), \(\beta\)) distribution, i.e. with pdf \[ f_Y(y \mid \alpha, \beta) = \dfrac{y^{\alpha -1} e^{-\beta x}\beta^\alpha}{\Gamma(\alpha)}, \quad y > 0, \] for parameters \(\alpha, \beta > 0\).
Find the maximum likelihood estimates of \((\alpha, \beta)\) using the BFGS method with
optim()
inR
by supplying a function that evaluates the negative log-likelihood’s gradient vector w.r.t. \((\alpha, \beta)\). [Note that inR
lgamma(x)
evaluates \(\log \Gamma(x)\) anddigamma(x)
evaluates \(\text{d} \log \Gamma(x) / \text{d}x\) forx
\(= x\), wheredigamma()
relies on the so-called polygamma function.]
Solution
We’ll load the wind speed data as
y0
.y0 <- c(3.52, 1.95, 0.62, 0.02, 5.13, 0.02, 0.01, 0.34, 0.43, 15.5, 4.99, 6.01, 0.28, 1.83, 0.14, 0.97, 0.22, 0.02, 1.87, 0.13, 0.01, 4.81, 0.37, 8.61, 3.48, 1.81, 37.21, 1.85, 0.04, 2.32, 1.06)
Then the following functions evaluate the negative log-likelihood and its gradient, respectively.
nldgamma <- function(pars, y) { # Function to evaluate Gamma(alpha, beta) negative log-likelihood # pars is a 2-vector # y is a vector # returns a scalar alpha <- pars[1] beta <- pars[2] if (min(c(alpha, beta)) <= 0) return(1e8) n <- length(y) - n * alpha * log(beta) + n * lgamma(alpha) - (alpha - 1) * sum(log(y)) + beta * sum(y) } nldgamma_d1 <- function(pars, y) { # Function to evaluate first derivative of Gamma(alpha, beta) # negative log-likelihood w.r.t. (alpha, beta) # pars is a 2-vector # y is a vector # returns a 2-vector alpha <- pars[1] beta <- pars[2] n <- length(y) out <- numeric(2) out[1] <- - n * log(beta) + n * digamma(alpha) - sum(log(y)) out[2] <- - n * alpha / beta + sum(y) out }
We’ll choose starting values for \((\alpha, \beta)\) as \((\alpha_0, \beta_0) = (1, 1)\) and store these as
pars0
and then call
optim()
to implement the BFGS method## [1] 0.4017348 0.1179670
storing the results maximum likelihood estimates as
pars_bfgs
.Find the maximum likelihood estimates of \((\alpha, \beta)\) using Newton’s method with
nlminb()
inR
by supplying the function that evaluates the negative log-likelihood’s gradient vector w.r.t. \((\alpha, \beta)\) created in Questionref qgbfgs
, and by supplying a function that evaluates the negative log-likelihood’s Hessian matrix w.r.t. \((\alpha, \beta)\). [Note that inR
trigamma(x)
evaluates \(\text{d}^2 \log \Gamma(x) / \text{d}x^2\) forx
\(= x\).]
Solution
We can re-use a few functions and objects from above, so next we’ll write a function to evaluate the second derivative of the log-likelihood w.r.t. \((\alpha, \beta)\).
nldgamma_d2 <- function(pars, y) { # Function to evaluate second derivative of Gamma(alpha, beta) # negative log-likelihood w.r.t. (alpha, beta) # pars is a 2-vector # y is a vector # returns a 2x2 matrix alpha <- pars[1] beta <- pars[2] n <- length(y) out <- matrix(0, 2, 2) out[1, 1] <- n * trigamma(alpha) out[1, 2] <- out[2, 1] <- - n / beta out[2, 2] <- n * beta^2 out }
We supply this to
nlminb()
withpars0
from above## [1] 0.4017348 0.1179670
and store the maximum likelihood estimates as
pars_newt
.By considering the gradient at your estimates in Questions 7(a) and 7(b), verify that on both occasions the maximum likelihood estimates have been reached.
Solution
We’ll evaluate the gradient of the negative log-likelihood at
pars_bfgs
andpars_newt
## [1] -1.200319e-06 8.312922e-06
## [1] 1.999826e-06 -1.875220e-06
and find both are sufficiently close to the zero vector that we think we’ve reached the maximum likelihood estimates.
Use the Nelder-Mead method to find the maximum likelihood estimates of \((\mu, \sigma^2)\) for the log-Normal model, starting values and data of Question 5.
Solution
## [1] -0.4113588 4.8537235
As a quick aside we’ll have a look at the derivative of our estimates
## [1] -0.0005929934 -0.0008170016
which we see are still close to zero, but not as close as with the gradient-based optimisation methods. This is something to expect with the Nelder-Mead method because it doesn’t use gradients in its iterations, and doesn’t use gradient-based criteria to determine convergence, whereas the gradient-based methods do (amongst other criteria).
Find the maximum likelihood estimates for the wind speed data of Example 5.5 based on the Gamma(\(\alpha\), \(\beta\)) model of Question 7 using the Nelder-Mead method.
Solution
We’ll use the following call to
optim()
## [1] 0.4017200 0.1179219
and see that we get similar estimates to question \(\ref{gmm}\).
Use simulated annealing with \(N = 1000\) iterations (i.e.
control = list(maxit = 1e3))
) to approximate the maximum likelihood estimates of \((\mu, \sigma^2)\) for the wind speed data and log-Normal model of Question 5.
Solution
hats4 <- optim(c(mu0, sigsq0), ln, s1 = s1, s2 = s2, n = n, mult = -1, method = 'SANN', control = list(maxit = 1e3)) hats4
## $par ## [1] -0.394841 4.795629 ## ## $value ## [1] 55.71617 ## ## $counts ## function gradient ## 1000 NA ## ## $convergence ## [1] 0 ## ## $message ## NULL
Find the maximum likelihood estimates for the wind speed data of Example 5.5 based on the Gamma(\(\alpha\), \(\beta\)) model of Question 7 using simulated annealing with \(N = 10^4\) iterations, and then evaluate the gradient w.r.t. (\(\alpha\), \(\beta\)) at the maximum likelihood estimates using your gradient function from Question 7(a).
Solution
We’ll use the following call to
optim()
## [1] 0.3995571 0.1175635
which recognises that \(N = 10^4\) iterations is
optim()
’s default. We get similar estimates to questions 7 and 9. The following evaluates the gradient## [1] -0.3835148 0.2118473
which we see is near zero, but nowhere near as near as for estimates we’ve seen previously. The nature of simulated annealing means we’re only going to get a final gradient incredibly close to zero by chance, or if we use lots of iterations and allow the temperature to decrease sufficiently. In practice, we’d probably not want to wait this long.