Chapter 5

Week 9 lecture 3

Challenges I

Consider modelling the sample \(y_1, \ldots, y_n\) independent as Exp(\(\lambda\)) random variables. Given that their log likelihood is \[ \log f(\mathbf{y} \mid \lambda) = n \log \lambda - \lambda \sum_{i = 1}^n y_i \] and then that \[ \dfrac{\partial \log f(\mathbf{y} \mid \lambda)}{\partial \lambda} = \dfrac{n}{\lambda} - \sum_{i = 1}^n y_i \quad \text{and} \quad \dfrac{\partial^2 \log f(\mathbf{y} \mid \lambda)}{\partial \lambda^2} = -\dfrac{n}{\lambda^2} \] write functions called nd1(lambda, y) and nd2(lambda, y) that evaluate the first and second derivatives of the negative log-likelihood, i.e. of \(-\log f(\mathbf{y} \mid \lambda)\).

Solution

nd1 <- function(lambda, y) {
  # Function to evaluate first derivative of exponential
  # negative log-likelihood w.r.t. lambda
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  sum(y) - n / lambda
}

nd2 <- function(lambda, y) {
  # Function to evaluate second derivative of exponential
  # negative log-likelihood w.r.t. lambda
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  n / lambda^2
}

Find the first Newton step given a starting value of \(\lambda_0 = 1\) and the sample \[ 0.2, 0.5, 0.6, 1.0, 1.8, 2.5 \]

Solution

y <- c(0.2, 0.5, 0.6, 1.0, 1.8, 2.5)
-nd1(1, y) / nd2(1, y)

Week 10 lecture 1

Challenges I

Consider modelling the sample \(y_1, \ldots, y_n\) independent as Exp(\(\lambda\)) random variables. Given that their log likelihood is \[ \log f(\mathbf{y} \mid \lambda) = n \log \lambda - \lambda \sum_{i = 1}^n y_i, \] and that \(\lambda > 0\), write a function called nd0(lambda, y) that evaluates the negative log-likelihood, i.e. \(-\log f(\mathbf{y} \mid \lambda)\).

Solution

nd0 <- function(lambda, y) {
  # Function to evaluate exponential negative log-likelihood
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  -n * log(lambda) + lambda * sum(y)
}

Use nlminb() with nd0(), nd1() and nd2() to find \(\hat \lambda\), the maximum likelihood estimate, given independent sample \[ 0.2, 0.5, 0.6, 1.0, 1.8, 2.5 \] and a starting value of \(\lambda_0 = 1\). [Note that you will need to ensure that nd2() returns a matrix, albeit a trivial \(1 \times 1\) one. Using as.matrix() is probably easiest.]

Solution

nd1 <- function(lambda, y) {
  # Function to evaluate first derivative of exponential
  # negative log-likelihood w.r.t. lambda
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  sum(y) - n / lambda
}

nd2 <- function(lambda, y) {
  # Function to evaluate second derivative of exponential
  # negative log-likelihood w.r.t. lambda
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  matrix(n / lambda^2, 1, 1)
}

y <- c(.2, .5, .6, 1, 1.8, 2.5)
lambda_0 <- 1
nlminb(lambda_0, nd0, nd1, nd2, y = y)

Experiment with different starting values \(\lambda_0\) in order to investigate the sensitivity of Newton’s method to starting values (on this occasion)

Solution

lambda0_seq <- 10^seq(-4, 4)
sapply(lambda0_seq, function(x) nlminb(x, nd0, nd1, nd2, y = y)$par)

Challenges I

Consider modelling the sample \(y_1, \ldots, y_n\) independent as Exp(\(\text{e}^\tau\)) random variables. Given that their log likelihood is \[ \log f(\mathbf{y} \mid \tau) = n \tau - \text{e}^\tau \sum_{i = 1}^n y_i, \] use nlminb() by supplying the negative log-likelihood and its first and second derivatives w.r.t. \(\tau\) to find \(\hat \tau\), its maximum likelihood estimate, given independent sample \[ 0.2, 0.5, 0.6, 1.0, 1.8, 2.5 \] and a starting value of \(\tau_0 = 1\).

Solution

nd0_tau <- function(tau, y) {
  # function to evaluate Exp(exp(tau)) neg. log lik.
  # tau is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  - n * tau + exp(tau) * sum(y)
}

y0 <- c(.2, .5, .6, 1, 1.8, 2.5)
tau0 <- 0
nd0_tau(tau0, y0)

nd1_tau <- function(tau, y) {
  # function to 1st deriv. w.r.t. to tau
  # of Exp(exp(tau)) neg. log lik.
  # tau is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  - n + exp(tau) * sum(y)
}

nd2_tau <- function(tau, y) {
  # function to 2nd deriv. w.r.t. to tau
  # of Exp(exp(tau)) neg. log lik.
  # tau is a scalar
  # y is a vector
  # returns a scalar
  as.matrix(exp(tau) * sum(y))
}

nd1_tau(tau0, y0)
nd2_tau(tau0, y0)

fit <- nlminb(tau0, nd0_tau, nd1_tau, nd2_tau, y = y0)
tau_hat <- fit$par
exp(tau_hat)

Week 10 lecture 2

Challenges I

Consider the sample \(y_1, \ldots, y_n\) given by \[ 1.8, 2.1, 2.3, 2.4, 2.5, 2.7, 3.1, 4.3, 4.3, 4.4 \] as \(n = 10\) independent realisations from the Normal(\(\mu\), \(\sigma^2\)) with \(\mu\) unknown and \(\sigma^2 = 1\), i.e. with pdf \[ f(y \mid \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}} \text{e}^{-(y - \mu)^2/2\sigma^2}. \] Suppose that \(\sigma^2 = 1\). Find \(\log f(\mathbf{y} \mid \mu = 1, \sigma^2 = 1) = \sum_{i = 1}^n \log f(y_i \mid \mu = 1, \sigma^2 = 1)\).

Solution

y <- c(1.8, 2.1, 2.3, 2.4, 2.5, 2.7, 3.1, 4.3, 4.3, 4.4)
sum(dnorm(y, 1, 1, log = TRUE))

Write a function in R, negloglik(mu, y), that evaluates \(-\log f(\mathbf{y} \mid \mu, \sigma^2 = 1)\) for mu \(= \mu\) and y \(=y\).

Solution

negloglik <- function(mu, y) {
 # Function to evaluate derivative negative log-likelihood
  # mu is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  n * log(2 * pi) / 2 + sum((y - mu)^2) / 2
}

negloglik(1, y)

Find \(-\partial \log f(\mathbf{y} \mid \mu, \sigma^2 = 1) / \partial \mu\) and write a function in R, negloglik_d1(mu, y), that evaluates \(-\partial \log f(\mathbf{y} \mid \mu, \sigma^2 = 1) / \partial \mu\) for mu \(= \mu\) and y \(=y\).

Solution

negloglik_d1 <- function(mu, y) {
  # Function to evaluate derivative of 
  # negative log-likelihood w.r.t. mu
  # mu is a scalar
  # y is a vector
  # returns a scalar
  -sum(y - mu)
}

negloglik_d1(1, y)

Use optim() in R to find \(\hat \mu\), the maximum likelihood estimate of \(\mu\), using the BFGS method. You should supply negloglik_d1(mu, y) to evaluate the first derivative w.r.t. \(\mu\), and use \(\mu_0 = 1\) as a starting value.

Solution

fit_bfgs <- optim(1, negloglik, negloglik_d1, y = y, method = 'BFGS')
mu_hat <- fit_bfgs$par

Evaluate \(-\partial \log f(\mathbf{y} \mid \mu, \sigma^2 = 1) / \partial \mu\) at \(\hat \mu\) to verify that \(\hat \mu\) is the m.l.e.

Solution

negloglik_d1(mu_hat, y)
mean(y)

Challenges II

Consider the sample \(y_1, \ldots, y_n\) given by \[ 1.8, 2.1, 2.3, 2.4, 2.5, 2.7, 3.1, 4.3, 4.3, 4.4 \] as \(n = 10\) independent realisations from the Normal(\(\mu\), \(\sigma^2\)) with \(\mu\) and \(\sigma^2\) unknown, i.e. with pdf \[ f(y \mid \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}} \text{e}^{-(y - \mu)^2/2\sigma^2}. \]

Write a function in R, negloglik(pars, y), that evaluates \(-\log f(\mathbf{y} \mid \mu, \sigma^2)\) for pars[1] \(= \mu\), pars[2] \(= \sigma\) and y \(=y\).

Solution

negloglik <- function(pars, y) {
 # Function to evaluate derivative negative log-likelihood
  # pars is a 2-vector, where
  # pars[1] is mu and
  # pars[2] is sigma
  # y is a vector
  # returns a scalar
  mu <- pars[1]
  sigma <- pars[2]
  if (sigma <= 0)
    return(1e6)
  n <- length(y)
  n * log(2 * pi) / 2 + n * log(sigma) + sum((y - mu)^2) / 2 / sigma^2
}

y <- c(1.8, 2.1, 2.3, 2.4, 2.5, 2.7, 3.1, 4.3, 4.3, 4.4)

negloglik(c(1.5, 1.5), y)

-sum(dnorm(y, 1.5, 1.5, log = TRUE))

Find \(-\nabla \log f(\mathbf{y} \mid \mu, \sigma)\), i.e. the 2-vector comprising \(-\partial \log f(\mathbf{y} \mid \mu, \sigma) / \partial \mu\) and \(-\partial \log f(\mathbf{y} \mid \mu, \sigma) / \partial \sigma\), and then write a function in R that evaluates \(-\nabla \log f(\mathbf{y} \mid \mu, \sigma)\).

Solution

negloglik_d1 <- function(pars, y) {
  # Function to evaluate derivative of 
  # negative log-likelihood w.r.t. mu and sigma
  # pars is a 2-vector, where
  # pars[1] is mu and
  # pars[2] is sigma
  # y is a vector
  # returns a 2-vector
  mu <- pars[1]
  sigma <- pars[2]
  n <- length(y)
  out <- numeric(2)
  res <- y - mu
  out[1] <- -sum(res) / sigma^2
  out[2] <- n / sigma - sum(res^2) / sigma^3
  out
}

negloglik_d1(c(1.5, 1.5), y)

Use the BFGS method with optim() in R to find \(\hat \mu\) and \(\hat \sigma\), the maximum likelihood estimates of \(\mu\) and \(\sigma\). Use \(\mu_0 = 1.5\) and \(\sigma_0 = 1.5\) as starting values.

Solution

fit_bfgs <- optim(c(1.5, 1.5), negloglik, negloglik_d1, y = y, method = 'BFGS')
mles <- fit_bfgs$par

Use the \(-\nabla \log f(\mathbf{y} \mid \mu, \sigma)\) evaluated at \(\hat \mu\) and \(\hat \sigma\) to verify that the m.l.e.s have been found.

Solution

negloglik_d1(mles, y)
mean(y)
sqrt(var(y) * (length(y) - 1) / length(y))

Week 10 lecture 3

Challenges I

Consider the alternative parameterisation of the Weibull pdf in terms of parameters \((\log \lambda, \log k)\) given by \[ f(y \mid \tilde \lambda, \tilde k) = \frac{\exp(\tilde k)}{\exp(\tilde \lambda)}\left(\frac{y}{\exp(\tilde \lambda)}\right)^{\exp(\tilde k)-1}e^{-(y/\exp(\tilde \lambda))^{\exp(\tilde k)}} \quad y > 0, \] where \(-\infty < \tilde \lambda, \tilde k < \infty\). Write a function in R, weib2_d0(pars, y, mult = 1), that evaluates the scaled log-likelihood of the model, i.e. \(m \log f(\mathbf{y} \mid \tilde \lambda, \tilde k)\), where pars is a 2-vector such that pars[1] \(= \tilde \lambda\) and pars[2] \(= \tilde k\), y is the data \(\mathbf{y}\) and mult \(= m\) is a multiplier as for weib_d0().

Solution

weib2_d0 <- function(pars, y, mult = 1) {
  # Function to evaluate re-parameterised Weibull log-likelihood
  # pars is a vector
  # pars[1] is $\tilde \lambda$
  # pars[2] is $\tilde k$
  # y can be scalar or vector
  # mult is a scalar defaulting to 1; so -1 returns neg. log likelihood
  # returns a scalar
  n <- length(y)
  epars <- exp(pars)
  out <- n * (pars[2] - epars[2] * pars[1]) + 
    (epars[2] - 1) * sum(log(y)) - sum((y / epars[1])^epars[2])
  mult * out
}

Find the maximum likelihood estimates of \((\tilde \lambda, \tilde k)\) using the Nelder-Mead method, starting at \((\tilde \lambda_0, \tilde k_0) = \big(\log(1.6), \log(0.6)\big)\).

Solution

fit_nm2 <- optim(log(c(1.6, 0.6)), weib2_d0, y = y0, mult = -1)
fit_nm2$par

Confirm that these are related to those of the original parameterisation through the logarithm.

Solution

fit_nm$par
exp(fit_nm2$par)

Week 11 lecture 1

Challenges I

Recall the Weibull parameterisation in terms \((\log \lambda, \log k)\) \[ f(y \mid \tilde \lambda, \tilde k) = \frac{\exp(\tilde k)}{\exp(\tilde \lambda)}\left(\frac{y}{\exp(\tilde \lambda)}\right)^{\exp(\tilde k)-1}e^{-(y/\exp(\tilde \lambda))^{\exp(\tilde k)}} \quad y > 0, \] where \(-\infty < \tilde \lambda, \tilde k < \infty\).

Use simulated annealing to find the maximum likelihood estimates of \((\tilde \lambda, \tilde k)\) using a starting temperature of \(T = 10\) and \(N = 10^3\) iterations, i.e. control = list(temp = 10, maxit = 1e3), for the wind speed data of Example 5.5.

Solution

Repeat the above optimisation 20 times, and comment on how the resulting parameter estimates vary.

Solution

Change \(T\) to \(T = 0.1\) and then repeat Question 2.

Solution

Comment on what you observe between Question 2 and Question 3.

Solution

Challenges II

Consider modelling the sample \(y_1, \ldots, y_n\) independent as Exp(\(\lambda\)) random variables. Given that their log likelihood is \[ \log f(\mathbf{y} \mid \lambda) = n \log \lambda - \lambda \sum_{i = 1}^n y_i, \] using nd1(lambda, y), which was previously written to evaluate the first derivative of the negative log-likelihood w.r.t. \(\lambda\), use uniroot() to find \(\hat \lambda\), the m.l.e. of \(\lambda\), as this satisfies \(\partial \log f(\mathbf{y} \mid \lambda) / \partial \lambda = 0\). You may assume \(0.1 < \hat \lambda < 2\).

Solution

Week 11 lecture 2

Challenges I

Consider a random variable \(Y\) with pdf \[ f(y \mid \theta) = \dfrac{\Gamma(2\theta) [y(1 - y)]^{\theta - 1}}{[\Gamma(\theta)]^2} \quad 0 \leq y \leq 1 \] and for \(\theta > 0\). The log-likelihood for an independent sample, \(y_1, \ldots, y_n\), is then given by \[ \log f(\mathbf{y} \mid \theta) = n\big[\log\big(\Gamma(2\theta)\big) - 2\log\big(\Gamma(\theta)\big)\big] + (\theta - 1) \sum_{i = 1}^n \left\{\log y_i + \log(1 - y_i)\right\}. \] Write a function, negloglik(theta, y) to evaluate the negative of the log-likelihood above, and then for the data \[ 0.16, 0.35, 0.37, 0.46, 0.67, 0.69, 0.79, 0.8, 0.91, 0.92 \] find the maximum likelihood estimate \(\hat \theta\) using optimize(), assuming that \(1 \leq \hat \theta \leq 3\). [Recall that lgamma(x) evaluates \(\log\big(\Gamma(x)\big)\) in R if \(x =\) x.]

Solution