Chapter 5

Week 9 lecture 3

Challenges I

  1. Consider modelling the sample \(y_1, \ldots, y_n\) independent as Exp(\(\lambda\)) random variables. Given that their log likelihood is \[ \log f(\mathbf{y} \mid \lambda) = n \log \lambda - \lambda \sum_{i = 1}^n y_i \] and then that \[ \dfrac{\partial \log f(\mathbf{y} \mid \lambda)}{\partial \lambda} = \dfrac{n}{\lambda} - \sum_{i = 1}^n y_i \quad \text{and} \quad \dfrac{\partial^2 \log f(\mathbf{y} \mid \lambda)}{\partial \lambda^2} = -\dfrac{n}{\lambda^2} \] write functions called nd1(lambda, y) and nd2(lambda, y) that evaluate the first and second derivatives of the negative log-likelihood, i.e. of \(-\log f(\mathbf{y} \mid \lambda)\).
Solution
nd1 <- function(lambda, y) {
  # Function to evaluate first derivative of exponential
  # negative log-likelihood w.r.t. lambda
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  sum(y) - n / lambda
}

nd2 <- function(lambda, y) {
  # Function to evaluate second derivative of exponential
  # negative log-likelihood w.r.t. lambda
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  n / lambda^2
}
  1. Find the first Newton step given a starting value of \(\lambda_0 = 1\) and the sample \[ 0.2, 0.5, 0.6, 1.0, 1.8, 2.5 \]
Solution
y <- c(0.2, 0.5, 0.6, 1.0, 1.8, 2.5)
-nd1(1, y) / nd2(1, y)

Week 10 lecture 1

Challenges I

  1. Consider modelling the sample \(y_1, \ldots, y_n\) independent as Exp(\(\lambda\)) random variables. Given that their log likelihood is \[ \log f(\mathbf{y} \mid \lambda) = n \log \lambda - \lambda \sum_{i = 1}^n y_i, \] and that \(\lambda > 0\), write a function called nd0(lambda, y) that evaluates the negative log-likelihood, i.e. \(-\log f(\mathbf{y} \mid \lambda)\).
Solution
nd0 <- function(lambda, y) {
  # Function to evaluate exponential negative log-likelihood
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  -n * log(lambda) + lambda * sum(y)
}
  1. Use nlminb() with nd0(), nd1() and nd2() to find \(\hat \lambda\), the maximum likelihood estimate, given independent sample \[ 0.2, 0.5, 0.6, 1.0, 1.8, 2.5 \] and a starting value of \(\lambda_0 = 1\). [Note that you will need to ensure that nd2() returns a matrix, albeit a trivial \(1 \times 1\) one. Using as.matrix() is probably easiest.]
Solution
nd1 <- function(lambda, y) {
  # Function to evaluate first derivative of exponential
  # negative log-likelihood w.r.t. lambda
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  sum(y) - n / lambda
}

nd2 <- function(lambda, y) {
  # Function to evaluate second derivative of exponential
  # negative log-likelihood w.r.t. lambda
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  matrix(n / lambda^2, 1, 1)
}

y <- c(.2, .5, .6, 1, 1.8, 2.5)
lambda_0 <- 1
nlminb(lambda_0, nd0, nd1, nd2, y = y)
  1. Experiment with different starting values \(\lambda_0\) in order to investigate the sensitivity of Newton’s method to starting values (on this occasion)
Solution
lambda0_seq <- 10^seq(-4, 4)
sapply(lambda0_seq, function(x) nlminb(x, nd0, nd1, nd2, y = y)$par)

Challenges I

  1. Consider modelling the sample \(y_1, \ldots, y_n\) independent as Exp(\(\text{e}^\tau\)) random variables. Given that their log likelihood is \[ \log f(\mathbf{y} \mid \tau) = n \tau - \text{e}^\tau \sum_{i = 1}^n y_i, \] use nlminb() by supplying the negative log-likelihood and its first and second derivatives w.r.t. \(\tau\) to find \(\hat \tau\), its maximum likelihood estimate, given independent sample \[ 0.2, 0.5, 0.6, 1.0, 1.8, 2.5 \] and a starting value of \(\tau_0 = 1\).
Solution
nd0_tau <- function(tau, y) {
  # function to evaluate Exp(exp(tau)) neg. log lik.
  # tau is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  - n * tau + exp(tau) * sum(y)
}

y0 <- c(.2, .5, .6, 1, 1.8, 2.5)
tau0 <- 0
nd0_tau(tau0, y0)

nd1_tau <- function(tau, y) {
  # function to 1st deriv. w.r.t. to tau
  # of Exp(exp(tau)) neg. log lik.
  # tau is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  - n + exp(tau) * sum(y)
}

nd2_tau <- function(tau, y) {
  # function to 2nd deriv. w.r.t. to tau
  # of Exp(exp(tau)) neg. log lik.
  # tau is a scalar
  # y is a vector
  # returns a scalar
  as.matrix(exp(tau) * sum(y))
}

nd1_tau(tau0, y0)
nd2_tau(tau0, y0)

fit <- nlminb(tau0, nd0_tau, nd1_tau, nd2_tau, y = y0)
tau_hat <- fit$par
exp(tau_hat)

Week 10 lecture 2

Challenges I

  1. Consider the sample \(y_1, \ldots, y_n\) given by \[ 1.8, 2.1, 2.3, 2.4, 2.5, 2.7, 3.1, 4.3, 4.3, 4.4 \] as \(n = 10\) independent realisations from the Normal(\(\mu\), \(\sigma^2\)) with \(\mu\) unknown and \(\sigma^2 = 1\), i.e. with pdf \[ f(y \mid \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}} \text{e}^{-(y - \mu)^2/2\sigma^2}. \] Suppose that \(\sigma^2 = 1\). Find \(\log f(\mathbf{y} \mid \mu = 1, \sigma^2 = 1) = \sum_{i = 1}^n \log f(y_i \mid \mu = 1, \sigma^2 = 1)\).
Solution
y <- c(1.8, 2.1, 2.3, 2.4, 2.5, 2.7, 3.1, 4.3, 4.3, 4.4)
sum(dnorm(y, 1, 1, log = TRUE))
  1. Write a function in R, negloglik(mu, y), that evaluates \(-\log f(\mathbf{y} \mid \mu, \sigma^2 = 1)\) for mu \(= \mu\) and y \(=y\).
Solution
negloglik <- function(mu, y) {
 # Function to evaluate derivative negative log-likelihood
  # mu is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  n * log(2 * pi) / 2 + sum((y - mu)^2) / 2
}

negloglik(1, y)
  1. Find \(-\partial \log f(\mathbf{y} \mid \mu, \sigma^2 = 1) / \partial \mu\) and write a function in R, negloglik_d1(mu, y), that evaluates \(-\partial \log f(\mathbf{y} \mid \mu, \sigma^2 = 1) / \partial \mu\) for mu \(= \mu\) and y \(=y\).
Solution
negloglik_d1 <- function(mu, y) {
  # Function to evaluate derivative of 
  # negative log-likelihood w.r.t. mu
  # mu is a scalar
  # y is a vector
  # returns a scalar
  -sum(y - mu)
}

negloglik_d1(1, y)
  1. Use optim() in R to find \(\hat \mu\), the maximum likelihood estimate of \(\mu\), using the BFGS method. You should supply negloglik_d1(mu, y) to evaluate the first derivative w.r.t. \(\mu\), and use \(\mu_0 = 1\) as a starting value.
Solution
fit_bfgs <- optim(1, negloglik, negloglik_d1, y = y, method = 'BFGS')
mu_hat <- fit_bfgs$par
  1. Evaluate \(-\partial \log f(\mathbf{y} \mid \mu, \sigma^2 = 1) / \partial \mu\) at \(\hat \mu\) to verify that \(\hat \mu\) is the m.l.e.
Solution
negloglik_d1(mu_hat, y)
mean(y)

Challenges II

Consider the sample \(y_1, \ldots, y_n\) given by \[ 1.8, 2.1, 2.3, 2.4, 2.5, 2.7, 3.1, 4.3, 4.3, 4.4 \] as \(n = 10\) independent realisations from the Normal(\(\mu\), \(\sigma^2\)) with \(\mu\) and \(\sigma^2\) unknown, i.e. with pdf \[ f(y \mid \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}} \text{e}^{-(y - \mu)^2/2\sigma^2}. \]

  1. Write a function in R, negloglik(pars, y), that evaluates \(-\log f(\mathbf{y} \mid \mu, \sigma^2)\) for pars[1] \(= \mu\), pars[2] \(= \sigma\) and y \(=y\).
Solution
negloglik <- function(pars, y) {
 # Function to evaluate derivative negative log-likelihood
  # pars is a 2-vector, where
  # pars[1] is mu and
  # pars[2] is sigma
  # y is a vector
  # returns a scalar
  mu <- pars[1]
  sigma <- pars[2]
  if (sigma <= 0)
    return(1e6)
  n <- length(y)
  n * log(2 * pi) / 2 + n * log(sigma) + sum((y - mu)^2) / 2 / sigma^2
}

y <- c(1.8, 2.1, 2.3, 2.4, 2.5, 2.7, 3.1, 4.3, 4.3, 4.4)

negloglik(c(1.5, 1.5), y)

-sum(dnorm(y, 1.5, 1.5, log = TRUE))
  1. Find \(-\nabla \log f(\mathbf{y} \mid \mu, \sigma)\), i.e. the 2-vector comprising \(-\partial \log f(\mathbf{y} \mid \mu, \sigma) / \partial \mu\) and \(-\partial \log f(\mathbf{y} \mid \mu, \sigma) / \partial \sigma\), and then write a function in R that evaluates \(-\nabla \log f(\mathbf{y} \mid \mu, \sigma)\).
Solution
negloglik_d1 <- function(pars, y) {
  # Function to evaluate derivative of 
  # negative log-likelihood w.r.t. mu and sigma
  # pars is a 2-vector, where
  # pars[1] is mu and
  # pars[2] is sigma
  # y is a vector
  # returns a 2-vector
  mu <- pars[1]
  sigma <- pars[2]
  n <- length(y)
  out <- numeric(2)
  res <- y - mu
  out[1] <- -sum(res) / sigma^2
  out[2] <- n / sigma - sum(res^2) / sigma^3
  out
}

negloglik_d1(c(1.5, 1.5), y)
  1. Use the BFGS method with optim() in R to find \(\hat \mu\) and \(\hat \sigma\), the maximum likelihood estimates of \(\mu\) and \(\sigma\). Use \(\mu_0 = 1.5\) and \(\sigma_0 = 1.5\) as starting values.
Solution
fit_bfgs <- optim(c(1.5, 1.5), negloglik, negloglik_d1, y = y, method = 'BFGS')
mles <- fit_bfgs$par
  1. Use the \(-\nabla \log f(\mathbf{y} \mid \mu, \sigma)\) evaluated at \(\hat \mu\) and \(\hat \sigma\) to verify that the m.l.e.s have been found.
Solution
negloglik_d1(mles, y)
mean(y)
sqrt(var(y) * (length(y) - 1) / length(y))

Week 10 lecture 3

Challenges I

  1. Consider the alternative parameterisation of the Weibull pdf in terms of parameters \((\log \lambda, \log k)\) given by \[ f(y \mid \tilde \lambda, \tilde k) = \frac{\exp(\tilde k)}{\exp(\tilde \lambda)}\left(\frac{y}{\exp(\tilde \lambda)}\right)^{\exp(\tilde k)-1}e^{-(y/\exp(\tilde \lambda))^{\exp(\tilde k)}} \quad y > 0, \] where \(-\infty < \tilde \lambda, \tilde k < \infty\). Write a function in R, weib2_d0(pars, y, mult = 1), that evaluates the scaled log-likelihood of the model, i.e. \(m \log f(\mathbf{y} \mid \tilde \lambda, \tilde k)\), where pars is a 2-vector such that pars[1] \(= \tilde \lambda\) and pars[2] \(= \tilde k\), y is the data \(\mathbf{y}\) and mult \(= m\) is a multiplier as for weib_d0().
Solution
weib2_d0 <- function(pars, y, mult = 1) {
  # Function to evaluate re-parameterised Weibull log-likelihood
  # pars is a vector
  # pars[1] is $\tilde \lambda$
  # pars[2] is $\tilde k$
  # y can be scalar or vector
  # mult is a scalar defaulting to 1; so -1 returns neg. log likelihood
  # returns a scalar
  n <- length(y)
  epars <- exp(pars)
  out <- n * (pars[2] - epars[2] * pars[1]) + 
    (epars[2] - 1) * sum(log(y)) - sum((y / epars[1])^epars[2])
  mult * out
}
  1. Find the maximum likelihood estimates of \((\tilde \lambda, \tilde k)\) using the Nelder-Mead method, starting at \((\tilde \lambda_0, \tilde k_0) = \big(\log(1.6), \log(0.6)\big)\).
Solution
fit_nm2 <- optim(log(c(1.6, 0.6)), weib2_d0, y = y0, mult = -1)
fit_nm2$par
  1. Confirm that these are related to those of the original parameterisation through the logarithm.
Solution
fit_nm$par
exp(fit_nm2$par)

Week 11 lecture 1

Challenges I

  • Recall the Weibull parameterisation in terms \((\log \lambda, \log k)\) \[ f(y \mid \tilde \lambda, \tilde k) = \frac{\exp(\tilde k)}{\exp(\tilde \lambda)}\left(\frac{y}{\exp(\tilde \lambda)}\right)^{\exp(\tilde k)-1}e^{-(y/\exp(\tilde \lambda))^{\exp(\tilde k)}} \quad y > 0, \] where \(-\infty < \tilde \lambda, \tilde k < \infty\).
  1. Use simulated annealing to find the maximum likelihood estimates of \((\tilde \lambda, \tilde k)\) using a starting temperature of \(T = 10\) and \(N = 10^3\) iterations, i.e. control = list(temp = 10, maxit = 1e3), for the wind speed data of Example 5.5.
Solution
  1. Repeat the above optimisation 20 times, and comment on how the resulting parameter estimates vary.
Solution
  1. Change \(T\) to \(T = 0.1\) and then repeat Question 2.
Solution
  1. Comment on what you observe between Question 2 and Question 3.
Solution

Challenges II

  1. Consider modelling the sample \(y_1, \ldots, y_n\) independent as Exp(\(\lambda\)) random variables. Given that their log likelihood is \[ \log f(\mathbf{y} \mid \lambda) = n \log \lambda - \lambda \sum_{i = 1}^n y_i, \] using nd1(lambda, y), which was previously written to evaluate the first derivative of the negative log-likelihood w.r.t. \(\lambda\), use uniroot() to find \(\hat \lambda\), the m.l.e. of \(\lambda\), as this satisfies \(\partial \log f(\mathbf{y} \mid \lambda) / \partial \lambda = 0\). You may assume \(0.1 < \hat \lambda < 2\).
Solution

Week 11 lecture 2

Challenges I

  1. Consider a random variable \(Y\) with pdf \[ f(y \mid \theta) = \dfrac{\Gamma(2\theta) [y(1 - y)]^{\theta - 1}}{[\Gamma(\theta)]^2} \quad 0 \leq y \leq 1 \] and for \(\theta > 0\). The log-likelihood for an independent sample, \(y_1, \ldots, y_n\), is then given by \[ \log f(\mathbf{y} \mid \theta) = n\big[\log\big(\Gamma(2\theta)\big) - 2\log\big(\Gamma(\theta)\big)\big] + (\theta - 1) \sum_{i = 1}^n \left\{\log y_i + \log(1 - y_i)\right\}. \] Write a function, negloglik(theta, y) to evaluate the negative of the log-likelihood above, and then for the data \[ 0.16, 0.35, 0.37, 0.46, 0.67, 0.69, 0.79, 0.8, 0.91, 0.92 \] find the maximum likelihood estimate \(\hat \theta\) using optimize(), assuming that \(1 \leq \hat \theta \leq 3\). [Recall that lgamma(x) evaluates \(\log\big(\Gamma(x)\big)\) in R if \(x =\) x.]
Solution