Chapter 5

Week 9 lecture 2

Challenges I

  1. Consider a random variable \(Y\) with pdf \[ f(y \mid \theta) = \dfrac{\Gamma(2\theta) [y(1 - y)]^{\theta - 1}}{[\Gamma(\theta)]^2} \quad 0 \leq y \leq 1 \] and for \(\theta > 0\). The log-likelihood for an independent sample, \(y_1, \ldots, y_n\), is then given by \[ \log f(\mathbf{y} \mid \theta) = n\big[\log\big(\Gamma(2\theta)\big) - 2\log\big(\Gamma(\theta)\big)\big] + (\theta - 1) \sum_{i = 1}^n \left\{\log y_i + \log(1 - y_i)\right\}. \] Write a function, negloglik(theta, y) to evaluate the negative of the log-likelihood above, and then for the data \[ 0.16, 0.35, 0.37, 0.46, 0.67, 0.69, 0.79, 0.8, 0.91, 0.92 \] find the maximum likelihood estimate \(\hat \theta\) using optimize(), assuming that \(1 \leq \hat \theta \leq 3\). [Recall that lgamma(x) evaluates \(\log\big(\Gamma(x)\big)\) in R if \(x =\) x.]
Solution
negloglik <- function(theta, y) {
  # Function to evaluate negative log-likelihood
  # theta is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  loglik <- n * (lgamma(2 * theta) - 2 * lgamma(theta))
  loglik <- loglik + (theta - 1) * sum(log(y) + log(1 - y))
  -loglik
}

y <- c(.16, .35, .37, .46, .67, .69, .79, .8, .91, .92)
negloglik(2, y)
optimize(negloglik, c(1, 3), y = y)

Week 9 lecture 3

Challenges I

  1. Consider modelling the sample \(y_1, \ldots, y_n\) independent as Exp(\(\lambda\)) random variables. Given that their log likelihood is \[ \log f(\mathbf{y} \mid \lambda) = n \log \lambda - \lambda \sum_{i = 1}^n y_i \] and then that \[ \dfrac{\partial \log f(\mathbf{y} \mid \lambda)}{\partial \lambda} = \dfrac{n}{\lambda} - \sum_{i = 1}^n y_i \quad \text{and} \quad \dfrac{\partial^2 \log f(\mathbf{y} \mid \lambda)}{\partial \lambda^2} = -\dfrac{n}{\lambda^2} \] write functions called nd1(lambda, y) and nd2(lambda, y) that evaluate the first and second derivatives of the negative log-likelihood, i.e. of \(-\log f(\mathbf{y} \mid \lambda)\).
Solution
nd1 <- function(lambda, y) {
  # Function to evaluate first derivative of exponential
  # negative log-likelihood w.r.t. lambda
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  sum(y) - n / lambda
}

nd2 <- function(lambda, y) {
  # Function to evaluate second derivative of exponential
  # negative log-likelihood w.r.t. lambda
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  n / lambda^2
}
  1. Find the first Newton step given a starting value of \(\lambda_0 = 1\) and the sample \[ 0.2, 0.5, 0.6, 1.0, 1.8, 2.5 \]
Solution
y <- c(0.2, 0.5, 0.6, 1.0, 1.8, 2.5)
-nd1(1, y) / nd2(1, y)

Week 10 lecture 1

Challenges I

  1. Consider modelling the sample \(y_1, \ldots, y_n\) independent as Exp(\(\lambda\)) random variables. Given that their log likelihood is \[ \log f(\mathbf{y} \mid \lambda) = n \log \lambda - \lambda \sum_{i = 1}^n y_i, \] write a function called nd0(lambda, y) that evaluates the negative log-likelihood, i.e. \(-\log f(\mathbf{y} \mid \lambda)\).
Solution
nd0 <- function(lambda, y) {
  # Function to evaluate exponential negative log-likelihood
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  -n * log(lambda) + lambda * sum(y)
}
  1. Use nlminb() with nd0(), nd1() and nd2() to find \(\hat \lambda\), the maximum likelihood estimate, given independent sample \[ 0.2, 0.5, 0.6, 1.0, 1.8, 2.5 \] and a starting value of \(\lambda_0 = 1\). [Note that you will need to ensure that nd2() returns a matrix, albeit a trivial \(1 \times 1\) one. Using as.matrix() is probably easiest.]
Solution
nd1 <- function(lambda, y) {
  # Function to evaluate first derivative of exponential
  # negative log-likelihood w.r.t. lambda
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  sum(y) - n / lambda
}

nd2 <- function(lambda, y) {
  # Function to evaluate second derivative of exponential
  # negative log-likelihood w.r.t. lambda
  # lambda is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  matrix(n / lambda^2, 1, 1)
}

y <- c(.2, .5, .6, 1, 1.8, 2.5)
lambda_0 <- 1
nlminb(lambda_0, nd0, nd1, nd2, y = y)
  1. Confirm your result with optimize() and optim(..., method = 'Brent).
Solution
optimize(nd0, c(.5, 1.5), y = y)
optim(1, nd0, lower = .5, upper = 1.5, method = 'Brent', y = y)
  1. Experiment with different starting values \(\lambda_0\) in order to investigate the sensitivity of Newton’s method to starting values (on this occasion)
Solution
lambda0_seq <- 10^seq(-4, 4)
sapply(lambda0_seq, function(x) nlminb(x, nd0, nd1, nd2, y = y)$par)

Week 10 lecture 2

Challenges I

  1. Consider modelling the sample \(y_1, \ldots, y_n\) independent as Exp(\(\text{e}^\tau\)) random variables. Given that their log likelihood is \[ \log f(\mathbf{y} \mid \tau) = n \tau - \text{e}^\tau \sum_{i = 1}^n y_i, \] use nlminb() by supplying the negative log-likelihood and its first and second derivatives w.r.t. \(\tau\) to find \(\hat \tau\), its maximum likelihood estimate, given independent sample \[ 0.2, 0.5, 0.6, 1.0, 1.8, 2.5 \] and a starting value of \(\tau_0 = 1\).
Solution
nd0_tau <- function(tau, y) {
  # function to evaluate Exp(exp(tau)) neg. log lik.
  # tau is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  - n * tau + exp(tau) * sum(y)
}

y0 <- c(.2, .5, .6, 1, 1.8, 2.5)
tau0 <- 0
nd0_tau(tau0, y0)

nd1_tau <- function(tau, y) {
  # function to 1st deriv. w.r.t. to tau
  # of Exp(exp(tau)) neg. log lik.
  # tau is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  - n + exp(tau) * sum(y)
}

nd2_tau <- function(tau, y) {
  # function to 2nd deriv. w.r.t. to tau
  # of Exp(exp(tau)) neg. log lik.
  # tau is a scalar
  # y is a vector
  # returns a scalar
  as.matrix(exp(tau) * sum(y))
}

nd1_tau(tau0, y0)
nd2_tau(tau0, y0)

fit <- nlminb(tau0, nd0_tau, nd1_tau, nd2_tau, y = y0)
tau_hat <- fit$par
exp(tau_hat)
  1. Confirm the result with optimize().
Solution
optimize(nd0_tau, c(-1, 1), y = y0)

Week 10 lecture 3

Challenges I

  1. Consider the sample \(y_1, \ldots, y_n\) given by \[ 1.8, 2.1, 2.3, 2.4, 2.5, 2.7, 3.1, 4.3, 4.3, 4.4 \] as \(n = 10\) independent realisations from the Normal(\(\mu\), \(\sigma^2\)) with \(\mu\) unknown and \(\sigma^2 = 1\), i.e. with pdf \[ f(y \mid \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}} \text{e}^{-(y - \mu)^2/2\sigma^2}. \] Suppose that \(\sigma^2 = 1\). Find \(\log f(\mathbf{y} \mid \mu = 1, \sigma^2 = 1) = \sum_{i = 1}^n \log f(y_i \mid \mu = 1, \sigma^2 = 1)\).
Solution
y <- c(1.8, 2.1, 2.3, 2.4, 2.5, 2.7, 3.1, 4.3, 4.3, 4.4)
sum(dnorm(y, 1, 1, log = TRUE))
  1. Write a function in R, negloglik(mu, y), that evaluates \(-\log f(\mathbf{y} \mid \mu, \sigma^2 = 1)\) for mu \(= \mu\) and y \(=y\).
Solution
negloglik <- function(mu, y) {
 # Function to evaluate derivative negative log-likelihood
  # mu is a scalar
  # y is a vector
  # returns a scalar
  n <- length(y)
  n * log(2 * pi) / 2 + sum((y - mu)^2) / 2
}

negloglik(1, y)
  1. Find \(-\partial \log f(\mathbf{y} \mid \mu, \sigma^2 = 1) / \partial \mu\) and write a function in R, negloglik_d1(mu, y), that evaluates \(-\partial \log f(\mathbf{y} \mid \mu, \sigma^2 = 1) / \partial \mu\) for mu \(= \mu\) and y \(=y\).
Solution
negloglik_d1 <- function(mu, y) {
  # Function to evaluate derivative of 
  # negative log-likelihood w.r.t. mu
  # mu is a scalar
  # y is a vector
  # returns a scalar
  -sum(y - mu)
}

negloglik_d1(1, y)
  1. Use optim() in R to find \(\hat \mu\), the maximum likelihood estimate of \(\mu\), using the BFGS method. You should supply negloglik_d1(mu, y) to evaluate the first derivative w.r.t. \(\mu\), and use \(\mu_0 = 1\) as a starting value.
Solution
fit_bfgs <- optim(1, negloglik, negloglik_d1, y = y, method = 'BFGS')
mu_hat <- fit_bfgs$par
  1. Evaluate \(-\partial \log f(\mathbf{y} \mid \mu, \sigma^2 = 1) / \partial \mu\) at \(\hat \mu\) to verify that \(\hat \mu\) is the m.l.e.
Solution
negloglik_d1(mu_hat, y)
mean(y)

Week 11 lecture 1

Extra example

Consider the sample \(y_1, \ldots, y_n\) given by \[ 1.8, 2.1, 2.3, 2.4, 2.5, 2.7, 3.1, 4.3, 4.3, 4.4 \] as \(n = 10\) independent realisations from the Normal(\(\mu\), \(\sigma^2\)) with \(\mu\) and \(\sigma^2\) unknown, i.e. with pdf \[ f(y \mid \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}} \text{e}^{-(y - \mu)^2/2\sigma^2}. \]

  1. Write a function in R, negloglik(pars, y), that evaluates \(-\log f(\mathbf{y} \mid \mu, \sigma^2)\) for pars[1] \(= \mu\), pars[2] \(= \sigma\) and y \(=y\).
Solution
negloglik <- function(pars, y) {
 # Function to evaluate derivative negative log-likelihood
  # pars is a 2-vector, where
  # pars[1] is mu and
  # pars[2] is sigma
  # y is a vector
  # returns a scalar
  mu <- pars[1]
  sigma <- pars[2]
  if (sigma <= 0)
    return(1e6)
  n <- length(y)
  n * log(2 * pi) / 2 + n * log(sigma) + sum((y - mu)^2) / 2 / sigma^2
}

y <- c(1.8, 2.1, 2.3, 2.4, 2.5, 2.7, 3.1, 4.3, 4.3, 4.4)

negloglik(c(1.5, 1.5), y)

-sum(dnorm(y, 1.5, 1.5, log = TRUE))
  1. Find \(-\nabla \log f(\mathbf{y} \mid \mu, \sigma)\), i.e. the 2-vector comprising \(-\partial \log f(\mathbf{y} \mid \mu, \sigma) / \partial \mu\) and \(-\partial \log f(\mathbf{y} \mid \mu, \sigma) / \partial \sigma\), and then write a function in R that evaluates \(-\nabla \log f(\mathbf{y} \mid \mu, \sigma)\).
Solution
negloglik_d1 <- function(pars, y) {
  # Function to evaluate derivative of 
  # negative log-likelihood w.r.t. mu and sigma
  # pars is a 2-vector, where
  # pars[1] is mu and
  # pars[2] is sigma
  # y is a vector
  # returns a 2-vector
  mu <- pars[1]
  sigma <- pars[2]
  n <- length(y)
  out <- numeric(2)
  res <- y - mu
  out[1] <- -sum(res) / sigma^2
  out[2] <- n / sigma - sum(res^2) / sigma^3
  out
}

negloglik_d1(c(1.5, 1.5), y)
  1. Use the BFGS method with optim() in R to find \(\hat \mu\) and \(\hat \sigma\), the maximum likelihood estimates of \(\mu\) and \(\sigma\). Use \(\mu_0 = 1.5\) and \(\sigma_0 = 1.5\) as starting values.
Solution
fit_bfgs <- optim(c(1.5, 1.5), negloglik, negloglik_d1, y = y, method = 'BFGS')
mles <- fit_bfgs$par
  1. Use the \(-\nabla \log f(\mathbf{y} \mid \mu, \sigma)\) evaluated at \(\hat \mu\) and \(\hat \sigma\) to verify that the m.l.e.s have been found.
Solution
negloglik_d1(mles, y)
mean(y)
sqrt(var(y) * (length(y) - 1) / length(y))

Challenges I

  1. Consider the alternative parameterisation of the Weibull pdf in terms of parameters \((\log \lambda, \log k)\) given by \[ f(y \mid \tilde \lambda, \tilde k) = \frac{\exp(\tilde k)}{\exp(\tilde \lambda)}\left(\frac{y}{\exp(\tilde \lambda)}\right)^{\exp(\tilde k)-1}e^{-(y/\exp(\tilde \lambda))^{\exp(\tilde k)}} \quad y > 0, \] where \(-\infty < \tilde \lambda, \tilde k < \infty\). Write a function in R, weib2_d0(pars, y, mult = 1), that evaluates the scaled log-likelihood of the model, i.e. \(m \log f(\mathbf{y} \mid \tilde \lambda, \tilde k)\), where pars is a 2-vector such that pars[1] \(= \tilde \lambda\) and pars[2] \(= \tilde k\), y is the data \(\mathbf{y}\) and mult \(= m\) is a multiplier as for weib_d0().
Solution
weib2_d0 <- function(pars, y, mult = 1) {
  # Function to evaluate re-parameterised Weibull log-likelihood
  # pars is a vector
  # pars[1] is $\tilde \lambda$
  # pars[2] is $\tilde k$
  # y can be scalar or vector
  # mult is a scalar defaulting to 1; so -1 returns neg. log likelihood
  # returns a scalar
  n <- length(y)
  epars <- exp(pars)
  out <- n * (pars[2] - epars[2] * pars[1]) + 
    (epars[2] - 1) * sum(log(y)) - sum((y / epars[1])^epars[2])
  mult * out
}
  1. Find the maximum likelihood estimates of \((\tilde \lambda, \tilde k)\) using the Nelder-Mead method, starting at \((\tilde \lambda_0, \tilde k_0) = \big(\log(1.6), \log(0.6)\big)\).
Solution
fit_nm2 <- optim(log(c(1.6, 0.6)), weib2_d0, y = y0, mult = -1)
fit_nm2$par
  1. Confirm that these are related to those of the original parameterisation through the logarithm.
Solution
fit_nm$par
exp(fit_nm2$par)

Week 11 lecture 2

Challenges I

  • Recall the Weibull parameterisation in terms \((\log \lambda, \log k)\) \[ f(y \mid \tilde \lambda, \tilde k) = \frac{\exp(\tilde k)}{\exp(\tilde \lambda)}\left(\frac{y}{\exp(\tilde \lambda)}\right)^{\exp(\tilde k)-1}e^{-(y/\exp(\tilde \lambda))^{\exp(\tilde k)}} \quad y > 0, \] where \(-\infty < \tilde \lambda, \tilde k < \infty\).
  1. Use simulated annealing to find the maximum likelihood estimates of \((\tilde \lambda, \tilde k)\) using a starting temperature of \(T = 10\) and \(N = 10^3\) iterations, i.e. control = list(temp = 10, maxit = 1e3), for the wind speed data of Example 5.5.
Solution
weib2_d0 <- function(pars, y, mult = 1) {
  # Function to evaluate re-parameterised Weibull log-likelihood
  # pars is a vector
  # pars[1] is $\tilde \lambda$
  # pars[2] is $\tilde k$
  # y can be scalar or vector
  # mult is a scalar defaulting to 1; so -1 returns neg. log likelihood
  # returns a scalar
  n <- length(y)
  epars <- exp(pars)
  out <- n * (pars[2] - epars[2] * pars[1]) + 
    (epars[2] - 1) * sum(log(y)) - sum((y / epars[1])^epars[2])
  mult * out
}

y0 <- c(3.52, 1.95, 0.62, 0.02, 5.13, 0.02, 0.01, 0.34, 0.43, 15.5, 
        4.99, 6.01, 0.28, 1.83, 0.14, 0.97, 0.22, 0.02, 1.87, 0.13, 0.01, 
        4.81, 0.37, 8.61, 3.48, 1.81, 37.21, 1.85, 0.04, 2.32, 1.06)

optim(c(log(1.6), log(0.6)), weib2_d0, y = y0, mult = -1, method = 'SANN',
      control = list(maxit = 1e3, temp = 10))
  1. Repeat the above optimisation 20 times, and comment on how the resulting parameter estimates vary.
Solution
pars1 <- replicate(20,
                   optim(c(log(1.6), log(0.6)), weib2_d0, y = y0, mult = -1, method = 'SANN',
      control = list(maxit = 1e3, temp = 10))$par)

apply(pars1, 1, sd)
  1. Change \(T\) to \(T = 0.1\) and then repeat Question 2.
Solution
pars2 <- replicate(20,
                   optim(c(log(1.6), log(0.6)), weib2_d0, y = y0, mult = -1, method = 'SANN',
                         control = list(maxit = 1e3, temp = 0.1))$par)

apply(pars2, 1, sd)
  1. Comment on what you observe between Question 2 and Question 3.
Solution
rbind(apply(pars1, 1, sd),
      apply(pars2, 1, sd))