4 Chapter 4 exercises

  1. For \(f(x) = \text{e}^{2x}\) approximate \(f'(x)\) by finite-differencing with \(\delta = 10^{-6}\). Use \(x = -1.0, -0.9, \ldots, 0.9, 1.0\) and find the difference between your approximations and the true derivatives.


    Solution

    The following calculates the finite-difference approximation to the derivative for \(x = -1.0, -0.9, \ldots, 0.9, 1.0\) and \(\delta = 10^{-6}\).

    x <- seq(-1, 1, by = .1)
    delta <- 1e-6
    fd <- (exp(2 * (x + delta)) - exp(2 * x)) / delta

    Then the following calculates the true derivative for the stated \(x\) values as true and calculates its difference from the finite-difference approximations.

    true <- 2 * exp(2 * x)
    fd - true
    ##  [1] 2.706593e-07 3.306135e-07 4.038067e-07 4.932252e-07 6.023919e-07
    ##  [6] 7.357217e-07 8.986189e-07 1.097651e-06 1.340685e-06 1.637477e-06
    ## [11] 2.000013e-06 2.442704e-06 2.983658e-06 3.644212e-06 4.450992e-06
    ## [16] 5.436657e-06 6.640438e-06 8.110137e-06 9.906360e-06 1.210034e-05
    ## [21] 1.477673e-05

    Looking at the range of the differences

    range(fd - true)
    ## [1] 2.706593e-07 1.477673e-05

    we see that finite-differencing overestimates the true derivative for all the \(x\) values, but at most only by a small amount, i.e. \(< 2 \times 10^{-5}\).


  2. The normal pdf, denoted \(\phi(y; \mu, \sigma^2)\), where \(\mu\) is its mean and \(\sigma^2\) is its variance, is defined as \[ \phi(y; \mu, \sigma^2) = \dfrac{\text{d} \Phi(y; \mu, \sigma^2)}{\text{d} y}\] where \(\Phi(y; \mu, \sigma^2)\) denotes its corresponding cdf. Confirm this result by finite-differencing for \(y = -2, -1, 0, 1, 2\), \(\mu = 0\) and \(\sigma^2 = 1\) using pnorm() to evaluate \(\Phi(y; \mu, \sigma^2)\).


    Solution

    We’ll start by defining the \(y\) values

    y <- seq(-2, 2)

    Then we’ll evaluate \(\Phi(y; \mu, \sigma^2)\) and \(\Phi(y + \delta; \mu, \sigma^2)\) for these \(y\) values, choosing \(\delta = 10^{-6}\).

    Fy <- pnorm(y)
    delta <- 1e-6
    Fy2 <- pnorm(y + delta)

    The finite-difference approximation is then given by \([\Phi(y + \delta; \mu, \sigma^2) - \Phi(y; \mu, \sigma^2)] / \delta\), which we can calculate in R with the following.

    fy_fd <- (Fy2 - Fy) / delta
    fy_fd
    ## [1] 0.05399102 0.24197085 0.39894228 0.24197060 0.05399091

    We can then compare this to R’s dnorm() function

    fy <- dnorm(y)
    all.equal(fy, fy_fd)
    ## [1] "Mean relative difference: 3.53179e-07"

    and see that the two are approximately equal. (We shouldn’t expect exact equality because pnorm() is only an approximation to \(\Phi(y; \mu, \sigma^2)\), since the normal distribution’s cdf doesn’t have closed form, and because finite-differencing is only an approximation.)


  3. The Rosenbrock function (sometimes called the banana function) is given by \[f(\mathbf{x})=(a - x_1)^2 + b (x_2 - x_1^2)^2,\] where \(\mathbf{x} = (x_1, x_2)^{\text{T}}\).

    1. Write a function, rosenbrock(x, a, b), to evaluate the Rosenbrock function in R for vector \(\mathbf{x} =\) x and scalars \(a =\) a and \(b =\) b, with default values a = 1 and b = 100, and evaluate the function at \(\mathbf{x}_0 = (1, 2)^{\text{T}}\) with the default values of \(a\) and \(b\).


      Solution
      rosenbrock <- function(x, a = 1, b = 100) {
      # function to evaluate Rosenbrock's banana function
      # x, a and b are scalars
      # returns a scalar
      (a - x[1])^2 + b * (x[2] - x[1]^2)^2
      }
      x0 <- c(1, 2)
      rosenbrock(x0)
      ## [1] 100

      It’s also useful to plot the function, so that we know what we’re dealing with. The following gives a contour plot and coloured surface plot.

      par(mfrow = 1:2)
      x1 <- seq(-2, 2, by = .1)
      x2 <- seq(-1, 1.5, by = .1)
      x12_grid <- expand.grid(x1 = x1, x2 = x2)
      f12 <- matrix(apply(x12_grid, 1, rosenbrock), length(x1))
      contour(x1, x2, f12, nlevels = 20)
      x1_mat <- matrix(x12_grid$x1, length(x1))
      x2_mat <- matrix(x12_grid$x2, length(x1))
      plot3D::surf3D(x1_mat, x2_mat, f12, colvar = f12, colkey = TRUE, 
        box = TRUE, bty = "b", phi = 20, theta = 15)
      Rosenbrock's function as a contour plot (left) and surface plot (right).

      Figure 4.1: Rosenbrock’s function as a contour plot (left) and surface plot (right).

      As we see, Rosenbrock’s function is a rather complicated function.


    2. Find \(\partial f(\mathbf{x}) / \partial x_1\) and \(\partial f(\mathbf{x}) / \partial x_2\) and then write a function in R, rosenbrock_d1(x, a, b), that returns the vector \((\partial f(\mathbf{x}) / \partial x_1, \partial f(\mathbf{x}) / \partial x_2)^{\text{T}}\) and has the same default values as rosenbrock(), and then evaluate this gradient operator for \(\mathbf{x}_0 = (1, 2)^{\text{T}}\) and the default values of \(a\) and \(b\).


      Solution

      The partial derivatives are \[\dfrac{\partial f(\mathbf{x})}{\partial x_1} = -2(a - x_1) - 4bx_1(x_2 - x_1^2) \text{ and } \dfrac{\partial f(\mathbf{x})}{\partial x_2} = 2b(x_2 - x_1^2),\] which can be evaluated in R with

      rosenbrock_d1 <- function(x, a = 1, b = 100) {
      # function to evaluate first partial derivatives
      # of Rosenbrock's banana function
      # x, a and b are scalars
      # returns a 2-vector
      d_x2 <- 2 * b * (x[2] - x[1]^2)
      c(-2 * (a - x[1]) - 2 * x[1] * d_x2, d_x2)
      }

      and for \(\mathbf{x}_0\) and the default values of \(a\) and \(b\) we get

      rosenbrock_d1(x0)
      ## [1] -400  200

    3. Find \(\partial^2 f(\mathbf{x}) / \partial x_1^2\), \(\partial^2 f(\mathbf{x}) / \partial x_1 \partial x_2\) and \(\partial^2 f(\mathbf{x}) / \partial x_2^2\) and then write a function in R, rosenbrock_d2(x, a, b), that returns the \(2 \times 2\) Hessian matrix of \(f(\mathbf{x})\) and has the same default values as rosenbrock(), and then evaluate this Hessian matrix for \(\mathbf{x}_0 = (1, 2)^{\text{T}}\) and the default values of \(a\) and \(b\).


      Solution

      The second partial derivatives are \[\dfrac{\partial f(\mathbf{x})}{\partial^2 x_1^2} = 2 - 4b(x_2 - 3x_1^2),~\dfrac{\partial f(\mathbf{x})}{\partial x_1 \partial x_2} = - 4bx_1 \text{ and } \dfrac{\partial^2 f(\mathbf{x})}{\partial x_2^2} = 2b,\] which can be evaluated in R with

      rosenbrock_d2 <- function(x, a = 1, b = 100) {
      # function to evaluate second partial derivatives
      # of Rosenbrock's banana function
      # x, a and b are scalars
      # returns a 2 x 2 matrix
      d_x1x1 <- 2 - 4 * b * (x[2] - 3 * x[1]^2)
      d_x1x2 <- -4 * b * x[1]
      d_x2x2 <- 2 * b
      matrix(c(d_x1x1, d_x1x2, d_x1x2, d_x2x2), 2, 2)
      }

      and for \(\mathbf{x}_0\) and the default values of \(a\) and \(b\) we get

      rosenbrock_d2(x0)
      ##      [,1] [,2]
      ## [1,]  402 -400
      ## [2,] -400  200

    4. Use rosenbrock_d1() to confirm that \(\tilde{\mathbf{x}} = (1, 1)^{\text{T}}\) is a minimum of \(f(\mathbf{x})\), i.e. that gradient operator is approximately a vector of zeros at \(\tilde{\mathbf{x}} = (1, 1)^{\text{T}}\).


      Solution

      We’ll call \(\tilde{\mathbf{x}}\), x_tilde,

      x_tilde <- c(1, 1)

      and then evaluating the first derivatives

      rosenbrock_d1(x_tilde)
      ## [1] 0 0

      we see that these are zero, and to confirm that the Hessian matrix is positive definite, we can confirm that all its eigenvalues are positive

      H <- rosenbrock_d2(x_tilde) # Hessian matrix
      lambda <- eigen(H, only.values = TRUE, symmetric = TRUE)$values
      all(lambda > 0)
      ## [1] TRUE

      which they are, and so \(\tilde{\mathbf{x}}\) is a minimum (although we don’t know whether it’s a local or global minimum from these two calculations).


  4. Recall the \(MVN_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})\) log-likelihood, e.g. from Example 3.2.

    1. Find \(\partial \log f(\mathbf{y} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}) / \partial \boldsymbol{\mu}\) analytically and evaluate this for \(\mathbf{y}\), \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\) as given in Example 3.2.


      Solution

      \[\dfrac{\partial \log f(\mathbf{y} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma})}{\partial \boldsymbol{\mu}} = \boldsymbol{\Sigma}^{-1} ({\bf y} - {\boldsymbol \mu}).\]

      Evaluating this for \(\mathbf{y}\), \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\) as given can be done with the following code

      y <- c(.7, 1.3, 2.6)
      mu <- 1:3
      Sigma <- matrix(c(4, 2, 1, 2, 3, 2, 1, 2, 2), 3, 3)
      (d_mu <- solve(Sigma, y - mu))
      ## [1]  0.08 -0.38  0.14

      so that \[\dfrac{\partial \log f(\mathbf{y} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma})}{\partial \boldsymbol{\mu}} \simeq \begin{pmatrix}0.08 \\-0.38 \\0.14 \\\end{pmatrix}.\]


    2. Confirm the above result by finite-differencing.


      Solution

      We’ll use function fd() again from Example 4.4 of the notes.

      fd <- function(x, f, delta = 1e-6, ...) {
      # Function to evaluate derivative by finite-differencing
      # x is a p-vector
      # fn is the function for which the derivative is being calculated
      # delta is the finite-differencing step, which defaults to 10^{-6}
      # returns a vector of length x
      f0 <- f(x, ...)
      p <- length(x)
      f1 <- numeric(p)
      for (i in 1:p) {
        x1 <- x
        x1[i] <- x[i] + delta
        f1[i] <- f(x1, ...)
      }
      (f1 - f0) / delta
      }

      We’ll also need function dmvn3() from Example 3.13.

      dmvn3 <- function(y, mu, Sigma, log = TRUE) {
        # Function to evaluate multivariate Normal pdf by solving
        # a system of linear equations via Cholesky decomposition
        # y and mu are vectors
        # Sigma is a square matrix
        # log is a logical
        # Returns scalar, on log scale, if log == TRUE.
        p <- length(y)
        res <- y - mu
        L <- t(chol(Sigma))
        out <- - sum(log(diag(L))) - 0.5 * p * log(2 * pi) -
          0.5 * sum(forwardsolve(L, res)^2)
        if (!log) 
          out <- exp(out)
        out
      }

      Then the following approximates \(\partial \log f(\mathbf{y} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}) / \partial \boldsymbol{\mu}\) by finite-differencing

      d_mu_fd <- fd(mu, dmvn3, y = y, Sigma = Sigma)

      Note above that we’ve given y = y and Sigma = Sigma, then the first free argument to dmvn3() is mu, and hence fd() takes this as x. The finite-differencing approximation is

      d_mu_fd
      ## [1]  0.0799998 -0.3800007  0.1399992

      and is the same as the analytical result

      all.equal(d_mu, d_mu_fd)
      ## [1] "Mean relative difference: 2.833149e-06"

      once we allow for error in the finite-difference approximation.


  5. The Gamma(\(\alpha\), \(\beta\)) pdf is given by \[\begin{equation} f(y \mid \alpha, \beta) = \dfrac{\beta^\alpha}{\Gamma(\alpha)} y^{\alpha - 1} \exp(-\beta y) \quad {\text{ for }}y>0, \tag{4.1} \end{equation}\] with \(\alpha,\beta > 0\). The gamma distribution’s cdf, however, has no closed form. Approximate its cdf \(F(y \mid \alpha, \beta)\) for \(y = 1.5\), \(\alpha = 1\) and \(\beta = 2\) using the midpoint rule with \(N = 14\) and compare your result to R’s pgamma(y, alpha, beta) function.


    Solution

    We can use dgamma(y, alpha, beta) to evaluate the Gamma(\(\alpha\), \(\beta\)) pdf (although it’s fine to explicitly write a function in R to evaluate \(f(y \mid \alpha, \beta)\)). Then we want to specify the integration nodes for \(N = 14\). These will be \(0.05, 0.15, \ldots, 1.35, 1.45\), if they’re equally spaced, and hence \(h = 0.1\).

    The following then approximates the integral as I_midpoint.

    nodes <- seq(.05, 1.45, by = .1)
    h <- .1
    I_midpoint <- h * sum(dgamma(nodes, 1, 2))
    I_midpoint
    ## [1] 0.9486311

    Using R’s pgamma() function we get

    I_pgamma <- pgamma(1.5, 1, 2)
    I_pgamma
    ## [1] 0.9502129

    Assuming R’s approximation to be the more accurate (which we’ll assume in MTH3045 is always a good assumption for R’s functions for evaluating cdfs), we’ll compares the relative absolute error of the midpoint approximation to pgamma(), which we’ll call rel_err

    rel_err <- abs((I_pgamma - I_midpoint) / I_pgamma)
    rel_err
    ## [1] 0.001664724

    and we see is about 0.17%.


  6. Consider integrating a function \(f(x)\) over and arbitrary interval \([a, b]\).

    1. Write a function to integrate \(f(x)\) over \([a, b]\) using the midpoint rule with \(N\) midpoints.


      Solution

      The following function, integrate_midpoint(), integrates a function over \([a, b]\) using the midpoint rule with \(N\) midpoints.

      integrate_midpoint <- function(f, a, b, N) {
        # Function to approximate integral of f by midpoint rule over [a, b]
        # f is a function
        # a and b are scalars
        # N is integer number of midpoints
        # returns scalar
        ends <- seq(0, 1, l = N + 1)
        h <- ends[2] - ends[1]
        mids <- ends[-1] - .5 * h
        h * sum(f(mids))
      }

    2. Then use this to approximate \(\int_0^1 f(x) \text{d} x\), where \[f(x) = 1 + 2 \cos(2 x) + 2 \sin(2 x)\] using the midpoint rule with \(N = 8\).


      Solution

      We’ll first create function \(f(x)\), which we’ll call f,

      f <- function(x) 1 + 2 * cos(2 * x) + 2 * sin(2 * x)

      and then we’ll use integrate_midpoint() to approximate its integral over \([0, 1]\) with \(N = 8\).

      midpoint <- integrate_midpoint(f, 0, 1, 8)
      midpoint
      ## [1] 3.331511

    3. Compare your integral approximation to the exact result in terms of its relative absolute error.


      Solution

      As \[\begin{align*} \int_0^1 1 + 2 \cos(2 x) + 2 \sin(2 x)\text{d}x &= \left[x + \sin(2 x) - \cos(2 x)\right]_0^1 \\ &= \left[1 + \sin(2) - \cos(2)\right] - \left[\sin(0) - \cos(0) \right]\\ &= 2 + \sin(2) - \cos(2),\end{align*}\]

      then the true integral is \(2 + \sin(2) - \cos(2) \simeq 3.3254\), which we’ll store as true. The relative absolute error, rel_err, is then

      true <- 2 + sin(2) - cos(2)
      rel_err <- abs((true - midpoint) / true)
      rel_err
      ## [1] 0.001824388

      and on this occasion we see a relative absolute error of approximately 0.18%, for just \(N = 8\).


  7. The composite trapezium rule for integration is given by \[\int_a^b f(x) \text{d}x \simeq \dfrac{h}{2}\left[f(x_1^*) + f(x_N^*) + 2\sum_{i = 2}^{N - 1} f(x_i^*)\right],\] for \(N\) integration nodes \(x_i^* = a + (i - 1)h\) where \(h = (b - a) / (N - 1)\).

    1. Write a function, integrate_trapezium(f, a, b, N), to implement the trapezium rule for an integral \(\int_a^b f(x)\text{d}x\) with \(N\) integration nodes.


      Solution
      integrate_trapezium <- function(f, a, b, N) {
        # Function to approximate integral of f by trapezium rule over [a, b]
        # f is a function
        # a and b are scalars
        # N is integer number of integration nodes
        # returns scalar
        nodes <- seq(a, b, l = N)
        h <- (b - a) / (N - 1)
        int <- 2 * sum(f(nodes[2:(N - 1)]))
        int <- int + sum(f(nodes[c(1, N)]))
        .5 * h * int
      }

    2. Then use integrate_trapezium() to approximate \(F(y \mid \alpha, \beta) = \int_0^y f(z \mid \alpha, \beta) \text{d} z\) using \(N = 14\), where \(f(y \mid \alpha, \beta)\) is the gamma pdf as given in Equation (4.1).


      Solution

      We’ll start with a function to evaluate the Gamma(1, 2) pdf, dgamma2(y).

      dgamma2 <- function(y) dgamma(y, 1, 2)

      Then the trapezium rule approximation to \(F(1.5 \mid 1, 2)\) is given below as I_trapezium.

      I_trapezium <- integrate_trapezium(dgamma2, 0, 1.5, 14)
      I_trapezium
      ## [1] 0.9544261

    3. Compare your integral approximation by the trapezium rule to that of the midpoint rule by using pgamma() as the benchmark of ‘the truth’.


      Solution
      rel_err <- c(midpoint = abs((I_pgamma - I_midpoint) / I_pgamma),
                   trapezium = abs((I_pgamma - I_trapezium) / I_pgamma))
      rel_err
      ##    midpoint   trapezium 
      ## 0.001664724 0.004433936

      We see that the relative absolute error of the trapezium rule is 0.44% and so is greater than that of the midpoint rule, given the same number of integration nodes.


  8. Repeat Question 5 using Simpson’s rule and \(N = 14\), comparing your approximation to the actual integral and your midpoint rule approximation.


    Solution

    For Simpson’s composite rule, the \(N\) integration nodes are given by \(x_{1i} = a + h(2i - 1)/2\) and \(x_{2i} = a + ih\) where \(h = (b - a)/N\).

    a <- 0
    b <- 1.5
    N <- 140
    h <- (b - a) / N
    nodes1 <- a + h * (2*c(1:N) - 1) / 2
    nodes2 <- a + h * c(1:(N - 1))
    I_simpson <- dgamma2(a) + dgamma2(b)
    I_simpson <- I_simpson+ 4 * sum(dgamma2(nodes1)) + 2 * sum(dgamma2(nodes2))
    I_simpson <- h * I_simpson / 6
    I_simpson
    ## [1] 0.9502129
    rel_err <- abs((c(I_pgamma, I_midpoint) - I_simpson) / c(I_pgamma, I_midpoint))
    rel_err
    ## [1] 7.321086e-11 1.667500e-03

  9. Repeat Question 6 using Simpson’s rule and \(N = 8\), comparing your approximation to the actual integral and your midpoint rule approximation.


    Solution
    integrate_simpson <- function(f, a, b, N) {
      # Function to approximate integral of f by Simpson's rule over [a, b]
      # f is a function
      # a and b are scalars
      # N is integer number of integration nodes
      # returns scalar
      h <- (b - a) / N
      nodes1 <- a + h * (2 * c(1:N) - 1) / 2
      nodes2 <- a + h * c(1:(N - 1))
      h * (f(a) + 4 * sum(f(nodes1)) + 2 * sum(f(nodes2)) + f(b)) / 6
    }
    simpson <- integrate_simpson(f, 0, 1, 8)
    simpson
    ## [1] 3.325447
    rel_err <- abs((c(true, midpoint) - simpson) / c(true, midpoint))
    rel_err
    ## [1] 9.502376e-07 1.820118e-03

  10. Simpson’s composite 3/8 rule approximates an integral as \[\int_a^b f(x) \text{d}x \simeq \dfrac{3h}{8} \left(f(a) + 3 \sum_{i=1}^{N/3} f(x_{3i - 2}^*) + 3 \sum_{i=1}^{N/3} f(x_{3i-1}^*) + 2\sum_{i = 1}^{N/3 - 1} f(x_{3i}^*) + f(b)\right),\] for integration nodes \(x_i^* = a + ih\), \(i = 1, \ldots, N - 1\), with \(h = (b - a) / N\) and where \(N\) must be a multiple of three. Approximate \(I = \int_0^1 \text{e}^x \text{d}x\) using Simpson’s 3/8 rule with \(N = 9\) nodes and find the relative absolute error of your approximation.


    Solution
    f <- function(x) exp(x)
    true <- exp(1) - 1
    N <- 9
    a <- 0
    b <- 1
    h <- (b - a) / N
    nodes <- a + h * (1:(N - 1))
    id2 <- 3 * 1:((N / 3) - 1) # multiply by two
    id3 <- c(3 * (1:(N / 3)) - 1, 3 * (1:(N / 3)) - 2) # multiply by three
    simpson38 <- f(a) + 3 * sum(f(nodes[id3])) + 2 * sum(f(nodes[id2])) + f(b)
    (simpson38 <- 3 * h * simpson38 / 8)
    ## [1] 1.718285
    rel_err <- abs((true - simpson38) / true)
    rel_err
    ## [1] 1.899613e-06

  11. Repeat Question 5 using Gaussian quadrature with \(N = 7\) integration nodes, comparing your approximation to the actual integral and your midpoint and Simpson’s rule approximations of questions 5 and 9, respectively.


    Solution
    N <- 7
    gq_nodes <- pracma::gaussLegendre(N, 0, 1.5)
    I_gq <- sum(gq_nodes$w * dgamma(gq_nodes$x, 1, 2))
    I_gq
    ## [1] 0.9502129
    I_vec <- c(I_pgamma, I_midpoint, I_simpson)
    rel_err <- abs((I_vec - I_gq) / I_vec)
    rel_err
    ## [1] 2.281873e-13 1.667500e-03 7.343905e-11

  12. Approximate \[I = \int_{-1}^1 \int_{-1}^1 \int_{-1}^1 (x_1 x_2 x_3)^2 \text{d}x_1\text{d}x_2\text{d}x_3\] using the midpoint rule with \(N = 20\) and evaluate the approximation’s relative absolute error.


    Solution

    Let’s start by finding the true integral \[I = \int_{-1}^1 \int_{-1}^1 \int_{-1}^1 (x_1 x_2 x_3)^2 \text{d}x_1\text{d}x_2\text{d}x_3 = \int_{-1}^1 \int_{-1}^1 \left[\dfrac{x_1^3}{3}\right]_{-1}^1 \text{d}x_2\text{d}x_3\] \[ = \int_{-1}^1 \int_{-1}^1 \left[\dfrac{1}{3} - \dfrac{-1}{3}\right]_{-1}^1 \text{d}x_2\text{d}x_3 = \dfrac{2}{3} \int_{-1}^1 \int_{-1}^1 (x_2 x_3)^2 \text{d}x_2\text{d}x_3 = \ldots = \left(\dfrac{2}{3}\right)^3.\]

    true <- (2 / 3)^3

    For the midpoint rule, we’ll first find the midpoints, which can be the same for each dimension in this case.

    N <- 20
    edges <- seq(-1, 1, l = N + 1)
    h <- edges[2] - edges[1]
    mids <- edges[-1] - .5 * h

    Then we’ll find all combinations of these, x123, and evaluate the integrand at them, f123.

    x123 <- expand.grid(x1 = mids, x2 = mids, x3 = mids)
    f123 <- apply(x123, 1, prod)^2

    We’ll call our midpoint approximation midpoint, which can be calculated with

    midpoint <- sum(h^3 * f123)
    midpoint
    ## [1] 0.2940796

    Then its relative absolute error, rel_err_mid, is

    rel_err_mid <- abs((true - midpoint) / true)
    rel_err_mid
    ## [1] 0.007481266

    which is reasonably small.


  13. Approximate the integral of Question 12 by applying Simpson’s rule with \(N = 20\) to each dimension, instead of the midpoint rule.


    Solution

    The following create the nodes and weights needed to apply Simpson’s rule.

    N <- 20
    a <- -1
    b <- 1
    h <- (b - a) / N
    x1i <- a + h * (2 * (1:N) - 1) / 2
    x2i <- a + h * (1:(N - 1))
    weights <- rep(c(1, 4, 2, 1), c(1, length(x1i), length(x2i), 1))
    nodes <- c(a, x1i, x2i, b)
    weights <- h * weights / 6
    x123 <- expand.grid(x1 = nodes, x2 = nodes, x3 = nodes)
    f123 <- apply(x123, 1, prod)^2
    w123 <- weights %x% weights %x% weights

    We’ll call our Simpson’s rule approximation simpson, which can be calculated with

    simpson <- sum(w123 * f123)
    simpson
    ## [1] 0.2962963

    Then its relative absolute error, rel_err_simp, is

    rel_err_simp <- abs((true - simpson) / true)
    rel_err_simp
    ## [1] 5.620504e-16

    which is very small.


  14. Approximate the integral \[I = \int_0^2 \int_{-\pi}^\pi \left[\sin(x_1) + x_2 + 1\right] \text{d}x_1 \text{d}x_2\] using Gaussian quadrature with \(N = 9\) integration nodes per dimension and estimate the relative absolute error of your approximation.


    Solution

    We’ll start by calculating \(I\), the true integral, \[\begin{align*} I &= \int_0^2 \int_{-\pi}^\pi \left[\sin(x_1) + x_2 + 1\right] \text{d}x_1 \text{d}x_2\\ &= \int_0^2 \Big\{\big[- \cos(x_1) + x_1 x_2 + x_2\big]_{-\pi}^\pi\Big\} \text{d}x_2\\ &= \int_0^2 \big[ \left(1 + \pi x_2 + \pi\right) - \left(1 - \pi x_2 - \pi\right)\big] \text{d}x_2\\ &= \int_0^2 2\pi(x_2 + 1) \text{d}x_2 = 2\pi \left[\dfrac{x_2^2}{2} + x_2\right]_0^2 = 8\pi, \end{align*}\]

    and then store this as true.

    (true <- 8 * pi)
    ## [1] 25.13274

    To calculate this is R we want to set our number of nodes, N, and write a function, f(), to evaluate the integrand.

    N <- 9
    f <- function(x1, x2) sin(x1) + x2 + 1

    Then we want to set our integration nodes for \(x1\), gq1, and for \(x2\), gq2.

    gq1 <- pracma::gaussLegendre(N, -pi, pi)
    gq2 <- pracma::gaussLegendre(N, 0, 2)

    Then we want to evaluate f() at each combination of our \(x_1\) and \(x_2\) integration nodes, which we’ll call f12.

    f12 <- outer(gq1$x, gq2$x, f)

    The weights corresponding to these nodes, w12, are

    w12 <- gq1$w %o% gq2$w

    and then the integral approximation is

    Ihat <- sum(w12 * f12)
    Ihat
    ## [1] 25.13274

    Its relative absolute error, rel_err, is

    rel_err <- abs((true - Ihat) / true)
    rel_err
    ## [1] 2.82716e-16

    and again is incredibly small.


  15. Repeat examples 4.10 and 4.11 by avoiding for() loops. [You might want to consider the functions expand.grid() or outer().]


    Solution

    We’ll start with Example 4.10 and will use expand.grid(). Then we’ll set the dimension d, the number of integration nodes per dimension, N, the integration nodes, x1 and x2, and calculate the true value of the integral, true, as in the Lecture Notes.

    d <- 2
    true <- (exp(1) - 1)^d
    N <- 10
    x1 <- x2 <- (1:N - .5) / N

    Then we want to use expand.grid() to give us all combinations of the integration nodes for both dimensions, and we’ll quickly use head() to see what it’s done.

    x12 <- expand.grid(x1 = x1, x2 = x2)
    head(x12)
    ##     x1   x2
    ## 1 0.05 0.05
    ## 2 0.15 0.05
    ## 3 0.25 0.05
    ## 4 0.35 0.05
    ## 5 0.45 0.05
    ## 6 0.55 0.05

    We’ll calculate the integration weights, w, as in the Lecture Notes,

    w <- 1/(N^d)

    and then approximate the integral, which we’ll call midpoint, by taking the first column of x12 as the \(x_{1i}\) nodes, and the second column as the \(x_{2j}\) nodes, which gives

    midpoint <- w * sum(exp(x12[, 1] + x12[, 2]))
    midpoint
    ## [1] 2.950033

    and is the same as in the Lecture Notes.

    For Example 4.11 we’ll use outer() instead, using d and true from above. Then we need to set the number of integration nodes, N, and find the nodes for each margin, i.e. \(x_{1i}\) and \(x_{2j}\) for \(i, j = 1, \ldots, N\), which we’ll store as xw.

    N <- 4
    xw <- pracma::gaussLegendre(N, 0, 1)

    Next we want to calculate \(\exp(x_{1i} + x_{2j})\) for \(i, j = 1, \ldots, N\), which we’ll store as f12.

    f12 <- exp(outer(xw$x, xw$x, FUN = '+'))
    f12
    ##          [,1]     [,2]     [,3]     [,4]
    ## [1,] 1.148967 1.490991 2.094725 2.718282
    ## [2,] 1.490991 1.934829 2.718282 3.527458
    ## [3,] 2.094725 2.718282 3.818971 4.955800
    ## [4,] 2.718282 3.527458 4.955800 6.431040

    And then we want to calculate the integration weights, \(w_{ij} = w_{1i} w_{2j}\), for \(i, j = 1, \ldots, N\), where the \(w_{1i}\)s and \(w_{2j}\) can both be taken from xw$w, and will store these as w12, obtaining them through outer().

    w12 <- outer(xw$w, xw$w)
    w12
    ##            [,1]       [,2]       [,3]       [,4]
    ## [1,] 0.03025075 0.05671296 0.05671296 0.03025075
    ## [2,] 0.05671296 0.10632333 0.10632333 0.05671296
    ## [3,] 0.05671296 0.10632333 0.10632333 0.05671296
    ## [4,] 0.03025075 0.05671296 0.05671296 0.03025075

    which uses that FUN = '*' is outer()’s default. Then we’ll calculate the integral approximation, and store this as gq

    gq <- sum(w12 * f12)
    gq
    ## [1] 2.952492

    which is the same approximation as in the Lecture Notes. [On this occasion outer() is perhaps a bit tidier, but is less easy to implement for \((d > 2)\)-dimensional integral approximations.]


  16. Consider a random vector \(\mathbf{X} = (X_1, X_1)^{\text{T}}\) from the standard bivariate normal distribution, which we write as \[\mathbf{X} = \left(\begin{array}{c} X_1\\ X_2 \end{array}\right) \sim BVN\left(\left(\begin{array}{c} 0\\ 0 \end{array}\right), \left(\begin{array}{cc} 1 & \rho\\ \rho & 1 \end{array}\right)\right)\] so that \(-1 \leq \rho \leq 1\) is the correlation between \(X_1\) and \(X_2\). The standard bivariate normal pdf is given by \[f(\mathbf{x} ; \rho) = \dfrac{1}{2 \pi \sqrt{1 - \rho^2}} \exp \left(-\dfrac{x_1^2 - 2 \rho x_1 x_2 + x_2^2}{2(1 - \rho^2)} \right)\] but its cdf, which is given by \[\text{Pr}(a_1 \leq x_1 \leq b_1, a_2 \leq x_2 \leq b_2; \rho) = \int_{a_1}^{b_1} \int_{a_2}^{b_2} \dfrac{1}{2 \pi} f(\mathbf{x} ; \rho) \text{d}x_1 \text{d} x_2,\] doesn’t have closed form. Estimate \(\text{Pr}(-1 \leq x_1 \leq 1, -1 \leq x_2 \leq 1; 0.9)\) using the midpoint rule with \(N=10\) integration nodes per dimension, and then using Gaussian quadrature with \(N = 7\) integration nodes per dimension.


    Solution

    We’ll start with a function to evaluate \(f(\mathbf{x} ; \rho)\), which we’ll call f_bvn().

    f_bvn <- function(x1, x2, rho) {
      # Function to evaluate bivaraiate Normal pdf
      # x1, x2 can be scalar, vector or matrix
      # rho is a scalar
      # Returns object in same format as x1
      temp <- 1 - rho^2
      out <- x1^2 - 2 * rho * x1 * x2 + x2^2
      out <- out / (2 * temp)
      exp(-out) / (2 * pi * sqrt(temp))
    }

    Then we’ll write a generic function to implement the two-dimensional midpoint rule, which we’ll call integrate_bvn_midpoint(), which is only useful for the case when we want the same integration nodes for each dimension.

    integrate_bvn_midpoint <- function(N, a, b, f, ...) {
      # Function to evaluate a cdf by midpoint rule
      # N is an integer giving the number of integration nodes per dimension
      # a and b are scalars
      # f is the corresponding pdf
      # Returns a scalar
      edges <- seq(a, b, l = N + 1)
      h <- edges[2] - edges[1]
      mids <- edges[-1] - .5 * h
      grid <- expand.grid(x1 = mids, x2 = mids)
      sum(h * h * f(grid$x1, grid$x2, ...))
    }
    I_bvn_midoint <- integrate_bvn_midpoint(10, -1, 1, f_bvn, rho = .9)
    I_bvn_midoint
    ## [1] 0.5986239

    Next we’ll repeat the process, writing a function integrate_bvn_gq() to approximate the integral by Gaussian quadrature.

    integrate_bvn_gq <- function(N, a, b, f, ...) {
      # Function to evaluate a cdf by Gaussian quadrature
      # N is an integer giving the number of integration nodes per dimension
      # a and b are scalars
      # f is the corresponding pdf
      # Returns a scalar
      x1 <- x2 <- pracma::gaussLegendre(N, a, b)
      x_grid <- expand.grid(x1 = x1$x, x2 = x2$x)
      w_vec <- x1$w %x% x2$w 
      sum(w_vec * f(x_grid$x1, x_grid$x2, ...))
    }
    I_bvn_gq <- integrate_bvn_gq(7, -1, 1, f_bvn, rho = .9)
    I_bvn_gq
    ## [1] 0.5963602

    Note that on this occasion, we don’t know the true integral, and so can’t calculate the relative absolute errors.


  17. Repeat Example 4.12 using the midpoint rule and \(N = 10\).


    Solution

    On this occasion, we can effectively merge code from Question \(\ref{nofor}\) and Example 4.12.

    N <- 10
    d <- 5
    x <- (1:N - .5) / N
    xx <- lapply(1:d, function(i) x)
    X <- expand.grid(xx)
    w <- 1/(N^d)
    f <- sum(w * exp(rowSums(X)))
    true <- (exp(1) - 1)^d
    c(true = true, midpoint = f, rel.err = abs((true - f) / true))
    ##         true     midpoint      rel.err 
    ## 14.978626322 14.947455928  0.002080991

    We’ve now got a relative absolute error about six orders of magnitude larger than for Gaussian quadrature in Example 4.12, and yet here we’ve had to evaluate the integrand 10^{5} times, as opposed to 1024 times in Example 4.12. Here we see that when dealing with high-dimensional integrals, we may want to carefully choose our integration scheme, especially if the integrand is an expensive function to evaluate.


  18. Consider a single random variable \(Y \mid \mu \sim \text{N}(\mu, \sigma^2)\), where we can characterise our prior beliefs about \(\mu\) as \(\mu \sim \text{N}(\alpha, \beta)\). The marginal distribution of \(y\) is given by \[f(y) = \int_{-\infty}^\infty f(y \mid \mu) f(\mu) \text{d}\mu, \] and that the marginal distribution of \(Y\) is \(Y \sim \text{N}(\alpha, \sigma^2 + \beta)\) can be derived in various ways. Approximate \(f(y)\) using Laplace’s method for \(\sigma = 1.5\), \(\alpha = 2\) and \(\beta = 0.5\), and then compare the relative absolute error of your approximation to the true marginal pdf for \(y = -2, 0, 2, 4, 6\).


    Solution

    For the Laplace approximation, we take \(n = 1\) and \(f()\) from Equation (4.4) as \(-\log \phi(y; \mu, \sigma^2) - \log \phi(\mu; \alpha, \beta)\), where we swap \(x\) for \(\mu\). Its first derivative w.r.t. \(\mu\) is \(-(y - \mu)/\sigma^2 + (\mu - \alpha)/\beta\) and its second derivative w.r.t. \(\mu\) is \(1/\sigma^2 + 1/\beta\). Setting the first derivative to zero gives \(\hat \mu = (y \beta + \alpha \sigma^2) / (\beta + \sigma^2)\). Our Laplace approximation then \[\begin{align*} \hat f(y) &\simeq \sqrt{\dfrac{2 \pi}{\frac{1}{\sigma^2} + \frac{1}{\beta}}} \text{exp}\left\{-\log \phi(y; \hat \mu, \sigma^2) - \log \phi(\hat \mu; \alpha, \beta)\right\}\\ &= \sqrt{\dfrac{2 \pi}{\frac{1}{\sigma^2} + \frac{1}{\beta}}} \phi(y; \hat \mu, \sigma^2) \phi(\hat \mu; \alpha, \beta). \end{align*}\]

    We can evaluate this in R with the following. We’ll start by setting \(\sigma\), \(\alpha\) and \(\beta\).

    alpha <- 2
    beta <- .5
    sigma <- 1.5

    Then we’ll create fy(), which evaluates the true marginal pdf, i.e. \(f(y)\), given \(y\), \(\sigma\), \(\alpha\) and \(\beta\),

    fy <- function(y, sigma, alpha, beta) {
      # Function to evaluate Normal(alpha, beta = sigma^2) pdf
      # y is scalar, vector or matrix
      # sigma, alpha and beta are scalars
      # returns objects in same format as y
      dnorm(y, alpha, sqrt(beta + sigma^2))
    }

    evaluate this for the \(y\) values specified, y_vals,

    y_vals <- seq(-2, 6)

    and store this as f_true

    f_true <- fy(y_vals, sigma, alpha, beta)

    The following function calculates the Laplace approximation, by finding \(\hat \mu\), called mu_hat, and then substituting this into the approximation above.

    fla <- function(y, sigma, alpha, beta) {
      # Function to compute Laplace approximate for Normal
      # prior on mean  mu and Normal likelihood on data
      # y is scalar or vector
      # sigma, alpha and beta are scalars
      # Returns object of same format as y
      mu_hat <- (y * beta + alpha * sigma^2) / (beta + sigma^2)
      mult <- 2 * pi / abs(1 / beta + 1 / sigma^2)
      sqrt(mult) * dnorm(y, mu_hat, sigma) * dnorm(mu_hat, alpha, sqrt(beta))
    }

    Then we use this to evaluate the approximation for the specified \(y\) values, f_la.

    f_la <- fla(y_vals, sigma, alpha, beta)

    We’ll look at these alongside each other

    rbind(true = f_true, laplace = f_la)
    ##               [,1]       [,2]      [,3]     [,4]      [,5]     [,6]      [,7]
    ## true    0.01311721 0.04683602 0.1162501 0.200577 0.2405712 0.200577 0.1162501
    ## laplace 0.01311721 0.04683602 0.1162501 0.200577 0.2405712 0.200577 0.1162501
    ##               [,8]       [,9]
    ## true    0.04683602 0.01311721
    ## laplace 0.04683602 0.01311721

    and then by calculating the relative absolute error, i.e.

    abs((f_true - f_la) / f_true)
    ## [1] 6.612393e-16 0.000000e+00 1.193788e-16 1.383787e-16 1.153736e-16
    ## [6] 1.383787e-16 0.000000e+00 0.000000e+00 1.322479e-16

    we see that the relative absolute error is effectively zero, once we allow for R’s machine tolerance. [In fact if we carry on with the algebra above by substituting in the expression for the Normal pdfs, i.e. \[\begin{align*} I_n &= \sqrt{\dfrac{2 \pi}{\frac{1}{\sigma^2} + \frac{1}{\beta}}} \left(\dfrac{1}{\sqrt{2 \pi \sigma^2}} \text{exp}\left\{-\dfrac{(y - \hat \mu)^2}{2 \sigma^2}\right\} \right) \left(\dfrac{1}{\sqrt{2 \pi \beta}} \text{exp}\left\{-\dfrac{(\hat \mu - \alpha)^2}{2 \beta}\right\} \right), \end{align*}\] we eventually find that on this occasion the Laplace approximation is exact (which you’re welcome to confirm in your own time).]


  19. The negative binomial distribution, denoted NegBin\((r, p)\), has probability mass function (pmf) \[f(y) = \dfrac{\Gamma(y + r)}{y!\Gamma(r)} p^y (1 - p)^r.\] If \(Y \mid \lambda \sim \text{Poisson}(\lambda)\) and \(\lambda \sim \text{Gamma}(r, [1 - p]/p)\), then \(Y \sim \text{NegBin}(r, p)\), so that the marginal distribution of \(Y\) is negative binomial, i.e.  \[\begin{equation} f(y) = \int_{-\infty}^\infty f(y \mid \lambda) f(\lambda) \text{d} \lambda \tag{4.2} \end{equation}\] if \(f(y \mid \lambda)\) is the \(\text{Poisson}(\lambda)\) pmf and \(f(\lambda)\) is the \(\text{Gamma}(\alpha, \beta)\) pdf with \(\alpha = r\) and \(\beta = (1 - p)/p\). Use Laplace’s method to approximate the marginal pmf, i.e. to approximate \(f(y)\), for \(y = 0, 1, \ldots, 20\), \(\alpha= 5\) and \(\beta = 0.5\).


    Solution

    For Laplace’s method, we first note that \[\begin{align*} f(y, \lambda) &= f(y \mid \lambda) f(\lambda)\\ &= \text{e}^{-\lambda}\dfrac{\lambda^y}{y!} \times \dfrac {\lambda^{\alpha -1}\text{e}^{-\beta \lambda} \beta^\alpha}{\Gamma(\alpha)} = \text{e}^{-\lambda(1 + \beta)} \dfrac{\lambda^{y + \alpha - 1} \beta^\alpha}{y!\Gamma(\alpha)} \end{align*}\] Then, considering only \(\lambda\), \[ \log f(y, \lambda) = -\lambda(\beta + 1) + (y + \alpha - 1) \log(\lambda) + \text{constant} \] and so \[ \dfrac{\partial \log f(y, \lambda)}{\partial \lambda} = -(\beta + 1) + (y + \alpha - 1) / \lambda \] which we set to zero at \(\tilde \lambda\) so that \(\beta + 1 = (y + \alpha - 1) / \tilde{\lambda}\), i.e. \[ \tilde \lambda = \dfrac{y + \alpha - 1}{\beta + 1}. \] Then \[ \dfrac{\partial^2 \log f(y, \lambda)}{\partial \lambda^2} = -(y + \alpha - 1) / \lambda^2. \] Using the formula for Laplace’s method from the notes, we take \(-nf(x)\) as \(\log f(y, \lambda)\) so that \[ f(y) \simeq \text{e}^{\log f(y, \tilde \lambda)} \sqrt{\dfrac{2\pi}{nf''(\tilde \lambda)}} \] where \(f''(\tilde \lambda) = -\partial^2 \log f(y, \lambda) / \partial \lambda^2\). Therefore \[\begin{align*} f(y) &\simeq \text{e}^{-\tilde \lambda(1 + \beta)} \dfrac{\tilde{\lambda}^{y + \alpha - 1} \beta^\alpha}{y!\Gamma(\alpha)} \sqrt{\dfrac{2\pi\tilde{\lambda}^2}{y + \alpha - 1}}\\ &= \text{e}^{-\tilde \lambda(1 + \beta)} \dfrac{\tilde{\lambda}^{y + \alpha} \beta^\alpha}{y!\Gamma(\alpha)} \sqrt{\dfrac{2\pi}{y + \alpha - 1}}. \end{align*}\]

    Next we’ll set the \(y\) values and \(\alpha\) and \(\beta\).

    y <- 0:20
    alpha <- 5
    beta <- .5

    The following function can then give the Laplace approximation to \(f(y)\) for given \(y\), \(\alpha\) and \(\beta\).

    fhat <- function(y, alpha, beta) {
      # Function to give Laplace approximation to marginal distribution
      # y can be scalar or vector
      # alpha and beta are scalar
      # returns scalar if y scalar, and vector if y vector
      tlambda <- (y + alpha - 1) / (beta + 1)
      dpois(y, tlambda) * dgamma(tlambda, alpha, beta) * 
        tlambda * sqrt(2 * pi / (y + alpha - 1))
    }

    Then we can evaluate this for the specific values of \(y\).

    fhat(y, alpha, beta)
    ##  [1] 0.004030551 0.013490989 0.027056782 0.042171720 0.056312548 0.067653206
    ##  [7] 0.075239786 0.078882315 0.078932092 0.076049269 0.071011807 0.064581799
    ## [13] 0.057425971 0.050079002 0.042936547 0.036266361 0.030228594 0.024899075
    ## [19] 0.020291794 0.016378531 0.013104802

    It’s useful to plot this against the \(\text{NegBin}(\alpha, p)\) pmf. First we’ll write a function dnbinom2(y, r, p) to evaluate this pmf

    dnbinom2 <- function(y, r, p) 
      gamma(y + r) * p^y * (1 - p)^r / factorial(y) / gamma(r)

    and then we’ll plot it against the Laplace approximation.

    fy <- dnbinom2(y, alpha, 1 / (beta + 1))
    plot(y, fy, ylim = c(0, max(fy)))
    lines(y, fhat(y, alpha, beta), col = 2)
    Comparison of true negative binomial marginal pmf with its Laplace approximation.

    Figure 4.2: Comparison of true negative binomial marginal pmf with its Laplace approximation.

    The plot shows agreement between the true marginal pmf and its Laplace approximation to be very good.


  20. Recall the integral \[I = \int_0^1 \big[1 + 2 \cos(2 x) + 2 \sin(2 x)\big] \text{d} x\] from above. Use Monte Carlo integration to approximate \(I\) with Monte Carlo samples of size \(N = 100\), 1000 and 10000 and calculate the relative absolute error of each of your approximations.


    Solution

    For a given value of \(N\), we want to generate \(N\) Uniform([0, 1]) random variates, evaluate the integrand at each, and take the mean as the approximation to \(I\). The following code approximates the integral as I_hat for for \(N = 100\)

    N <- 100
    x <- runif(N)
    fx <- 1 + 2 * cos(2 * x) + 2 * sin(2 * x)
    I_hat <- mean(fx)
    I_hat
    ## [1] 3.330527

    and its relative absolute error, rel_err, is

    true <- 2 + sin(2) - cos(2)
    rel_err <- abs((true - I_hat) / true)
    rel_err
    ## [1] 0.001528359

    The following code then loops this over the \(N\) values \(N = 100\), 1000 and 10000.

    N_vals <- c(100, 1000, 10000)
    I_hat <- rel_err <- numeric(length(N_vals))
    for (i in 1:length(N_vals)) {
      N <- N_vals[i]
      x <- runif(N)
      fx <- 1 + 2 * cos(2 * x) + 2 * sin(2 * x)
      I_hat[i] <- mean(fx)
      rel_err[i] <- abs((true - I_hat[i]) / true)
    }

    which gives the integral approximations

    I_hat
    ## [1] 3.316098 3.311553 3.325624

    and corresponding relative absolute errors

    rel_err
    ## [1] 2.810404e-03 4.177179e-03 5.404947e-05

    We might expect that our relative absolute errors will decrease as \(N\) increases. Probabilistically they will: but our approximations are subject to the variability in the random samples, and so sometimes a smaller Monte Carlo sample size may give a better approximation than a larger sample size. In general, though, we should rely on the probabilistic properties, and use the largest possible Monte Carlo sample size that is feasible.


  21. If \(Y \sim \text{Uniform}([a, b])\) then \[E(Y) = \int_a^b \frac{u}{b - a} \text{d} u = \frac{1}{2}(a + b).\] Use the mean of \(m = 1000\) Monte Carlo replicates of Monte Carlo samples of size \(N = 500\) to approximate the above integral, and hence to approximate \(E(Y)\), for \(a = -1\) and \(b = 1\).


    Solution

    We’ll start by setting the Monte Carlo sample size, N, and then the number of replicates, m.

    N <- 500
    m <- 1e3

    For a given replicate we need to generate \(N\) points from the Uniform[-1, 1] distribution, \(u_1^*, \ldots, u_N^*\), say, and then our approximation to the integral is given by \(N^{-1} \sum_{i = 1}^N u_i^* / 2\), as \(b - a = 2\).

    The following calculates this for a single replicate, where \(a =\) a and \(b =\) b,

    a <- -1
    b <- 1
    u <- runif(N, -1, 1)
    mean(u) / (b -a)
    ## [1] -0.001125121

    and then replicate() can perform the \(m\) replicates

    uu <- replicate(m, runif(N, -1, 1))
    dim(uu)
    ## [1]  500 1000

    which stored the sampled values in a \(N \times m\) matrix uu. Then we can use colMeans() to calculate the mean of each sample, and mean() to take the mean over replicates.

    mean(colMeans(uu) / (b - a))
    ## [1] 0.0005700557

    [Note that due to the independence of the samples, this is equivalent to

    mean(uu) / (b - a)
    ## [1] 0.0005700557

    and so \(m\) Monte Carlo samples of size \(N\) is equivalent to one Monte Carlo sample of size \(mN\).]


  22. Approximate the integral the earlier integral \[I = \int_0^2 \int_{-\pi}^\pi \left[\sin(x_1) + x_2 + 1\right] \text{d}x_1\text{d}x_2 \] using Monte Carlo with samples of size \(N = 100\), \(200\) and \(400\) and \(m = 1000\) replicates for each. Calculate the mean and variance of your approximations for each value of \(N\), i.e. across replicates. What do you observe?


    Solution

    We’ll start with a single replicate for \(N = 100\), generating \(x_1\) from the Uniform([\(-\pi, \pi\)]) distribution and \(x_2\) from the Uniform([0, 2]) distribution. Let \(x_{11}^*, \ldots, x_{1N}^*\) and \(x_{21}^*, \ldots, x_{2N}^*\) denote these respective samples. Then the integral approximation is given by \[\hat I = \dfrac{4\pi}{N} \sum_{i = 1}^N \big[\sin(x_{1i}^*) + x_{2i}^* + 1\big]\] as our domain for \((x_1, x_2)\), denoted by \(\mathcal{X}\) in the Lecture Notes, is \([0, 2] \times [-\pi, \pi]\), and hence \(V(\mathcal{X}) = 4\pi\). The following then calculates this in R.

    N <- 100
    x1 <- runif(N, -pi, pi)
    x2 <- runif(N, 0, 2)
    f <- function(x1, x2) sin(x1) + x2 + 1
    4 * pi * mean(f(x1, x2))
    ## [1] 26.53961

    Now we can to replicate this \(m\) times for the different \(N\) values. For explicitness, we’ll use loops.

    N_vals <- c(100, 200, 400)
    m <- 1000
    I_hat <- matrix(NA, length(N_vals), m)
    for (i in 1:length(N_vals)) {
      N <- N_vals[i]
      for (j in 1:m) {
        x1 <- runif(N, -pi, pi)
        x2 <- runif(N, 0, 2)
        I_hat[i, j] <- 4 * pi * mean(f(x1, x2))
      }
    }

    The following give the mean and variance across replicates.

    rowMeans(I_hat)
    ## [1] 25.06577 25.08629 25.16046
    apply(I_hat, 1, sd)
    ## [1] 1.1376463 0.8392819 0.6005032

    Our main observation here is that the variance decreases as \(N\) increases, and roughly halves as \(N\) doubles, which is consistent with the formula for \(\text{Var}(\hat I_{\mathcal{X}})\) given in the Lecture Notes, i.e. being inversely proportional to \(N\).


  23. Approximate \[I = \int_{-1}^1 \int_{-1}^1 \int_{-1}^1 (x_1 x_2 x_3)^2 \text{d}x_1\text{d}x_2\text{d}x_3\] using a Monte Carlo sample of size \(N=10^4\) and evaluate the approximation’s relative absolute error.


    Solution

    Let’s start by finding the true integral \[I = \int_{-1}^1 \int_{-1}^1 \int_{-1}^1 (x_1 x_2 x_3)^2 \text{d}x_1\text{d}x_2\text{d}x_3 = \int_{-1}^1 \int_{-1}^1 \dfrac{2}{3}\text{d}x_2\text{d}x_3 = \ldots = \left(\dfrac{2}{3}\right)^3 = \dfrac{8}{27}.\]

    We’ll store this as true.

    true <- 8 / 27

    Then we want to generate three sets of \(N\) draws from the Uniform([-1, 1]) distribution, representing \(x_1\), \(x_2\) and \(x_3\), which we’ll store as x123.

    N <- 1e4
    x123 <- matrix(runif(3 * N, -1, 1), N)

    Our domain is \([-1, 1]^3\) which has volume \(2^3 = 8\). Our approximation to the integral, I_hat, is

    I_hat <- 8 * mean(apply(x123^2, 1, prod))
    I_hat
    ## [1] 0.2993306

    which has a relative absolute error, rel_err, of

    rel_err <- abs((true - I_hat) / true)
    rel_err
    ## [1] 0.01024071

    i.e. about 1%.


  24. Approximate the negative binomial marginal distribution given in Equation (4.2) using Monte Carlo sampling with \(N = 10^3\), \(\alpha = 5\), \(\beta = 0.5\) and for \(y = 0, 1, \ldots, 20\), and calculate the relative absolute error of your approximation.


    Solution

    We’ll start by storing \(N\), \(\alpha\), \(\beta\) and \(y = 0, 1, \ldots, 20\).

    N <- 1e3
    alpha <- 5
    beta <- .5
    y_vals <- 0:20

    Then we want to generate \(N\) values for \(\lambda\) from the \(\text{Gamma}(\alpha, \beta)\) distribution, which we’ll call lambda_samp.

    lambda_samp <- rgamma(N, alpha, beta)

    Our approximation to the marginal distribution is obtained by evaluating the Poisson(\(\lambda\)) distribution for the \(y\) values specified at each sampled \(\lambda\) value, which we’ll call fy_samp,

    fy_samp <- sapply(lambda_samp, function(lambda) dpois(y_vals, lambda))

    and then averaging over the samples, which we’ll call fy_hat.

    fy_hat <- rowMeans(fy_samp)
    fy_hat
    ##  [1] 0.003646912 0.012868946 0.026607133 0.042330466 0.057387922 0.069712704
    ##  [7] 0.078056072 0.081994808 0.081816047 0.078298252 0.072440391 0.065222290
    ## [13] 0.057453215 0.049716276 0.042382039 0.035655861 0.029631056 0.024332628
    ## [19] 0.019746861 0.015838362 0.012558630

    The relative absolute error, rel_err, relative to the true marginal pdf, fy, can be caluclated with

    fy <- dnbinom2(y_vals, alpha, 1 / (beta + 1))
    rel_err <- abs((fy - fy_hat) / fy)
    rel_err
    ##  [1] 0.113800441 0.061853809 0.030169988 0.008106467 0.008541485 0.020948770
    ##  [7] 0.028824333 0.031614646 0.029365575 0.022995284 0.014064485 0.004323243
    ## [13] 0.004722164 0.012097060 0.017473383 0.021136090 0.023842795 0.026619715
    ## [19] 0.030533783 0.036477436 0.044998852

    and is reasonably small for each \(y\) value. Finally, as in question \(\ref{negbin}\), we’ll plot the approximation against the true marginal pdf

    plot(y_vals, fy, ylim = c(0, max(fy_hat)), xlab = 'y', ylab = 'f(y)')
    lines(y, fy_hat, col = 2)
    Comparison of true negative binomial marginal pmf with its Monte Carlo approximation.

    Figure 4.3: Comparison of true negative binomial marginal pmf with its Monte Carlo approximation.

    and see that the approximation (in red) closely follows the true marginal pdf.