2 Chapter 2 exercises

    1. Compute \(\pi + \text{e}\), where \(\pi = 3.1415927 \times 10^0\) and \(\text{e} = 2.7182818 \times 10^0\), using floating point addition in base 10, working to five decimal places.


      Solution

      As \(\pi\) and \(\text{e}\) have a common exponent, we sum their mantissas, i.e. \[\begin{align*} \pi + \text{e} &= (3.14159 \times 10^0) + (2.71828 \times 10^0)\\ &= 5.85987 \times 10^0 \end{align*}\]


    2. Now compute \(10^6\pi + \text{e}\), using floating point addition in base 10, now working to seven decimal places.


      Solution

      We first need to put the numbers on a common exponent, which is that of \(10^6 \pi\). \[\begin{align*} 10^6 \pi &= 3.1415927 \times 10^6\\ \text{e} & = 2.7182818 \times 10^0 = 0.0000027 \times 10^6. \end{align*}\] Then summing their mantissas gives \[\begin{align*} 10^6 \pi + \text{e} &= (3.1415927 \times 10^6) + (2.7182818 \times 10^0)\\ &= (3.1415927 \times 10^6) + (0.0000027 \times 10^6)\\ &= (3.1415927 + 0.0000027) \times 10^6\\ &= 3.1415954 \times 10^6. \end{align*}\]


    3. What would happen if we computed \(10^6\pi + \text{e}\) using a base 10 floating point representation, but only worked with five decimal places?


      Solution

      If we put \(2.7182818 \times 10^0\) onto the exponent \(10^6\) we get \(0.00000 \times 10^6\) to five decimal places, and so \(\text{e}\) becomes negligible alongside \(10^6 \pi\) if we only work to five decimal places.


  1. Write \(2\pi\) in binary form using single- and double precision arithmetic. [Hint: I recommend you consider the binary forms for \(\pi\) given in the lecture notes, and you might also use bit2decimal() from the lecture notes to check your answer.]


    Solution

    The key here is to note that we want to calculate \(\pi \times 2\). As we have a number in the form \(S \times (1 + F) \times 2^{E - e}\), then we just need to raise \(E\) by one. For both single- and double-precision, this corresponds to changing the last zero in the 0s and 1s for the \(E\) term to a one.

    bit2decimal <- function(x, e, dp = 20) {
    # function to convert bits to decimal form
    # x: the bits as a character string, with appropriate spaces
    # e: the excess
    # dp: the decimal places to report the answer to
    bl <- strsplit(x, ' ')[[1]] # split x into S, E and F components by spaces
    # and then into a list of three character vectors, each element one bit
    bl <- lapply(bl, function(z) as.integer(strsplit(z, '')[[1]]))
    names(bl) <- c('S', 'E', 'F') # give names, to simplify next few lines
    S <- (-1)^bl$S # calculate sign, S
    E <- sum(bl$E * 2^c((length(bl$E) - 1):0)) # ditto for exponent, E
    F <- sum(bl$F * 2^(-c(1:length(bl$F)))) # and ditto to fraction, F
    z <- S * 2^(E - e) * (1 + F) # calculate z
    out <- format(z, nsmall = dp) # use format() for specific dp
    # add (S, E, F) as attributes, for reference
    attr(out, '(S,E,F)') <- c(S = S, E = E, F = F)
    out
    }
    bit2decimal('0 10000001 10010010000111111011011', 127)
    ## [1] "6.28318548202514648438"
    ## attr(,"(S,E,F)")
    ##           S           E           F 
    ##   1.0000000 129.0000000   0.5707964
    bit2decimal('0 10000000001 1001001000011111101101010100010001000010110100011000', 1023)
    ## [1] "6.28318530717958623200"
    ## attr(,"(S,E,F)")
    ##            S            E            F 
    ##    1.0000000 1025.0000000    0.5707963

  2. Find the calculation error in R of \(b - a\) where \(a = 10^{16}\) and \(b = 10^{16} + \exp(0.5)\).


    Solution
    a <- 1e16
    b <- 1e16 + exp(.5)
    b - a
    ## [1] 2

  3. Create the following: The row vector \[ \mathbf{a} = (2, 4, 6), \] the \(2 \times 3\) matrix \[ \mathbf{B} = \left(\begin{array}{ccc} 6 & 5 & 4\\ 3 & 2 & 1\end{array}\right) \] and a list with first element \(\mathbf{a}\) and second element \(\mathbf{B}\), and an arbitrary (i.e. with whatever values you like) \(5 \times 3 \times 2\) array with a 'name' attribute that is 'array1'. \(~\)
    For each of the above, consider whether your code could be simpler.


    Solution
    a <- t(c(2, 4, 6))
    B <- matrix(6:1, 2, byrow = TRUE)
    l <- list(a, B)
    arr <- array(rnorm(30), c(5, 3, 2))
    attr(arr, 'name') <- 'array1'
    arr
    ## , , 1
    ## 
    ##            [,1]       [,2]       [,3]
    ## [1,]  0.5021744 -0.3243478 -0.1800251
    ## [2,] -0.9137393 -0.5555424 -0.9410917
    ## [3,]  0.3861424  1.1738392 -1.5345555
    ## [4,] -1.3246890 -0.1276600  0.9497845
    ## [5,]  1.1743350  2.0280488 -1.2774335
    ## 
    ## , , 2
    ## 
    ##            [,1]       [,2]        [,3]
    ## [1,]  0.4627193 -1.1517148  0.82157939
    ## [2,]  0.8607064  0.2187305 -0.63916765
    ## [3,] -1.0594128  0.3703816  0.16419412
    ## [4,]  1.2615024 -1.0162768 -0.48462806
    ## [5,]  1.1556080 -0.2807050  0.06279438
    ## 
    ## attr(,"name")
    ## [1] "array1"

  4. Produce a \(3 \times 4 \times 4 \times 2\) array containing Uniform(0, 1) random variates and compute the mean over its second and third margins using apply(..., ..., means) and rowMeans() or colMeans().


    Solution
    arr <- array(NA, c(3, 4, 4, 2))
    arr[] <- runif(prod(dim(arr))) # saves names to get the number right
    apply(arr, 2:3, mean)
    ##           [,1]      [,2]      [,3]      [,4]
    ## [1,] 0.3942134 0.3820306 0.2997983 0.5703195
    ## [2,] 0.3481094 0.4652640 0.2625658 0.7657332
    ## [3,] 0.6368024 0.5111707 0.6808062 0.5063143
    ## [4,] 0.4917663 0.4106000 0.2843431 0.7682020
    rowMeans(aperm(arr, c(2, 3, 1, 4)), dims = 2)
    ##           [,1]      [,2]      [,3]      [,4]
    ## [1,] 0.3942134 0.3820306 0.2997983 0.5703195
    ## [2,] 0.3481094 0.4652640 0.2625658 0.7657332
    ## [3,] 0.6368024 0.5111707 0.6808062 0.5063143
    ## [4,] 0.4917663 0.4106000 0.2843431 0.7682020
    colMeans(aperm(arr, c(1, 4, 2, 3)), dims = 2)
    ##           [,1]      [,2]      [,3]      [,4]
    ## [1,] 0.3942134 0.3820306 0.2997983 0.5703195
    ## [2,] 0.3481094 0.4652640 0.2625658 0.7657332
    ## [3,] 0.6368024 0.5111707 0.6808062 0.5063143
    ## [4,] 0.4917663 0.4106000 0.2843431 0.7682020

    1. Create a 3-element list of vectors comprising Uniform(0, 1) variates of length 3, 5 and 7, respectively.


      Solution
      lst <- list(runif(3), runif(5), runif(7))

    2. Create another list in which the vectors above are sorted into descending order.


      Solution
      lst2 <- lapply(lst, sort, decreasing = TRUE)

    3. Then create a vector comprising the minimum of each vector in the list, and another stating which element is the minimum. [Hint: for the latter you may want to consult the ‘See Also’ part of the min() function’s help file.]


      Solution
      sapply(lst, min)
      ## [1] 0.49689597 0.08217604 0.16759284
      sapply(lst, which.min)
      ## [1] 1 3 5

  5. Use a for() loop to produce the following. [Hint: the function paste() might be useful.]

    ## [1] "iteration 1"
    ## [1] "iteration 2"
    ## [1] "iteration 3"
    ## [1] "iteration 4"
    ## [1] "iteration 5"
    ## [1] "iteration 6"
    ## [1] "iteration 7"
    ## [1] "iteration 8"
    ## [1] "iteration 9"
    ## [1] "iteration 10"

    Solution
    for (i in 1:10) print(paste('iteration', i))
    ## [1] "iteration 1"
    ## [1] "iteration 2"
    ## [1] "iteration 3"
    ## [1] "iteration 4"
    ## [1] "iteration 5"
    ## [1] "iteration 6"
    ## [1] "iteration 7"
    ## [1] "iteration 8"
    ## [1] "iteration 9"
    ## [1] "iteration 10"

  6. Consider the following two infinite series that represent \(\pi\).\[\begin{align*} \pi &= 4 \left[1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \frac{1}{9} - \frac{1}{11} + \frac{1}{13} -\cdots\right]\\ \pi &= 3 + \frac{4}{2 \times 3 \times 4} - \frac{4}{4 \times 5 \times 6} + \frac{4}{6 \times 7 \times 8} - \frac{4}{8 \times 9 \times 10} + \cdots \end{align*}\] Use a while() loop to find which converges to pi in R to within \(\epsilon_m^{1/3}\) using the fewest terms, where \(\epsilon_m\) is R’s machine tolerance.


    Solution

    Here’s the first approximation…

    quarter_pi <- 1
    terms1 <- 1
    denom <- 3
    multiplier <- -1
    my_pi <- 4 * quarter_pi
    while(abs(my_pi - pi) > .Machine$double.eps^(1/3)) {
      terms1 <- terms1 + 1
      quarter_pi <- quarter_pi + multiplier / denom
      my_pi <- 4 * quarter_pi
      denom <- denom + 2
      multiplier <- -1 * multiplier
    }
    terms1
    ## [1] 165141

    …and here’s the second approximation…

    my_pi <- 3
    terms2 <- 2
    denoms <- c(2, 3, 4)
    multiplier <- 1
    while(abs(my_pi - pi) > .Machine$double.eps^(1/3)) {
      my_pi <- my_pi + multiplier * 4 / prod(denoms)
      denoms <- denoms + 2
      multiplier <- -1 * multiplier
      terms2 <- terms2 + 1
    }
    terms2
    ## [1] 36

    1. Write a function based on a for() loop to calculate the cumulative sum of a vector, \(\bf y\), i.e. so that its \(i\)th value, \(y_i\) say, is \[y_i = \sum_{j = 1}^i x_j.\]


      Solution

      Either of the following two functions are options (although others exist).

      my_cumsum <- function(x) {
      # function 1 to calculate cumulative sum of a vector
      # x is a vector
      # returns a vector of length length(x)
      out <- numeric(length(x))
      for (i in 1:length(x)) {
        out[i] <- sum(x[1:i])
      }
      out
      }
      
      my_cumsum2 <- function(x) {
      # function 2 to calculate cumulative sum of a vector
      # x is a vector
      # returns a vector of length length(x)
      out <- x
      for (i in 2:length(x)) {      
        out[i] <- out[i] + out[i - 1]
      }
      out
      }

      We see that both perform the same calculation.

      x <- runif(10)
      cumsum(x)
      ##  [1] 0.3782105 1.0333816 1.1626501 1.6891643 2.2303038 2.6320114 3.1995439
      ##  [8] 3.8348091 4.7733853 5.7455502
      my_cumsum(x)
      ##  [1] 0.3782105 1.0333816 1.1626501 1.6891643 2.2303038 2.6320114 3.1995439
      ##  [8] 3.8348091 4.7733853 5.7455502
      my_cumsum2(x)
      ##  [1] 0.3782105 1.0333816 1.1626501 1.6891643 2.2303038 2.6320114 3.1995439
      ##  [8] 3.8348091 4.7733853 5.7455502

    2. Then benchmark its execution time against R’s vectorised function cumsum() for summing 1000 iid \(Uniform[0, 1]\) random variables.


      Solution
      x <- runif(1e3)
      microbenchmark::microbenchmark(
        cumsum(x),
        my_cumsum(x),
        my_cumsum2(x)
      )
      ## Unit: nanoseconds
      ##           expr     min        lq       mean    median      uq      max neval
      ##      cumsum(x)     769     852.5    1193.62    1158.0    1406     2724   100
      ##   my_cumsum(x) 1816834 1872994.0 2698061.30 1895991.0 1949558 44749972   100
      ##  my_cumsum2(x)   63870   66673.5   68575.99   67585.5   69117    85055   100
      ##  cld
      ##   a 
      ##    b
      ##   a

    1. To start a board game, a player must throw three sixes using a conventional die. Write a function to simulate this, which returns the total number of throws that the player has taken.


      Solution
      sixes1 <- function() {
      # function to simulate number of throws needed
      # to reach three sixes
      # n is an integer
      # returns total number of throws taken
      out <- sample(1:6, 3, replace = TRUE)
      while(sum(out == 6) < 3) {
        out <- c(out, sample(1:6, 1))
      }
      return(length(out))
      }

    2. Then use 1000 simulations to empirically estimate the distribution of the number of throws needed.


      Solution
      samp1 <- replicate(1e3, sixes1())
      table(samp1)
      ## samp1
      ##  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 
      ##  3 12 23 42 15 32 61 49 51 57 35 48 50 43 39 49 31 32 27 30 28 25 24 23 18 22 
      ## 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 49 50 51 53 56 62 64 
      ## 13 17  8 13 13  9  9  9  4  5  5  3  3  5  2  1  2  3  1  1  1  1  1  1  1
      hist(samp1, xlab = 'Number of throws', main = 'Histogram of number of throws')
      Histogram of empirical distribution of number of throws for starting method 1.

      Figure 2.1: Histogram of empirical distribution of number of throws for starting method 1.


    3. I think you’ll agree that this would be a rather dull board game! It is therefore proposed that a player should instead throw two consecutive sixes. Write a function to simulate this new criterion, and estimate its distribution empirically.


      Solution
      sixes2 <- function() {
      # function to simulate number of throws needed
      # for two consecutive sixes
      # n is an integer
      # returns total number of throws taken
      out <- sample(1:6, 2, replace = TRUE)
      cond <- TRUE
      while(cond) {
        if (sum(out[1:(length(out) - 1)] + out[2:length(out)] == 12) > 0) {
          cond <- FALSE
        } else {
          out <- c(out, sample(1:6, 1))
        }
      }
      return(length(out))
      }
      sixes2()
      ## [1] 166

    4. Then use 1000 simulations to empirically estimate the distribution of the number of throws needed.


      Solution
      samp2 <- replicate(1e3, sixes2())
      table(samp2)
      ## samp2
      ##   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21 
      ##  30  24  25  20  17  23  19  19  17  21  14  11  12  12  13  11  17  21  21  18 
      ##  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41 
      ##  20  21  18   8  16  14  11  11  17  15  12  11  13  15   9   4  11   9  13   8 
      ##  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61 
      ##   8   9   5   6  10   7   8   7  12   7   5   7   7   5   7   4   4  10   5   5 
      ##  62  63  64  65  66  67  68  70  71  72  73  74  75  76  77  79  80  81  82  83 
      ##   9   5   1   4   6  10   2   5   5   2   4   6   5   4   4   2   4   5   2   1 
      ##  84  85  86  87  88  89  90  91  92  93  94  95  96  98  99 100 101 102 103 104 
      ##   1   7   3   1   2   3   4   3   3   3   3   5   4   2   3   2   1   4   2   2 
      ## 105 106 107 108 109 111 112 115 116 117 118 119 121 124 126 127 130 132 133 134 
      ##   1   3   1   3   7   2   2   1   2   2   1   2   3   2   4   3   4   2   1   1 
      ## 135 137 138 139 140 144 146 147 148 150 151 152 155 156 157 158 161 167 171 173 
      ##   1   3   1   2   3   3   2   1   2   1   1   1   1   2   1   3   2   1   1   1 
      ## 176 182 183 185 188 192 197 203 211 212 214 215 220 236 253 
      ##   1   1   1   1   1   1   1   1   1   1   1   3   1   1   1
      hist(samp2, xlab = 'Number of throws', main = 'Histogram of number of throws')
      Histogram of empirical distribution of number of throws for starting method 2.

      Figure 2.2: Histogram of empirical distribution of number of throws for starting method 2.


    5. By comparing sample mean starting numbers of throws, which starting criterion should get a player into the game quickest?


      Solution
      mean(samp1) - mean(samp2)
      ## [1] -26.304

      So the second approach, by comparing means, takes more throws before the game can begin.


  7. Consider the following two functions for calculating \[d_i = x_{i + 1} - x_i, \hspace{2cm} i = 1, \ldots, n - 1\] where \({\bf x}' = (x_1, \ldots, x_n)\).

    diff1 <- function(x) {
    # function to calculate differences of a vector
    # based on a for loop
    # x is a vector
    # returns a vector of length (length(x) - 1)
    out <- numeric(length(x) - 1)
    for (i in 1:(length(x) - 1)) {
      out[i] <- x[i + 1] - x[i]
    }
    out
    }
    diff2 <- function(x) {
    # function to calculate differences of a vector
    # based on vectorisation
    # x is a vector
    # returns a vector of length (length(x) - 1)
    id <- 1:(length(x) - 1)
    x[id + 1] - x[id]
    }

    The first, diff1() uses a straightforward for() loop to calculate \(d_i\), for \(i = 1, \ldots, n - 1\), whereas diff2() could be seen to be a vectorised alternative. Benchmark the two for a vector of \(n = 1000\) iid \(N(0, 1)\) random variates by comparing the median difference in execution time.


    Solution
    x <- rnorm(1000)
    microbenchmark::microbenchmark(
      diff1(x),
      diff2(x)
      )
    ## Unit: microseconds
    ##      expr    min     lq      mean median      uq      max neval cld
    ##  diff1(x) 63.412 65.537 100.11289 66.206 67.4145 3440.630   100   a
    ##  diff2(x)  8.603  9.040  33.05097  9.228  9.5360 2366.981   100   a

  8. The following function assesses whether all elements is a logical vector are TRUE.

    all2 <- function(x) {
    # function to calculate whether all elements are TRUE
    # returns a scalar
    # x is a logical vector
    sum(x) == length(x)
    }
    1. Calculate the following and benchmark all2() against R’s built-in function all(), which does the same.

      n <- 1e4
      x1 <- !logical(n)

      Solution
      all2(x1)
      ## [1] TRUE
      all(x1)
      ## [1] TRUE
      microbenchmark::microbenchmark(
        all2(x1),
        all(x1)
      )
      ## Unit: microseconds
      ##      expr    min      lq     mean median     uq      max neval cld
      ##  all2(x1)  4.763  4.8875 16.92501  5.051  5.128 1193.384   100   a
      ##   all(x1) 12.990 13.0405 13.61019 13.587 13.612   29.071   100   a

      We see that both take a similar amount of time.


    2. Now swap the first element of x1 so that it’s FALSE and repeat the benchmarking.


      Solution
      x1[1] <- FALSE
      all2(x1)
      ## [1] FALSE
      all(x1)
      ## [1] FALSE
      microbenchmark::microbenchmark(
        all2(x1),
        all(x1)
      )
      ## Unit: nanoseconds
      ##      expr  min   lq    mean median     uq   max neval cld
      ##  all2(x1) 9191 9476 9802.10   9566 9732.5 18963   100   b
      ##   all(x1)  265  291  353.41    313  330.5  3617   100  a

      Now all2() is much slower. This is because it’s performed a calculation on the entire x1 vector, whereas all() has stopped as soon as it’s found a FALSE.


    3. Evaluate the function any2() below against R’s built-in function any() similarly.

      any2 <- function(x) {
      # function to calculate whether any elements are TRUE
      # returns a scalar
      # x is a logical vector
      sum(x) > 0
      }

      Solution
      x2 <- logical(n)
      any2(x2)
      ## [1] FALSE
      any(x2)
      ## [1] FALSE
      microbenchmark::microbenchmark(
        any2(x2),
        any(x2)
      )
      ## Unit: microseconds
      ##      expr    min     lq     mean median     uq      max neval cld
      ##  any2(x2)  4.780  4.994 15.96975  5.034  5.097 1087.368   100   a
      ##   any(x2) 13.016 14.971 15.23015 15.602 15.637   18.150   100   a

      We again see that both take a similar amount of time.


    4. Now swap the first element of x2 so that it’s FALSE and repeat the benchmarking.


      Solution
      x2[1] <- TRUE
      microbenchmark::microbenchmark(  
        any2(x2),
        any(x2)
      )
      ## Unit: nanoseconds
      ##      expr  min     lq    mean median     uq  max neval cld
      ##  any2(x2) 4897 5001.5 5103.98 5051.0 5091.0 9518   100   b
      ##   any(x2)  130  139.0  189.47  181.5  191.5 1321   100  a

      This time any2() is much slower, for similar reasoning to all2() being much slower than all(), except that any() is stopped when it reaches a TRUE.