MTH3045: Statistical Computing: Exercises
16/01/2024
1 Chapter 1 exercises
- Generate a sample of \(n = 20\) \(N(1, 3^2)\) random variates, and without using
mean()
,var()
orsd()
writeR
functions to calculate the sample mean, \(\bar x\), sample variance, \(s^2\), and sample standard deviation, \(s\), where \[ \bar x = \dfrac{1}{n} \sum_{i = 1}^n x_i \text{ and } s^2 = \dfrac{1}{n - 1} \sum_{i = 1}^n (x_i - \bar x)^2. \] Note thansum()
may be used.
Solution
n <- 20
y <- rnorm(n, 1, 3)
# mean
mean2 <- function(x) {
# function to calculate mean of a vector
# x is a vector
# returns a scalar
sum(x) / length(x)
}
mean2(y)
## [1] 0.5542768
## [1] TRUE
# variance
var2 <- function(x) {
# function to calculate variance of a vector
# x is a vector
# returns a scalar
xbar <- mean2(x)
sum((x - xbar)^2) / (length(x) - 1)
}
var2(y)
## [1] 7.871484
## [1] TRUE
# standard deviation
sd2 <- function(x) {
# function to calculate standard deviation of a vector
# x is a vector
# returns a scalar
sqrt(var2(x))
}
sd2(y)
## [1] 2.805617
## [1] TRUE
Note that above for mean2()
we’ve not used a for
loop. A for
loop can be used, but usually if on cna be avoided, then it should. Then for var2()
we’ve re-used mean2()
and for sd2()
we’ve re-used var2()
. It’s often a good idea to re-use functions. In this case, we break the function sd2()
into various smaller functions, which can often be good practice, as whether the final function is correct can be assessed by whether the simpler functions it comprises are correct.
- Consider computing \(\text{Pr}(Z \geq z) = 1 - \Phi(z)\) where \(Z \sim \text{Normal}(0, 1)\), or, for short, \(Z \sim N(0, 1)\). For \(z = 0, 0.5, 1, 1.5, 2, \ldots\) compute this in
R
in three different ways using the following three commands
and find the lowest value of \(z\) for which the three don’t give the same answer.
Solution
## z
## [1,] 0.0 5.000000e-01 5.000000e-01 5.000000e-01
## [2,] 0.5 3.085375e-01 3.085375e-01 3.085375e-01
## [3,] 1.0 1.586553e-01 1.586553e-01 1.586553e-01
## [4,] 1.5 6.680720e-02 6.680720e-02 6.680720e-02
## [5,] 2.0 2.275013e-02 2.275013e-02 2.275013e-02
## [6,] 2.5 6.209665e-03 6.209665e-03 6.209665e-03
## [7,] 3.0 1.349898e-03 1.349898e-03 1.349898e-03
## [8,] 3.5 2.326291e-04 2.326291e-04 2.326291e-04
## [9,] 4.0 3.167124e-05 3.167124e-05 3.167124e-05
## [10,] 4.5 3.397673e-06 3.397673e-06 3.397673e-06
## [11,] 5.0 2.866516e-07 2.866516e-07 2.866516e-07
## [12,] 5.5 1.898956e-08 1.898956e-08 1.898956e-08
## [13,] 6.0 9.865876e-10 9.865877e-10 9.865876e-10
## [14,] 6.5 4.016001e-11 4.015999e-11 4.016001e-11
## [15,] 7.0 1.279813e-12 1.279865e-12 1.279813e-12
Above we see that for \(z = 6.0\), the second approximation to the standard normal tail probability doesn’t give the same answer as the other two approximations (although the discrepancy is tiny).
- The formula \(\text{Var}(Y) = \text{E}(Y^2) - [\text{E}(Y)]^2\) is sometimes called the ‘short-cut’ variance formula, i.e. a short-cut for \(\text{Var}(Y) = \text{E}[Y - \text{E}(Y)]^2\). Compare computing the biased version of \(\text{Var}(Y)\) using the two formulae above for the samples
y1
andy2
below.
Solution
We’ll start with a function to calculate the short-cut formula
bvar1 <- function(x) {
# function to calculate short-cut variance
# x is a vector
# returns a scalar
mean(x^2) - mean(x)^2
}
and the we’ll write a function to calculate the other formula.
bvar2 <- function(x) {
# function to calculate variance
# x is a vector
# returns a scalar
mean((x - mean(x))^2)
}
For y1
we have
## [1] 8.25
## [1] 8.25
and so both formulae give the same answer, but for y2
we have
## [1] 0
## [1] 8.25
and see that they don’t. The short-cut formula is clearly wrong, bcause the variance of a sample doesn’t change if we add a constant, which is the only difference between y1
and y2
. (We’ll learn about the cause of this in Chapter 2.)