We’ll start with an example fitting the generalized extreme value (GEV) distribution to a annual maxima for locations represented on a grid, where we use a Gaussian Markov random field (GMRF) to capture variation in the GEV distribution’s parameters over the grid. We’ll later look at other distributions, some GMRF properties, and other GRMF specifications.
We’ll load a few packages, to make for slightly tidier code later.
library(evgmrf)
library(MASS)
library(lattice)
We’ll just generate some multivariate Gaussian data as an example. We’ll consider a \(15 \times 10\) grid, with each cell having 50 years or data, from which we’ll extract annual maxima.
nx <- 15
ny <- 10
nyrs <- 50
ndy <- 365
x <- 1:nx
y <- 1:ny
xy <- expand.grid(x = x, y = y)
D <- sqrt(outer(xy[, 1], xy[, 1], FUN = '-')^2 + outer(xy[, 2], xy[, 2], FUN = '-')^2)
C <- exp(-D^1.9999)
mu <- xy[,1] + xy[,2]
sigma <- 1 + xy[,2] / 20
z <- mvrnorm(nyrs * ndy, mu, sigma * C)
z2 <- array(t(z), c(nx, ny, ndy, nyrs))
z_max <- apply(z2, c(1, 2, 4), max)
Here’s a quick plot of the data, in terms of overall maxima at each location.
levelplot(apply(z_max, 1:2, max), aspect = 'fill')
We need to re-order our array of maxima so that we have \((t, i, j)\), where \(t\) is the time index, \(i\) is the row index, and \(j\) is the column index.
z_max2 <- aperm(z_max, c(3, 1, 2))
And this is enough to fit a GMRF model with evgmrf with
its default arguments.
m0 <- evgmrf(z_max2)
We can plot the model’s estimates
plot(m0, type = 'response')
or we can estimate quantiles from the model
plot(m0, prob = .9)
We can extract the model’s estimates with standard errors
pred0 <- predict(m0, se.fit = TRUE, progress = FALSE)
lapply(pred0, function(x) lapply(x, function(y) y[1:2, 1:5]))
## $fitted
## $fitted$location
## [,1] [,2] [,3] [,4] [,5]
## [1,] 4.714300 5.978334 6.955998 8.071688 9.05263
## [2,] 5.929518 6.863698 8.003890 9.108325 10.19242
##
## $fitted$logscale
## [,1] [,2] [,3] [,4] [,5]
## [1,] -1.0853721 -1.133766 -1.080218 -1.111603 -1.054478
## [2,] -0.9922232 -1.103613 -1.073816 -1.097977 -1.067473
##
## $fitted$transshape
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.4463217 0.4302799 0.4281980 0.4175678 0.4144532
## [2,] 0.4926327 0.4414815 0.4272417 0.4232714 0.4074620
##
##
## $se
## $se$location
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.04948484 0.04698397 0.05008356 0.04805460 0.05095972
## [2,] 0.05624265 0.04836904 0.05019169 0.04867384 0.05036493
##
## $se$logscale
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.06528790 0.05999707 0.05538770 0.05770479 0.05653681
## [2,] 0.05402853 0.05279869 0.04978733 0.05209812 0.05124136
##
## $se$transshape
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.09862342 0.09084202 0.08402685 0.08317886 0.08353744
## [2,] 0.07573394 0.07771057 0.07458366 0.07568578 0.07524902
We can extract the model’s estimates with standard errors
pred1 <- predict(m0, se.fit = TRUE, progress = FALSE, prob = .99)
lapply(pred1, function(x) lapply(x, function(y) y[1:2, 1:5]))
## $fitted
## $fitted$q_0.99
## [,1] [,2] [,3] [,4] [,5]
## [1,] 5.999369 7.187725 8.229879 9.296197 10.34604
## [2,] 7.391145 8.120861 9.285013 10.355069 11.46231
##
##
## $se
## $se$q_0.99
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.1326147 0.1179958 0.1105297 0.1115269 0.1143054
## [2,] 0.1124677 0.1087401 0.1041876 0.1060653 0.1077551
evgmrfThe following distributions can be estimated with
evgmrf(..., family = '...'):
family = 'gev', the generalized extreme value (GEV)
distribution (as seen above)family = 'gpd', the generalized Pareto distribution
(GPD)family = 'rlarge', the \(r\)-largest order statistics modelsfamily = 'ald', the asymmetric Laplace
distributionfamily = 'gev'The GEV distribution is appropriate for block maxima of sufficiently large blocks. Here years will be considered as blocks, to help intuition; henceforth we will refer to maxima. A random variable \(Y\) that is GEV distributed has cumulative distribution function (CDF) \[ F_\text{GEV}(y; \mu, \psi, \xi) = \exp\left\{-\left[1 + \xi\left(\dfrac{y - \mu}{\psi}\right)\right]^{-1/\xi}\right\}, \] which is defined for \(\{y : 1 + \xi (y - \mu) / \psi > 0\}\) with \((\mu, \psi, \xi) \in \mathbb{R} \times \mathbb{R}^+ \times \mathbb{R}/\{0\}\). The limit \(\xi \to 0\) is used for the \(\xi = 0\) case, which corresponds to the Gumbel CDF, \(\exp(-\exp\{-[(y - \mu)/\psi]\})\). For all models this limit is invoked in evgmrf if \(|\xi| < 10^{-6}\).
The \(r\)-largest order statistics models capture the GEV as the special case of \(r=1\); i.e., instead of modelling just block maxima, the \(r\)-largest values in a block are modelled. This uses the same parameterisation as the GEV, with a resulting joint pdf for \(y^{(1)} \geq \ldots \geq y^{(r)}\) given by \[ f_r(y^{(1)}, \ldots, y^{(r)}; \mu, \psi, \xi) = \exp\left\{-\left[1 + \xi\left(\dfrac{y^{(r)} - \mu}{\psi}\right)\right]^{-1/\xi}\right\} \prod_{i = 1}^r \psi^{-1} \left[1 + \xi\left(\dfrac{y^{(i)} - \mu}{\psi}\right)\right]^{-1/\xi - 1}, \] which is defined as above and for for \(\{y^{(i)} : 1 + \xi (y^{(i)} - \mu) / \psi > 0\}\), for \(i = 1, \ldots, r\).
For these model, we can take the earlier data, and extract the \(r\)-largest order statistics instead. Let’s take \(r = 3\).
r <- 3
z_r <- apply(z2, c(1, 2, 4), function(x) sort(x, decreasing = TRUE)[1:r])
z_r2 <- aperm(z_r, c(4, 1, 2, 3))
m_rlarge <- evgmrf(z_r2, family = 'rlarge')
When we plot the \(r\)-largest order statistics model’s estimates, we’re plotting the same values as for the GEV
plot(m_rlarge, type = 'response')
and see good agreement between the estimates.
The GPD is used to model excesses of a high threshold \(u\). For a random variable \(Y\), it is a model for the conditional distribution \((Y-u) \mid (Y > u)\) with CDF \[ F_\text{GPD}^{(u)}(y; \psi_u, \xi) = 1 - \left[1 + \xi\left(\dfrac{y}{\psi_u}\right)\right]^{-1/\xi}, \] which is defined for \(\{y : y > 0 \text{ and } 1 + \xi y / \psi_u > 0\}\) with \((\psi_u, \xi) \in \mathbb{R}^+ \times \mathbb{R}/\{0\}\). The exponential CDF, \(1 - \exp(-y/\psi_u)\), is used for the \(\xi = 0\) case.
To demonstrate this model, we’ll take the earlier data and use a 98%
empirical quantile estimate as the threshold, and set anything below the
threshold as NA.
z3 <- apply(z2, 1:2, c)
u <- round(apply(z2, 1:2, quantile, .98), 1)
z_exc <- z3 - rep(u, each = nyrs * ndy)
z_exc[z_exc <= 0] <- NA
Then, before fitting the model, we’ll re-order data across locations
to put NAs at the end, and then drop any redundant
dimensions using tighten().
z_gpd <- tighten(z_exc)
m_gpd <- evgmrf(z_gpd, family = 'gpd')
A plot of this model’s estimates is consistent with that seen earlier.
plot(m_gpd, type = 'response')
The ALD is not an EVD in the usual sense. It is useful in threshold-based extreme value analyses for allowing quantile estimation (Yu and Moyeed 2001). The GPD and Poisson-GPD models rely on a `high’ threshold. Coles (2001, chap. 4) discusses assessing its choice. High can be sometimes be intuitively defined through a high quantile, e.g., 0.9, 0.95 or 0.99. Quantile regression can be used to estimate such thresholds, especially covariate-dependent thresholds. The ALD has density function \[ f_{\text{ALD}, \tau}(y; u, \sigma, \tau) = \dfrac{\tau(1 - \tau)}{\sigma} \exp\left\{-\rho_\tau\left(\dfrac{y - u}{\sigma}\right)\right\}, \] where \(\rho_\tau(y) = y(\tau - I\{y < 0\})\) is the check function, for indicator function \(I\{\}\); see Koenker (2005) for an overview of quantile regression. The modified check function of Oh, Lee, and Nychka (2011) is used in evgam to ease inference.
For this model, we’ll again consider the 98th percentile. However, we’ll subset the data to the largest 10% at each location, and then estimate their 80th percentile. This is somewhat informative for occasions when we want to use quantile regression to estimate a high threshold, but want to significantly speed up its estimation.
The following picks out the highest 10% of the data
z_ald <- apply(z3, 2:3, function(x) sort(x, decreasing = TRUE)[1:(.1 * nyrs * ndy)])
and then this evgmrf call uses quantile regression to
estimate the 80th percentile.
m_ald <- evgmrf(z_ald, family = 'ald', args = list(tau = .8))
We can plot the ALD’s location and (log) scale parameters
plot(m_ald)
and we can also see that this approach gives good agreement to the earlier empirical 98th percentile estimates.
plot(predict(m_ald)$location, u, asp = 1)
abline(0, 1, col = 2)