Package 'spStack'

Title: Bayesian Geostatistics Using Predictive Stacking
Description: Fits Bayesian hierarchical spatial process models for point-referenced Gaussian, Poisson, binomial, and binary data using stacking of predictive densities. It involves sampling from analytically available posterior distributions conditional upon some candidate values of the spatial process parameters and, subsequently assimilate inference from these individual posterior distributions using Bayesian predictive stacking. Our algorithm is highly parallelizable and hence, much faster than traditional Markov chain Monte Carlo algorithms while delivering competitive predictive performance. See Zhang, Tang, and Banerjee (2024) <doi:10.48550/arXiv.2304.12414>, and, Pan, Zhang, Bradley, and Banerjee (2024) <doi:10.48550/arXiv.2406.04655> for details.
Authors: Soumyakanti Pan [aut, cre] , Sudipto Banerjee [aut]
Maintainer: Soumyakanti Pan <[email protected]>
License: GPL-3
Version: 1.0.1
Built: 2025-01-08 06:37:47 UTC
Source: https://github.com/span-18/spstack-dev

Help Index


spStack: Bayesian Geostatistics Using Predictive Stacking

Description

This package delivers functions to fit Bayesian hierarchical spatial process models for point-referenced Gaussian, Poisson, binomial, and binary data using stacking of predictive densities. It involves sampling from analytically available posterior distributions conditional upon some candidate values of the spatial process parameters for both Gaussian response model as well as non-Gaussian responses, and, subsequently assimilate inference from these individual posterior distributions using Bayesian predictive stacking. Our algorithm is highly parallelizable and hence, much faster than traditional Markov chain Monte Carlo algorithms while delivering competitive predictive performance.

In context of inference for spatial point-referenced data, Bayesian hierarchical models involve latent spatial processes characterized by spatial process parameters, which besides lacking substantive relevance in scientific contexts, are also weakly identified and hence, impedes convergence of MCMC algorithms. This motivates us to build methodology that involves fast sampling from posterior distributions conditioned on a grid of the weakly identified model parameters and combine the inference by stacking of predictive densities (Yao et. al 2018). We exploit the Bayesian conjugate linear modeling framework for the Gaussian case (Zhang, Tang and Banerjee 2024) and the generalized conjugate multivariate distribution theory (Pan, Zhang, Bradley and Banerjee 2024) to analytically derive the individual posterior distributions.

Details

Package: spStack
Type: Package
Version: 1.0.1
License: GPL-3

Accepts a formula, e.g., y~x1+x2, for most regression models accompanied by candidate values of spatial process parameters, and returns posterior samples of the regression coefficients and the latent spatial random effects. Posterior inference or prediction of any quantity of interest proceed from these samples. Main functions are -
spLMexact()
spGLMexact()
spLMstack()
spGLMstack()

Author(s)

Maintainer: Soumyakanti Pan [email protected] (ORCID)

Authors:

References

Zhang L, Tang W, Banerjee S (2024). "Bayesian Geostatistics Using Predictive Stacking."
doi:10.48550/arXiv.2304.12414.

Pan S, Zhang L, Bradley JR, Banerjee S (2024). "Bayesian Inference for Spatial-temporal Non-Gaussian Data Using Predictive Stacking." doi:10.48550/arXiv.2406.04655.

Yao Y, Vehtari A, Simpson D, Gelman A (2018). "Using Stacking to Average Bayesian Predictive Distributions (with Discussion)." Bayesian Analysis, 13(3), 917-1007. doi:10.1214/17-BA1091.

See Also

Useful links:


Different Cholesky factor updates

Description

Provides functions that implements different types of updates of a Cholesky factor that includes rank-one update, single row/column deletion update and a block deletion update.

Usage

cholUpdateRankOne(A, v, alpha, beta, lower = TRUE)

cholUpdateDel(A, del.index, lower = TRUE)

cholUpdateDelBlock(A, del.start, del.end, lower = TRUE)

Arguments

A

an n×nn\times n triangular matrix

v

an n×1n\times 1 matrix/vector

alpha

scalar; if not supplied, default is 1

beta

scalar; if not supplied, default is 1

lower

logical; if A is lower-triangular or not

del.index

an integer from 1 to nn indicating the row/column to be deleted

del.start

an integer from 1 to nn indicating the first row/column of a block to be deleted, must be at least 1 less than del.end

del.end

an integer from 1 to nn indicating the last row/column of a block to be deleted, must be at least 1 more than del.start

Details

Suppose B=AAB = AA^\top is a n×nn \times n matrix with AA being its lower-triangular Cholesky factor. Then rank-one update corresponds to finding the Cholesky factor of the matrix C=αB+βvvC = \alpha B + \beta vv^\top for some α,βR\alpha,\beta\in\mathbb{R} given AA (see, Krause and Igel 2015). Similarly, single row/column deletion update corresponds to finding the Cholesky factor of the (n1)×(n1)(n-1)\times(n-1) matrix BiB_i which is obtained by removing the ii-th row and column of BB, given AA for some i1,,ni - 1, \ldots, n. Lastly, block deletion corresponds to finding the Cholesky factor of the (nnk)×(nnk)(n-n_k)\times(n-n_k) matrix BIB_{I} for a subset II of {1,,n}\{1, \ldots, n\} containing nkn_k consecutive indices, given the factor AA.

Value

An m×mm \times m lower-triangular matrix with m=nm = n in case of cholUpdateRankOne(), m=n1m = n - 1 in case of cholUpdateDel(), and, m=nnkm = n - n_k in case of cholUpdateDelBlock() where nkn_k is the size of the block removed.

Author(s)

Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]

References

Oswin Krause and Christian Igel. 2015. "A More Efficient Rank-one Covariance Matrix Update for Evolution Strategies". In Proceedings of the 2015 ACM Conference on Foundations of Genetic Algorithms XIII (FOGA '15). Association for Computing Machinery, New York, NY, USA, 129-136. doi:10.1145/2725494.2725496.

Examples

n <- 10
A <- matrix(rnorm(n^2), n, n)
A <- crossprod(A)
cholA <- chol(A)

## Rank-1 update
v <- 1:n
APlusvvT <- A + tcrossprod(v)
cholA1 <- t(chol(APlusvvT))
cholA2 <- cholUpdateRankOne(cholA, v, lower = FALSE)
print(all(abs(cholA1 - cholA2) < 1E-9))

## Single Row-deletion update
ind <- 2
A1 <- A[-ind, -ind]
cholA1 <- t(chol(A1))
cholA2 <- cholUpdateDel(cholA, del.index = ind, lower = FALSE)
print(all(abs(cholA1 - cholA2) < 1E-9))

## Block-deletion update
start_ind <- 2
end_ind <- 6
del_ind <- c(start_ind:end_ind)
A1 <- A[-del_ind, -del_ind]
cholA1 <- t(chol(A1))
cholA2 <- cholUpdateDelBlock(cholA, start_ind, end_ind, lower = FALSE)
print(all(abs(cholA1 - cholA2) < 1E-9))

Optimal stacking weights

Description

Obtains optimal stacking weights given leave-one-out predictive densities for each candidate model.

Usage

get_stacking_weights(log_loopd, solver = "ECOS")

Arguments

log_loopd

an n×Mn \times M matrix with ii-th row containing the leave-one-out predictive densities for the ii-th data point for the MM candidate models.

solver

specifies the solver to use for obtaining optimal weights. Default is "ECOS". Internally calls CVXR::psolve().

Value

A list of length 2.

weights

optimal stacking weights as a numeric vector of length MM

status

solver status, returns "optimal" if solver succeeded.

Author(s)

Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]

References

Yao Y, Vehtari A, Simpson D, Gelman A (2018). "Using Stacking to Average Bayesian Predictive Distributions (with Discussion)." Bayesian Analysis, 13(3), 917-1007. doi:10.1214/17-BA1091.

See Also

CVXR::psolve(), spLMstack(), spGLMstack()

Examples

data(simGaussian)
dat <- simGaussian[1:100, ]

mod1 <- spLMstack(y ~ x1, data = dat,
                  coords = as.matrix(dat[, c("s1", "s2")]),
                  cor.fn = "matern",
                  params.list = list(phi = c(1.5, 3),
                                     nu = c(0.5, 1),
                                     noise_sp_ratio = c(1)),
                  n.samples = 1000, loopd.method = "exact",
                  parallel = FALSE, solver = "ECOS", verbose = TRUE)

loopd_mat <- do.call('cbind', mod1$loopd)
w_hat <- get_stacking_weights(loopd_mat)
print(round(w_hat$weights, 4))
print(w_hat$status)

Calculate distance matrix

Description

Computes the inter-site Euclidean distance matrix for one or two sets of points.

Usage

iDist(coords.1, coords.2, ...)

Arguments

coords.1

an n×pn\times p matrix with each row corresponding to a point in pp-dimensional space.

coords.2

an m×pm\times p matrix with each row corresponding to a point in pp dimensional space. If this is missing then coords.1 is used.

...

currently no additional arguments.

Value

The n×nn\times n or n×mn\times m inter-site Euclidean distance matrix.

Author(s)

Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]

Examples

n <- 10
p1 <- cbind(runif(n),runif(n))
m <- 5
p2 <- cbind(runif(m),runif(m))
D <- iDist(p1, p2)

Simulate spatial data on unit square

Description

Generates synthetic spatial data of different types where the spatial co-ordinates are sampled uniformly on an unit square. Different types include point-referenced Gaussian, Poisson, binomial and binary data. The design includes an intercept and fixed covariates sampled from a standard normal distribution.

Usage

sim_spData(n, beta, cor.fn, spParams, spvar, deltasq, family, n_binom)

Arguments

n

sample size.

beta

a pp-dimensional vector of fixed effects.

cor.fn

a quoted keyword that specifies the correlation function used to model the spatial dependence structure among the observations. Supported covariance model key words are: 'exponential' and 'matern'.

spParams

a numeric vector containing spatial process parameters - e.g., spatial decay and smoothness.

spvar

value of spatial variance parameter.

deltasq

value of noise-to-spatial variance ratio.

family

a character specifying the distribution of the response as a member of the exponential family. Valid inputs are 'gaussian', 'poisson', 'binary', and 'binomial'.

n_binom

necessary only when family = 'binomial'. Must be a vector of length n that will specify the number of trials for each observation. If it is of length 1, then that value is considered to be the common value for the number of trials for all n observations.

Value

a data.frame object containing the columns -

⁠s1, s2⁠

2D-coordinates in unit square

⁠x1, x2, ...⁠

covariates, not including intercept

y

response

n_trials

present only when binomial data is generated

z_true

true spatial effects with which the data is generated

Author(s)

Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]

Examples

set.seed(1729)
n <- 10
beta <- c(2, 5)
phi0 <- 2
nu0 <- 0.5
spParams <- c(phi0, nu0)
spvar <- 0.4
deltasq <- 1
sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern",
                   spParams = spParams, spvar = spvar, deltasq = deltasq,
                   family = "gaussian")

Synthetic point-referenced binary data

Description

Dataset of size 500, with a binary response variable indexed by spatial coordinates sampled uniformly from the unit square. The model includes one covariate and spatial random effects induced by a Matérn covariogram.

Usage

data(simBinary)

Format

a data.frame object.

⁠s1, s2⁠

2-D coordinates; latitude and longitude.

x1

a covariate sampled from the standard normal distribution.

y

response vector (0/1).

z_true

true spatial random effects that generated the data.

Details

With n=500n = 500, the binary data is simulated using

y(si)Bernoulli(π(si)),i=1,,n,π(si)=ilogit(x(si)β+z(si))\begin{aligned} y(s_i) &\sim \mathrm{Bernoulli}(\pi(s_i)), i = 1, \ldots, n,\\ \pi(s_i) &= \mathrm{ilogit}(x(s_i)^\top \beta + z(s_i)) \end{aligned}

where the function ilogit\mathrm{ilogit} refers to the inverse-logit function, the spatial effects zN(0,σ2R)z \sim N(0, \sigma^2 R) with RR being a n×nn \times n correlation matrix given by the Matérn covariogram

R(s,s)=(ϕss)νΓ(ν)2ν1Kν(ϕss),R(s, s') = \frac{(\phi |s-s'|)^\nu}{\Gamma(\nu) 2^{\nu - 1}} K_\nu(\phi |s-s'|),

where ϕ\phi is the spatial decay parameter and ν\nu the spatial smoothness parameter. We have sampled the data with β=(0.5,0.5)\beta = (0.5, -0.5), ϕ=5\phi = 5, ν=0.5\nu = 0.5, and σ2=0.4\sigma^2 = 0.4. This data can be generated with the code as given in the example below.

Author(s)

Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]

See Also

simGaussian, simPoisson, simBinom

Examples

set.seed(1729)
n <- 500
beta <- c(0.5, -0.5)
phi0 <- 5
nu0 <- 0.5
spParams <- c(phi0, nu0)
spvar <- 0.4
sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern",
                   spParams = spParams, spvar = spvar, deltasq = deltasq,
                   family = "binary")
plot1 <- surfaceplot(sim1, coords_name = c("s1", "s2"), var_name = "z_true")

library(ggplot2)
plot2 <- ggplot(sim1, aes(x = s1, y = s2)) +
  geom_point(aes(color = factor(y)), alpha = 0.75) +
  scale_color_manual(values = c("red", "blue"), labels = c("0", "1")) +
  guides(alpha = 'none') +
  theme_bw() +
  theme(axis.ticks = element_line(linewidth = 0.25),
        panel.background = element_blank(),
        panel.grid = element_blank(),
        legend.title = element_text(size = 10, hjust = 0.25),
        legend.box.just = "center", aspect.ratio = 1)

Synthetic point-referenced binomial count data

Description

Dataset of size 500, with a binomial response variable indexed by spatial coordinates sampled uniformly from the unit square. The model includes one covariate and spatial random effects induced by a Matérn covariogram. The number of trials at each location is sampled from a Poisson distribution with mean 20.

Usage

data(simBinom)

Format

a data.frame object.

⁠s1, s2⁠

2-D coordinates; latitude and longitude.

x1

a covariate sampled from the standard normal distribution.

y

response vector.

n_trials

Number of trials at that location.

z_true

true spatial random effects that generated the data.

Details

With n=500n = 500, the count data is simulated using

y(si)Binomial(m(si),π(si)),i=1,,n,π(si)=ilogit(x(si)β+z(si))\begin{aligned} y(s_i) &\sim \mathrm{Binomial}(m(s_i), \pi(s_i)), i = 1, \ldots, n,\\ \pi(s_i) &= \mathrm{ilogit}(x(s_i)^\top \beta + z(s_i)) \end{aligned}

where the function ilogit\mathrm{ilogit} refers to the inverse-logit function, the number of trials m(si)m(s_i) is sampled from a Poisson distribution with mean 20, the spatial effects zN(0,σ2R)z \sim N(0, \sigma^2 R) with RR being a n×nn \times n correlation matrix given by the Matérn covariogram

R(s,s)=(ϕss)νΓ(ν)2ν1Kν(ϕss),R(s, s') = \frac{(\phi |s-s'|)^\nu}{\Gamma(\nu) 2^{\nu - 1}} K_\nu(\phi |s-s'|),

where ϕ\phi is the spatial decay parameter and ν\nu the spatial smoothness parameter. We have sampled the data with β=(0.5,0.5)\beta = (0.5, -0.5), ϕ=3\phi = 3, ν=0.5\nu = 0.5, and σ2=0.4\sigma^2 = 0.4. This data can be generated with the code as given in the example below.

Author(s)

Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]

See Also

simGaussian, simPoisson, simBinary

Examples

set.seed(1729)
n <- 500
beta <- c(0.5, -0.5)
phi0 <- 3
nu0 <- 0.5
spParams <- c(phi0, nu0)
spvar <- 0.4
sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern",
                   spParams = spParams, spvar = spvar, deltasq = deltasq,
                   n_binom = rpois(n, 20),
                   family = "binomial")
plot1 <- surfaceplot(sim1, coords_name = c("s1", "s2"), var_name = "z_true")

library(ggplot2)
plot2 <- ggplot(sim1, aes(x = s1, y = s2)) +
  geom_point(aes(color = y), alpha = 0.75) +
  scale_color_distiller(palette = "RdYlGn", direction = -1,
                        label = function(x) sprintf("%.0f", x)) +
  guides(alpha = 'none') +
  theme_bw() +
  theme(axis.ticks = element_line(linewidth = 0.25),
        panel.background = element_blank(),
        panel.grid = element_blank(),
        legend.title = element_text(size = 10, hjust = 0.25),
        legend.box.just = "center", aspect.ratio = 1)

Synthetic point-referenced Gaussian data

Description

Dataset of size 500 with a Gaussian response variable, simulated with spatial coordinates sampled uniformly from the unit square. The model includes one covariate and spatial random effects induced by a Matérn covariogram.

Usage

data(simGaussian)

Format

a data.frame object.

⁠s1, s2⁠

2-D coordinates; latitude and longitude.

x1

a covariate sampled from the standard normal distribution.

y

response vector.

z_true

true spatial random effects that generated the data.

Details

The data is generated using the model

y=Xβ+z+ϵ,y = X \beta + z + \epsilon,

where the spatial effects zN(0,σ2R)z \sim N(0, \sigma^2 R) is independent of the measurement error ϵN(0,δ2σ2In)\epsilon \sim N(0, \delta^2 \sigma^2 I_n) with δ2\delta^2 being the noise-to-spatial variance ratio and RR being a n×nn \times n correlation matrix given by the Matérn covariogram

R(s,s)=(ϕss)νΓ(ν)2ν1Kν(ϕss),R(s, s') = \frac{(\phi |s-s'|)^\nu}{\Gamma(\nu) 2^{\nu - 1}} K_\nu(\phi |s-s'|),

where ϕ\phi is the spatial decay parameter and ν\nu the spatial smoothness parameter. We have sampled the data with β=(2,5)\beta = (2, 5), ϕ=2\phi = 2, ν=0.5\nu = 0.5, δ2=1\delta^2 = 1 and σ2=0.4\sigma^2 = 0.4. This data can be generated with the code as given in the example.

Author(s)

Soumyakanti Pan [email protected]

See Also

simPoisson, simBinom, simBinary

Examples

set.seed(1729)
n <- 500
beta <- c(2, 5)
phi0 <- 2
nu0 <- 0.5
spParams <- c(phi0, nu0)
spvar <- 0.4
deltasq <- 1
sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern",
                   spParams = spParams, spvar = spvar, deltasq = deltasq,
                   family = "gaussian")
plot1 <- surfaceplot(sim1, coords_name = c("s1", "s2"), var_name = "z_true",
                     mark_points = TRUE)
plot1

library(ggplot2)
plot2 <- ggplot(sim1, aes(x = s1, y = s2)) +
  geom_point(aes(color = y), alpha = 0.75) +
  scale_color_distiller(palette = "RdYlGn", direction = -1,
                        label = function(x) sprintf("%.0f", x)) +
  guides(alpha = 'none') + theme_bw() +
  theme(axis.ticks = element_line(linewidth = 0.25),
        panel.background = element_blank(), panel.grid = element_blank(),
        legend.title = element_text(size = 10, hjust = 0.25),
        legend.box.just = "center", aspect.ratio = 1)
plot2

Synthetic point-referenced Poisson count data

Description

Dataset of size 500, with a Poisson distributed response variable indexed by spatial coordinates sampled uniformly from the unit square. The model includes one covariate and spatial random effects induced by a Matérn covariogram.

Usage

data(simPoisson)

Format

a data.frame object.

⁠s1, s2⁠

2-D coordinates; latitude and longitude.

x1

a covariate sampled from the standard normal distribution.

y

response vector.

z_true

true spatial random effects that generated the data.

Details

With n=500n = 500, the count data is simulated using

y(si)Poisson(λ(si)),i=1,,n,logλ(si)=x(si)β+z(si)\begin{aligned} y(s_i) &\sim \mathrm{Poisson}(\lambda(s_i)), i = 1, \ldots, n,\\ \log \lambda(s_i) &= x(s_i)^\top \beta + z(s_i) \end{aligned}

where the spatial effects zN(0,σ2R)z \sim N(0, \sigma^2 R) with RR being a n×nn \times n correlation matrix given by the Matérn covariogram

R(s,s)=(ϕss)νΓ(ν)2ν1Kν(ϕss),R(s, s') = \frac{(\phi |s-s'|)^\nu}{\Gamma(\nu) 2^{\nu - 1}} K_\nu(\phi |s-s'|),

where ϕ\phi is the spatial decay parameter and ν\nu the spatial smoothness parameter. We have sampled the data with β=(2,0.5)\beta = (2, -0.5), ϕ=5\phi = 5, ν=0.5\nu = 0.5, and σ2=0.4\sigma^2 = 0.4. This data can be generated with the code as given in the example below.

See Also

simGaussian, simBinom, simBinary

Examples

set.seed(1729)
n <- 500
beta <- c(2, -0.5)
phi0 <- 5
nu0 <- 0.5
spParams <- c(phi0, nu0)
spvar <- 0.4
sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern",
                   spParams = spParams, spvar = spvar, deltasq = deltasq,
                   family = "poisson")

# Plot an interpolated spatial surface of the true random spatial effects
plot1 <- surfaceplot(sim1, coords_name = c("s1", "s2"), var_name = "z_true")

# Plot the simulated count data
library(ggplot2)
plot2 <- ggplot(sim1, aes(x = s1, y = s2)) +
  geom_point(aes(color = y), alpha = 0.75) +
  scale_color_distiller(palette = "RdYlGn", direction = -1,
                        label = function(x) sprintf("%.0f", x)) +
  guides(alpha = 'none') + theme_bw() +
  theme(axis.ticks = element_line(linewidth = 0.25),
        panel.background = element_blank(), panel.grid = element_blank(),
        legend.title = element_text(size = 10, hjust = 0.25),
        legend.box.just = "center", aspect.ratio = 1)

Univariate Bayesian spatial generalized linear model

Description

Fits a Bayesian spatial generalized linear model with fixed values of spatial process parameters and some auxiliary model parameters. The output contains posterior samples of the fixed effects, spatial random effects and, if required, finds leave-one-out predictive densities.

Usage

spGLMexact(
  formula,
  data = parent.frame(),
  family,
  coords,
  cor.fn,
  priors,
  spParams,
  boundary = 0.5,
  n.samples,
  loopd = FALSE,
  loopd.method = "exact",
  CV.K = 10,
  loopd.nMC = 500,
  verbose = TRUE,
  ...
)

Arguments

formula

a symbolic description of the regression model to be fit. See example below.

data

an optional data frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which spGLMexact is called.

family

Specifies the distribution of the response as a member of the exponential family. Supported options are 'poisson', 'binomial' and 'binary'.

coords

an n×2n \times 2 matrix of the observation coordinates in R2\mathbb{R}^2 (e.g., easting and northing).

cor.fn

a quoted keyword that specifies the correlation function used to model the spatial dependence structure among the observations. Supported covariance model key words are: 'exponential' and 'matern'. See below for details.

priors

(optional) a list with each tag corresponding to a hyperparameter name and containing hyperprior details. Valid tags include V.beta, nu.beta, nu.z and sigmaSq.xi. Values of nu.beta and nu.z must be at least 2.1. If not supplied, uses defaults.

spParams

fixed values of spatial process parameters.

boundary

Specifies the boundary adjustment parameter. Must be a real number between 0 and 1. Default is 0.5.

n.samples

number of posterior samples to be generated.

loopd

logical. If loopd=TRUE, returns leave-one-out predictive densities, using method as given by loopd.method. Default is FALSE.

loopd.method

character. Ignored if loopd=FALSE. If loopd=TRUE, valid inputs are 'exact', 'CV' and 'PSIS'. The option 'exact' corresponds to exact leave-one-out predictive densities which requires computation almost equivalent to fitting the model nn times. The options 'CV' and 'PSIS' are faster and they implement KK-fold cross validation and Pareto-smoothed importance sampling to find approximate leave-one-out predictive densities (Vehtari et al. 2017).

CV.K

An integer between 10 and 20. Considered only if loopd.method='CV'. Default is 10 (as recommended in Vehtari et. al 2017).

loopd.nMC

Number of Monte Carlo samples to be used to evaluate leave-one-out predictive densities when loopd.method is set to either 'exact' or 'CV'.

verbose

logical. If verbose = TRUE, prints model description.

...

currently no additional argument.

Details

With this function, we fit a Bayesian hierarchical spatial generalized linear model by sampling exactly from the joint posterior distribution utilizing the generalized conjugate multivariate distribution theory (Bradley and Clinch 2024). Suppose χ=(s1,,sn)\chi = (s_1, \ldots, s_n) denotes the nn spatial locations the response yy is observed. Let y(s)y(s) be the outcome at location ss endowed with a probability law from the natural exponential family, which we denote by

y(s)EF(x(s)β+z(s);b,ψ)y(s) \sim \mathrm{EF}(x(s)^\top \beta + z(s); b, \psi)

for some positive parameter b>0b > 0 and unit log partition function ψ\psi. We consider the following response models based on the input supplied to the argument family.

'poisson'

It considers point-referenced Poisson responses y(s)Poisson(ex(s)β+z(s))y(s) \sim \mathrm{Poisson}(e^{x(s)^\top \beta + z(s)}). Here, b=1b = 1 and ψ(t)=et\psi(t) = e^t.

'binomial'

It considers point-referenced binomial counts y(s)Binomial(m(s),π(s))y(s) \sim \mathrm{Binomial}(m(s), \pi(s)) where, m(s)m(s) denotes the total number of trials and probability of success π(s)=ilogit(x(s)β+z(s))\pi(s) = \mathrm{ilogit}(x(s)^\top \beta + z(s)) at location ss. Here, b=m(s)b = m(s) and ψ(t)=log(1+et)\psi(t) = \log(1+e^t).

'binary'

It considers point-referenced binary data (0 or, 1) i.e., y(s)Bernoulli(π(s))y(s) \sim \mathrm{Bernoulli}(\pi(s)), where probability of success π(s)=ilogit(x(s)β+z(s))\pi(s) = \mathrm{ilogit}(x(s)^\top \beta + z(s)) at location ss. Here, b=1b = 1 and ψ(t)=log(1+et)\psi(t) = \log(1 + e^t).

The hierarchical model is given as

y(si)β,z,ξEF(x(si)β+z(si)+ξiμi;bi,ψy),i=1,,nξβ,z,σξ2,αϵGCMc(),βσβ2N(0,σβ2Vβ),σβ2IG(νβ/2,νβ/2)zσz2N(0,σz2R(χ;ϕ,ν)),σz2IG(νz/2,νz/2),\begin{aligned} y(s_i) &\mid \beta, z, \xi \sim EF(x(s_i)^\top \beta + z(s_i) + \xi_i - \mu_i; b_i, \psi_y), i = 1, \ldots, n\\ \xi &\mid \beta, z, \sigma^2_\xi, \alpha_\epsilon \sim \mathrm{GCM_c}(\cdots),\\ \beta &\mid \sigma^2_\beta \sim N(0, \sigma^2_\beta V_\beta), \quad \sigma^2_\beta \sim \mathrm{IG}(\nu_\beta/2, \nu_\beta/2)\\ z &\mid \sigma^2_z \sim N(0, \sigma^2_z R(\chi; \phi, \nu)), \quad \sigma^2_z \sim \mathrm{IG}(\nu_z/2, \nu_z/2), \end{aligned}

where μ=(μ1,,μn)\mu = (\mu_1, \ldots, \mu_n)^\top denotes the discrepancy parameter. We fix the spatial process parameters ϕ\phi and ν\nu and the hyperparameters VβV_\beta, νβ\nu_\beta, νz\nu_z and σξ2\sigma^2_\xi. The term ξ\xi is known as the fine-scale variation term which is given a conditional generalized conjugate multivariate distribution as prior. For more details, see Pan et al. 2024. Default values for VβV_\beta, νβ\nu_\beta, νz\nu_z, σξ2\sigma^2_\xi are diagonal with each diagonal element 100, 2.1, 2.1 and 0.1 respectively.

Value

An object of class spGLMexact, which is a list with the following tags -

priors

details of the priors used, containing the values of the boundary adjustment parameter (boundary), the variance parameter of the fine-scale variation term (simasq.xi) and others.

samples

a list of length 3, containing posterior samples of fixed effects (beta), spatial effects (z) and the fine-scale variation term (xi).

loopd

If loopd=TRUE, contains leave-one-out predictive densities.

model.params

Values of the fixed parameters that includes phi (spatial decay), nu (spatial smoothness).

The return object might include additional data that can be used for subsequent prediction and/or model fit evaluation.

Author(s)

Soumyakanti Pan [email protected]

References

Bradley JR, Clinch M (2024). "Generating Independent Replicates Directly from the Posterior Distribution for a Class of Spatial Hierarchical Models." Journal of Computational and Graphical Statistics, 0(0), 1-17. doi:10.1080/10618600.2024.2365728.

Pan S, Zhang L, Bradley JR, Banerjee S (2024). "Bayesian Inference for Spatial-temporal Non-Gaussian Data Using Predictive Stacking." doi:10.48550/arXiv.2406.04655.

Vehtari A, Gelman A, Gabry J (2017). "Practical Bayesian Model Evaluation Using Leave-One-out Cross-Validation and WAIC." Statistics and Computing, 27(5), 1413-1432. ISSN 0960-3174. doi:10.1007/s11222-016-9696-4.

See Also

spLMexact()

Examples

# Example 1: Analyze spatial poisson count data
data(simPoisson)
dat <- simPoisson[1:10, ]
mod1 <- spGLMexact(y ~ x1, data = dat, family = "poisson",
                   coords = as.matrix(dat[, c("s1", "s2")]),
                   cor.fn = "matern",
                   spParams = list(phi = 4, nu = 0.4),
                   n.samples = 100, verbose = TRUE)

# summarize posterior samples
post_beta <- mod1$samples$beta
print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))))

# Example 2: Analyze spatial binomial count data
data(simBinom)
dat <- simBinom[1:10, ]
mod2 <- spGLMexact(cbind(y, n_trials) ~ x1, data = dat, family = "binomial",
                   coords = as.matrix(dat[, c("s1", "s2")]),
                   cor.fn = "matern",
                   spParams = list(phi = 3, nu = 0.4),
                   n.samples = 100, verbose = TRUE)

# summarize posterior samples
post_beta <- mod2$samples$beta
print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))))

# Example 3: Analyze spatial binary data
data(simBinary)
dat <- simBinary[1:10, ]
mod3 <- spGLMexact(y ~ x1, data = dat, family = "binary",
                   coords = as.matrix(dat[, c("s1", "s2")]),
                   cor.fn = "matern",
                   spParams = list(phi = 4, nu = 0.4),
                   n.samples = 100, verbose = TRUE)

# summarize posterior samples
post_beta <- mod3$samples$beta
print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))))

Bayesian spatial generalized linear model using predictive stacking

Description

Fits Bayesian spatial generalized linear model on a collection of candidate models constructed based on some candidate values of some model parameters specified by the user and subsequently combines inference by stacking predictive densities. See Pan, Zhang, Bradley, and Banerjee (2024) for more details.

Usage

spGLMstack(
  formula,
  data = parent.frame(),
  family,
  coords,
  cor.fn,
  priors,
  params.list,
  n.samples,
  loopd.controls,
  parallel = FALSE,
  solver = "ECOS",
  verbose = TRUE,
  ...
)

Arguments

formula

a symbolic description of the regression model to be fit. See example below.

data

an optional data frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which spLMstack is called.

family

Specifies the distribution of the response as a member of the exponential family. Supported options are 'poisson', 'binomial' and 'binary'.

coords

an n×2n \times 2 matrix of the observation coordinates in R2\mathbb{R}^2 (e.g., easting and northing).

cor.fn

a quoted keyword that specifies the correlation function used to model the spatial dependence structure among the observations. Supported covariance model key words are: 'exponential' and 'matern'. See below for details.

priors

(optional) a list with each tag corresponding to a parameter name and containing prior details. Valid tags include V.beta, nu.beta, nu.z and sigmaSq.xi.

params.list

a list containing candidate values of spatial process parameters for the cor.fn used, and, the boundary parameter.

n.samples

number of posterior samples to be generated.

loopd.controls

a list with details on how leave-one-out predictive densities (LOO-PD) are to be calculated. Valid tags include method, CV.K and nMC. The tag method can be either 'exact' or 'CV'. If sample size is more than 100, then the default is 'CV' with CV.K equal to its default value 10 (Gelman et al. 2024). The tag nMC decides how many Monte Carlo samples will be used to evaluate the leave-one-out predictive densities, which must be at least 500 (default).

parallel

logical. If parallel=FALSE, the parallelization plan, if set up by the user, is ignored. If parallel=TRUE, the function inherits the parallelization plan that is set by the user via the function future::plan() only. Depending on the parallel backend available, users may choose their own plan. More details are available at https://cran.R-project.org/package=future.

solver

(optional) Specifies the name of the solver that will be used to obtain optimal stacking weights for each candidate model. Default is 'ECOS'. Users can use other solvers supported by the CVXR-package package.

verbose

logical. If TRUE, prints model-specific optimal stacking weights.

...

currently no additional argument.

Details

Instead of assigning a prior on the process parameters ϕ\phi and ν\nu, the boundary adjustment parameter ϵ\epsilon, we consider a set of candidate models based on some candidate values of these parameters supplied by the user. Suppose the set of candidate models is M={M1,,MG}\mathcal{M} = \{M_1, \ldots, M_G\}. Then for each g=1,,Gg = 1, \ldots, G, we sample from the posterior distribution p(σ2,β,zy,Mg)p(\sigma^2, \beta, z \mid y, M_g) under the model MgM_g and find leave-one-out predictive densities p(yiyi,Mg)p(y_i \mid y_{-i}, M_g). Then we solve the optimization problem

maxw1,,wG1ni=1nlogg=1Gwgp(yiyi,Mg)subject towg0,g=1Gwg=1\begin{aligned} \max_{w_1, \ldots, w_G}& \, \frac{1}{n} \sum_{i = 1}^n \log \sum_{g = 1}^G w_g p(y_i \mid y_{-i}, M_g) \\ \text{subject to} & \quad w_g \geq 0, \sum_{g = 1}^G w_g = 1 \end{aligned}

to find the optimal stacking weights w^1,,w^G\hat{w}_1, \ldots, \hat{w}_G.

Value

An object of class spGLMstack, which is a list including the following tags -

family

the distribution of the responses as indicated in the function call

samples

a list of length equal to total number of candidate models with each entry corresponding to a list of length 3, containing posterior samples of fixed effects (beta), spatial effects (z) and fine-scale variation term (xi) for that particular model.

loopd

a list of length equal to total number of candidate models with each entry containing leave-one-out predictive densities under that particular model.

loopd.method

a list containing details of the algorithm used for calculation of leave-one-out predictive densities.

n.models

number of candidate models that are fit.

candidate.models

a matrix with n_model rows with each row containing details of the model parameters and its optimal weight.

stacking.weights

a numeric vector of length equal to the number of candidate models storing the optimal stacking weights.

run.time

a proc_time object with runtime details.

solver.status

solver status as returned by the optimization routine.

The return object might include additional data that is useful for subsequent prediction, model fit evaluation and other utilities.

Author(s)

Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]

References

Pan S, Zhang L, Bradley JR, Banerjee S (2024). "Bayesian Inference for Spatial-temporal Non-Gaussian Data Using Predictive Stacking." doi:10.48550/arXiv.2406.04655.

Vehtari A, Simpson D, Gelman A, Yao Y, Gabry J (2024). "Pareto Smoothed Importance Sampling." Journal of Machine Learning Research, 25(72), 1-58. URL https://jmlr.org/papers/v25/19-556.html.

See Also

spGLMexact(), spLMstack()

Examples

data("simPoisson")
dat <- simPoisson[1:100,]
mod1 <- spGLMstack(y ~ x1, data = dat, family = "poisson",
                   coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern",
                  params.list = list(phi = c(3, 7, 10), nu = c(0.25, 0.5, 1.5),
                                     boundary = c(0.5, 0.6)),
                  n.samples = 1000,
                  loopd.controls = list(method = "CV", CV.K = 10, nMC = 1000),
                  parallel = TRUE, solver = "ECOS", verbose = TRUE)

# print(mod1$solver.status)
# print(mod1$run.time)

post_samps <- stackedSampler(mod1)
post_beta <- post_samps$beta
print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))))

post_z <- post_samps$z
post_z_summ <- t(apply(post_z, 1, function(x) quantile(x, c(0.025, 0.5, 0.975))))

z_combn <- data.frame(z = dat$z_true,
                      zL = post_z_summ[, 1],
                      zM = post_z_summ[, 2],
                      zU = post_z_summ[, 3])

library(ggplot2)
plot_z <- ggplot(data = z_combn, aes(x = z)) +
 geom_errorbar(aes(ymin = zL, ymax = zU),
               width = 0.05, alpha = 0.15,
               color = "skyblue") +
 geom_point(aes(y = zM), size = 0.25,
            color = "darkblue", alpha = 0.5) +
 geom_abline(slope = 1, intercept = 0,
             color = "red", linetype = "solid") +
 xlab("True z") + ylab("Posterior of z") +
 theme_bw() +
 theme(panel.background = element_blank(),
       aspect.ratio = 1)

Univariate Bayesian spatial linear model

Description

Fits a Bayesian spatial linear model with spatial process parameters and the noise-to-spatial variance ratio fixed to a value supplied by the user. The output contains posterior samples of the fixed effects, variance parameter, spatial random effects and, if required, leave-one-out predictive densities.

Usage

spLMexact(
  formula,
  data = parent.frame(),
  coords,
  cor.fn,
  priors,
  spParams,
  noise_sp_ratio,
  n.samples,
  loopd = FALSE,
  loopd.method = "exact",
  verbose = TRUE,
  ...
)

Arguments

formula

a symbolic description of the regression model to be fit. See example below.

data

an optional data frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which spLMexact is called.

coords

an n×2n \times 2 matrix of the observation coordinates in R2\mathbb{R}^2 (e.g., easting and northing).

cor.fn

a quoted keyword that specifies the correlation function used to model the spatial dependence structure among the observations. Supported covariance model key words are: 'exponential' and 'matern'. See below for details.

priors

a list with each tag corresponding to a parameter name and containing prior details.

spParams

fixed value of spatial process parameters.

noise_sp_ratio

noise-to-spatial variance ratio.

n.samples

number of posterior samples to be generated.

loopd

logical. If loopd=TRUE, returns leave-one-out predictive densities, using method as given by loopd.method. Default is FALSE.

loopd.method

character. Ignored if loopd=FALSE. If loopd=TRUE, valid inputs are 'exact' and 'PSIS'. The option 'exact' corresponds to exact leave-one-out predictive densities which requires computation almost equivalent to fitting the model nn times. The option 'PSIS' is faster and finds approximate leave-one-out predictive densities using Pareto-smoothed importance sampling (Gelman et al. 2024).

verbose

logical. If verbose = TRUE, prints model description.

...

currently no additional argument.

Details

Suppose χ=(s1,,sn)\chi = (s_1, \ldots, s_n) denotes the nn spatial locations the response yy is observed. With this function, we fit a conjugate Bayesian hierarchical spatial model

yz,β,σ2N(Xβ+z,δ2σ2In),zσ2N(0,σ2R(χ;ϕ,ν)),βσ2N(μβ,σ2Vβ),σ2IG(aσ,bσ)\begin{aligned} y \mid z, \beta, \sigma^2 &\sim N(X\beta + z, \delta^2 \sigma^2 I_n), \quad z \mid \sigma^2 \sim N(0, \sigma^2 R(\chi; \phi, \nu)), \\ \beta \mid \sigma^2 &\sim N(\mu_\beta, \sigma^2 V_\beta), \quad \sigma^2 \sim \mathrm{IG}(a_\sigma, b_\sigma) \end{aligned}

where we fix the spatial process parameters ϕ\phi and ν\nu, the noise-to-spatial variance ratio δ2\delta^2 and the hyperparameters μβ\mu_\beta, VβV_\beta, aσa_\sigma and bσb_\sigma. We utilize a composition sampling strategy to sample the model parameters from their joint posterior distribution which can be written as

p(σ2,β,zy)=p(σ2y)×p(βσ2,y)×p(zβ,σ2,y).p(\sigma^2, \beta, z \mid y) = p(\sigma^2 \mid y) \times p(\beta \mid \sigma^2, y) \times p(z \mid \beta, \sigma^2, y).

We proceed by first sampling σ2\sigma^2 from its marginal posterior, then given the samples of σ2\sigma^2, we sample β\beta and subsequently, we sample zz conditioned on the posterior samples of β\beta and σ2\sigma^2 (Banerjee 2020).

Value

An object of class spLMexact, which is a list with the following tags -

samples

a list of length 3, containing posterior samples of fixed effects (beta), variance parameter (sigmaSq), spatial effects (z).

loopd

If loopd=TRUE, contains leave-one-out predictive densities.

model.params

Values of the fixed parameters that includes phi (spatial decay), nu (spatial smoothness) and noise_sp_ratio (noise-to-spatial variance ratio).

The return object might include additional data used for subsequent prediction and/or model fit evaluation.

Author(s)

Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]

References

Banerjee S (2020). "Modeling massive spatial datasets using a conjugate Bayesian linear modeling framework." Spatial Statistics, 37, 100417. ISSN 2211-6753. doi:10.1016/j.spasta.2020.100417.

Vehtari A, Simpson D, Gelman A, Yao Y, Gabry J (2024). "Pareto Smoothed Importance Sampling." Journal of Machine Learning Research, 25(72), 1-58. URL https://jmlr.org/papers/v25/19-556.html.

See Also

spLMstack()

Examples

# load data
data(simGaussian)
dat <- simGaussian[1:100, ]

# setup prior list
muBeta <- c(0, 0)
VBeta <- cbind(c(1.0, 0.0), c(0.0, 1.0))
sigmaSqIGa <- 2
sigmaSqIGb <- 0.1
prior_list <- list(beta.norm = list(muBeta, VBeta),
                   sigma.sq.ig = c(sigmaSqIGa, sigmaSqIGb))

# supply fixed values of model parameters
phi0 <- 3
nu0 <- 0.75
noise.sp.ratio <- 0.8

mod1 <- spLMexact(y ~ x1, data = dat,
                  coords = as.matrix(dat[, c("s1", "s2")]),
                  cor.fn = "matern",
                  priors = prior_list,
                  spParams = list(phi = phi0, nu = nu0),
                  noise_sp_ratio = noise.sp.ratio,
                  n.samples = 100,
                  loopd = TRUE, loopd.method = "exact")

beta.post <- mod1$samples$beta
z.post.median <- apply(mod1$samples$z, 1, median)
dat$z.post.median <- z.post.median
plot1 <- surfaceplot(dat, coords_name = c("s1", "s2"),
                     var_name = "z_true")
plot2 <- surfaceplot(dat, coords_name = c("s1", "s2"),
                     var_name = "z.post.median")
plot1
plot2

Bayesian spatial linear model using predictive stacking

Description

Fits Bayesian spatial linear model on a collection of candidate models constructed based on some candidate values of some model parameters specified by the user and subsequently combines inference by stacking predictive densities. See Zhang, Tang and Banerjee (2024) for more details.

Usage

spLMstack(
  formula,
  data = parent.frame(),
  coords,
  cor.fn,
  priors,
  params.list,
  n.samples,
  loopd.method,
  parallel = FALSE,
  solver = "ECOS",
  verbose = TRUE,
  ...
)

Arguments

formula

a symbolic description of the regression model to be fit. See example below.

data

an optional data frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which spLMstack is called.

coords

an n×2n \times 2 matrix of the observation coordinates in R2\mathbb{R}^2 (e.g., easting and northing).

cor.fn

a quoted keyword that specifies the correlation function used to model the spatial dependence structure among the observations. Supported covariance model key words are: 'exponential' and 'matern'. See below for details.

priors

a list with each tag corresponding to a parameter name and containing prior details. If not supplied, uses defaults.

params.list

a list containing candidate values of spatial process parameters for the cor.fn used, and, noise-to-spatial variance ratio.

n.samples

number of posterior samples to be generated.

loopd.method

character. Valid inputs are 'exact' and 'PSIS'. The option 'exact' corresponds to exact leave-one-out predictive densities. The option 'PSIS' is faster, as it finds approximate leave-one-out predictive densities using Pareto-smoothed importance sampling (Gelman et al. 2024).

parallel

logical. If parallel=FALSE, the parallelization plan, if set up by the user, is ignored. If parallel=TRUE, the function inherits the parallelization plan that is set by the user via the function future::plan() only. Depending on the parallel backend available, users may choose their own plan. More details are available at https://cran.R-project.org/package=future.

solver

(optional) Specifies the name of the solver that will be used to obtain optimal stacking weights for each candidate model. Default is "ECOS". Users can use other solvers supported by the CVXR-package package.

verbose

logical. If TRUE, prints model-specific optimal stacking weights.

...

currently no additional argument.

Details

Instead of assigning a prior on the process parameters ϕ\phi and ν\nu, noise-to-spatial variance ratio δ2\delta^2, we consider a set of candidate models based on some candidate values of these parameters supplied by the user. Suppose the set of candidate models is M={M1,,MG}\mathcal{M} = \{M_1, \ldots, M_G\}. Then for each g=1,,Gg = 1, \ldots, G, we sample from the posterior distribution p(σ2,β,zy,Mg)p(\sigma^2, \beta, z \mid y, M_g) under the model MgM_g and find leave-one-out predictive densities p(yiyi,Mg)p(y_i \mid y_{-i}, M_g). Then we solve the optimization problem

maxw1,,wG1ni=1nlogg=1Gwgp(yiyi,Mg)subject towg0,g=1Gwg=1\begin{aligned} \max_{w_1, \ldots, w_G}& \, \frac{1}{n} \sum_{i = 1}^n \log \sum_{g = 1}^G w_g p(y_i \mid y_{-i}, M_g) \\ \text{subject to} & \quad w_g \geq 0, \sum_{g = 1}^G w_g = 1 \end{aligned}

to find the optimal stacking weights w^1,,w^G\hat{w}_1, \ldots, \hat{w}_G.

Value

An object of class spLMstack, which is a list including the following tags -

samples

a list of length equal to total number of candidate models with each entry corresponding to a list of length 3, containing posterior samples of fixed effects (beta), variance parameter (sigmaSq), spatial effects (z) for that model.

loopd

a list of length equal to total number of candidate models with each entry containing leave-one-out predictive densities under that particular model.

n.models

number of candidate models that are fit.

candidate.models

a matrix with n_model rows with each row containing details of the model parameters and its optimal weight.

stacking.weights

a numeric vector of length equal to the number of candidate models storing the optimal stacking weights.

run.time

a proc_time object with runtime details.

solver.status

solver status as returned by the optimization routine.

The return object might include additional data that is useful for subsequent prediction, model fit evaluation and other utilities.

Author(s)

Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]

References

Vehtari A, Simpson D, Gelman A, Yao Y, Gabry J (2024). "Pareto Smoothed Importance Sampling." Journal of Machine Learning Research, 25(72), 1-58. URL https://jmlr.org/papers/v25/19-556.html.

Zhang L, Tang W, Banerjee S (2024). "Bayesian Geostatistics Using Predictive Stacking."
doi:10.48550/arXiv.2304.12414.

See Also

spLMexact(), spGLMstack()

Examples

# load data and work with first 100 rows
data(simGaussian)
dat <- simGaussian[1:100, ]

# setup prior list
muBeta <- c(0, 0)
VBeta <- cbind(c(1.0, 0.0), c(0.0, 1.0))
sigmaSqIGa <- 2
sigmaSqIGb <- 2
prior_list <- list(beta.norm = list(muBeta, VBeta),
                   sigma.sq.ig = c(sigmaSqIGa, sigmaSqIGb))

mod1 <- spLMstack(y ~ x1, data = dat,
                  coords = as.matrix(dat[, c("s1", "s2")]),
                  cor.fn = "matern",
                  priors = prior_list,
                  params.list = list(phi = c(1.5, 3),
                                     nu = c(0.5, 1),
                                     noise_sp_ratio = c(1)),
                  n.samples = 1000, loopd.method = "exact",
                  parallel = FALSE, solver = "ECOS", verbose = TRUE)

post_samps <- stackedSampler(mod1)
post_beta <- post_samps$beta
print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))))

post_z <- post_samps$z
post_z_summ <- t(apply(post_z, 1,
                       function(x) quantile(x, c(0.025, 0.5, 0.975))))

z_combn <- data.frame(z = dat$z_true,
                      zL = post_z_summ[, 1],
                      zM = post_z_summ[, 2],
                      zU = post_z_summ[, 3])

library(ggplot2)
plot1 <- ggplot(data = z_combn, aes(x = z)) +
  geom_point(aes(y = zM), size = 0.25,
             color = "darkblue", alpha = 0.5) +
  geom_errorbar(aes(ymin = zL, ymax = zU),
                width = 0.05, alpha = 0.15) +
  geom_abline(slope = 1, intercept = 0,
              color = "red", linetype = "solid") +
  xlab("True z") + ylab("Stacked posterior of z") +
  theme_bw() +
  theme(panel.background = element_blank(),
        aspect.ratio = 1)

Sample from the stacked posterior distribution

Description

A helper function to sample from the stacked posterior distribution to obtain final posterior samples that can be used for subsequent analysis. This function applies on outputs of functions spLMstack() and spGLMstack().

Usage

stackedSampler(mod_out, n.samples)

Arguments

mod_out

an object of class spLMstack or spGLMstack.

n.samples

(optional) If missing, inherits the number of posterior samples from the original output. Otherwise, it specifies number of posterior samples to draw from the stacked posterior. If it exceeds the number of posterior draws used in the original function, then a warning is thrown and the samples are obtained by resampling. It is recommended, to run the original function with enough samples.

Details

After obtaining the optimal stacking weights w^1,,w^G\hat{w}_1, \ldots, \hat{w}_G, posterior inference of quantities of interest subsequently proceed from the stacked posterior,

p~(y)=g=1Gw^gp(y,Mg),\tilde{p}(\cdot \mid y) = \sum_{g = 1}^G \hat{w}_g p(\cdot \mid y, M_g),

where M={M1,,Mg}\mathcal{M} = \{M_1, \ldots, M_g\} is the collection of candidate models.

Value

An object of class stacked_posterior, which is a list that includes the following tags -

beta

samples of the fixed effect from the stacked joint posterior.

z

samples of the spatial random effects from the stacked joint posterior.

In case of model output of class spLMstack, the list additionally contains sigmaSq which are the samples of the variance parameter from the stacked joint posterior of the spatial linear model. For model output of class spGLMstack, the list also contains xi which are the samples of the fine-scale variation term from the stacked joint posterior of the spatial generalized linear model.

Author(s)

Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]

See Also

spLMstack(), spGLMstack()

Examples

data(simGaussian)
dat <- simGaussian[1:100, ]

mod1 <- spLMstack(y ~ x1, data = dat,
                  coords = as.matrix(dat[, c("s1", "s2")]),
                  cor.fn = "matern",
                  params.list = list(phi = c(1.5, 3),
                                     nu = c(0.5, 1),
                                     noise_sp_ratio = c(1)),
                  n.samples = 1000, loopd.method = "exact",
                  parallel = FALSE, solver = "ECOS", verbose = TRUE)
print(mod1$solver.status)
print(mod1$run.time)

post_samps <- stackedSampler(mod1)
post_beta <- post_samps$beta
print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))))

Bayesian spatially-temporally varying generalized linear model

Description

Fits a Bayesian generalized linear model with spatially-temporally varying coefficients under fixed values of spatial-temporal process parameters and some auxiliary model parameters. The output contains posterior samples of the fixed effects, spatial-temporal random effects and, if required, finds leave-one-out predictive densities.

Usage

stvcGLMexact(
  formula,
  data = parent.frame(),
  family,
  sp_coords,
  time_coords,
  cor.fn,
  sptParams,
  sptShared = FALSE,
  priors,
  boundary = 0.5,
  n.samples,
  loopd = FALSE,
  loopd.method = "exact",
  CV.K = 10,
  loopd.nMC = 500,
  verbose = TRUE,
  ...
)

Arguments

formula

a symbolic description of the regression model to be fit. Variables in parenthesis are assigned spatially-temporally varying coefficients. See example below.

data

an optional data frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which spGLMexact is called.

family

Specifies the distribution of the response as a member of the exponential family. Supported options are 'poisson', 'binomial' and 'binary'.

sp_coords

an n×2n \times 2 matrix of the observation spatial coordinates in R2\mathbb{R}^2 (e.g., easting and northing).

time_coords

an n×1n \times 1 matrix of the observation temporal coordinates in T[0,)\mathcal{T} \subseteq [0, \infty).

cor.fn

a quoted keyword that specifies the correlation function used to model the spatial-temporal dependence structure among the observations. Supported covariance model key words are: 'gneiting-decay' (Gneiting and Guttorp 2010). See below for details.

sptParams

fixed values of spatial-temporal process parameters in usually a list of length 2. If cor.fn='gneiting-decay', then it is a list of length 2 with tags phi_s and phi_t. If sptShared=TRUE, then phi_s and phi_t contain scalars, otherwise it will contain fixed values of the rr spatial-temporal processes. See examples below.

sptShared

If TRUE, then model assumes shared spatial and temporal decay parameters across all processes. If FALSE, fixed values of process parameters must be supplied within the sptParams argument.

priors

(optional) a list with each tag corresponding to a hyperparameter name and containing hyperprior details. Valid tags include V.beta, nu.beta, nu.z and sigmaSq.xi. Values of nu.beta and nu.z must be at least 2.1. If not supplied, uses defaults.

boundary

Specifies the boundary adjustment parameter. Must be a real number between 0 and 1. Default is 0.5.

n.samples

number of posterior samples to be generated.

loopd

logical. If loopd=TRUE, returns leave-one-out predictive densities, using method as given by loopd.method. Default is FALSE.

loopd.method

character. Ignored if loopd=FALSE. If loopd=TRUE, valid inputs are 'exact', 'CV' and 'PSIS'. The option 'exact' corresponds to exact leave-one-out predictive densities which requires computation almost equivalent to fitting the model nn times. The options 'CV' and 'PSIS' are faster and they implement KK-fold cross validation and Pareto-smoothed importance sampling to find approximate leave-one-out predictive densities (Vehtari et al. 2017).

CV.K

An integer between 10 and 20. Considered only if loopd.method='CV'. Default is 10 (as recommended in Vehtari et. al 2017).

loopd.nMC

Number of Monte Carlo samples to be used to evaluate leave-one-out predictive densities when loopd.method is set to either 'exact' or 'CV'.

verbose

logical. If verbose = TRUE, prints model description.

...

currently no additional argument.

Details

With this function, we fit a Bayesian hierarchical spatially-temporally varying generalized linear model by sampling exactly from the joint posterior distribution utilizing the generalized conjugate multivariate distribution theory (Bradley and Clinch 2024). Suppose χ=(1,,n)\chi = (\ell_1, \ldots, \ell_n) denotes the nn spatial-temporal co-ordinates in L=S×T\mathcal{L} = \mathcal{S} \times \mathcal{T}, the response yy is observed. Let y()y(\ell) be the outcome at the co-ordinate \ell endowed with a probability law from the natural exponential family, which we denote by

y()EF(x()β+x~()z();b(),ψ)y(\ell) \sim \mathrm{EF}(x(\ell)^\top \beta + \tilde{x}(\ell)^\top z(\ell); b(\ell), \psi)

for some positive parameter b()>0b(\ell) > 0 and unit log partition function ψ\psi. Here, x~()\tilde{x}(\ell) denotes covariates with spatially-temporally varying coefficients We consider the following response models based on the input supplied to the argument family.

'poisson'

It considers point-referenced Poisson responses y()Poisson(ex()β+x~()z())y(\ell) \sim \mathrm{Poisson}(e^{x(\ell)^\top \beta + \tilde{x}(\ell)^\top z(\ell)}). Here, b()=1b(\ell) = 1 and ψ(t)=et\psi(t) = e^t.

'binomial'

It considers point-referenced binomial counts y()Binomial(m(),π())y(\ell) \sim \mathrm{Binomial}(m(\ell), \pi(\ell)) where, m()m(\ell) denotes the total number of trials and probability of success π()=ilogit(x()β+x~()z())\pi(\ell) = \mathrm{ilogit}(x(\ell)^\top \beta + \tilde{x}(\ell)^\top z(\ell)) at spatial-temporal co-ordinate \ell. Here, b=m()b = m(\ell) and ψ(t)=log(1+et)\psi(t) = \log(1+e^t).

'binary'

It considers point-referenced binary data (0 or, 1) i.e., y()Bernoulli(π())y(\ell) \sim \mathrm{Bernoulli}(\pi(\ell)), where probability of success π()=ilogit(x()β+x~()z())\pi(\ell) = \mathrm{ilogit}(x(\ell)^\top \beta + \tilde{x}(\ell)^\top z(\ell)) at spatial-temporal co-ordinate \ell. Here, b()=1b(\ell) = 1 and ψ(t)=log(1+et)\psi(t) = \log(1 + e^t).

The hierarchical model is given as

y(i)β,z,ξEF(x(i)β+x~(i)z(si)+ξiμi;bi,ψy),i=1,,nξβ,z,σξ2,αϵGCMc(),βσβ2N(0,σβ2Vβ),σβ2IG(νβ/2,νβ/2)zjσzj2N(0,σzj2R(χ;ϕs,ϕt)),σzj2IG(νz/2,νz/2),j=1,,r\begin{aligned} y(\ell_i) &\mid \beta, z, \xi \sim EF(x(\ell_i)^\top \beta + \tilde{x}(\ell_i)^\top z(s_i) + \xi_i - \mu_i; b_i, \psi_y), i = 1, \ldots, n\\ \xi &\mid \beta, z, \sigma^2_\xi, \alpha_\epsilon \sim \mathrm{GCM_c}(\cdots),\\ \beta &\mid \sigma^2_\beta \sim N(0, \sigma^2_\beta V_\beta), \quad \sigma^2_\beta \sim \mathrm{IG}(\nu_\beta/2, \nu_\beta/2)\\ z_j &\mid \sigma^2_{z_j} \sim N(0, \sigma^2_{z_j} R(\chi; \phi_s, \phi_t)), \quad \sigma^2_{z_j} \sim \mathrm{IG}(\nu_z/2, \nu_z/2), j = 1, \ldots, r \end{aligned}

where μ=(μ1,,μn)\mu = (\mu_1, \ldots, \mu_n)^\top denotes the discrepancy parameter. We fix the spatial-temporal process parameters ϕs\phi_s and ϕt\phi_t and the hyperparameters VβV_\beta, νβ\nu_\beta, νz\nu_z and σξ2\sigma^2_\xi. The term ξ\xi is known as the fine-scale variation term which is given a conditional generalized conjugate multivariate distribution as prior. For more details, see Pan et al. 2024. Default values for VβV_\beta, νβ\nu_\beta, νz\nu_z, σξ2\sigma^2_\xi are diagonal with each diagonal element 100, 2.1, 2.1 and 0.1 respectively.

Value

An object of class stvcGLMexact, which is a list with the following tags -

priors

details of the priors used, containing the values of the boundary adjustment parameter (boundary), the variance parameter of the fine-scale variation term (simasq.xi) and others.

samples

a list of length 3, containing posterior samples of fixed effects (beta), spatial-temporal effects (z) and the fine-scale variation term (xi). The element with tag z will again be a list of length rr, each containing posterior samples of the spatial-temporal random effects corresponding to each varying coefficient.

loopd

If loopd=TRUE, contains leave-one-out predictive densities.

model.params

Values of the fixed parameters that includes phi_s (spatial decay), phi_t (temporal smoothness).

The return object might include additional data that can be used for subsequent prediction and/or model fit evaluation.

Author(s)

Soumyakanti Pan [email protected]

References

Bradley JR, Clinch M (2024). "Generating Independent Replicates Directly from the Posterior Distribution for a Class of Spatial Hierarchical Models." Journal of Computational and Graphical Statistics, 0(0), 1-17. doi:10.1080/10618600.2024.2365728.

T. Gneiting and P. Guttorp (2010). "Continuous-parameter spatio-temporal processes." In A.E. Gelfand, P.J. Diggle, M. Fuentes, and P Guttorp, editors, Handbook of Spatial Statistics, Chapman & Hall CRC Handbooks of Modern Statistical Methods, p 427–436. Taylor and Francis.

Pan S, Zhang L, Bradley JR, Banerjee S (2024). "Bayesian Inference for Spatial-temporal Non-Gaussian Data Using Predictive Stacking." doi:10.48550/arXiv.2406.04655.

Vehtari A, Gelman A, Gabry J (2017). "Practical Bayesian Model Evaluation Using Leave-One-out Cross-Validation and WAIC." Statistics and Computing, 27(5), 1413-1432. ISSN 0960-3174. doi:10.1007/s11222-016-9696-4.

See Also

spGLMexact()


Make a surface plot

Description

Make a surface plot

Usage

surfaceplot(tab, coords_name, var_name, h = 8, col.pal, mark_points = FALSE)

Arguments

tab

a data-frame containing spatial co-ordinates and the variable to plot

coords_name

name of the two columns that contains the co-ordinates of the points

var_name

name of the column containing the variable to be plotted

h

integer; (optional) controls smoothness of the spatial interpolation as appearing in the MBA::mba.surf() function. Default is 8.

col.pal

Optional; color palette, preferably divergent, use colorRampPalette function from grDevices. Default is 'RdYlBu'.

mark_points

Logical; if TRUE, the input points are marked. Default is FALSE.

Value

a ggplot object containing the surface plot

Author(s)

Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]

Examples

data(simGaussian)
plot1 <- surfaceplot(simGaussian, coords_name = c("s1", "s2"),
                     var_name = "z_true")
plot1

# try your favourite color palette
col.br <- colorRampPalette(c("blue", "white", "red"))
col.br.pal <- col.br(100)
plot2 <- surfaceplot(simGaussian, coords_name = c("s1", "s2"),
                     var_name = "z_true", col.pal = col.br.pal)
plot2

Make two side-by-side surface plots

Description

Make two side-by-side surface plots, particularly useful towards a comparative study of two spatial surfaces.

Usage

surfaceplot2(
  tab,
  coords_name,
  var1_name,
  var2_name,
  h = 8,
  col.pal,
  mark_points = FALSE
)

Arguments

tab

a data-frame containing spatial co-ordinates and the variables to plot

coords_name

name of the two columns that contains the co-ordinates of the points

var1_name

name of the column containing the first variable to be plotted

var2_name

name of the column containing the second variable to be plotted

h

integer; (optional) controls smoothness of the spatial interpolation as appearing in the MBA::mba.surf() function. Default is 8.

col.pal

Optional; color palette, preferably divergent, use colorRampPalette function from grDevices. Default is 'RdYlBu'.

mark_points

Logical; if TRUE, the input points are marked. Default is FALSE.

Value

a list containing two ggplot objects

Author(s)

Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]

Examples

data(simGaussian)
plots_2 <- surfaceplot2(simGaussian, coords_name = c("s1", "s2"),
                        var1_name = "z_true", var2_name = "y")
plots_2