Title: | Bayesian Geostatistics Using Predictive Stacking |
---|---|
Description: | Fits Bayesian hierarchical spatial process models for point-referenced Gaussian, Poisson, binomial, and binary data using stacking of predictive densities. It involves sampling from analytically available posterior distributions conditional upon some candidate values of the spatial process parameters and, subsequently assimilate inference from these individual posterior distributions using Bayesian predictive stacking. Our algorithm is highly parallelizable and hence, much faster than traditional Markov chain Monte Carlo algorithms while delivering competitive predictive performance. See Zhang, Tang, and Banerjee (2024) <doi:10.48550/arXiv.2304.12414>, and, Pan, Zhang, Bradley, and Banerjee (2024) <doi:10.48550/arXiv.2406.04655> for details. |
Authors: | Soumyakanti Pan [aut, cre] , Sudipto Banerjee [aut] |
Maintainer: | Soumyakanti Pan <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2025-01-08 06:37:47 UTC |
Source: | https://github.com/span-18/spstack-dev |
This package delivers functions to fit Bayesian hierarchical spatial process models for point-referenced Gaussian, Poisson, binomial, and binary data using stacking of predictive densities. It involves sampling from analytically available posterior distributions conditional upon some candidate values of the spatial process parameters for both Gaussian response model as well as non-Gaussian responses, and, subsequently assimilate inference from these individual posterior distributions using Bayesian predictive stacking. Our algorithm is highly parallelizable and hence, much faster than traditional Markov chain Monte Carlo algorithms while delivering competitive predictive performance.
In context of inference for spatial point-referenced data, Bayesian hierarchical models involve latent spatial processes characterized by spatial process parameters, which besides lacking substantive relevance in scientific contexts, are also weakly identified and hence, impedes convergence of MCMC algorithms. This motivates us to build methodology that involves fast sampling from posterior distributions conditioned on a grid of the weakly identified model parameters and combine the inference by stacking of predictive densities (Yao et. al 2018). We exploit the Bayesian conjugate linear modeling framework for the Gaussian case (Zhang, Tang and Banerjee 2024) and the generalized conjugate multivariate distribution theory (Pan, Zhang, Bradley and Banerjee 2024) to analytically derive the individual posterior distributions.
Package: | spStack |
Type: | Package |
Version: | 1.0.1 |
License: | GPL-3 |
Accepts a formula, e.g., y~x1+x2
, for most regression models
accompanied by candidate values of spatial process parameters, and returns
posterior samples of the regression coefficients and the latent spatial
random effects. Posterior inference or prediction of any quantity of interest
proceed from these samples. Main functions are - spLMexact()
spGLMexact()
spLMstack()
spGLMstack()
Maintainer: Soumyakanti Pan [email protected] (ORCID)
Authors:
Sudipto Banerjee [email protected]
Zhang L, Tang W, Banerjee S (2024). "Bayesian Geostatistics Using
Predictive Stacking."
doi:10.48550/arXiv.2304.12414.
Pan S, Zhang L, Bradley JR, Banerjee S (2024). "Bayesian Inference for Spatial-temporal Non-Gaussian Data Using Predictive Stacking." doi:10.48550/arXiv.2406.04655.
Yao Y, Vehtari A, Simpson D, Gelman A (2018). "Using Stacking to Average Bayesian Predictive Distributions (with Discussion)." Bayesian Analysis, 13(3), 917-1007. doi:10.1214/17-BA1091.
Useful links:
Provides functions that implements different types of updates of a Cholesky factor that includes rank-one update, single row/column deletion update and a block deletion update.
cholUpdateRankOne(A, v, alpha, beta, lower = TRUE) cholUpdateDel(A, del.index, lower = TRUE) cholUpdateDelBlock(A, del.start, del.end, lower = TRUE)
cholUpdateRankOne(A, v, alpha, beta, lower = TRUE) cholUpdateDel(A, del.index, lower = TRUE) cholUpdateDelBlock(A, del.start, del.end, lower = TRUE)
A |
an |
v |
an |
alpha |
scalar; if not supplied, default is 1 |
beta |
scalar; if not supplied, default is 1 |
lower |
logical; if |
del.index |
an integer from 1 to |
del.start |
an integer from 1 to |
del.end |
an integer from 1 to |
Suppose is a
matrix with
being its lower-triangular Cholesky factor. Then rank-one update corresponds
to finding the Cholesky factor of the matrix
for some
given
(see, Krause and Igel 2015). Similarly, single row/column
deletion update corresponds to finding the Cholesky factor of the
matrix
which is obtained by removing the
-th row and column of
, given
for some
. Lastly, block deletion corresponds to finding the
Cholesky factor of the
matrix
for a
subset
of
containing
consecutive
indices, given the factor
.
An lower-triangular
matrix
with in
case of
cholUpdateRankOne()
, in case of
cholUpdateDel()
,
and, in case of
cholUpdateDelBlock()
where is
the size of the block removed.
Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]
Oswin Krause and Christian Igel. 2015. "A More Efficient Rank-one Covariance Matrix Update for Evolution Strategies". In Proceedings of the 2015 ACM Conference on Foundations of Genetic Algorithms XIII (FOGA '15). Association for Computing Machinery, New York, NY, USA, 129-136. doi:10.1145/2725494.2725496.
n <- 10 A <- matrix(rnorm(n^2), n, n) A <- crossprod(A) cholA <- chol(A) ## Rank-1 update v <- 1:n APlusvvT <- A + tcrossprod(v) cholA1 <- t(chol(APlusvvT)) cholA2 <- cholUpdateRankOne(cholA, v, lower = FALSE) print(all(abs(cholA1 - cholA2) < 1E-9)) ## Single Row-deletion update ind <- 2 A1 <- A[-ind, -ind] cholA1 <- t(chol(A1)) cholA2 <- cholUpdateDel(cholA, del.index = ind, lower = FALSE) print(all(abs(cholA1 - cholA2) < 1E-9)) ## Block-deletion update start_ind <- 2 end_ind <- 6 del_ind <- c(start_ind:end_ind) A1 <- A[-del_ind, -del_ind] cholA1 <- t(chol(A1)) cholA2 <- cholUpdateDelBlock(cholA, start_ind, end_ind, lower = FALSE) print(all(abs(cholA1 - cholA2) < 1E-9))
n <- 10 A <- matrix(rnorm(n^2), n, n) A <- crossprod(A) cholA <- chol(A) ## Rank-1 update v <- 1:n APlusvvT <- A + tcrossprod(v) cholA1 <- t(chol(APlusvvT)) cholA2 <- cholUpdateRankOne(cholA, v, lower = FALSE) print(all(abs(cholA1 - cholA2) < 1E-9)) ## Single Row-deletion update ind <- 2 A1 <- A[-ind, -ind] cholA1 <- t(chol(A1)) cholA2 <- cholUpdateDel(cholA, del.index = ind, lower = FALSE) print(all(abs(cholA1 - cholA2) < 1E-9)) ## Block-deletion update start_ind <- 2 end_ind <- 6 del_ind <- c(start_ind:end_ind) A1 <- A[-del_ind, -del_ind] cholA1 <- t(chol(A1)) cholA2 <- cholUpdateDelBlock(cholA, start_ind, end_ind, lower = FALSE) print(all(abs(cholA1 - cholA2) < 1E-9))
Obtains optimal stacking weights given leave-one-out predictive densities for each candidate model.
get_stacking_weights(log_loopd, solver = "ECOS")
get_stacking_weights(log_loopd, solver = "ECOS")
log_loopd |
an |
solver |
specifies the solver to use for obtaining optimal weights.
Default is |
A list of length 2.
weights
optimal stacking weights as a numeric vector of
length
status
solver status, returns "optimal"
if solver
succeeded.
Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]
Yao Y, Vehtari A, Simpson D, Gelman A (2018). "Using Stacking to Average Bayesian Predictive Distributions (with Discussion)." Bayesian Analysis, 13(3), 917-1007. doi:10.1214/17-BA1091.
CVXR::psolve()
, spLMstack()
, spGLMstack()
data(simGaussian) dat <- simGaussian[1:100, ] mod1 <- spLMstack(y ~ x1, data = dat, coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", params.list = list(phi = c(1.5, 3), nu = c(0.5, 1), noise_sp_ratio = c(1)), n.samples = 1000, loopd.method = "exact", parallel = FALSE, solver = "ECOS", verbose = TRUE) loopd_mat <- do.call('cbind', mod1$loopd) w_hat <- get_stacking_weights(loopd_mat) print(round(w_hat$weights, 4)) print(w_hat$status)
data(simGaussian) dat <- simGaussian[1:100, ] mod1 <- spLMstack(y ~ x1, data = dat, coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", params.list = list(phi = c(1.5, 3), nu = c(0.5, 1), noise_sp_ratio = c(1)), n.samples = 1000, loopd.method = "exact", parallel = FALSE, solver = "ECOS", verbose = TRUE) loopd_mat <- do.call('cbind', mod1$loopd) w_hat <- get_stacking_weights(loopd_mat) print(round(w_hat$weights, 4)) print(w_hat$status)
Computes the inter-site Euclidean distance matrix for one or two sets of points.
iDist(coords.1, coords.2, ...)
iDist(coords.1, coords.2, ...)
coords.1 |
an |
coords.2 |
an |
... |
currently no additional arguments. |
The or
inter-site Euclidean
distance matrix.
Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]
n <- 10 p1 <- cbind(runif(n),runif(n)) m <- 5 p2 <- cbind(runif(m),runif(m)) D <- iDist(p1, p2)
n <- 10 p1 <- cbind(runif(n),runif(n)) m <- 5 p2 <- cbind(runif(m),runif(m)) D <- iDist(p1, p2)
Generates synthetic spatial data of different types where the spatial co-ordinates are sampled uniformly on an unit square. Different types include point-referenced Gaussian, Poisson, binomial and binary data. The design includes an intercept and fixed covariates sampled from a standard normal distribution.
sim_spData(n, beta, cor.fn, spParams, spvar, deltasq, family, n_binom)
sim_spData(n, beta, cor.fn, spParams, spvar, deltasq, family, n_binom)
n |
sample size. |
beta |
a |
cor.fn |
a quoted keyword that specifies the correlation function used
to model the spatial dependence structure among the observations. Supported
covariance model key words are: |
spParams |
a numeric vector containing spatial process parameters - e.g., spatial decay and smoothness. |
spvar |
value of spatial variance parameter. |
deltasq |
value of noise-to-spatial variance ratio. |
family |
a character specifying the distribution of the response as a
member of the exponential family. Valid inputs are |
n_binom |
necessary only when |
a data.frame
object containing the columns -
s1, s2
2D-coordinates in unit square
x1, x2, ...
covariates, not including intercept
y
response
n_trials
present only when binomial data is generated
z_true
true spatial effects with which the data is generated
Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]
set.seed(1729) n <- 10 beta <- c(2, 5) phi0 <- 2 nu0 <- 0.5 spParams <- c(phi0, nu0) spvar <- 0.4 deltasq <- 1 sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern", spParams = spParams, spvar = spvar, deltasq = deltasq, family = "gaussian")
set.seed(1729) n <- 10 beta <- c(2, 5) phi0 <- 2 nu0 <- 0.5 spParams <- c(phi0, nu0) spvar <- 0.4 deltasq <- 1 sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern", spParams = spParams, spvar = spvar, deltasq = deltasq, family = "gaussian")
Dataset of size 500, with a binary response variable indexed by spatial coordinates sampled uniformly from the unit square. The model includes one covariate and spatial random effects induced by a Matérn covariogram.
data(simBinary)
data(simBinary)
a data.frame
object.
s1, s2
2-D coordinates; latitude and longitude.
x1
a covariate sampled from the standard normal distribution.
y
response vector (0/1).
z_true
true spatial random effects that generated the data.
With , the binary data is simulated using
where the function refers to the inverse-logit
function, the spatial effects
with
being
a
correlation matrix given by the Matérn covariogram
where is the spatial decay parameter and
the spatial
smoothness parameter. We have sampled the data with
,
,
, and
. This data can be generated with the code as given in
the example below.
Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]
simGaussian, simPoisson, simBinom
set.seed(1729) n <- 500 beta <- c(0.5, -0.5) phi0 <- 5 nu0 <- 0.5 spParams <- c(phi0, nu0) spvar <- 0.4 sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern", spParams = spParams, spvar = spvar, deltasq = deltasq, family = "binary") plot1 <- surfaceplot(sim1, coords_name = c("s1", "s2"), var_name = "z_true") library(ggplot2) plot2 <- ggplot(sim1, aes(x = s1, y = s2)) + geom_point(aes(color = factor(y)), alpha = 0.75) + scale_color_manual(values = c("red", "blue"), labels = c("0", "1")) + guides(alpha = 'none') + theme_bw() + theme(axis.ticks = element_line(linewidth = 0.25), panel.background = element_blank(), panel.grid = element_blank(), legend.title = element_text(size = 10, hjust = 0.25), legend.box.just = "center", aspect.ratio = 1)
set.seed(1729) n <- 500 beta <- c(0.5, -0.5) phi0 <- 5 nu0 <- 0.5 spParams <- c(phi0, nu0) spvar <- 0.4 sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern", spParams = spParams, spvar = spvar, deltasq = deltasq, family = "binary") plot1 <- surfaceplot(sim1, coords_name = c("s1", "s2"), var_name = "z_true") library(ggplot2) plot2 <- ggplot(sim1, aes(x = s1, y = s2)) + geom_point(aes(color = factor(y)), alpha = 0.75) + scale_color_manual(values = c("red", "blue"), labels = c("0", "1")) + guides(alpha = 'none') + theme_bw() + theme(axis.ticks = element_line(linewidth = 0.25), panel.background = element_blank(), panel.grid = element_blank(), legend.title = element_text(size = 10, hjust = 0.25), legend.box.just = "center", aspect.ratio = 1)
Dataset of size 500, with a binomial response variable indexed by spatial coordinates sampled uniformly from the unit square. The model includes one covariate and spatial random effects induced by a Matérn covariogram. The number of trials at each location is sampled from a Poisson distribution with mean 20.
data(simBinom)
data(simBinom)
a data.frame
object.
s1, s2
2-D coordinates; latitude and longitude.
x1
a covariate sampled from the standard normal distribution.
y
response vector.
n_trials
Number of trials at that location.
z_true
true spatial random effects that generated the data.
With , the count data is simulated using
where the function refers to the inverse-logit
function, the number of trials
is sampled from a Poisson
distribution with mean 20, the spatial effects
with
being a
correlation matrix given by the Matérn
covariogram
where is the spatial decay parameter and
the spatial
smoothness parameter. We have sampled the data with
,
,
, and
. This data can be generated with the code as given in
the example below.
Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]
simGaussian, simPoisson, simBinary
set.seed(1729) n <- 500 beta <- c(0.5, -0.5) phi0 <- 3 nu0 <- 0.5 spParams <- c(phi0, nu0) spvar <- 0.4 sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern", spParams = spParams, spvar = spvar, deltasq = deltasq, n_binom = rpois(n, 20), family = "binomial") plot1 <- surfaceplot(sim1, coords_name = c("s1", "s2"), var_name = "z_true") library(ggplot2) plot2 <- ggplot(sim1, aes(x = s1, y = s2)) + geom_point(aes(color = y), alpha = 0.75) + scale_color_distiller(palette = "RdYlGn", direction = -1, label = function(x) sprintf("%.0f", x)) + guides(alpha = 'none') + theme_bw() + theme(axis.ticks = element_line(linewidth = 0.25), panel.background = element_blank(), panel.grid = element_blank(), legend.title = element_text(size = 10, hjust = 0.25), legend.box.just = "center", aspect.ratio = 1)
set.seed(1729) n <- 500 beta <- c(0.5, -0.5) phi0 <- 3 nu0 <- 0.5 spParams <- c(phi0, nu0) spvar <- 0.4 sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern", spParams = spParams, spvar = spvar, deltasq = deltasq, n_binom = rpois(n, 20), family = "binomial") plot1 <- surfaceplot(sim1, coords_name = c("s1", "s2"), var_name = "z_true") library(ggplot2) plot2 <- ggplot(sim1, aes(x = s1, y = s2)) + geom_point(aes(color = y), alpha = 0.75) + scale_color_distiller(palette = "RdYlGn", direction = -1, label = function(x) sprintf("%.0f", x)) + guides(alpha = 'none') + theme_bw() + theme(axis.ticks = element_line(linewidth = 0.25), panel.background = element_blank(), panel.grid = element_blank(), legend.title = element_text(size = 10, hjust = 0.25), legend.box.just = "center", aspect.ratio = 1)
Dataset of size 500 with a Gaussian response variable, simulated with spatial coordinates sampled uniformly from the unit square. The model includes one covariate and spatial random effects induced by a Matérn covariogram.
data(simGaussian)
data(simGaussian)
a data.frame
object.
s1, s2
2-D coordinates; latitude and longitude.
x1
a covariate sampled from the standard normal distribution.
y
response vector.
z_true
true spatial random effects that generated the data.
The data is generated using the model
where the spatial effects is independent of the
measurement error
with
being the noise-to-spatial variance ratio and
being a
correlation matrix given by the Matérn covariogram
where is the spatial decay parameter and
the spatial
smoothness parameter. We have sampled the data with
,
,
,
and
. This data can be generated with the code as given in
the example.
Soumyakanti Pan [email protected]
simPoisson, simBinom, simBinary
set.seed(1729) n <- 500 beta <- c(2, 5) phi0 <- 2 nu0 <- 0.5 spParams <- c(phi0, nu0) spvar <- 0.4 deltasq <- 1 sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern", spParams = spParams, spvar = spvar, deltasq = deltasq, family = "gaussian") plot1 <- surfaceplot(sim1, coords_name = c("s1", "s2"), var_name = "z_true", mark_points = TRUE) plot1 library(ggplot2) plot2 <- ggplot(sim1, aes(x = s1, y = s2)) + geom_point(aes(color = y), alpha = 0.75) + scale_color_distiller(palette = "RdYlGn", direction = -1, label = function(x) sprintf("%.0f", x)) + guides(alpha = 'none') + theme_bw() + theme(axis.ticks = element_line(linewidth = 0.25), panel.background = element_blank(), panel.grid = element_blank(), legend.title = element_text(size = 10, hjust = 0.25), legend.box.just = "center", aspect.ratio = 1) plot2
set.seed(1729) n <- 500 beta <- c(2, 5) phi0 <- 2 nu0 <- 0.5 spParams <- c(phi0, nu0) spvar <- 0.4 deltasq <- 1 sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern", spParams = spParams, spvar = spvar, deltasq = deltasq, family = "gaussian") plot1 <- surfaceplot(sim1, coords_name = c("s1", "s2"), var_name = "z_true", mark_points = TRUE) plot1 library(ggplot2) plot2 <- ggplot(sim1, aes(x = s1, y = s2)) + geom_point(aes(color = y), alpha = 0.75) + scale_color_distiller(palette = "RdYlGn", direction = -1, label = function(x) sprintf("%.0f", x)) + guides(alpha = 'none') + theme_bw() + theme(axis.ticks = element_line(linewidth = 0.25), panel.background = element_blank(), panel.grid = element_blank(), legend.title = element_text(size = 10, hjust = 0.25), legend.box.just = "center", aspect.ratio = 1) plot2
Dataset of size 500, with a Poisson distributed response variable indexed by spatial coordinates sampled uniformly from the unit square. The model includes one covariate and spatial random effects induced by a Matérn covariogram.
data(simPoisson)
data(simPoisson)
a data.frame
object.
s1, s2
2-D coordinates; latitude and longitude.
x1
a covariate sampled from the standard normal distribution.
y
response vector.
z_true
true spatial random effects that generated the data.
With , the count data is simulated using
where the spatial effects with
being a
correlation matrix given by the Matérn covariogram
where is the spatial decay parameter and
the spatial
smoothness parameter. We have sampled the data with
,
,
, and
. This data can be
generated with the code as given in the example below.
simGaussian, simBinom, simBinary
set.seed(1729) n <- 500 beta <- c(2, -0.5) phi0 <- 5 nu0 <- 0.5 spParams <- c(phi0, nu0) spvar <- 0.4 sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern", spParams = spParams, spvar = spvar, deltasq = deltasq, family = "poisson") # Plot an interpolated spatial surface of the true random spatial effects plot1 <- surfaceplot(sim1, coords_name = c("s1", "s2"), var_name = "z_true") # Plot the simulated count data library(ggplot2) plot2 <- ggplot(sim1, aes(x = s1, y = s2)) + geom_point(aes(color = y), alpha = 0.75) + scale_color_distiller(palette = "RdYlGn", direction = -1, label = function(x) sprintf("%.0f", x)) + guides(alpha = 'none') + theme_bw() + theme(axis.ticks = element_line(linewidth = 0.25), panel.background = element_blank(), panel.grid = element_blank(), legend.title = element_text(size = 10, hjust = 0.25), legend.box.just = "center", aspect.ratio = 1)
set.seed(1729) n <- 500 beta <- c(2, -0.5) phi0 <- 5 nu0 <- 0.5 spParams <- c(phi0, nu0) spvar <- 0.4 sim1 <- sim_spData(n = n, beta = beta, cor.fn = "matern", spParams = spParams, spvar = spvar, deltasq = deltasq, family = "poisson") # Plot an interpolated spatial surface of the true random spatial effects plot1 <- surfaceplot(sim1, coords_name = c("s1", "s2"), var_name = "z_true") # Plot the simulated count data library(ggplot2) plot2 <- ggplot(sim1, aes(x = s1, y = s2)) + geom_point(aes(color = y), alpha = 0.75) + scale_color_distiller(palette = "RdYlGn", direction = -1, label = function(x) sprintf("%.0f", x)) + guides(alpha = 'none') + theme_bw() + theme(axis.ticks = element_line(linewidth = 0.25), panel.background = element_blank(), panel.grid = element_blank(), legend.title = element_text(size = 10, hjust = 0.25), legend.box.just = "center", aspect.ratio = 1)
Fits a Bayesian spatial generalized linear model with fixed values of spatial process parameters and some auxiliary model parameters. The output contains posterior samples of the fixed effects, spatial random effects and, if required, finds leave-one-out predictive densities.
spGLMexact( formula, data = parent.frame(), family, coords, cor.fn, priors, spParams, boundary = 0.5, n.samples, loopd = FALSE, loopd.method = "exact", CV.K = 10, loopd.nMC = 500, verbose = TRUE, ... )
spGLMexact( formula, data = parent.frame(), family, coords, cor.fn, priors, spParams, boundary = 0.5, n.samples, loopd = FALSE, loopd.method = "exact", CV.K = 10, loopd.nMC = 500, verbose = TRUE, ... )
formula |
a symbolic description of the regression model to be fit. See example below. |
data |
an optional data frame containing the variables in the model.
If not found in |
family |
Specifies the distribution of the response as a member of the
exponential family. Supported options are |
coords |
an |
cor.fn |
a quoted keyword that specifies the correlation function used
to model the spatial dependence structure among the observations. Supported
covariance model key words are: |
priors |
(optional) a list with each tag corresponding to a
hyperparameter name and containing hyperprior details. Valid tags include
|
spParams |
fixed values of spatial process parameters. |
boundary |
Specifies the boundary adjustment parameter. Must be a real number between 0 and 1. Default is 0.5. |
n.samples |
number of posterior samples to be generated. |
loopd |
logical. If |
loopd.method |
character. Ignored if |
CV.K |
An integer between 10 and 20. Considered only if
|
loopd.nMC |
Number of Monte Carlo samples to be used to evaluate
leave-one-out predictive densities when |
verbose |
logical. If |
... |
currently no additional argument. |
With this function, we fit a Bayesian hierarchical spatial
generalized linear model by sampling exactly from the joint posterior
distribution utilizing the generalized conjugate multivariate distribution
theory (Bradley and Clinch 2024). Suppose
denotes the
spatial locations the response
is observed. Let
be the outcome at location
endowed with a probability law
from the natural exponential family, which we denote by
for some positive parameter and unit log partition function
. We consider the following response models based on the input
supplied to the argument
family
.
'poisson'
It considers point-referenced Poisson responses
. Here,
and
.
'binomial'
It considers point-referenced binomial counts
where,
denotes
the total number of trials and probability of success
at location
.
Here,
and
.
'binary'
It considers point-referenced binary data (0 or, 1) i.e.,
, where probability of success
at location
.
Here,
and
.
The hierarchical model is given as
where denotes the discrepancy
parameter. We fix the spatial process parameters
and
and
the hyperparameters
,
,
and
. The term
is known as the fine-scale variation
term which is given a conditional generalized conjugate multivariate
distribution as prior. For more details, see Pan et al. 2024. Default
values for
,
,
,
are diagonal with each diagonal element 100, 2.1, 2.1 and 0.1 respectively.
An object of class spGLMexact
, which is a list with the
following tags -
details of the priors used, containing the values of the
boundary adjustment parameter (boundary
), the variance parameter of the
fine-scale variation term (simasq.xi
) and others.
a list of length 3, containing posterior samples of fixed
effects (beta
), spatial effects (z
) and the fine-scale
variation term (xi
).
If loopd=TRUE
, contains leave-one-out predictive
densities.
Values of the fixed parameters that includes
phi
(spatial decay), nu
(spatial smoothness).
The return object might include additional data that can be used for subsequent prediction and/or model fit evaluation.
Soumyakanti Pan [email protected]
Bradley JR, Clinch M (2024). "Generating Independent Replicates Directly from the Posterior Distribution for a Class of Spatial Hierarchical Models." Journal of Computational and Graphical Statistics, 0(0), 1-17. doi:10.1080/10618600.2024.2365728.
Pan S, Zhang L, Bradley JR, Banerjee S (2024). "Bayesian Inference for Spatial-temporal Non-Gaussian Data Using Predictive Stacking." doi:10.48550/arXiv.2406.04655.
Vehtari A, Gelman A, Gabry J (2017). "Practical Bayesian Model Evaluation Using Leave-One-out Cross-Validation and WAIC." Statistics and Computing, 27(5), 1413-1432. ISSN 0960-3174. doi:10.1007/s11222-016-9696-4.
# Example 1: Analyze spatial poisson count data data(simPoisson) dat <- simPoisson[1:10, ] mod1 <- spGLMexact(y ~ x1, data = dat, family = "poisson", coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", spParams = list(phi = 4, nu = 0.4), n.samples = 100, verbose = TRUE) # summarize posterior samples post_beta <- mod1$samples$beta print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975))))) # Example 2: Analyze spatial binomial count data data(simBinom) dat <- simBinom[1:10, ] mod2 <- spGLMexact(cbind(y, n_trials) ~ x1, data = dat, family = "binomial", coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", spParams = list(phi = 3, nu = 0.4), n.samples = 100, verbose = TRUE) # summarize posterior samples post_beta <- mod2$samples$beta print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975))))) # Example 3: Analyze spatial binary data data(simBinary) dat <- simBinary[1:10, ] mod3 <- spGLMexact(y ~ x1, data = dat, family = "binary", coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", spParams = list(phi = 4, nu = 0.4), n.samples = 100, verbose = TRUE) # summarize posterior samples post_beta <- mod3$samples$beta print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))))
# Example 1: Analyze spatial poisson count data data(simPoisson) dat <- simPoisson[1:10, ] mod1 <- spGLMexact(y ~ x1, data = dat, family = "poisson", coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", spParams = list(phi = 4, nu = 0.4), n.samples = 100, verbose = TRUE) # summarize posterior samples post_beta <- mod1$samples$beta print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975))))) # Example 2: Analyze spatial binomial count data data(simBinom) dat <- simBinom[1:10, ] mod2 <- spGLMexact(cbind(y, n_trials) ~ x1, data = dat, family = "binomial", coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", spParams = list(phi = 3, nu = 0.4), n.samples = 100, verbose = TRUE) # summarize posterior samples post_beta <- mod2$samples$beta print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975))))) # Example 3: Analyze spatial binary data data(simBinary) dat <- simBinary[1:10, ] mod3 <- spGLMexact(y ~ x1, data = dat, family = "binary", coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", spParams = list(phi = 4, nu = 0.4), n.samples = 100, verbose = TRUE) # summarize posterior samples post_beta <- mod3$samples$beta print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))))
Fits Bayesian spatial generalized linear model on a collection of candidate models constructed based on some candidate values of some model parameters specified by the user and subsequently combines inference by stacking predictive densities. See Pan, Zhang, Bradley, and Banerjee (2024) for more details.
spGLMstack( formula, data = parent.frame(), family, coords, cor.fn, priors, params.list, n.samples, loopd.controls, parallel = FALSE, solver = "ECOS", verbose = TRUE, ... )
spGLMstack( formula, data = parent.frame(), family, coords, cor.fn, priors, params.list, n.samples, loopd.controls, parallel = FALSE, solver = "ECOS", verbose = TRUE, ... )
formula |
a symbolic description of the regression model to be fit. See example below. |
data |
an optional data frame containing the variables in the model.
If not found in |
family |
Specifies the distribution of the response as a member of the
exponential family. Supported options are |
coords |
an |
cor.fn |
a quoted keyword that specifies the correlation function used
to model the spatial dependence structure among the observations. Supported
covariance model key words are: |
priors |
(optional) a list with each tag corresponding to a parameter
name and containing prior details. Valid tags include |
params.list |
a list containing candidate values of spatial process
parameters for the |
n.samples |
number of posterior samples to be generated. |
loopd.controls |
a list with details on how leave-one-out predictive
densities (LOO-PD) are to be calculated. Valid tags include |
parallel |
logical. If |
solver |
(optional) Specifies the name of the solver that will be used
to obtain optimal stacking weights for each candidate model. Default is
|
verbose |
logical. If |
... |
currently no additional argument. |
Instead of assigning a prior on the process parameters
and
, the boundary adjustment parameter
, we consider
a set of candidate models based on some candidate values of these parameters
supplied by the user. Suppose the set of candidate models is
. Then for each
, we sample from the posterior distribution
under the model
and find
leave-one-out predictive densities
. Then we
solve the optimization problem
to find the optimal stacking weights .
An object of class spGLMstack
, which is a list including the
following tags -
family
the distribution of the responses as indicated in the function call
samples
a list of length equal to total number of candidate models
with each entry corresponding to a list of length 3, containing posterior
samples of fixed effects (beta
), spatial effects (z
) and
fine-scale variation term (xi
) for that particular model.
loopd
a list of length equal to total number of candidate models with each entry containing leave-one-out predictive densities under that particular model.
loopd.method
a list containing details of the algorithm used for calculation of leave-one-out predictive densities.
n.models
number of candidate models that are fit.
candidate.models
a matrix with n_model
rows with each row
containing details of the model parameters and its optimal weight.
stacking.weights
a numeric vector of length equal to the number of candidate models storing the optimal stacking weights.
run.time
a proc_time
object with runtime details.
solver.status
solver status as returned by the optimization routine.
The return object might include additional data that is useful for subsequent prediction, model fit evaluation and other utilities.
Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]
Pan S, Zhang L, Bradley JR, Banerjee S (2024). "Bayesian Inference for Spatial-temporal Non-Gaussian Data Using Predictive Stacking." doi:10.48550/arXiv.2406.04655.
Vehtari A, Simpson D, Gelman A, Yao Y, Gabry J (2024). "Pareto Smoothed Importance Sampling." Journal of Machine Learning Research, 25(72), 1-58. URL https://jmlr.org/papers/v25/19-556.html.
data("simPoisson") dat <- simPoisson[1:100,] mod1 <- spGLMstack(y ~ x1, data = dat, family = "poisson", coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", params.list = list(phi = c(3, 7, 10), nu = c(0.25, 0.5, 1.5), boundary = c(0.5, 0.6)), n.samples = 1000, loopd.controls = list(method = "CV", CV.K = 10, nMC = 1000), parallel = TRUE, solver = "ECOS", verbose = TRUE) # print(mod1$solver.status) # print(mod1$run.time) post_samps <- stackedSampler(mod1) post_beta <- post_samps$beta print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975))))) post_z <- post_samps$z post_z_summ <- t(apply(post_z, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))) z_combn <- data.frame(z = dat$z_true, zL = post_z_summ[, 1], zM = post_z_summ[, 2], zU = post_z_summ[, 3]) library(ggplot2) plot_z <- ggplot(data = z_combn, aes(x = z)) + geom_errorbar(aes(ymin = zL, ymax = zU), width = 0.05, alpha = 0.15, color = "skyblue") + geom_point(aes(y = zM), size = 0.25, color = "darkblue", alpha = 0.5) + geom_abline(slope = 1, intercept = 0, color = "red", linetype = "solid") + xlab("True z") + ylab("Posterior of z") + theme_bw() + theme(panel.background = element_blank(), aspect.ratio = 1)
data("simPoisson") dat <- simPoisson[1:100,] mod1 <- spGLMstack(y ~ x1, data = dat, family = "poisson", coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", params.list = list(phi = c(3, 7, 10), nu = c(0.25, 0.5, 1.5), boundary = c(0.5, 0.6)), n.samples = 1000, loopd.controls = list(method = "CV", CV.K = 10, nMC = 1000), parallel = TRUE, solver = "ECOS", verbose = TRUE) # print(mod1$solver.status) # print(mod1$run.time) post_samps <- stackedSampler(mod1) post_beta <- post_samps$beta print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975))))) post_z <- post_samps$z post_z_summ <- t(apply(post_z, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))) z_combn <- data.frame(z = dat$z_true, zL = post_z_summ[, 1], zM = post_z_summ[, 2], zU = post_z_summ[, 3]) library(ggplot2) plot_z <- ggplot(data = z_combn, aes(x = z)) + geom_errorbar(aes(ymin = zL, ymax = zU), width = 0.05, alpha = 0.15, color = "skyblue") + geom_point(aes(y = zM), size = 0.25, color = "darkblue", alpha = 0.5) + geom_abline(slope = 1, intercept = 0, color = "red", linetype = "solid") + xlab("True z") + ylab("Posterior of z") + theme_bw() + theme(panel.background = element_blank(), aspect.ratio = 1)
Fits a Bayesian spatial linear model with spatial process parameters and the noise-to-spatial variance ratio fixed to a value supplied by the user. The output contains posterior samples of the fixed effects, variance parameter, spatial random effects and, if required, leave-one-out predictive densities.
spLMexact( formula, data = parent.frame(), coords, cor.fn, priors, spParams, noise_sp_ratio, n.samples, loopd = FALSE, loopd.method = "exact", verbose = TRUE, ... )
spLMexact( formula, data = parent.frame(), coords, cor.fn, priors, spParams, noise_sp_ratio, n.samples, loopd = FALSE, loopd.method = "exact", verbose = TRUE, ... )
formula |
a symbolic description of the regression model to be fit. See example below. |
data |
an optional data frame containing the variables in the model.
If not found in |
coords |
an |
cor.fn |
a quoted keyword that specifies the correlation function used
to model the spatial dependence structure among the observations. Supported
covariance model key words are: |
priors |
a list with each tag corresponding to a parameter name and containing prior details. |
spParams |
fixed value of spatial process parameters. |
noise_sp_ratio |
noise-to-spatial variance ratio. |
n.samples |
number of posterior samples to be generated. |
loopd |
logical. If |
loopd.method |
character. Ignored if |
verbose |
logical. If |
... |
currently no additional argument. |
Suppose denotes the
spatial locations the response
is observed. With this function, we
fit a conjugate Bayesian hierarchical spatial model
where we fix the spatial process parameters and
, the
noise-to-spatial variance ratio
and the hyperparameters
,
,
and
. We utilize
a composition sampling strategy to sample the model parameters from their
joint posterior distribution which can be written as
We proceed by first sampling from its marginal posterior,
then given the samples of
, we sample
and
subsequently, we sample
conditioned on the posterior samples of
and
(Banerjee 2020).
An object of class spLMexact
, which is a list with the
following tags -
a list of length 3, containing posterior samples of fixed
effects (beta
), variance parameter (sigmaSq
), spatial effects
(z
).
If loopd=TRUE
, contains leave-one-out predictive
densities.
Values of the fixed parameters that includes
phi
(spatial decay), nu
(spatial smoothness) and
noise_sp_ratio
(noise-to-spatial variance ratio).
The return object might include additional data used for subsequent prediction and/or model fit evaluation.
Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]
Banerjee S (2020). "Modeling massive spatial datasets using a conjugate Bayesian linear modeling framework." Spatial Statistics, 37, 100417. ISSN 2211-6753. doi:10.1016/j.spasta.2020.100417.
Vehtari A, Simpson D, Gelman A, Yao Y, Gabry J (2024). "Pareto Smoothed Importance Sampling." Journal of Machine Learning Research, 25(72), 1-58. URL https://jmlr.org/papers/v25/19-556.html.
# load data data(simGaussian) dat <- simGaussian[1:100, ] # setup prior list muBeta <- c(0, 0) VBeta <- cbind(c(1.0, 0.0), c(0.0, 1.0)) sigmaSqIGa <- 2 sigmaSqIGb <- 0.1 prior_list <- list(beta.norm = list(muBeta, VBeta), sigma.sq.ig = c(sigmaSqIGa, sigmaSqIGb)) # supply fixed values of model parameters phi0 <- 3 nu0 <- 0.75 noise.sp.ratio <- 0.8 mod1 <- spLMexact(y ~ x1, data = dat, coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", priors = prior_list, spParams = list(phi = phi0, nu = nu0), noise_sp_ratio = noise.sp.ratio, n.samples = 100, loopd = TRUE, loopd.method = "exact") beta.post <- mod1$samples$beta z.post.median <- apply(mod1$samples$z, 1, median) dat$z.post.median <- z.post.median plot1 <- surfaceplot(dat, coords_name = c("s1", "s2"), var_name = "z_true") plot2 <- surfaceplot(dat, coords_name = c("s1", "s2"), var_name = "z.post.median") plot1 plot2
# load data data(simGaussian) dat <- simGaussian[1:100, ] # setup prior list muBeta <- c(0, 0) VBeta <- cbind(c(1.0, 0.0), c(0.0, 1.0)) sigmaSqIGa <- 2 sigmaSqIGb <- 0.1 prior_list <- list(beta.norm = list(muBeta, VBeta), sigma.sq.ig = c(sigmaSqIGa, sigmaSqIGb)) # supply fixed values of model parameters phi0 <- 3 nu0 <- 0.75 noise.sp.ratio <- 0.8 mod1 <- spLMexact(y ~ x1, data = dat, coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", priors = prior_list, spParams = list(phi = phi0, nu = nu0), noise_sp_ratio = noise.sp.ratio, n.samples = 100, loopd = TRUE, loopd.method = "exact") beta.post <- mod1$samples$beta z.post.median <- apply(mod1$samples$z, 1, median) dat$z.post.median <- z.post.median plot1 <- surfaceplot(dat, coords_name = c("s1", "s2"), var_name = "z_true") plot2 <- surfaceplot(dat, coords_name = c("s1", "s2"), var_name = "z.post.median") plot1 plot2
Fits Bayesian spatial linear model on a collection of candidate models constructed based on some candidate values of some model parameters specified by the user and subsequently combines inference by stacking predictive densities. See Zhang, Tang and Banerjee (2024) for more details.
spLMstack( formula, data = parent.frame(), coords, cor.fn, priors, params.list, n.samples, loopd.method, parallel = FALSE, solver = "ECOS", verbose = TRUE, ... )
spLMstack( formula, data = parent.frame(), coords, cor.fn, priors, params.list, n.samples, loopd.method, parallel = FALSE, solver = "ECOS", verbose = TRUE, ... )
formula |
a symbolic description of the regression model to be fit. See example below. |
data |
an optional data frame containing the variables in the model.
If not found in |
coords |
an |
cor.fn |
a quoted keyword that specifies the correlation function used
to model the spatial dependence structure among the observations. Supported
covariance model key words are: |
priors |
a list with each tag corresponding to a parameter name and containing prior details. If not supplied, uses defaults. |
params.list |
a list containing candidate values of spatial process
parameters for the |
n.samples |
number of posterior samples to be generated. |
loopd.method |
character. Valid inputs are |
parallel |
logical. If |
solver |
(optional) Specifies the name of the solver that will be used
to obtain optimal stacking weights for each candidate model. Default is
|
verbose |
logical. If |
... |
currently no additional argument. |
Instead of assigning a prior on the process parameters
and
, noise-to-spatial variance ratio
, we consider
a set of candidate models based on some candidate values of these parameters
supplied by the user. Suppose the set of candidate models is
. Then for each
, we sample from the posterior distribution
under the model
and find
leave-one-out predictive densities
. Then we
solve the optimization problem
to find the optimal stacking weights .
An object of class spLMstack
, which is a list including the
following tags -
samples
a list of length equal to total number of candidate models
with each entry corresponding to a list of length 3, containing posterior
samples of fixed effects (beta
), variance parameter
(sigmaSq
), spatial effects (z
) for that model.
loopd
a list of length equal to total number of candidate models with each entry containing leave-one-out predictive densities under that particular model.
n.models
number of candidate models that are fit.
candidate.models
a matrix with n_model
rows with each row
containing details of the model parameters and its optimal weight.
stacking.weights
a numeric vector of length equal to the number of candidate models storing the optimal stacking weights.
run.time
a proc_time
object with runtime details.
solver.status
solver status as returned by the optimization routine.
The return object might include additional data that is useful for subsequent prediction, model fit evaluation and other utilities.
Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]
Vehtari A, Simpson D, Gelman A, Yao Y, Gabry J (2024). "Pareto Smoothed Importance Sampling." Journal of Machine Learning Research, 25(72), 1-58. URL https://jmlr.org/papers/v25/19-556.html.
Zhang L, Tang W, Banerjee S (2024). "Bayesian Geostatistics Using
Predictive Stacking."
doi:10.48550/arXiv.2304.12414.
# load data and work with first 100 rows data(simGaussian) dat <- simGaussian[1:100, ] # setup prior list muBeta <- c(0, 0) VBeta <- cbind(c(1.0, 0.0), c(0.0, 1.0)) sigmaSqIGa <- 2 sigmaSqIGb <- 2 prior_list <- list(beta.norm = list(muBeta, VBeta), sigma.sq.ig = c(sigmaSqIGa, sigmaSqIGb)) mod1 <- spLMstack(y ~ x1, data = dat, coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", priors = prior_list, params.list = list(phi = c(1.5, 3), nu = c(0.5, 1), noise_sp_ratio = c(1)), n.samples = 1000, loopd.method = "exact", parallel = FALSE, solver = "ECOS", verbose = TRUE) post_samps <- stackedSampler(mod1) post_beta <- post_samps$beta print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975))))) post_z <- post_samps$z post_z_summ <- t(apply(post_z, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))) z_combn <- data.frame(z = dat$z_true, zL = post_z_summ[, 1], zM = post_z_summ[, 2], zU = post_z_summ[, 3]) library(ggplot2) plot1 <- ggplot(data = z_combn, aes(x = z)) + geom_point(aes(y = zM), size = 0.25, color = "darkblue", alpha = 0.5) + geom_errorbar(aes(ymin = zL, ymax = zU), width = 0.05, alpha = 0.15) + geom_abline(slope = 1, intercept = 0, color = "red", linetype = "solid") + xlab("True z") + ylab("Stacked posterior of z") + theme_bw() + theme(panel.background = element_blank(), aspect.ratio = 1)
# load data and work with first 100 rows data(simGaussian) dat <- simGaussian[1:100, ] # setup prior list muBeta <- c(0, 0) VBeta <- cbind(c(1.0, 0.0), c(0.0, 1.0)) sigmaSqIGa <- 2 sigmaSqIGb <- 2 prior_list <- list(beta.norm = list(muBeta, VBeta), sigma.sq.ig = c(sigmaSqIGa, sigmaSqIGb)) mod1 <- spLMstack(y ~ x1, data = dat, coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", priors = prior_list, params.list = list(phi = c(1.5, 3), nu = c(0.5, 1), noise_sp_ratio = c(1)), n.samples = 1000, loopd.method = "exact", parallel = FALSE, solver = "ECOS", verbose = TRUE) post_samps <- stackedSampler(mod1) post_beta <- post_samps$beta print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975))))) post_z <- post_samps$z post_z_summ <- t(apply(post_z, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))) z_combn <- data.frame(z = dat$z_true, zL = post_z_summ[, 1], zM = post_z_summ[, 2], zU = post_z_summ[, 3]) library(ggplot2) plot1 <- ggplot(data = z_combn, aes(x = z)) + geom_point(aes(y = zM), size = 0.25, color = "darkblue", alpha = 0.5) + geom_errorbar(aes(ymin = zL, ymax = zU), width = 0.05, alpha = 0.15) + geom_abline(slope = 1, intercept = 0, color = "red", linetype = "solid") + xlab("True z") + ylab("Stacked posterior of z") + theme_bw() + theme(panel.background = element_blank(), aspect.ratio = 1)
A helper function to sample from the stacked posterior
distribution to obtain final posterior samples that can be used for
subsequent analysis. This function applies on outputs of functions
spLMstack()
and spGLMstack()
.
stackedSampler(mod_out, n.samples)
stackedSampler(mod_out, n.samples)
mod_out |
an object of class |
n.samples |
(optional) If missing, inherits the number of posterior samples from the original output. Otherwise, it specifies number of posterior samples to draw from the stacked posterior. If it exceeds the number of posterior draws used in the original function, then a warning is thrown and the samples are obtained by resampling. It is recommended, to run the original function with enough samples. |
After obtaining the optimal stacking weights
, posterior inference of quantities of
interest subsequently proceed from the stacked posterior,
where is the collection of candidate
models.
An object of class stacked_posterior
, which is a list that
includes the following tags -
samples of the fixed effect from the stacked joint posterior.
samples of the spatial random effects from the stacked joint posterior.
In case of model output of class spLMstack
, the list additionally contains
sigmaSq
which are the samples of the variance parameter from the stacked
joint posterior of the spatial linear model. For model output of class
spGLMstack
, the list also contains xi
which are the samples of the
fine-scale variation term from the stacked joint posterior of the spatial
generalized linear model.
Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]
data(simGaussian) dat <- simGaussian[1:100, ] mod1 <- spLMstack(y ~ x1, data = dat, coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", params.list = list(phi = c(1.5, 3), nu = c(0.5, 1), noise_sp_ratio = c(1)), n.samples = 1000, loopd.method = "exact", parallel = FALSE, solver = "ECOS", verbose = TRUE) print(mod1$solver.status) print(mod1$run.time) post_samps <- stackedSampler(mod1) post_beta <- post_samps$beta print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))))
data(simGaussian) dat <- simGaussian[1:100, ] mod1 <- spLMstack(y ~ x1, data = dat, coords = as.matrix(dat[, c("s1", "s2")]), cor.fn = "matern", params.list = list(phi = c(1.5, 3), nu = c(0.5, 1), noise_sp_ratio = c(1)), n.samples = 1000, loopd.method = "exact", parallel = FALSE, solver = "ECOS", verbose = TRUE) print(mod1$solver.status) print(mod1$run.time) post_samps <- stackedSampler(mod1) post_beta <- post_samps$beta print(t(apply(post_beta, 1, function(x) quantile(x, c(0.025, 0.5, 0.975)))))
Fits a Bayesian generalized linear model with spatially-temporally varying coefficients under fixed values of spatial-temporal process parameters and some auxiliary model parameters. The output contains posterior samples of the fixed effects, spatial-temporal random effects and, if required, finds leave-one-out predictive densities.
stvcGLMexact( formula, data = parent.frame(), family, sp_coords, time_coords, cor.fn, sptParams, sptShared = FALSE, priors, boundary = 0.5, n.samples, loopd = FALSE, loopd.method = "exact", CV.K = 10, loopd.nMC = 500, verbose = TRUE, ... )
stvcGLMexact( formula, data = parent.frame(), family, sp_coords, time_coords, cor.fn, sptParams, sptShared = FALSE, priors, boundary = 0.5, n.samples, loopd = FALSE, loopd.method = "exact", CV.K = 10, loopd.nMC = 500, verbose = TRUE, ... )
formula |
a symbolic description of the regression model to be fit. Variables in parenthesis are assigned spatially-temporally varying coefficients. See example below. |
data |
an optional data frame containing the variables in the model.
If not found in |
family |
Specifies the distribution of the response as a member of the
exponential family. Supported options are |
sp_coords |
an |
time_coords |
an |
cor.fn |
a quoted keyword that specifies the correlation function used
to model the spatial-temporal dependence structure among the observations.
Supported covariance model key words are: |
sptParams |
fixed values of spatial-temporal process parameters in
usually a list of length 2. If |
sptShared |
If |
priors |
(optional) a list with each tag corresponding to a
hyperparameter name and containing hyperprior details. Valid tags include
|
boundary |
Specifies the boundary adjustment parameter. Must be a real number between 0 and 1. Default is 0.5. |
n.samples |
number of posterior samples to be generated. |
loopd |
logical. If |
loopd.method |
character. Ignored if |
CV.K |
An integer between 10 and 20. Considered only if
|
loopd.nMC |
Number of Monte Carlo samples to be used to evaluate
leave-one-out predictive densities when |
verbose |
logical. If |
... |
currently no additional argument. |
With this function, we fit a Bayesian hierarchical
spatially-temporally varying generalized linear model by sampling exactly
from the joint posterior distribution utilizing the generalized conjugate
multivariate distribution theory (Bradley and Clinch 2024). Suppose
denotes the
spatial-temporal
co-ordinates in
, the
response
is observed. Let
be the outcome at the
co-ordinate
endowed with a probability law from the natural
exponential family, which we denote by
for some positive parameter and unit log partition function
. Here,
denotes covariates with
spatially-temporally varying coefficients We consider the following response
models based on the input supplied to the argument
family
.
'poisson'
It considers point-referenced Poisson responses
. Here,
and
.
'binomial'
It considers point-referenced binomial counts
where,
denotes the total number of trials and probability of success
at spatial-temporal co-ordinate
. Here,
and
.
'binary'
It considers point-referenced binary data (0 or, 1) i.e.,
, where probability of
success
at spatial-temporal co-ordinate
.
Here,
and
.
The hierarchical model is given as
where denotes the discrepancy
parameter. We fix the spatial-temporal process parameters
and
and the hyperparameters
,
,
and
. The term
is known as the
fine-scale variation term which is given a conditional generalized conjugate
multivariate distribution as prior. For more details, see Pan et al. 2024.
Default values for
,
,
,
are diagonal with each diagonal element 100, 2.1, 2.1 and
0.1 respectively.
An object of class stvcGLMexact
, which is a list with the
following tags -
details of the priors used, containing the values of the
boundary adjustment parameter (boundary
), the variance parameter of the
fine-scale variation term (simasq.xi
) and others.
a list of length 3, containing posterior samples of fixed
effects (beta
), spatial-temporal effects (z
) and the
fine-scale variation term (xi
). The element with tag z
will
again be a list of length , each containing posterior samples of the
spatial-temporal random effects corresponding to each varying coefficient.
If loopd=TRUE
, contains leave-one-out predictive
densities.
Values of the fixed parameters that includes phi_s
(spatial decay), phi_t
(temporal smoothness).
The return object might include additional data that can be used for subsequent prediction and/or model fit evaluation.
Soumyakanti Pan [email protected]
Bradley JR, Clinch M (2024). "Generating Independent Replicates Directly from the Posterior Distribution for a Class of Spatial Hierarchical Models." Journal of Computational and Graphical Statistics, 0(0), 1-17. doi:10.1080/10618600.2024.2365728.
T. Gneiting and P. Guttorp (2010). "Continuous-parameter spatio-temporal processes." In A.E. Gelfand, P.J. Diggle, M. Fuentes, and P Guttorp, editors, Handbook of Spatial Statistics, Chapman & Hall CRC Handbooks of Modern Statistical Methods, p 427–436. Taylor and Francis.
Pan S, Zhang L, Bradley JR, Banerjee S (2024). "Bayesian Inference for Spatial-temporal Non-Gaussian Data Using Predictive Stacking." doi:10.48550/arXiv.2406.04655.
Vehtari A, Gelman A, Gabry J (2017). "Practical Bayesian Model Evaluation Using Leave-One-out Cross-Validation and WAIC." Statistics and Computing, 27(5), 1413-1432. ISSN 0960-3174. doi:10.1007/s11222-016-9696-4.
Make a surface plot
surfaceplot(tab, coords_name, var_name, h = 8, col.pal, mark_points = FALSE)
surfaceplot(tab, coords_name, var_name, h = 8, col.pal, mark_points = FALSE)
tab |
a data-frame containing spatial co-ordinates and the variable to plot |
coords_name |
name of the two columns that contains the co-ordinates of the points |
var_name |
name of the column containing the variable to be plotted |
h |
integer; (optional) controls smoothness of the spatial interpolation
as appearing in the |
col.pal |
Optional; color palette, preferably divergent, use
|
mark_points |
Logical; if |
a ggplot
object containing the surface plot
Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]
data(simGaussian) plot1 <- surfaceplot(simGaussian, coords_name = c("s1", "s2"), var_name = "z_true") plot1 # try your favourite color palette col.br <- colorRampPalette(c("blue", "white", "red")) col.br.pal <- col.br(100) plot2 <- surfaceplot(simGaussian, coords_name = c("s1", "s2"), var_name = "z_true", col.pal = col.br.pal) plot2
data(simGaussian) plot1 <- surfaceplot(simGaussian, coords_name = c("s1", "s2"), var_name = "z_true") plot1 # try your favourite color palette col.br <- colorRampPalette(c("blue", "white", "red")) col.br.pal <- col.br(100) plot2 <- surfaceplot(simGaussian, coords_name = c("s1", "s2"), var_name = "z_true", col.pal = col.br.pal) plot2
Make two side-by-side surface plots, particularly useful towards a comparative study of two spatial surfaces.
surfaceplot2( tab, coords_name, var1_name, var2_name, h = 8, col.pal, mark_points = FALSE )
surfaceplot2( tab, coords_name, var1_name, var2_name, h = 8, col.pal, mark_points = FALSE )
tab |
a data-frame containing spatial co-ordinates and the variables to plot |
coords_name |
name of the two columns that contains the co-ordinates of the points |
var1_name |
name of the column containing the first variable to be plotted |
var2_name |
name of the column containing the second variable to be plotted |
h |
integer; (optional) controls smoothness of the spatial interpolation
as appearing in the |
col.pal |
Optional; color palette, preferably divergent, use
|
mark_points |
Logical; if |
a list containing two ggplot
objects
Soumyakanti Pan [email protected],
Sudipto Banerjee [email protected]
data(simGaussian) plots_2 <- surfaceplot2(simGaussian, coords_name = c("s1", "s2"), var1_name = "z_true", var2_name = "y") plots_2
data(simGaussian) plots_2 <- surfaceplot2(simGaussian, coords_name = c("s1", "s2"), var1_name = "z_true", var2_name = "y") plots_2