Efficient Bayesian modeling of large lattice data using spectral properties of Laplacian matrix

Abstract Spatial data observed on a group of areal units is common in scientific applications. The usual hierarchical approach for modeling this kind of dataset is to introduce a spatial random effect with an autoregressive prior. However, the usual Markov chain Monte Carlo scheme for this hierarchical framework requires the spatial effects to be sampled from their full conditional posteriors one-by-one resulting in poor mixing. More importantly, it makes the model computationally inefficient for datasets with large number of units. In this article, we propose a Bayesian approach that uses the spectral structure of the adjacency to construct a low-rank expansion for modeling spatial dependence. We propose a pair of computationally efficient estimation schemes that select the functions most important to capture the variation in response. Through simulation studies, we validate the computational efficiency as well as predictive accuracy of our method. Finally, we present an important real-world application of the proposed methodology on a massive plant abundance dataset from Cape Floristic Region in South Africa.

[1]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[2]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[3]  Jane Elith,et al.  Comparing species abundance models , 2006 .

[4]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[5]  Shanshan Wu,et al.  Building statistical models to analyze species distributions. , 2006, Ecological applications : a publication of the Ecological Society of America.

[6]  A. Gelfand,et al.  Modeling large scale species abundance with latent spatial processes , 2010, 1011.3327.

[7]  Gabriel Taubin,et al.  A signal processing approach to fair surface design , 1995, SIGGRAPH.

[8]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[9]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[10]  Duncan Lee,et al.  CARBayes: An R Package for Bayesian Spatial Modeling with Conditional Autoregressive Priors , 2013 .

[11]  D. Calvetti,et al.  AN IMPLICITLY RESTARTED LANCZOS METHOD FOR LARGE SYMMETRIC EIGENVALUE PROBLEMS , 1994 .

[12]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[13]  Alan E. Gelfand,et al.  On smoothness properties of spatial processes , 2003 .

[14]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[15]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[16]  Fei Liu,et al.  Bayesian Regularization via Graph Laplacian , 2014 .

[17]  J. Besag,et al.  Bayesian image restoration, with two applications in spatial statistics , 1991 .

[18]  Avishek Chakraborty,et al.  Point pattern modelling for degraded presence‐only data over large regions , 2011 .

[19]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[20]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[21]  J. Besag,et al.  On conditional and intrinsic autoregressions , 1995 .

[22]  Harry van Zanten,et al.  Estimating a smooth function on a large graph by Bayesian Laplacian regularisation , 2015 .

[23]  Noel A Cressie,et al.  Uncertainty and Spatial Linear Models for Ecological Data , 2001 .

[24]  Younghwan Namkoong,et al.  Ordinal Classification of Imbalanced Data with Application in Emergency and Disaster Information Services , 2016, IEEE Intelligent Systems.

[25]  Yuan Qi,et al.  EigenGP: Sparse Gaussian process models with data-dependent eigenfunctions , 2013 .

[26]  Merlise A. Clyde,et al.  Rao–Blackwellization for Bayesian Variable Selection and Model Averaging in Linear and Binary Regression: A Novel Data Augmentation Approach , 2011 .

[27]  A. Gelfand,et al.  Explaining Species Distribution Patterns through Hierarchical Modeling , 2006 .

[28]  Tony Rebelo,et al.  Sasol proteas: A field guide to the proteas of southern Africa , 1995 .

[29]  David J. Bartholomew,et al.  Latent Variable Models and Factor Analysis: A Unified Approach , 2011 .

[30]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[31]  Bradley P. Carlin,et al.  Bayesian multivariate areal wombling for multiple disease boundary analysis , 2007 .

[32]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[33]  Melanie M Wall,et al.  Generalized common spatial factor model. , 2003, Biostatistics.

[34]  Mikhail Belkin,et al.  Using Manifold Stucture for Partially Labeled Classification , 2002, NIPS.

[35]  Christel Faes,et al.  Bayesian multi-scale modeling for aggregated disease mapping data , 2017, Statistical methods in medical research.

[36]  D. Higdon Space and Space-Time Modeling using Process Convolutions , 2002 .

[37]  Inyoung Kim,et al.  Bayesian Spatial Multivariate Receptor Modeling for Multisite Multipollutant Data , 2018, Technometrics.