A Computationally Efficient Projection-Based Approach for Spatial Generalized Linear Mixed Models

ABSTRACT Inference for spatial generalized linear mixed models (SGLMMs) for high-dimensional non-Gaussian spatial data is computationally intensive. The computational challenge is due to the high-dimensional random effects and because Markov chain Monte Carlo (MCMC) algorithms for these models tend to be slow mixing. Moreover, spatial confounding inflates the variance of fixed effect (regression coefficient) estimates. Our approach addresses both the computational and confounding issues by replacing the high-dimensional spatial random effects with a reduced-dimensional representation based on random projections. Standard MCMC algorithms mix well and the reduced-dimensional setting speeds up computations per iteration. We show, via simulated examples, that Bayesian inference for this reduced-dimensional approach works well both in terms of inference as well as prediction; our methods also compare favorably to existing “reduced-rank” approaches. We also apply our methods to two real world data examples, one on bird count data and the other classifying rock types. Supplementary material for this article is available online.

[1]  Jianhua Z. Huang,et al.  A full scale approximation of covariance functions for large spatial data sets , 2012 .

[2]  Daniel J. McDonald,et al.  On the Nyström and Column-Sampling Methods for the Approximate Principal Components Analysis of Large Datasets , 2016, 1602.01120.

[3]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[4]  D. Nychka,et al.  A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets , 2015 .

[5]  Andrew O. Finley,et al.  spBayes for Large Univariate and Multivariate Point-Referenced Spatio-Temporal Data Models , 2013, 1310.8192.

[6]  Jennifer A. Hoeting,et al.  Data augmentation and parameter expansion for independent or spatially correlated ordinal data , 2015, Comput. Stat. Data Anal..

[7]  B. Carlin,et al.  Accelerating Computation in Markov Random Field Models for Spatial Data via Structured MCMC , 2003 .

[8]  Christian P. Robert,et al.  Statistics for Spatio-Temporal Data , 2014 .

[9]  Colin Fox,et al.  Posterior Exploration for Computationally Intensive Forward Models , 2011 .

[10]  Candace Berrett,et al.  Bayesian Spatial Binary Classification , 2014, 1406.3647.

[11]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  J. Besag,et al.  Bayesian image restoration, with two applications in spatial statistics , 1991 .

[14]  James O. Ramsay,et al.  Functional Data Analysis , 2005 .

[15]  A. Getis,et al.  Comparative Spatial Filtering in Regression Analysis , 2002 .

[16]  Murali Haran,et al.  Dimension reduction and alleviation of confounding for spatial generalized linear mixed models , 2010, 1011.6649.

[17]  David Ruppert,et al.  Tapered Covariance: Bayesian Estimation and Asymptotics , 2012 .

[18]  Haran Murali,et al.  Gaussian Random Field Models for Spatial Data , 2011 .

[19]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[20]  Michael L. Stein,et al.  Interpolation of spatial data , 1999 .

[21]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[22]  Clayton V. Deutsch,et al.  Hierarchical object-based stochastic modeling of fluvial reservoirs , 1996 .

[23]  V. Zadnik,et al.  Effects of Residual Smoothing on the Posterior of the Fixed Effects in Disease‐Mapping Models , 2006, Biometrics.

[24]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[25]  N. Cressie,et al.  Hierarchical statistical modeling of big spatial datasets using the exponential family of distributions , 2013 .

[26]  Andrew O. Finley,et al.  Improving the performance of predictive process modeling for large datasets , 2009, Comput. Stat. Data Anal..

[27]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[28]  Mevin B. Hooten,et al.  Restricted spatial regression in practice: geostatistical models, confounding, and robustness under model misspecification , 2015 .

[29]  Joel A. Tropp,et al.  Improved Analysis of the subsampled Randomized Hadamard Transform , 2010, Adv. Data Sci. Adapt. Anal..

[30]  Gareth O. Roberts,et al.  Robust Markov chain Monte Carlo Methods for Spatial Generalized Linear Mixed Models , 2006 .

[31]  David Higdon,et al.  A process-convolution approach to modelling temperatures in the North Atlantic Ocean , 1998, Environmental and Ecological Statistics.

[32]  Conformal Rigidity ELEMENTARY PROOF OF A THEOREM ON , 2016 .

[33]  Larry W. Lake,et al.  Seismic Facies Identification and Classification Using Simple Statistics , 2008 .

[34]  Michael L. Stein,et al.  Limitations on low rank approximations for covariance matrices of spatial data , 2014 .

[35]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[36]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[37]  JOHN J. ELLIOTT,et al.  ORIGIN AND STATUS OF THE HOUSE FINCH IN THE EASTERN UNITED STATES , 2003 .

[38]  Murali Haran,et al.  Markov chain Monte Carlo: Can we trust the third significant figure? , 2007, math/0703746.

[39]  Mohamed-Ali Belabbas,et al.  Spectral methods in machine learning and new strategies for very large datasets , 2009, Proceedings of the National Academy of Sciences.

[40]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[41]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[42]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[43]  R. Adler An introduction to continuity, extrema, and related topics for general Gaussian processes , 1990 .

[44]  P. Diggle,et al.  Model‐based geostatistics , 2007 .

[45]  D. Dunson,et al.  Efficient Gaussian process regression for large datasets. , 2011, Biometrika.

[46]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[47]  Richard A. Frey,et al.  Predictive Inference for Big, Spatial, Non‐Gaussian Data: MODIS Cloud Data and its Change‐of‐Support , 2016 .

[48]  J. Hodges,et al.  Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love , 2010 .

[49]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[50]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[51]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[52]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[53]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[54]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[55]  Candace Berrett,et al.  Data augmentation strategies for the Bayesian spatial probit regression model , 2012, Comput. Stat. Data Anal..