Methods for Analyzing Large Spatial Data: A Review and Comparison

The Gaussian process is an indispensable tool for spatial data analysts. The onset of the "big data" era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each which was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online.

[1]  Jianhua Z. Huang,et al.  A full scale approximation of covariance functions for large spatial data sets , 2012 .

[2]  Dorit Hammerling,et al.  Parallel inference for massive distributed spatial data using low-rank models , 2017, Stat. Comput..

[3]  Sudipto Banerjee,et al.  Web Appendix: Meta-Kriging: Scalable Bayesian Modeling and Inference for Massive Spatial Datasets , 2018 .

[4]  Noel A Cressie,et al.  Statistics for Spatio-Temporal Data , 2011 .

[5]  Stanislav Minsker Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[6]  N. Cressie,et al.  Bayesian Inference for the Spatial Random Effects Model , 2011 .

[7]  M. Anitescu,et al.  STOCHASTIC APPROXIMATION OF SCORE FUNCTIONS FOR GAUSSIAN PROCESSES , 2013, 1312.2687.

[8]  Jo Eidsvik,et al.  Estimation and Prediction in Spatial Models With Block Composite Likelihoods , 2014 .

[9]  Sudipto Banerjee,et al.  On nearest‐neighbor Gaussian process models for massive spatial data , 2016, Wiley interdisciplinary reviews. Computational statistics.

[10]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[11]  H. Rue,et al.  In order to make spatial statistics computationally feasible, we need to forget about the covariance function , 2012 .

[12]  Matthias Katzfuss,et al.  A class of multi-resolution approximations for large spatial datasets , 2017, Statistica Sinica.

[13]  Alan E Gelfand,et al.  A multivariate spatial mixture model for areal data: examining regional differences in standardized test scores , 2014, Journal of the Royal Statistical Society. Series C, Applied statistics.

[14]  Stephan R. Sain,et al.  spam: A Sparse Matrix R Package with Emphasis on MCMC Methods for Gaussian Markov Random Fields , 2010 .

[15]  Prabhat,et al.  Parallelizing Gaussian Process Calculations in R , 2013, ArXiv.

[16]  A. V. Vecchia Estimation and model identification for continuous spatial processes , 1988 .

[17]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[18]  M. Fuentes Approximate Likelihood for Large Irregularly Spaced Spatial Data , 2007, Journal of the American Statistical Association.

[19]  Michael L. Stein,et al.  Limitations on low rank approximations for covariance matrices of spatial data , 2014 .

[20]  Peter M. Atkinson,et al.  An effective approach for gap-filling continental scale remotely sensed time-series , 2014, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[21]  F. Liang,et al.  A Resampling-Based Stochastic Approximation Method for Analysis of Large Geostatistical Data , 2013 .

[22]  X. Emery The kriging update equations and their application to the selection of neighboring data , 2009 .

[23]  Craig Anderson,et al.  Identifying clusters in Bayesian disease mapping. , 2013, Biostatistics.

[24]  Robert B. Gramacy,et al.  laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R , 2016 .

[25]  Andrew O. Finley,et al.  Applying Nearest Neighbor Gaussian Processes to Massive Spatial Data Sets: Forest Canopy Height Prediction Across Tanana Valley Alaska , 2017 .

[26]  Juan Du,et al.  Asymptotic properties of multivariate tapering for estimation and prediction , 2015, J. Multivar. Anal..

[27]  X. Guyon Parameter estimation for a stationary process on a d-dimensional lattice , 1982 .

[28]  V. Mandrekar,et al.  Fixed-domain asymptotic properties of tapered maximum likelihood estimators , 2009, 0909.0359.

[29]  P. Whittle ON STATIONARY PROCESSES IN THE PLANE , 1954 .

[30]  Douglas W. Nychka,et al.  Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets , 2008 .

[31]  Michael L. Stein,et al.  Statistical Properties of Covariance Tapers , 2013 .

[32]  Matthias Katzfuss,et al.  Spatio‐temporal smoothing and EM estimation for massive remote‐sensing data sets , 2011 .

[33]  Rio Yokota,et al.  Multi-level restricted maximum likelihood covariance estimation and kriging for large non-gridded spatial datasets , 2015, Spatial Statistics.

[34]  Jean-Francois Ton,et al.  Spatial mapping with Gaussian processes and nonstationary Fourier features , 2017, Spatial statistics.

[35]  M. Fuentes,et al.  Circulant Embedding of Approximate Covariances for Inference From Gaussian Data on Large Lattices , 2017 .

[36]  Robert B. Gramacy,et al.  Massively parallel approximate Gaussian process regression , 2013, SIAM/ASA J. Uncertain. Quantification.

[37]  D. Nychka,et al.  A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets , 2015 .

[38]  Daniel W. Apley,et al.  Local Gaussian Process Approximation for Large Computer Experiments , 2013, 1303.0383.

[39]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[40]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[41]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[42]  Zhiyi Chi,et al.  Approximating likelihoods for large spatial data sets , 2004 .

[43]  M. Bevilacqua,et al.  Estimation and prediction using generalized Wendland covariance functions under fixed domain asymptotics , 2016, The Annals of Statistics.

[44]  Joseph Guinness,et al.  Spectral density estimation for random fields via periodic embeddings. , 2019, Biometrika.

[45]  Ying Sun,et al.  Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets , 2016 .

[46]  L Knorr-Held,et al.  Bayesian Detection of Clusters and Discontinuities in Disease Maps , 2000, Biometrics.

[47]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[48]  Bani K. Mallick,et al.  Adaptive Bayesian Nonstationary Modeling for Large Spatial Datasets Using Covariance Approximations , 2014 .

[49]  Matthias Katzfuss,et al.  A Multi-Resolution Approximation for Massive Spatial Datasets , 2015, 1507.04789.

[50]  D. Nychka,et al.  Covariance Tapering for Interpolation of Large Spatial Datasets , 2006 .

[51]  Jianhua Z. Huang,et al.  Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors , 2011, 1203.0133.

[52]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[53]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[54]  Márcia Helena Barbian,et al.  Spatial subsemble estimator for large geostatistical data , 2017 .

[55]  Carol A. Gotway,et al.  Statistical Methods for Spatial Data Analysis , 2004 .

[56]  Andrew O. Finley,et al.  Spatial Factor Models for High-Dimensional and Large Spatial Data: An Application in Forest Variable Mapping. , 2018, Statistica Sinica.

[57]  Michael E. Schaepman,et al.  Predicting Missing Values in Spatio-Temporal Remote Sensing Data , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[58]  Alexander Litvinenko,et al.  Likelihood approximation with hierarchical matrices for large spatial datasets , 2017, Comput. Stat. Data Anal..

[59]  Jonathan R. Bradley,et al.  A comparison of spatial predictors when datasets could be very large , 2014, 1410.7748.

[60]  N. Hamm,et al.  NONSEPARABLE DYNAMIC NEAREST NEIGHBOR GAUSSIAN PROCESS MODELS FOR LARGE SPATIO-TEMPORAL DATA WITH AN APPLICATION TO PARTICULATE MATTER ANALYSIS. , 2015, The annals of applied statistics.

[61]  B. Mallick,et al.  Analyzing Nonstationary Spatial Data Using Piecewise Gaussian Processes , 2005 .

[62]  David B. Dunson,et al.  Robust and Scalable Bayes via a Median of Subset Posterior Measures , 2014, J. Mach. Learn. Res..

[63]  Robert B. Gramacy,et al.  Speeding Up Neighborhood Search in Local Gaussian Process Prediction , 2014, Technometrics.

[64]  Toshihiro Hirano,et al.  Covariance tapering for prediction of large spatial data sets in transformed random fields , 2013 .

[65]  William F. Christensen,et al.  Nonstationary Gaussian Process Models Using Spatial Hierarchical Clustering from Finite Differences , 2017, Technometrics.

[66]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[67]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[68]  Cheng Li,et al.  A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging , 2017, 1712.09767.