Vecchia Approximations of Gaussian-Process Predictions

Gaussian processes (GPs) are highly flexible function estimators used for geospatial analysis, nonparametric regression, and machine learning, but they are computationally infeasible for large datasets. Vecchia approximations of GPs have been used to enable fast evaluation of the likelihood for parameter inference. Here, we study Vecchia approximations of spatial predictions at observed and unobserved locations, including obtaining joint predictive distributions at large sets of locations. We consider a general Vecchia framework for GP predictions, which contains some novel and some existing special cases. We study the accuracy and computational properties of these approaches theoretically and numerically, proving that our new methods exhibit linear computational complexity in the total number of spatial locations. We show that certain choices within the framework can have a strong effect on uncertainty quantification and computational cost, which leads to specific recommendations on which methods are most suitable for various settings. We also apply our methods to a satellite dataset of chlorophyll fluorescence, showing that the new methods are faster or more accurate than existing methods and reduce unrealistic artifacts in prediction maps. Supplementary materials accompanying this paper appear on-line.

[1]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[2]  A. V. Vecchia A New Method of Prediction for Spatial Regression Models with Correlated Errors , 1992 .

[3]  A. V. Vecchia Estimation and model identification for continuous spatial processes , 1988 .

[4]  W. F. Tinney,et al.  On computing certain elements of the inverse of a sparse matrix , 1975, Commun. ACM.

[5]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[6]  Daniel W. Apley,et al.  Local Gaussian Process Approximation for Large Computer Experiments , 2013, 1303.0383.

[7]  Matthias Katzfuss,et al.  Multi-Resolution Filters for Massive Spatio-Temporal Data , 2018, Journal of Computational and Graphical Statistics.

[8]  Joseph Guinness Permutation Methods for Sharpening Gaussian Process Approximations , 2016 .

[9]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[10]  Andrew O. Finley,et al.  Efficient Algorithms for Bayesian Nearest Neighbor Gaussian Processes , 2017, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[11]  Matthias Katzfuss,et al.  A class of multi-resolution approximations for large spatial datasets , 2017, Statistica Sinica.

[12]  D. Nychka,et al.  A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets , 2015 .

[13]  Eric Darve,et al.  Computing entries of the inverse of a sparse matrix using the FIND algorithm , 2008, J. Comput. Phys..

[14]  Zhiyi Chi,et al.  Approximating likelihoods for large spatial data sets , 2004 .

[15]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[16]  Hsin-Cheng Huang,et al.  Resolution Adaptive Fixed Rank Kriging , 2018, Technometrics.

[17]  Brandon C. Kelly,et al.  FLEXIBLE AND SCALABLE METHODS FOR QUANTIFYING STOCHASTIC VARIABILITY IN THE ERA OF MASSIVE TIME-DOMAIN ASTRONOMICAL DATA SETS , 2014, 1402.5978.

[18]  D. Nychka,et al.  Covariance Tapering for Interpolation of Large Spatial Datasets , 2006 .

[19]  Lexing Ying,et al.  SelInv---An Algorithm for Selected Inversion of a Sparse Symmetric Matrix , 2011, TOMS.

[20]  Andrew O. Finley,et al.  Improving the performance of predictive process modeling for large datasets , 2009, Comput. Stat. Data Anal..

[21]  Jianhua Z. Huang,et al.  Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors , 2011, 1203.0133.

[22]  Douglas W. Nychka,et al.  Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets , 2008 .

[23]  James V. Zidek,et al.  Statistical Analysis of Environmental Space-Time Processes , 2006 .

[24]  Zoubin Ghahramani,et al.  Local and global sparse Gaussian process approximations , 2007, AISTATS.

[25]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[26]  M. Katzfuss,et al.  A General Framework for Vecchia Approximations of Gaussian Processes , 2017, 1708.06302.

[27]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[28]  Matthias Katzfuss,et al.  Spatio‐temporal smoothing and EM estimation for massive remote‐sensing data sets , 2011 .

[29]  Houman Owhadi,et al.  Sparse Cholesky factorization by Kullback-Leibler minimization , 2020, SIAM J. Sci. Comput..

[30]  Christian P. Robert,et al.  Statistics for Spatio-Temporal Data , 2014 .

[31]  C. Frankenberg,et al.  OCO-2 advances photosynthesis observation from space via solar-induced chlorophyll fluorescence , 2017, Science.

[32]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[33]  David Higdon,et al.  A process-convolution approach to modelling temperatures in the North Atlantic Ocean , 1998, Environmental and Ecological Statistics.

[34]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[35]  Subhash R. Lele,et al.  A composite likelihood approach to semivariogram estimation , 1999 .

[36]  Pavlos Protopapas,et al.  NONPARAMETRIC BAYESIAN ESTIMATION OF PERIODIC LIGHT CURVES , 2011, 1111.1315.

[37]  V. Mandrekar,et al.  Fixed-domain asymptotic properties of tapered maximum likelihood estimators , 2009, 0909.0359.

[38]  Florian Schäfer,et al.  Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity , 2017, Multiscale Model. Simul..

[39]  A. O'Hagan,et al.  Bayesian calibration of computer models , 2001 .

[40]  Jo Eidsvik,et al.  Estimation and Prediction in Spatial Models With Block Composite Likelihoods , 2014 .

[41]  Michael L. Stein,et al.  Bayesian and Maximum Likelihood Estimation for Gaussian Processes on an Incomplete Lattice , 2014, 1402.4281.

[42]  Joseph Guinness,et al.  Spectral density estimation for random fields via periodic embeddings. , 2017, Biometrika.

[43]  Joseph Guinness,et al.  Permutation and Grouping Methods for Sharpening Gaussian Process Approximations , 2016, Technometrics.

[44]  Earl Lawrence,et al.  Scaled Vecchia approximation for fast computer-model emulation , 2020 .

[45]  C. Frankenberg,et al.  Overview of Solar-Induced chlorophyll Fluorescence (SIF) from the Orbiting Carbon Observatory-2: Retrieval, cross-mission comparison, and global monitoring for GPP , 2018 .

[46]  Dorit Hammerling,et al.  A Case Study Competition Among Methods for Analyzing Large Spatial Data , 2017, Journal of Agricultural, Biological and Environmental Statistics.

[47]  Jianhua Z. Huang,et al.  Smoothed Full-Scale Approximation of Gaussian Process Models for Computation of Large Spatial Datasets , 2019, Statistica Sinica.

[48]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[49]  N. Cressie,et al.  A dimension-reduced approach to space-time Kalman filtering , 1999 .

[50]  Matthias Katzfuss,et al.  A Multi-Resolution Approximation for Massive Spatial Datasets , 2015, 1507.04789.

[51]  Ying Sun,et al.  Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets , 2016 .

[52]  D. Lobell,et al.  Improving the monitoring of crop productivity using spaceborne solar‐induced fluorescence , 2016, Global change biology.

[53]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[54]  Daniel Foreman-Mackey,et al.  Fast and Scalable Gaussian Process Modeling with Applications to Astronomical Time Series , 2017, 1703.09710.