Hierarchical Low Rank Approximation of Likelihoods for Large Spatial Datasets

ABSTRACT Datasets in the fields of climate and environment are often very large and irregularly spaced. To model such datasets, the widely used Gaussian process models in spatial statistics face tremendous challenges due to the prohibitive computational burden. Various approximation methods have been introduced to reduce the computational cost. However, most of them rely on unrealistic assumptions for the underlying process and retaining statistical efficiency remains an issue. We develop a new approximation scheme for maximum likelihood estimation. We show how the composite likelihood method can be adapted to provide different types of hierarchical low rank approximations that are both computationally and statistically efficient. The improvement of the proposed method is explored theoretically; the performance is investigated by numerical and simulation studies; and the practicality is illustrated through applying our methods to two million measurements of soil moisture in the area of the Mississippi River basin, which facilitates a better understanding of the climate variability. Supplementary material for this article is available online.

[1]  Noel A. C. Cressie,et al.  Statistics for Spatial Data: Cressie/Statistics , 1993 .

[2]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[3]  John L. Nazareth,et al.  Conjugate-Gradient Methods , 2009, Encyclopedia of Optimization.

[4]  Michael L. Stein,et al.  Interpolation of spatial data , 1999 .

[5]  Michael L. Stein,et al.  Limitations on low rank approximations for covariance matrices of spatial data , 2014 .

[6]  N. Cressie,et al.  A dimension-reduced approach to space-time Kalman filtering , 1999 .

[7]  Rio Yokota,et al.  Multi-level restricted maximum likelihood covariance estimation and kriging for large non-gridded spatial datasets , 2015, Spatial Statistics.

[8]  Douglas W. Nychka,et al.  Equivalent kriging , 2015 .

[9]  A. V. Vecchia Estimation and model identification for continuous spatial processes , 1988 .

[10]  Ying Sun,et al.  Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets , 2016 .

[11]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[12]  D. Nychka,et al.  A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets , 2015 .

[13]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[14]  Jianhua Z. Huang,et al.  A full scale approximation of covariance functions for large spatial data sets , 2012 .

[15]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[16]  Xiwu Lin,et al.  Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV , 2000 .

[17]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[18]  D. Nychka,et al.  Covariance Tapering for Interpolation of Large Spatial Datasets , 2006 .

[19]  L. Mirsky SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS , 1960 .

[20]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[21]  Leslie Greengard,et al.  Fast Direct Methods for Gaussian Processes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  B. Kozintsev,et al.  Computations With Gaussian Random Fields , 1999 .

[23]  Yi Lin Tensor product space ANOVA models , 2000 .

[24]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[25]  Raymond H. Chan,et al.  Conjugate Gradient Methods for Toeplitz Systems , 1996, SIAM Rev..

[26]  H. Rue,et al.  Fitting Gaussian Markov Random Fields to Gaussian Fields , 2002 .

[27]  Ying Sun,et al.  Geostatistics for Large Datasets , 2012 .

[28]  Zhiyi Chi,et al.  Approximating likelihoods for large spatial data sets , 2004 .

[29]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[30]  Eric F. Wood,et al.  HydroBlocks: a field‐scale resolving land surface model for application over continental extents , 2016 .

[31]  Douglas W. Nychka,et al.  Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets , 2008 .

[32]  Michael L. Stein,et al.  Statistical Properties of Covariance Tapers , 2013 .

[33]  Ronald P. Barry,et al.  Flexible Spatial Models for Kriging and Cokriging Using Moving Averages and the Fast Fourier Transform (FFT) , 2004 .