Limitations on low rank approximations for covariance matrices of spatial data

Abstract Evaluating the likelihood function for Gaussian models when a spatial process is observed irregularly is problematic for larger datasets due to constraints of memory and calculation. If the covariance structure can be approximated by a diagonal matrix plus a low rank matrix, then both the memory and calculations needed to evaluate the likelihood function are greatly reduced. When neighboring observations are strongly correlated, much of the variation in the observations can be captured by low frequency components, so the low rank approach might be thought to work well in this setting. Through both theory and numerical results, where the diagonal matrix is assumed to be a multiple of the identity, this paper shows that the low rank approximation sometimes performs poorly in this setting. In particular, an approximation in which observations are split into contiguous blocks and independence across blocks is assumed often provides a much better approximation to the likelihood than a low rank approximation requiring similar memory and calculations. An example with satellite-based measurements of total column ozone shows that these results are relevant to real data and that the low rank models also can be highly statistically inefficient for spatial interpolation.

[1]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[2]  Iain Murray,et al.  A framework for evaluating approximation methods for Gaussian process regression , 2012, J. Mach. Learn. Res..

[3]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[4]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[5]  Zhiyi Chi,et al.  Approximating likelihoods for large spatial data sets , 2004 .

[6]  David Ruppert,et al.  Tapered Covariance: Bayesian Estimation and Asymptotics , 2012 .

[7]  Jianhua Z. Huang,et al.  Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors , 2011, 1203.0133.

[8]  Richard L. Smith,et al.  Asymptotic properties of computationally efficient alternative estimators for a class of multivariate normal models , 2007 .

[9]  Jianhua Z. Huang,et al.  A full scale approximation of covariance functions for large spatial data sets , 2012 .

[10]  Andrew O. Finley,et al.  Hierarchical Spatial Process Models for Multiple Traits in Large Genetic Trials , 2010, Journal of the American Statistical Association.

[11]  X. Guyon Parameter estimation for a stationary process on a d-dimensional lattice , 1982 .

[12]  Andrew O. Finley,et al.  Improving the performance of predictive process modeling for large datasets , 2009, Comput. Stat. Data Anal..

[13]  P. Whittle ON STATIONARY PROCESSES IN THE PLANE , 1954 .

[14]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[15]  A. V. Vecchia Estimation and model identification for continuous spatial processes , 1988 .

[16]  Andrew O. Finley,et al.  Norges Teknisk-naturvitenskapelige Universitet Approximate Bayesian Inference for Large Spatial Datasets Using Predictive Process Models Approximate Bayesian Inference for Large Spatial Datasets Using Predictive Process Models , 2022 .

[17]  MIHAI ANITESCU,et al.  A Matrix-free Approach for Solving the Parametric Gaussian Process Maximum Likelihood Problem , 2012, SIAM J. Sci. Comput..

[18]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[20]  Douglas W. Nychka,et al.  Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets , 2008 .

[21]  Michael L. Stein,et al.  Statistical Properties of Covariance Tapers , 2013 .

[22]  E. J. G. Pitman On the behaviour of the characteristic function of a probability distribution in the neighbourhood of the origin , 1968 .

[23]  Michael L. Stein,et al.  Spatial variation of total column ozone on a global scale , 2007, 0709.0394.

[24]  D. Nychka,et al.  Covariance Tapering for Interpolation of Large Spatial Datasets , 2006 .

[25]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[26]  Yoshihiro Yajima,et al.  Fourier analysis of irregularly spaced data on Rd , 2007 .

[27]  Gene H. Golub,et al.  Matrix computations , 1983 .

[28]  Michael L. Stein,et al.  A modeling approach for large spatial datasets , 2008 .

[29]  A. O'Hagan,et al.  Bayesian calibration of computer models , 2001 .

[30]  N. Cressie,et al.  Bayesian hierarchical spatio‐temporal smoothing for very large datasets , 2012 .

[31]  Yoshihiro Yajima,et al.  Fourier analysis of irregularly spaced data on "R"-super-"d" , 2009 .

[32]  S. Sisson,et al.  A comparative review of dimension reduction methods in approximate Bayesian computation , 2012, 1202.3819.

[33]  Petre Stoica,et al.  On maximum likelihood estimation in factor analysis - An algebraic derivation , 2009, Signal Process..

[34]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[35]  M. Fuentes Approximate Likelihood for Large Irregularly Spaced Spatial Data , 2007, Journal of the American Statistical Association.

[36]  Emilio Porcu,et al.  Advances and challenges in space-time modelling of natural events , 2012 .

[37]  Ying Sun,et al.  Geostatistics for Large Datasets , 2012 .

[38]  J. Chilès,et al.  Geostatistics: Modeling Spatial Uncertainty , 1999 .

[39]  R. Dahlhaus,et al.  Edge effects and efficient parameter estimation for stationary random fields , 1987 .

[40]  H. Rue,et al.  In order to make spatial statistics computationally feasible, we need to forget about the covariance function , 2012 .

[41]  J. Weston,et al.  Approximation Methods for Gaussian Process Regression , 2007 .

[42]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .