Estimation and Prediction in Spatial Models With Block Composite Likelihoods

This article develops a block composite likelihood for estimation and prediction in large spatial datasets. The composite likelihood (CL) is constructed from the joint densities of pairs of adjacent spatial blocks. This allows large datasets to be split into many smaller datasets, each of which can be evaluated separately, and combined through a simple summation. Estimates for unknown parameters are obtained by maximizing the block CL function. In addition, a new method for optimal spatial prediction under the block CL is presented. Asymptotic variances for both parameter estimates and predictions are computed using Godambe sandwich matrices. The approach considerably improves computational efficiency, and the composite structure obviates the need to load entire datasets into memory at once, completely avoiding memory limitations imposed by massive datasets. Moreover, computing time can be reduced even further by distributing the operations using parallel computing. A simulation study shows that CL estimates and predictions, as well as their corresponding asymptotic confidence intervals, are competitive with those based on the full likelihood. The procedure is demonstrated on one dataset from the mining industry and one dataset of satellite retrievals. The real-data examples show that the block composite results tend to outperform two competitors; the predictive process model and fixed-rank kriging. Supplementary materials for this article is available online on the journal web site.

[1]  M. Fuentes Approximate Likelihood for Large Irregularly Spaced Spatial Data , 2007, Journal of the American Statistical Association.

[2]  Rüdiger Westermann,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.

[3]  C. C. Heyde,et al.  Quasi-Likelihood and Optimal Estimation, Correspondent Paper , 1987 .

[4]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[5]  Jianhua Z. Huang,et al.  A full scale approximation of covariance functions for large spatial data sets , 2012 .

[6]  A. V. Vecchia Estimation and model identification for continuous spatial processes , 1988 .

[7]  Hong Li,et al.  Efficient Parallelization of the Stochastic Simulation Algorithm for Chemically Reacting Systems On the Graphics Processing Unit , 2010, Int. J. High Perform. Comput. Appl..

[8]  Richard L. Smith,et al.  Asymptotic properties of computationally efficient alternative estimators for a class of multivariate normal models , 2007 .

[9]  Peter X.-K. Song,et al.  Joint composite estimating functions in spatiotemporal models , 2012 .

[10]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Arnaud Doucet,et al.  On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[12]  C. Varin On composite marginal likelihoods , 2008 .

[13]  Peter J. Diggle,et al.  Bayesian Geostatistical Design , 2006 .

[14]  S. Lele,et al.  A Composite Likelihood Approach to Binary Spatial Data , 1998 .

[15]  Dinesh Manocha,et al.  LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[16]  Zhiyi Chi,et al.  Approximating likelihoods for large spatial data sets , 2004 .

[17]  David Ruppert,et al.  Tapered Covariance: Bayesian Estimation and Asymptotics , 2012 .

[18]  Jo Eidsvik,et al.  Parameter estimation in high dimensional Gaussian distributions , 2011, Stat. Comput..

[19]  Cliburn Chan,et al.  Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[20]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[21]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[22]  D. Nychka,et al.  Covariance Tapering for Interpolation of Large Spatial Datasets , 2006 .

[23]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[24]  J. Eidsvik,et al.  Local and Spatial Joint Frequency Uncertainty and its Application to Rock Mass Characterisation , 2009 .

[25]  C. C. Heyde,et al.  Quasi-likelihood and Optimal Estimation , 2010 .

[26]  V. P. Godambe An Optimum Property of Regular Maximum Likelihood Estimation , 1960 .

[27]  Subhash R. Lele,et al.  A composite likelihood approach to semivariogram estimation , 1999 .

[28]  Harry Joe,et al.  Composite Likelihood Methods , 2012 .

[29]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[30]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[31]  Jarad Niemi,et al.  Efficient Bayesian inference in stochastic chemical kinetic models using graphical processing units , 2011, 1101.4242.

[32]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[33]  D. Owen Handbook of Mathematical Functions with Formulas , 1965 .

[34]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[35]  Marc A. Suchard,et al.  Many-core algorithms for statistical phylogenetics , 2009, Bioinform..

[36]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[37]  D. Zimmerman,et al.  Towards reconciling two asymptotic frameworks in spatial statistics , 2005 .

[38]  K. Mardia,et al.  Maximum likelihood estimation of models for residual covariance in spatial regression , 1984 .

[39]  F. Lindgren,et al.  Spatial models generated by nested stochastic partial differential equations, with an application to global ozone mapping , 2011, 1104.3436.

[40]  Douglas W. Nychka,et al.  Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets , 2008 .

[41]  採編典藏組 Society for Industrial and Applied Mathematics(SIAM) , 2008 .

[42]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[43]  Jorge Mateu,et al.  Estimating Space and Space-Time Covariance Functions for Large Data Sets: A Weighted Composite Likelihood Approach , 2012 .

[44]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields; The SPDE approach , 2010 .

[45]  Michael L. Stein,et al.  A modeling approach for large spatial datasets , 2008 .

[46]  Andrew O. Finley,et al.  Improving the performance of predictive process modeling for large datasets , 2009, Comput. Stat. Data Anal..

[47]  N. Higham Functions Of Matrices , 2008 .