Competition on Spatial Statistics for Large Datasets

As spatial datasets are becoming increasingly large and unwieldy, exact inference on spatial models becomes computationally prohibitive. Various approximation methods have been proposed to reduce the computational burden. Although comprehensive reviews on these approximation methods exist, comparisons of their performances are limited to small and medium sizes of datasets for a few selected methods. To achieve a comprehensive comparison comprising as many methods as possible, we organized the Competition on Spatial Statistics for Large Datasets. This competition had the following novel features: (1) we generated synthetic datasets with the ExaGeoStat software so that the number of generated realizations ranged from 100 thousand to 1 million; (2) we systematically designed the data-generating models to represent spatial processes with a wide range of statistical properties for both Gaussian and non-Gaussian cases; (3) the competition tasks included both estimation and prediction, and the results were assessed by multiple criteria; and (4) we have made all the datasets and competition results publicly available to serve as a benchmark for other approximation methods. In this paper, we disclose all the competition details and results along with some analysis of the competition outcomes.

[1]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[2]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[3]  Marc G. Genton,et al.  ExaGeoStat: A High Performance Unified Framework for Geostatistics on Manycore Systems , 2017, ArXiv.

[4]  Ying Sun,et al.  Geostatistics for Large Datasets , 2012 .

[5]  Matthias Katzfuss,et al.  A Multi-Resolution Approximation for Massive Spatial Datasets , 2015, 1507.04789.

[6]  Evan J. Englund,et al.  A variance of geostatisticians , 1990 .

[7]  David E. Keyes,et al.  Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[8]  D. Nychka,et al.  Covariance Tapering for Interpolation of Large Spatial Datasets , 2006 .

[9]  Ying Sun,et al.  Efficiency assessment of approximated spatial predictions for large datasets , 2019, 1911.04109.

[10]  A. V. Vecchia Estimation and model identification for continuous spatial processes , 1988 .

[11]  C. Varin On composite marginal likelihoods , 2008 .

[12]  Marc G. Genton,et al.  Geostatistical Modeling and Prediction Using Mixed Precision Tile Cholesky Factorization , 2019, 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC).

[13]  David E. Keyes,et al.  ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[14]  Jonathan R. Bradley,et al.  A comparison of spatial predictors when datasets could be very large , 2014, 1410.7748.

[15]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[16]  Anthony N. Pettitt,et al.  Comment on the paper: ‘Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations’ by Rue, H. Martino, S. and Chopin, N. , 2009 .

[17]  Douglas W. Nychka,et al.  Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets , 2008 .

[18]  Marc G. Genton,et al.  Tukey g-and-h Random Fields , 2017 .

[19]  Alexander Litvinenko,et al.  Likelihood approximation with hierarchical matrices for large spatial datasets , 2017, Comput. Stat. Data Anal..

[20]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[21]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[22]  David E. Keyes,et al.  ExaGeoStatR: A Package for Large-Scale Geostatistics in R , 2019, ArXiv.

[23]  Dorit Hammerling,et al.  A Case Study Competition Among Methods for Analyzing Large Spatial Data , 2017, Journal of Agricultural, Biological and Environmental Statistics.

[24]  Jianhua Z. Huang,et al.  A full scale approximation of covariance functions for large spatial data sets , 2012 .