Nonstationary Spatial Modeling of Massive Global Satellite Data

Earth-observing satellite instruments obtain a massive number of observations every day. For example, tens of millions of sea surface temperature (SST) observations on a global scale are collected daily by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. Despite their size, such datasets are incomplete and noisy, necessitating spatial statistical inference to obtain complete, high-resolution fields with quantified uncertainties. Such inference is challenging due to the high computational cost, the nonstationary behavior of environmental processes on a global scale, and land barriers affecting the dependence of SST. In this work, we develop a multi-resolution approximation (M -RA) of a Gaussian process (GP) whose nonstationary, global covariance function is obtained using local fits. The M -RA requires domain partitioning, which can be set up application-specifically. In the SST case, we partition the domain purposefully to account for and weaken dependence across land barriers. Our M -RA implementation is tailored to distributed-memory computation in high-performance-computing environments. We analyze a MODIS SST dataset consisting of more than 43 million observations, to our knowledge the largest dataset ever analyzed using a probabilistic GP model. We show that our nonstationary model based on local fits provides substantially improved predictive performance relative to a stationary approach.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  K. Casey,et al.  Observational Needs of Sea Surface Temperature , 2019, Front. Mar. Sci..

[3]  A. V. Vecchia Estimation and model identification for continuous spatial processes , 1988 .

[4]  F. Bretherton,et al.  A technique for objective analysis and design of oceanographic experiments applied to MODE-73* , 2002 .

[5]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[6]  Jianhua Z. Huang,et al.  A full scale approximation of covariance functions for large spatial data sets , 2012 .

[7]  John D. Farrara,et al.  Blending Sea Surface Temperatures from Multiple Satellites and In Situ Observations for Coastal Oceans , 2009 .

[8]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[9]  Matthias Katzfuss,et al.  A class of multi-resolution approximations for large spatial datasets , 2017, Statistica Sinica.

[10]  Dorit Hammerling,et al.  A Case Study Competition Among Methods for Analyzing Large Spatial Data , 2017, Journal of Agricultural, Biological and Environmental Statistics.

[11]  Jonathan Rougier,et al.  Multi-scale process modelling and distributed computation for spatial data , 2019, Statistics and Computing.

[12]  Christian P. Robert,et al.  Statistics for Spatio-Temporal Data , 2014 .

[13]  A. Finley,et al.  Conjugate Nearest Neighbor Gaussian Process Models for Efficient Statistical Interpolation of Large Spatial Data. , 2019, 1907.10109.

[14]  Matthias Katzfuss,et al.  A Multi-Resolution Approximation for Massive Spatial Datasets , 2015, 1507.04789.

[15]  Matthias Katzfuss,et al.  A General Framework for Vecchia Approximations of Gaussian Processes , 2017, Statistical Science.

[16]  Lewis R. Blake,et al.  Pushing the Limit: A Hybrid Parallel Implementation of the Multi-resolution Approximation for Massive Data , 2019 .

[17]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[18]  E. Armstrong,et al.  A multi-scale high-resolution analysis of global sea surface temperature , 2017 .

[19]  Luca Vogt Statistics For Spatial Data , 2016 .

[20]  Michael L. Stein,et al.  Local likelihood estimation for nonstationary random fields , 2011, J. Multivar. Anal..

[21]  T. M. Chin,et al.  Basin-Scale, High-Wavenumber Sea Surface Wind Fields from a Multiresolution Analysis of Scatterometer Data , 1998 .

[22]  Matthias Katzfuss,et al.  Multi-Resolution Filters for Massive Spatio-Temporal Data , 2018, Journal of Computational and Graphical Statistics.

[23]  Edzer Pebesma,et al.  Spatiotemporal Multi-Resolution Approximations for Analyzing Global Environmental Data , 2020, Spatial Statistics.

[24]  Dorit Hammerling,et al.  A Shallow-Tree Multi-resolution Approximation for Distributed and High-Performance Computing Systems , 2019 .

[25]  Noel A Cressie,et al.  Statistics for Spatial Data, Revised Edition. , 1994 .

[26]  Christopher J Paciorek,et al.  Spatial modelling using a new class of nonstationary covariance functions , 2006, Environmetrics.

[27]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[28]  Veronica J Berrocal,et al.  Identifying regions of inhomogeneities in spatial processes via an M‐RA and mixture priors , 2021, Biometrics.