Multi-scale process modelling and distributed computation for spatial data

Recent years have seen a huge development in spatial modelling and prediction methodology, driven by the increased availability of remote-sensing data and the reduced cost of distributed-processing technology. It is well known that modelling and prediction using infinite-dimensional process models is not possible with large data sets, and that both approximate models and, often, approximate-inference methods, are needed. The problem of fitting simple global spatial models to large data sets has been solved through the likes of multi-resolution approximations and nearest-neighbour techniques. Here we tackle the next challenge, that of fitting complex, nonstationary, multi-scale models to large data sets. We propose doing this through the use of superpositions of spatial processes with increasing spatial scale and increasing degrees of nonstationarity. Computation is facilitated through the use of Gaussian Markov random fields and parallel Markov chain Monte Carlo based on graph colouring. The resulting model allows for both distributed computing and distributed data. Importantly, it provides opportunities for genuine model and data scaleability and yet is still able to borrow strength across large spatial scales. We illustrate a two-scale version on a data set of sea-surface temperature containing on the order of one million observations, and compare our approach to state-of-the-art spatial modelling and prediction methods.

[1]  Noel Cressie,et al.  FRK: An R Package for Spatial and Spatio-Temporal Prediction with Large Datasets , 2017, J. Stat. Softw..

[2]  B. Carlin,et al.  Identifiability and convergence issues for Markov chain Monte Carlo fitting of spatial models. , 2000, Statistics in medicine.

[3]  K. Mardia,et al.  A Bayesian kriged Kalman model for short‐term forecasting of air pollution levels , 2005 .

[4]  Philipp O. J. Scherer Computational Physics: Simulation of Classical and Quantum Systems , 2010 .

[5]  K. Nordhausen,et al.  Blind Source Separation for Spatial Compositional Data , 2015, Mathematical Geosciences.

[6]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[7]  Darren J. Wilkinson,et al.  Parallel Bayesian Computation , 2005 .

[8]  J. Besag,et al.  Bayesian Computation and Stochastic Systems , 1995 .

[9]  Uffe Kjærulff,et al.  Blocking Gibbs sampling in very large probabilistic expert systems , 1995, Int. J. Hum. Comput. Stud..

[10]  D. V. van Dyk,et al.  Partially Collapsed Gibbs Samplers , 2008 .

[11]  Xi Shao,et al.  Suomi NPP VIIRS sensor data record verification, validation, and long‐term performance monitoring , 2013 .

[12]  Andrew O. Finley,et al.  Efficient Algorithms for Bayesian Nearest Neighbor Gaussian Processes , 2017, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[13]  S D.,et al.  Going off grid: Computationally efficient inference for log-Gaussian Cox processes , 2015 .

[14]  Finn Lindgren,et al.  Bayesian Spatial Modelling with R-INLA , 2015 .

[15]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[16]  Jianhua Z. Huang,et al.  A full scale approximation of covariance functions for large spatial data sets , 2012 .

[17]  Noel A Cressie,et al.  Spatio-Temporal Statistics with R , 2019 .

[18]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[19]  Kevin Sahr Location coding on icosahedral aperture 3 hexagon discrete global grids , 2008, Comput. Environ. Urban Syst..

[20]  L. Mark Berliner,et al.  Hierarchical Bayesian Time Series Models , 1996 .

[21]  Visakan Kadirkamanathan,et al.  Data-Driven Spatio-Temporal Modeling Using the Integro-Difference Equation , 2009, IEEE Transactions on Signal Processing.

[22]  Noel A Cressie,et al.  Statistics for Spatio-Temporal Data , 2011 .

[23]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[24]  Christopher S. McMahan,et al.  Sampling Strategies for Fast Updating of Gaussian Markov Random Fields , 2017, The American statistician.

[25]  H. Rue,et al.  On Block Updating in Markov Random Field Models for Disease Mapping , 2002 .

[26]  D. Nychka,et al.  A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets , 2015 .

[27]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[28]  Visakan Kadirkamanathan,et al.  Variational Estimation in Spatiotemporal Systems From Continuous and Point-Process Observations , 2012, IEEE Transactions on Signal Processing.

[29]  A. Gelfand,et al.  On computation using gibbs sampling for multilevel models. , 2001 .

[30]  Matthias Katzfuss,et al.  A Multi-Resolution Approximation for Massive Spatial Datasets , 2015, 1507.04789.

[31]  Georges Gonthier,et al.  Formal Proof—The Four- Color Theorem , 2008 .

[32]  Noel Cressie,et al.  On Statistical Approaches to Generate Level 3 Products from Satellite Remote Sensing Retrievals , 2018, Remote. Sens..

[33]  Edward A. Bender,et al.  A Theoretical Analysis of Backtracking in the Graph Coloring Problem , 1985, J. Algorithms.

[34]  H. Rue,et al.  Fitting Gaussian Markov Random Fields to Gaussian Fields , 2002 .

[35]  Mark Girolami,et al.  Posterior inference for sparse hierarchical non-stationary models , 2018, Comput. Stat. Data Anal..

[36]  Dorit Hammerling,et al.  Parallel inference for massive distributed spatial data using low-rank models , 2017, Stat. Comput..

[37]  Arthur Gretton,et al.  Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees , 2011, AISTATS.

[38]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[39]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[40]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[41]  Jo Eidsvik,et al.  Parameter estimation in high dimensional Gaussian distributions , 2011, Stat. Comput..