NONSEPARABLE DYNAMIC NEAREST NEIGHBOR GAUSSIAN PROCESS MODELS FOR LARGE SPATIO-TEMPORAL DATA WITH AN APPLICATION TO PARTICULATE MATTER ANALYSIS.

Particulate matter (PM) is a class of malicious environmental pollutants known to be detrimental to human health. Regulatory efforts aimed at curbing PM levels in different countries often require high resolution space-time maps that can identify red-flag regions exceeding statutory concentration limits. Continuous spatio-temporal Gaussian Process (GP) models can deliver maps depicting predicted PM levels and quantify predictive uncertainty. However, GP-based approaches are usually thwarted by computational challenges posed by large datasets. We construct a novel class of scalable Dynamic Nearest Neighbor Gaussian Process (DNNGP) models that can provide a sparse approximation to any spatio-temporal GP (e.g., with nonseparable covariance structures). The DNNGP we develop here can be used as a sparsity-inducing prior for spatio-temporal random effects in any Bayesian hierarchical model to deliver full posterior inference. Storage and memory requirements for a DNNGP model are linear in the size of the dataset, thereby delivering massive scalability without sacrificing inferential richness. Extensive numerical studies reveal that the DNNGP provides substantially superior approximations to the underlying process than low-rank approximations. Finally, we use the DNNGP to analyze a massive air quality dataset to substantially improve predictions of PM levels across Europe in conjunction with the LOTOS-EUROS chemistry transport models (CTMs).

[1]  P. Pfeifer,et al.  Stationarity and invertibility regions for low order starma models , 1980 .

[2]  P. Pfeifer,et al.  Independence and sphericity tests for the residuals of space-time arma models , 1980 .

[3]  David S. Stoffer,et al.  Estimation and Identification of Space-Time ARMAX Models in the Presence of Missing Data , 1986 .

[4]  A. V. Vecchia Estimation and model identification for continuous spatial processes , 1988 .

[5]  A. V. Vecchia A New Method of Prediction for Spatial Regression Models with Correlated Errors , 1992 .

[6]  M. Green Air pollution and health , 1995 .

[7]  Richard H. Jones,et al.  Models for Continuous Stationary Space-Time Processes , 1997 .

[8]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[9]  Alan E. Gelfand,et al.  Model choice: A minimum posterior predictive loss approach , 1998, AISTATS.

[10]  Phaedon C. Kyriakidis,et al.  Geostatistical Space–Time Models: A Review , 1999 .

[11]  N. Cressie,et al.  Classes of nonseparable, spatio-temporal stationary covariance functions , 1999 .

[12]  Ozgur Yeniay,et al.  A comparison of partial least squares regression with other prediction methods , 2001 .

[13]  Jonathan R. Stroud,et al.  Dynamic models for spatiotemporal data , 2001 .

[14]  D. Higdon Space and Space-Time Modeling using Process Convolutions , 2002 .

[15]  T. Gneiting Nonseparable, Stationary Covariance Functions for Space–Time Data , 2002 .

[16]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[17]  Chris A. Glasbey,et al.  A latent Gaussian Markov random‐field model for spatiotemporal rainfall disaggregation , 2003 .

[18]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[19]  M. Wand,et al.  Geoadditive models , 2003 .

[20]  Zhiyi Chi,et al.  Approximating likelihoods for large spatial data sets , 2004 .

[21]  P. Atkinson,et al.  Increased accuracy of geostatistical prediction of nitrogen dioxide in the United Kingdom with secondary data , 2004 .

[22]  C. F. Sirmans,et al.  Nonstationary multivariate process modeling through spatially varying coregionalization , 2004 .

[23]  L. Held,et al.  Gaussian Markov Random Fields: Theory And Applications (Monographs on Statistics and Applied Probability) , 2005 .

[24]  Alan E. Gelfand,et al.  Spatial process modelling for univariate and multivariate dynamic spatial data , 2005 .

[25]  M. Stein Space–Time Covariance Functions , 2005 .

[26]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[27]  Jan van de Kassteele,et al.  A model for external drift kriging with uncertain covariates applied to air quality measurements and dispersion model output , 2006 .

[28]  Christopher J Paciorek,et al.  Spatial modelling using a new class of nonstationary covariance functions , 2006, Environmetrics.

[29]  D. Nychka,et al.  Covariance Tapering for Interpolation of Large Spatial Datasets , 2006 .

[30]  Bruno Sansó,et al.  Dynamic Models for Spatio-Temporal Data , 2007 .

[31]  P. Guttorp,et al.  Geostatistical Space-Time Models, Stationarity, Separability, and Full Symmetry , 2007 .

[32]  Michael L. Stein,et al.  Spatial variation of total column ozone on a global scale , 2007, 0709.0394.

[33]  P. Diggle,et al.  Bivariate Binomial Spatial Modeling of Loa loa Prevalence in Tropical Africa , 2008 .

[34]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[35]  J. Møller,et al.  Handbook of Spatial Statistics , 2008 .

[36]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[37]  Bruce Denby,et al.  Comparison of two data assimilation methods for assessing PM10 exceedances on the European scale , 2008 .

[38]  Michael L. Stein,et al.  A modeling approach for large spatial datasets , 2008 .

[39]  Renske Timmermans,et al.  The LOTOS?EUROS model: description, validation and latest developments , 2008 .

[40]  Alma Hodzic,et al.  A model inter-comparison study focussing on episodes with elevated PM10 concentrations , 2008 .

[41]  Albert Ansmann,et al.  A case of extreme particulate matter concentrations over Central Europe caused by dust emitted over the southern Ukraine , 2008 .

[42]  Douglas W. Nychka,et al.  Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets , 2008 .

[43]  Martijn Schaap,et al.  Testing the capability of the chemistry transport model LOTOS-EUROS to forecast PM10 levels in the Netherlands , 2009 .

[44]  V. Mandrekar,et al.  Fixed-domain asymptotic properties of tapered maximum likelihood estimators , 2009, 0909.0359.

[45]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[46]  Sudipto Banerjee,et al.  HIERARCHICAL SPATIAL MODELS FOR PREDICTING TREE SPECIES ASSEMBLAGES ACROSS LARGE DOMAINS. , 2009, The annals of applied statistics.

[47]  Harald Flentje,et al.  Coupling global chemistry transport models to ECMWF’s integrated forecast system , 2009 .

[48]  Peter Guttorp,et al.  Continuous Parameter Spatio-Temporal Processes , 2010 .

[49]  N. Cressie,et al.  Fixed Rank Filtering for Spatio-Temporal Data , 2010 .

[50]  B. Denby,et al.  Spatial mapping of ozone and SO2 trends in Europe. , 2010, The Science of the total environment.

[51]  Tapering spatio temporal models , 2011 .

[52]  Jianhua Z. Huang,et al.  A full scale approximation of covariance functions for large spatial data sets , 2012 .

[53]  A. Cohen,et al.  Exposure assessment for estimation of the global burden of disease attributable to outdoor air pollution. , 2012, Environmental science & technology.

[54]  N. Cressie,et al.  Bayesian hierarchical spatio‐temporal smoothing for very large datasets , 2012 .

[55]  Michael D. Moran,et al.  Comparing emission inventories and model-ready emission datasets between Europe and North America for the AQMEII project , 2012 .

[56]  B. Brunekreef,et al.  Spatial variation of PM2.5, PM10, PM2.5 absorbance and PMcoarse concentrations between and within 20 European study areas and the relationship with NO2 : results of the ESCAPE project , 2012 .

[57]  Lieven Clarisse,et al.  Exceptional emissions of NH 3 and HCOOH in the 2010 Russian wildfires , 2012 .

[58]  Alan E. Gelfand,et al.  Bayesian dynamic modeling for large space-time datasets using Gaussian predictive processes , 2012, J. Geogr. Syst..

[59]  David Ruppert,et al.  Tapered Covariance: Bayesian Estimation and Asymptotics , 2012 .

[60]  Jorge Mateu,et al.  Estimating Space and Space-Time Covariance Functions for Large Data Sets: A Weighted Composite Likelihood Approach , 2012 .

[61]  Emilio Porcu,et al.  Tapering Space-Time Covariance Functions , 2013 .

[62]  Hugo Denier van der Gon,et al.  The origin of ambient particulate matter concentrations in the Netherlands , 2013 .

[63]  A. Peters,et al.  Long-term air pollution exposure and cardio- respiratory mortality: a review , 2013, Environmental Health.

[64]  Claudio Carnevale,et al.  A comparison of reanalysis techniques: applying optimal interpolation and Ensemble Kalman Filtering to improve air quality monitoring at mesoscale. , 2013, The Science of the total environment.

[65]  M. Stein On a class of space–time intrinsic random functions , 2013, 1303.4620.

[66]  A. Segers,et al.  Sensitivity of air pollution simulations with LOTOS-EUROS to the temporal distribution of anthropogenic emissions , 2013 .

[67]  Kurt Straif,et al.  The carcinogenicity of outdoor air pollution. , 2013, The Lancet Oncology.

[68]  Daniel W. Apley,et al.  Local Gaussian Process Approximation for Large Computer Experiments , 2013, 1303.0383.

[69]  Christian P. Robert,et al.  Statistics for Spatio-Temporal Data , 2014 .

[70]  Faming Liang,et al.  A BAYESIAN SPATIO-TEMPORAL GEOSTATISTICAL MODEL WITH AN AUXILIARY LATTICE FOR LARGE DATASETS , 2014 .

[71]  Jo Eidsvik,et al.  Estimation and Prediction in Spatial Models With Block Composite Likelihoods , 2014 .

[72]  Michael L. Stein,et al.  Limitations on low rank approximations for covariance matrices of spatial data , 2014 .

[73]  Alfred Stein,et al.  A spatially varying coefficient model for mapping PM10 air quality at the European scale , 2015 .

[74]  Matthias Katzfuss,et al.  A Multi-Resolution Approximation for Massive Spatial Datasets , 2015, 1507.04789.

[75]  Emilio Porcu,et al.  Covariance tapering for multivariate Gaussian random fields estimation , 2016, Stat. Methods Appl..

[76]  Mohsen Mohammadzadeh,et al.  A new method to build spatio-temporal covariance functions: analysis of ozone data , 2016 .

[77]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[78]  SUPPLEMENT: NON-SEPARABLE DYNAMIC NEAREST-NEIGHBOR GAUSSIAN PROCESS MODELS FOR LARGE SPATIO-TEMPORAL DATA WITH AN APPLICATION TO PARTICULATE MATTER ANALYSIS , 2016 .