Improving the performance of predictive process modeling for large datasets

Advances in Geographical Information Systems (GIS) and Global Positioning Systems (GPS) enable accurate geocoding of locations where scientific data are collected. This has encouraged collection of large spatial datasets in many fields and has generated considerable interest in statistical modeling for location-referenced spatial data. The setting where the number of locations yielding observations is too large to fit the desired hierarchical spatial random effects models using Markov chain Monte Carlo methods is considered. This problem is exacerbated in spatial-temporal and multivariate settings where many observations occur at each location. The recently proposed predictive process, motivated by kriging ideas, aims to maintain the richness of desired hierarchical spatial modeling specifications in the presence of large datasets. A shortcoming of the original formulation of the predictive process is that it induces a positive bias in the non-spatial error term of the models. A modified predictive process is proposed to address this problem. The predictive process approach is knot-based leading to questions regarding knot design. An algorithm is designed to achieve approximately optimal spatial placement of knots. Detailed illustrations of the modified predictive process using multivariate spatial regression with both a simulated and a real dataset are offered.

[1]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[2]  C. F. Sirmans,et al.  Nonstationary multivariate process modeling through spatially varying coregionalization , 2004 .

[3]  Douglas W. Nychka,et al.  Design of Air-Quality Monitoring Networks , 1998 .

[4]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[5]  Hans Wackernagel,et al.  Multivariate Geostatistics: An Introduction with Applications , 1996 .

[6]  Sudipto Banerjee,et al.  Hierarchical spatial modeling of additive and dominance genetic variance for large spatial trial datasets. , 2009, Biometrics.

[7]  Dale L. Zimmerman,et al.  Optimal network design for spatial prediction, covariance parameter estimation, and empirical prediction , 2006 .

[8]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[9]  Alex B. McBratney,et al.  The design of optimal sampling schemes for local estimation and mapping of of regionalized variables—I: Theory and method , 1981 .

[10]  J. C. Cain,et al.  Summary and future work , 1973 .

[11]  K. Ritter Asymptotic optimality of regular sequence designs , 1996 .

[12]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[13]  Alan R. Ek,et al.  Bayesian multivariate process modeling for prediction of forest attributes , 2008 .

[14]  Alex B. McBratney,et al.  The design of optimal sampling schemes for local estimation and mapping of regionalized variables—II: Program and examples☆ , 1981 .

[15]  Peter J. Diggle,et al.  Bayesian Geostatistical Design , 2006 .

[16]  Limin Yang,et al.  Derivation of a tasselled cap transformation based on Landsat 7 at-satellite reflectance , 2002 .

[17]  Alan E. Gelfand,et al.  Approximately optimal spatial design approaches for environmental health data , 2006 .

[18]  Zhengyuan Zhu,et al.  Spatial sampling design for parameter estimation of the covariance function , 2005 .

[19]  Dankmar Böhning,et al.  Estimating the hidden number of scrapie affected holdings in Great Britain using a simple, truncated count model allowing for heterogeneity , 2008 .

[20]  R. Munn,et al.  The Design of Air Quality Monitoring Networks , 1981 .

[21]  Michael L. Stein,et al.  Interpolation of spatial data , 1999 .