Gaussian Orthogonal Latent Factor Processes for Large Incomplete Matrices of Correlated Data

We introduce the Gaussian orthogonal latent factor processes for modeling and predicting large correlated data. To handle the computational challenge, we first decompose the likelihood function of the Gaussian random field with multi-dimensional input domain into a product of densities at the orthogonal components with lower dimensional inputs. The continuous-time Kalman filter is implemented to efficiently compute the likelihood function without making approximation. We also show that the posterior distribution of the factor processes are independent, as a consequence of prior independence of factor processes and orthogonal factor loading matrix. For studies with a large sample size, we propose a flexible way to model the mean in the model and derive the closed-form marginal posterior distribution. Both simulated and real data applications confirm the outstanding performance of this method.

[1]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[2]  Robert B. Gramacy,et al.  laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R , 2016 .

[3]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[4]  Yanxun Xu,et al.  Fast Nonseparable Gaussian Stochastic Process With Application to Methylation Level Interpolation , 2017, Journal of Computational and Graphical Statistics.

[5]  M. Stein,et al.  A Bayesian analysis of kriging , 1993 .

[6]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[7]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[8]  A. V. Vecchia Estimation and model identification for continuous spatial processes , 1988 .

[9]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[10]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[11]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[12]  C. F. Sirmans,et al.  Nonstationary multivariate process modeling through spatially varying coregionalization , 2004 .

[13]  A. O'Hagan,et al.  Bayesian calibration of computer models , 2001 .

[14]  Douglas W. Nychka,et al.  Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets , 2008 .

[15]  Michael L. Stein,et al.  Limitations on low rank approximations for covariance matrices of spatial data , 2014 .

[16]  Thomas J. Santner,et al.  Design and analysis of computer experiments , 1998 .

[17]  James O. Berger,et al.  A Framework for Validation of Computer Models , 2007, Technometrics.

[18]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[19]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[20]  Jouni Hartikainen,et al.  Kalman filtering and smoothing solutions to temporal Gaussian process regression models , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[21]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[22]  Jianhua Z. Huang,et al.  A full scale approximation of covariance functions for large spatial data sets , 2012 .

[23]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[24]  Daniel W. Apley,et al.  Local Gaussian Process Approximation for Large Computer Experiments , 2013, 1303.0383.

[25]  Noel Cressie,et al.  FRK: An R Package for Spatial and Spatio-Temporal Prediction with Large Datasets , 2017, J. Stat. Softw..

[26]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[27]  D. Nychka,et al.  A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets , 2015 .

[28]  Weining Shen,et al.  Generalized probabilistic principal component analysis of correlated data , 2018, J. Mach. Learn. Res..

[29]  Håvard Rue,et al.  Simultaneous Credible Bands for Latent Gaussian Models , 2011 .

[30]  Gonzalo García-Donato,et al.  Calibration of computer models with multivariate output , 2012, Comput. Stat. Data Anal..

[31]  J. Møller,et al.  Handbook of Spatial Statistics , 2008 .

[32]  P. Whittle ON STATIONARY PROCESSES IN THE PLANE , 1954 .

[33]  James O. Berger,et al.  Using Statistical and Computer Models to Quantify Volcanic Hazards , 2009, Technometrics.

[34]  Simo Särkkä,et al.  Infinite-Dimensional Kalman Filtering Approach to Spatio-Temporal Gaussian Process Regression , 2012, AISTATS.

[35]  A. O'Hagan,et al.  Bayesian emulation of complex multi-output and dynamic computer models , 2010 .

[36]  Michael A. West,et al.  Time Series: Modeling, Computation, and Inference , 2010 .

[37]  Mengyang Gu Jointly Robust Prior for Gaussian Stochastic Process in Emulation, Calibration and Variable Selection , 2018, Bayesian Analysis.

[38]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[39]  Dorit Hammerling,et al.  A Case Study Competition Among Methods for Analyzing Large Spatial Data , 2017, Journal of Agricultural, Biological and Environmental Statistics.

[40]  Roberto Cerbino,et al.  Differential dynamic microscopy: probing wave vector dependent dynamics with a microscope. , 2008, Physical review letters.

[41]  James O. Berger,et al.  RobustGaSP: Robust Gaussian Stochastic Process Emulation in R , 2018, R J..

[42]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[43]  William F. Christensen,et al.  Nonstationary Gaussian Process Models Using Spatial Hierarchical Clustering from Finite Differences , 2017, Technometrics.

[44]  Michael L. Stein,et al.  Bayesian and Maximum Likelihood Estimation for Gaussian Processes on an Incomplete Lattice , 2014, 1402.4281.

[45]  M. Fuentes,et al.  Circulant Embedding of Approximate Covariances for Inference From Gaussian Data on Large Lattices , 2017 .

[46]  P. Segall,et al.  Magma reservoir failure and the onset of caldera collapse at Kīlauea Volcano in 2018 , 2019, Science.

[47]  Rui Paulo Default priors for Gaussian processes , 2005 .

[48]  D. Higdon,et al.  Computer Model Calibration Using High-Dimensional Output , 2008 .

[49]  Matthias Katzfuss,et al.  A Multi-Resolution Approximation for Massive Spatial Datasets , 2015, 1507.04789.

[50]  Giovanni Petris,et al.  Dynamic Linear Models with R , 2009 .

[51]  Matthias Katzfuss,et al.  A General Framework for Vecchia Approximations of Gaussian Processes , 2017, Statistical Science.

[52]  D. Zimmerman Another look at anisotropy in geostatistics , 1993 .

[53]  Luca Vogt Statistics For Spatial Data , 2016 .