Variational inference for sparse spectrum Gaussian process regression

We develop a fast variational approximation scheme for Gaussian process (GP) regression, where the spectrum of the covariance function is subjected to a sparse approximation. Our approach enables uncertainty in covariance function hyperparameters to be treated without using Monte Carlo methods and is robust to overfitting. Our article makes three contributions. First, we present a variational Bayes algorithm for fitting sparse spectrum GP regression models that uses nonconjugate variational message passing to derive fast and efficient updates. Second, we propose a novel adaptive neighbourhood technique for obtaining predictive inference that is effective in dealing with nonstationarity. Regression is performed locally at each point to be predicted and the neighbourhood is determined using a measure defined based on lengthscales estimated from an initial fit. Weighting dimensions according to lengthscales, this downweights variables of little relevance, leading to automatic variable selection and improved prediction. Third, we introduce a technique for accelerating convergence in nonconjugate variational message passing by adapting step sizes in the direction of the natural gradient of the lower bound. Our adaptive strategy can be easily implemented and empirical results indicate significant speedups.

[1]  Daniel W. Apley,et al.  Local Gaussian Process Approximation for Large Computer Experiments , 2013, 1303.0383.

[2]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[3]  James S. Hodges,et al.  Variational Bayesian methods for spatial data analysis , 2011, Comput. Stat. Data Anal..

[4]  Matthew P. Wand,et al.  Fully simplified multivariate normal updates in non-conjugate variational message passing , 2014, J. Mach. Learn. Res..

[5]  Jan Peters,et al.  Model Learning with Local Gaussian Process Regression , 2009, Adv. Robotics.

[6]  A. V. Vecchia Estimation and model identification for continuous spatial processes , 1988 .

[7]  Walter Boughton,et al.  The Australian water balance model , 2004, Environ. Model. Softw..

[8]  Sunho Park,et al.  Hierarchical Gaussian Process Regression , 2010, ACML.

[9]  Miguel Lázaro-Gredilla,et al.  Variational Heteroscedastic Gaussian Process Regression , 2011, ICML.

[10]  Zhiyi Chi,et al.  Approximating likelihoods for large spatial data sets , 2004 .

[11]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[12]  Bo Wang,et al.  Inadequacy of interval estimates corresponding to variational Bayesian approximations , 2005, AISTATS.

[13]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[14]  P. Gustafson,et al.  Conservative prior distributions for variance parameters in hierarchical models , 2006 .

[15]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[16]  Linda S. L. Tan,et al.  A Stochastic Variational Framework for Fitting and Diagnosing Generalized Linear Mixed Models , 2012, 1208.4949.

[17]  Steven D. Prager,et al.  Combining multiple maps of line features to infer true position , 2008 .

[18]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[19]  Yuan Qi,et al.  Parameter Expanded Variational Bayesian Methods , 2006, NIPS.

[20]  M. Wand,et al.  Mean field variational bayes for elaborate distributions , 2011 .

[21]  D. Titterington,et al.  Convergence properties of a general algorithm for calculating variational Bayesian estimates for a normal mixture model , 2006 .

[22]  Bernhard Schölkopf,et al.  Sparse multiscale gaussian process regression , 2008, ICML '08.

[23]  Juha Karhunen,et al.  Accelerating Cyclic Update Algorithms for Parameter Estimation by Pattern Searches , 2003, Neural Processing Letters.

[24]  Ruslan Salakhutdinov,et al.  Adaptive Overrelaxed Bound Optimization Methods , 2003, ICML.

[25]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[27]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[28]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[29]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[30]  R. Kohn,et al.  Regression Density Estimation With Variational Methods and Stochastic Approximation , 2012 .

[31]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[32]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[33]  Zoubin Ghahramani,et al.  Local and global sparse Gaussian process approximations , 2007, AISTATS.

[34]  R. Adler,et al.  Random Fields and Geometry , 2007 .

[35]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[36]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Tom Minka,et al.  Non-conjugate Variational Message Passing for Multinomial and Binary Regression , 2011, NIPS.

[38]  Chun-Nan Hsu,et al.  Triple jump acceleration for the EM algorithm , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[39]  T. C. Haas,et al.  Local Prediction of a Spatio-Temporal Process with an Application to Wet Sulfate Deposition , 1995 .

[40]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[41]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[42]  Linda S. L. Tan,et al.  Variational Inference for Generalized Linear Mixed Models Using Partially Noncentered Parametrizations , 2012, 1205.3906.

[43]  M. Wand,et al.  Explaining Variational Approximations , 2010 .

[44]  Jon D. McAuliffe,et al.  Variational Inference for Large-Scale Models of Discrete Choice , 2007, 0712.2526.

[45]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[46]  B. Mallick,et al.  Analyzing Nonstationary Spatial Data Using Piecewise Gaussian Processes , 2005 .