Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE)

In applications of Gaussian processes where quantification of uncertainty is of primary interest, it is necessary to accurately characterize the posterior distribution over covariance parameters. This paper proposes an adaptation of the Stochastic Gradient Langevin Dynamics algorithm to draw samples from the posterior distribution over covariance parameters with negligible bias and without the need to compute the marginal likelihood. In Gaussian process regression, this has the enormous advantage that stochastic gradients can be computed by solving linear systems only. A novel unbiased linear systems solver based on parallelizable covariance matrix-vector products is developed to accelerate the unbiased estimation of gradients. The results demonstrate the possibility to enable scalable and exact (in a Monte Carlo sense) quantification of uncertainty in Gaussian processes without imposing any special structure on the covariance or reducing the number of input vectors.

[1]  A. O'Hagan,et al.  Bayesian calibration of computer models , 2001 .

[2]  Christian P. Robert,et al.  Accelerating Metropolis-Hastings algorithms: Delayed acceptance with prefetching , 2014, 1406.2660.

[3]  Kennedy,et al.  Noise without noise: A new Monte Carlo method. , 1985, Physical review letters.

[4]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[5]  Yves F. Atchad'e,et al.  On Russian Roulette Estimates for Bayesian Inference with Doubly-Intractable Likelihoods , 2013, 1306.4032.

[6]  Maurizio Filippone Bayesian Inference for Gaussian Process Classifiers with Annealing and Pseudo-Marginal MCMC , 2014, 2014 22nd International Conference on Pattern Recognition.

[7]  A. P. Dawid,et al.  Regression and Classification Using Gaussian Process Priors , 2009 .

[8]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[9]  M Filippone,et al.  PROBABILISTIC PREDICTION OF NEUROLOGICAL DISORDERS WITH A STATISTICAL ASSESSMENT OF NEUROIMAGING DATA MODALITIES. , 2012, The annals of applied statistics.

[10]  M. Anitescu,et al.  STOCHASTIC APPROXIMATION OF SCORE FUNCTIONS FOR GAUSSIAN PROCESSES , 2013, 1312.2687.

[11]  D. Dunson,et al.  Efficient Gaussian process regression for large datasets. , 2011, Biometrika.

[12]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[13]  Ryan P. Adams,et al.  Slice sampling covariance hyperparameters of latent Gaussian models , 2010, NIPS.

[14]  Peter J. Diggle,et al.  INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes , 2012, 1202.1738.

[15]  A Noisy Monte Carlo Algorithm with Fermion Determinant , 2000 .

[16]  M. Gibbs,et al.  Efficient implementation of gaussian processes , 1997 .

[17]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[18]  Maurizio Filippone,et al.  A comparative evaluation of stochastic-based inference methods for Gaussian process models , 2013, Machine Learning.

[19]  Elad Gilboa,et al.  Scaling Multidimensional Inference for Structured Gaussian Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Satoshi Matsuoka,et al.  Fast Conjugate Gradients with Multiple GPUs , 2009, ICCS.

[21]  H. Robbins A Stochastic Approximation Method , 1951 .

[22]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[23]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[24]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[25]  Maurizio Filippone Scalable stochastic gradient-based inference for gaussian processes , 2015 .

[26]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[27]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[28]  Julien Langou,et al.  Accelerating scientific computations with mixed precision algorithms , 2008, Comput. Phys. Commun..

[29]  K. Zygalakis,et al.  (Non-) asymptotic properties of Stochastic Gradient Langevin Dynamics , 2015, 1501.00438.

[30]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[31]  Balaji Vasan Srinivasan,et al.  Preconditioned Krylov solvers for kernel regression , 2014, ArXiv.

[32]  Hyung-Jin Kim,et al.  Multi GPU Performance of Conjugate Gradient Solver with Staggered Fermions in Mixed Precision , 2011 .

[33]  P. Diaconis Bayesian Numerical Analysis , 1988 .

[34]  Nicholas J. Higham,et al.  Functions of matrices - theory and computation , 2008 .

[35]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[36]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[37]  Mihai Anitescu,et al.  Computing f(A)b via Least Squares Polynomial Approximations , 2011, SIAM J. Sci. Comput..

[38]  Klaus Ritter,et al.  Bayesian numerical analysis , 2000 .

[39]  Andrew W. Moore,et al.  The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data , 2000, UAI.

[40]  MIHAI ANITESCU,et al.  A Matrix-free Approach for Solving the Parametric Gaussian Process Maximum Likelihood Problem , 2012, SIAM J. Sci. Comput..

[41]  Maurizio Filippone,et al.  Pseudo-Marginal Bayesian Inference for Gaussian Processes , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Philipp Hennig,et al.  Probabilistic Interpretation of Linear Solvers , 2014, SIAM J. Optim..

[44]  N. Higham Functions of Matrices: Theory and Computation (Other Titles in Applied Mathematics) , 2008 .

[45]  Robert B. Gramacy,et al.  Parameter space exploration with Gaussian process trees , 2004, ICML.

[46]  James Hensman,et al.  MCMC for Variationally Sparse Gaussian Processes , 2015, NIPS.

[47]  A. Pettitt,et al.  Scalable iterative methods for sampling from massive Gaussian random vectors , 2013, 1312.1476.

[48]  Iain Murray Gaussian processes and fast matrix-vector multiplies , 2009 .

[49]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[50]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .