On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes

The variational framework for learning inducing variables (Titsias, 2009a) has had a large impact on the Gaussian process literature. The framework may be interpreted as minimizing a rigorously defined Kullback-Leibler divergence between the approximating and posterior processes. To our knowledge this connection has thus far gone unremarked in the literature. In this paper we give a substantial generalization of the literature on this topic. We give a new proof of the result for infinite index sets which allows inducing points that are not data points and likelihoods that depend on all function values. We then discuss augmented index sets and show that, contrary to previous works, marginal consistency of augmentation is not enough to guarantee consistency of variational inference with the original model. We then characterize an extra condition where such a guarantee is obtainable. Finally we show how our framework sheds light on interdomain sparse approximations and sparse approximations for Cox processes.

[1]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[2]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[3]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[4]  Neil D. Lawrence,et al.  Nested Variational Compression in Deep Gaussian Processes , 2014, 1412.1370.

[5]  Mauricio A. Álvarez Convolved Gaussian process priors for multivariate regression with applications to dynamical systems , 2011 .

[6]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[7]  Kian Ming Adam Chai,et al.  Variational Multinomial Logit Gaussian Process , 2012, J. Mach. Learn. Res..

[8]  Stephen J. Roberts,et al.  Variational Inference for Gaussian Process Modulated Poisson Processes , 2014, ICML.

[9]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[10]  Neil D. Lawrence,et al.  Variational Inference for Uncertainty on the Inputs of Gaussian Process Models , 2014, ArXiv.

[11]  Michalis K. Titsias,et al.  Variational Model Selection for Sparse Gaussian Process Regression , 2008 .

[12]  Richard F. Bass Kolmogorov extension theorem , 2011 .

[13]  F. Y. Edgeworth,et al.  The theory of statistics , 1996 .

[14]  Aníbal R. Figueiras-Vidal,et al.  Inter-domain Gaussian Processes for Sparse Inference using Inducing Features , 2009, NIPS.

[15]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[16]  Neil D. Lawrence,et al.  Computationally Efficient Convolved Multiple Output Gaussian Processes , 2011, J. Mach. Learn. Res..

[17]  L. Csató Gaussian processes:iterative sparse approximations , 2002 .

[18]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[19]  P. E. Kopp,et al.  Measure, Integral and Probability , 1998 .

[20]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[21]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[22]  B. Hunt Prevalence: a translation-invariant “almost every” on infinite-dimensional spaces , 1992, math/9210220.

[23]  Neil D. Lawrence,et al.  Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes , 2016, J. Mach. Learn. Res..

[24]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[25]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[26]  J. Norris Appendix: probability and measure , 1997 .