Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds

Gaussian Processes (GPs) are a generic modelling tool for supervised learning. While they have been successfully applied on large datasets, their use in safety-critical applications is hindered by the lack of good performance guarantees. To this end, we propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood. Besides its theoretical appeal, we find in our evaluation that our learning method is robust and yields significantly better generalization guarantees than other common GP approaches on several regression benchmark datasets.

[1]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[2]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[3]  Alexandre Lacoste,et al.  PAC-Bayesian Theory Meets Bayesian Inference , 2016, NIPS.

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[6]  Roni Khardon,et al.  Excess Risk Bounds for the Bayes Risk using Variational Inference in Latent Gaussian Models , 2017, NIPS.

[7]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[8]  Carl E. Rasmussen,et al.  Understanding Probabilistic Sparse Gaussian Process Approximations , 2016, NIPS.

[9]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[10]  Taekyun Kim,et al.  A Note on the -Euler Measures , 2009 .

[11]  Byron Boots,et al.  Variational Inference for Gaussian Process Models with Linear Complexity , 2017, NIPS.

[12]  Richard E. Turner,et al.  A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..

[13]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[14]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[15]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[16]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[17]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[18]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[19]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[20]  James Hensman,et al.  On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes , 2015, AISTATS.

[21]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[22]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[23]  François Laviolette,et al.  PAC-Bayesian Bounds based on the Rényi Divergence , 2016, AISTATS.

[24]  Andreas Maurer,et al.  A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[25]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[26]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[27]  John Shawe-Taylor,et al.  Tighter PAC-Bayes Bounds , 2006, NIPS.

[28]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[29]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[30]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.