PAC-Bayes with Backprop

We explore the family of methods "PAC-Bayes with Backprop" (PBB) to train probabilistic neural networks by minimizing PAC-Bayes bounds. We present two training objectives, one derived from a previously known PAC-Bayes bound, and a second one derived from a novel PAC-Bayes bound. Both training objectives are evaluated on MNIST and on various UCI data sets. Our experiments show two striking observations: we obtain competitive test set error estimates (~1.4% on MNIST) and at the same time we compute non-vacuous bounds with much tighter values (~2.3% on MNIST) than previous results. These observations suggest that neural nets trained by PBB may lead to self-bounding learning, where the available data can be used to simultaneously learn a predictor and certify its risk, with no need to follow a data-splitting protocol.

[1]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[2]  Geoffrey E. Hinton,et al.  Keeping Neural Networks Simple , 1993 .

[3]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[4]  John Shawe-Taylor,et al.  Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[5]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[6]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[7]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[8]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[9]  Ben London,et al.  A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent , 2017, NIPS.

[10]  Martin Jankowiak,et al.  Pathwise Derivatives Beyond the Reparameterization Trick , 2018, ICML.

[11]  John Langford,et al.  (Not) Bounding the True Error , 2001, NIPS.

[12]  Robert Price,et al.  A useful theorem for nonlinear devices having Gaussian inputs , 1958, IRE Trans. Inf. Theory.

[13]  Yoav Freund,et al.  Self bounding learning algorithms , 1998, COLT' 98.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Matthias Seeger,et al.  PAC-Bayesian Generalization Error Bounds for GaussianPro ess Classi ationMatthias , 2002 .

[16]  Gintare Karolina Dziugaite,et al.  Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy , 2017, NeurIPS.

[17]  Pierre Vandergheynst,et al.  PAC-BAYESIAN MARGIN BOUNDS FOR CONVOLUTIONAL NEURAL NETWORKS , 2018 .

[18]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[19]  Shai Ben-David,et al.  Understanding Machine Learning: Preface , 2014 .

[20]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[21]  Koby Crammer,et al.  Robust Forward Algorithms via PAC-Bayes and Laplace Distributions , 2014, AISTATS.

[22]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[23]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[24]  John Shawe-Taylor,et al.  A PAC analysis of a Bayesian estimator , 1997, COLT '97.

[25]  John Langford,et al.  Microchoice Bounds and Self Bounding Learning Algorithms , 2003, Machine Learning.

[26]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[27]  Radford M. Neal Bayesian Learning via Stochastic Dynamics , 1992, NIPS.

[28]  Samy Bengio,et al.  Are All Layers Created Equal? , 2019, J. Mach. Learn. Res..

[29]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[30]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[31]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[32]  Christian Igel,et al.  A Strongly Quasiconvex PAC-Bayesian Bound , 2016, ALT.

[33]  John Shawe-Taylor,et al.  Distribution-Dependent PAC-Bayes Priors , 2010, ALT.

[34]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[35]  Andreas Maurer,et al.  A Note on the PAC Bayesian Theorem , 2004, ArXiv.