PAC-Bayesian Neural Network Bounds

Bayesian neural networks, which both use the negative log-likelihood loss function and average their predictions using a learned posterior over the parameters, have been used successfully across many scientific fields, partly due to their ability to ‘effortlessly’ extract desired representations from many large-scale datasets. However, generalization bounds for this setting is still missing. In this paper, we present a new PAC-Bayesian generalization bound for the negative log-likelihood loss which utilizes the Herbst Argument for the log-Sobolev inequality to bound the moment generating function of the learners risk. We explore the generalization and calibration properties of the learned posterior on several image classification benchmarks, showing that the proposed approach provides better generalization and uncertainty estimates.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[3]  Yee Whye Teh,et al.  Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server , 2015, J. Mach. Learn. Res..

[4]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[6]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[7]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[8]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[9]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[10]  Peter L. Bartlett,et al.  Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..

[11]  T. Poggio,et al.  STABILITY RESULTS IN LEARNING THEORY , 2005 .

[12]  Alexandre Lacoste,et al.  PAC-Bayesian Theory Meets Bayesian Inference , 2016, NIPS.

[13]  M. Ledoux,et al.  Logarithmic Sobolev Inequalities for Unbounded Spin Systems Revisited , 2001 .

[14]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[15]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[16]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[17]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[18]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[19]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[20]  Zhi-Hua Zhou,et al.  Dropout Rademacher complexity of deep neural networks , 2014, Science China Information Sciences.

[21]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[22]  John Langford,et al.  (Not) Bounding the True Error , 2001, NIPS.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[25]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[26]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[27]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[28]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[29]  M. Ledoux Concentration of measure and logarithmic Sobolev inequalities , 1999 .

[30]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[31]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[32]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[33]  David A. McAllester A PAC-Bayesian Tutorial with A Dropout Bound , 2013, ArXiv.

[34]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[35]  Ivan Gentil Logarithmic Sobolev inequality for log-concave measure from Prekopa-Leindler inequality , 2005 .

[36]  Arindam Banerjee,et al.  On Bayesian bounds , 2006, ICML.

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[39]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[40]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[41]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .