On the generalization of bayesian deep nets for multi-class classification

Generalization bounds which assess the difference between the true risk and the empirical risk have been studied extensively. However, to obtain bounds, current techniques use strict assumptions such as a uniformly bounded or a Lipschitz loss function. To avoid these assumptions, in this paper, we propose a new generalization bound for Bayesian deep nets by exploiting the contractivity of the Log-Sobolev inequalities. Using these inequalities adds an additional loss-gradient norm term to the generalization bound, which is intuitively a surrogate of the model complexity. Empirically, we analyze the affect of this loss-gradient norm term using different deep nets.

[1]  Yann LeCun,et al.  Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.

[2]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[3]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[4]  Christoph H. Lampert,et al.  Data-Dependent Stability of Stochastic Gradient Descent , 2017, ICML.

[5]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[6]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[7]  Manfred K. Warmuth,et al.  The Last-Step Minimax Algorithm , 2000, ALT.

[8]  Arindam Banerjee,et al.  On Bayesian bounds , 2006, ICML.

[9]  Ivan Gentil Logarithmic Sobolev inequality for log-concave measure from Prekopa-Leindler inequality , 2005 .

[10]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ben London,et al.  A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent , 2017, NIPS.

[13]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  T. Poggio,et al.  STABILITY RESULTS IN LEARNING THEORY , 2005 .

[16]  Pierre Alquier,et al.  Model selection for weakly dependent time series forecasting , 2009, 0902.2924.

[17]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[18]  Daniel M. Roy,et al.  Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes , 2019, NeurIPS.

[19]  M. Ledoux Concentration of measure and logarithmic Sobolev inequalities , 1999 .

[20]  Ohad Shamir,et al.  Size-Independent Sample Complexity of Neural Networks , 2017, COLT.

[21]  Peter L. Bartlett,et al.  Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..

[22]  Colin Wei,et al.  Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation , 2019, NeurIPS.

[23]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[24]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[25]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[26]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[27]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[28]  David A. McAllester A PAC-Bayesian Tutorial with A Dropout Bound , 2013, ArXiv.

[29]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[30]  Peter Grünwald,et al.  A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity , 2017, ALT.

[31]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[32]  Alexandre Lacoste,et al.  PAC-Bayesian Theory Meets Bayesian Inference , 2016, NIPS.

[33]  Peter L. Bartlett,et al.  Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families , 2013, COLT.

[34]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[35]  Tengyu Ma,et al.  Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.

[36]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[37]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[38]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[39]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.