Exploring Generalization in Deep Learning

With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness. We study how these measures can ensure generalization, highlighting the importance of scale normalization, and making a connection between sharpness and PAC-Bayes theory. We then investigate how well the measures explain different observed phenomena.

[1]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[2]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[3]  Peter L. Bartlett,et al.  Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.

[4]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[5]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[6]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[7]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[8]  John Langford,et al.  (Not) Bounding the True Error , 2001, NIPS.

[9]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[10]  Ulrike von Luxburg,et al.  Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[11]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[12]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[13]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[14]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[15]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[16]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[17]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[18]  Ruslan Salakhutdinov,et al.  Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[21]  Ruslan Salakhutdinov,et al.  Data-Dependent Path Normalization in Neural Networks , 2015, ICLR.

[22]  Ruslan Salakhutdinov,et al.  Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations , 2016, NIPS.

[23]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[24]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[25]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[26]  Abbas Mehrabian,et al.  Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.

[27]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[28]  Guillermo Sapiro,et al.  Generalization Error of Invariant Classifiers , 2016, AISTATS.

[29]  Stefano Soatto,et al.  Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[30]  Peter L. Bartlett,et al.  Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..