论文信息 - Exploring Generalization in Deep Learning - 字舞流文

Exploring Generalization in Deep Learning

With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness. We study how these measures can ensure generalization, highlighting the importance of scale normalization, and making a connection between sharpness and PAC-Bayes theory. We then investigate how well the measures explain different observed phenomena.

Nathan Srebro | Behnam Neyshabur | David McAllester | Srinadh Bhojanapalli | Nathan Srebro | D. McAllester | Srinadh Bhojanapalli | Behnam Neyshabur | N. Srebro

[1] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[2] Bernhard Schölkopf,et al. The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[3] Peter L. Bartlett,et al. Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.

[4] David A. McAllester. PAC-Bayesian model averaging , 1999, COLT '99.

[5] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[6] Tomaso A. Poggio,et al. Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[7] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[8] John Langford,et al. (Not) Bounding the True Error , 2001, NIPS.

[9] David A. McAllester. Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[10] Ulrike von Luxburg,et al. Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[11] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.

[12] Tommi S. Jaakkola,et al. Maximum-Margin Matrix Factorization , 2004, NIPS.

[13] Adi Shraibman,et al. Rank, Trace-Norm and Max-Norm , 2005, COLT.

[14] Shie Mannor,et al. Robustness and generalization , 2010, Machine Learning.

[15] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .

[16] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[17] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[18] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.

[19] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[21] Ruslan Salakhutdinov,et al. Data-Dependent Path Normalization in Neural Networks , 2015, ICLR.

[22] Ruslan Salakhutdinov,et al. Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations , 2016, NIPS.

[23] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[24] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[25] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[26] Abbas Mehrabian,et al. Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.

[27] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[28] Guillermo Sapiro,et al. Generalization Error of Invariant Classifiers , 2016, AISTATS.

[29] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[30] Peter L. Bartlett,et al. Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..