论文信息 - Modern Neural Networks Generalize on Small Data Sets

Modern Neural Networks Generalize on Small Data Sets

In this paper, we use a linear program to empirically decompose fitted neural networks into ensembles of low-bias sub-networks. We show that these sub-networks are relatively uncorrelated which leads to an internal regularization process, very much like a random forest, which can explain why a neural network is surprisingly resistant to overfitting. We then demonstrate this in practice by applying large neural networks, with hundreds of parameters per training observation, to a collection of 116 real-world data sets from the UCI Machine Learning Repository. This collection of data sets contains a much smaller number of training examples than the types of image classification tasks generally studied in the deep learning literature, as well as non-trivial label noise. We show that even in this setting deep neural nets are capable of achieving superior classification accuracy without overfitting.

[1] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[2] L. Breiman. SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[3] Leslie Pack Kaelbling,et al. Generalization in Deep Learning , 2017, ArXiv.

[4] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[5] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[6] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7] Max Tegmark,et al. Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.

[8] Mark R. Segal,et al. Machine Learning Benchmarks and Random Forest Regression , 2004 .

[9] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[10] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[11] Nir Shavit,et al. Deep Learning is Robust to Massive Label Noise , 2017, ArXiv.

[12] Serge J. Belongie,et al. Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[13] François Chollet,et al. Deep Learning with Python , 2017 .

[14] Senén Barro,et al. Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[15] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.

[16] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.

[17] David Mease,et al. Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers , 2015, J. Mach. Learn. Res..

[18] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.

[19] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[24] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.