Estimating the Generalization in Deep Neural Networks via Sparsity

Generalization is the key capability for deep neural networks (DNNs). However, it is challenging to give a reliable measure of the generalization ability of a DNN via only its nature. In this paper, we propose a novel method for estimating the generalization gap based on network sparsity. In our method, two key quantities are proposed first. They have close relationship with the generalization ability and can be calculated directly from the training results alone. Then a simple linear model involving two key quantities are constructed to give accurate estimation of the generalization gap. By training DNNs with a wide range of generalization gap on popular datasets, we show that our key quantities and linear model could be efficient tools for estimating the generalization gap of DNNs.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[3]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[4]  Hossein Mobahi,et al.  Predicting the Generalization Gap in Deep Networks with Margin Distributions , 2018, ICLR.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Thomas J. Archdeacon Correlation and regression analysis : a historian's guide , 1994 .

[10]  Bolei Zhou,et al.  Understanding the role of individual units in a deep neural network , 2020, Proceedings of the National Academy of Sciences.

[11]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[12]  Razvan Pascanu,et al.  Sharp Minima Can Generalize For Deep Nets , 2017, ICML.

[13]  Stefano Soatto,et al.  Emergence of invariance and disentangling in deep representations , 2017 .

[14]  Shai Shalev-Shwartz,et al.  Fast Rates for Empirical Risk Minimization of Strict Saddle Problems , 2017, COLT.

[15]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[16]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[17]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[18]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[19]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[20]  Matthew Botvinick,et al.  On the importance of single directions for generalization , 2018, ICLR.

[21]  Shiwei Liu Learning Sparse Neural Networks for Better Generalization , 2020, IJCAI.

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.

[24]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[25]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Tie-Yan Liu,et al.  On the Depth of Deep Neural Networks: A Theoretical View , 2015, AAAI.

[27]  Ari S. Morcos,et al.  Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs , 2020, ICLR.

[28]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[29]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[30]  Bolei Zhou,et al.  Revisiting the Importance of Individual Units in CNNs via Ablation , 2018, ArXiv.

[31]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[32]  Decebal Constantin Mocanu,et al.  On improving deep learning generalization with adaptive sparse connectivity , 2019, ArXiv.

[33]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[34]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[35]  Jürgen Schmidhuber,et al.  Flat Minima , 1997, Neural Computation.

[36]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[39]  Gordon Erlebacher,et al.  The Generalization-Stability Tradeoff in Neural Network Pruning , 2019, NeurIPS.

[40]  Hossein Mobahi,et al.  Large Margin Deep Networks for Classification , 2018, NeurIPS.