Generalization Error in Deep Learning

Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing. However, alongside their state-of-the-art performance, it is still generally unclear what is the source of their generalization ability. Thus, an important question is what makes deep neural networks able to generalize well from the training set to new data. In this chapter, we provide an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results.

[1]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[2]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[3]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[4]  Nikhil Ketkar Training Deep Learning Models , 2017 .

[5]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[6]  Michael Elad,et al.  Convolutional Neural Networks Analyzed via Convolutional Sparse Coding , 2016, J. Mach. Learn. Res..

[7]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[9]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[10]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[11]  Andrew R. Barron,et al.  Approximation and Estimation for High-Dimensional Deep Learning Networks , 2018, ArXiv.

[12]  Razvan Pascanu,et al.  Sharp Minima Can Generalize For Deep Nets , 2017, ICML.

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Lorenzo Rosasco,et al.  Theory of Deep Learning III: explaining the non-overfitting puzzle , 2017, ArXiv.

[15]  Elad Hoffer,et al.  Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.

[16]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[17]  Leonardo Rey Vega,et al.  Information Bottleneck and Representation Learning , 2021, Information-Theoretic Methods in Data Science.

[18]  Rémi Gribonval,et al.  Sample Complexity of Dictionary Learning and Other Matrix Factorizations , 2013, IEEE Transactions on Information Theory.

[19]  Abbas Mehrabian,et al.  Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.

[20]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[21]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[22]  Dan Boneh,et al.  The Space of Transferable Adversarial Examples , 2017, ArXiv.

[23]  Yann LeCun,et al.  Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.

[24]  Ryan P. Adams,et al.  Compressibility and Generalization in Large-Scale Deep Learning , 2018, ArXiv.

[25]  Shie Mannor,et al.  The Sample Complexity of Dictionary Learning , 2010, COLT.

[26]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[27]  Nathan Srebro,et al.  The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..

[28]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[29]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[30]  René Vidal,et al.  Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Barbara Hammer,et al.  Incremental learning algorithms and applications , 2016, ESANN.

[32]  Guillermo Sapiro,et al.  DNN or k-NN: That is the Generalize vs. Memorize Question , 2018, ArXiv.

[33]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[34]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[35]  Jascha Sohl-Dickstein,et al.  Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.

[36]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[37]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[38]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[39]  Rémi Gribonval,et al.  Sparse and Spurious: Dictionary Learning With Noise and Outliers , 2014, IEEE Transactions on Information Theory.

[40]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[41]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[42]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[43]  Lorenzo Rosasco,et al.  Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[44]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[45]  Ohad Shamir,et al.  Size-Independent Sample Complexity of Neural Networks , 2017, COLT.

[46]  Barnabás Póczos,et al.  On the Reconstruction Risk of Convolutional Sparse Dictionary Learning , 2017, ArXiv.

[47]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[48]  Barnabás Póczos,et al.  Minimax Reconstruction Risk of Convolutional Sparse Dictionary Learning , 2018, AISTATS.

[49]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[50]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[51]  Andrew M. Saxe,et al.  High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.

[52]  Karin Schnass,et al.  Convergence radius and sample complexity of ITKM algorithms for dictionary learning , 2015, Applied and Computational Harmonic Analysis.

[53]  Peter L. Bartlett,et al.  Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.

[54]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[55]  Shai Shalev-Shwartz,et al.  SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.

[56]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[57]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[58]  Yonina C. Eldar,et al.  Performance limits of dictionary learning for sparse coding , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[59]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[60]  Raja Giryes,et al.  Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization , 2018, ECCV.

[61]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[62]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[63]  Leonardo Rey Vega,et al.  Compression-Based Regularization With an Application to Multitask Learning , 2017, IEEE Journal of Selected Topics in Signal Processing.

[64]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[65]  Hongyang Zhang,et al.  Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex , 2018, ArXiv.

[66]  Joan Bruna,et al.  Mathematics of Deep Learning , 2017, ArXiv.

[67]  Guillermo Sapiro,et al.  Generalization Error of Invariant Classifiers , 2016, AISTATS.