Generalization error of deep neural networks: Role of classification margin and data structure

Understanding the generalization properties of deep learning models is critical for their successful usage in many applications, especially in the regimes where the number of training samples is limited. We study the generalization properties of deep neural networks (DNNs) via the Jacobian matrix of the network. Our analysis is general to arbitrary network structures, types of non-linearities and pooling operations. We show that bounding the spectral norm of the Jacobian matrix in the network reduces the generalization error. In addition, we tie this error to the invariance in the data and the network. Experiments on the MNIST and ImageNet datasets support these findings. This short paper summarizes our generalization error theorems for DNNs and for general invariant classifiers [1], [2].

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[3]  Guillermo Sapiro,et al.  Generalization Error of Invariant Classifiers , 2016, AISTATS.

[4]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[5]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[6]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Nakul Verma,et al.  Distance Preserving Embeddings for General n-Dimensional Manifolds , 2012, COLT.

[9]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[10]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[11]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[12]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[13]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[14]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[15]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[16]  Guillermo Sapiro,et al.  Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? , 2015, IEEE Transactions on Signal Processing.

[17]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Nadav Cohen,et al.  On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[20]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[21]  Koray Kavukcuoglu,et al.  Exploiting Cyclic Symmetry in Convolutional Neural Networks , 2016, ICML.

[22]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[23]  René Vidal,et al.  Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.

[24]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.