Exploring the Generalization Performance of Neural Networks via Diversity

Neural networks (NNs) have achieved excellent performance in many industrial tasks, but their interpretability is still a major challenge and difficulty, in which the generalization ability of NNs is a subject to be completely studied. Inspired by ensemble learning, this paper proposes a new evaluation indicator called diversity to evaluate the generalization ability of NNs, that is, each hidden unit plays two roles in Neural Networks: the unit could be treated as an “ensemble” learner that integrates features extracted from the preceding layer, meanwhile, it is treated as a base learner for the learners in the next layer. We derive a diversitybased generalization bound for NNs and prove that network diversity is crucial for reducing generalization error. We experimentally verified the proposed evaluation indicator on two well-known datasets, i.e. CIFAR and MNIST, and the experimental results sufficiently verified the theory we proposed.

[1]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[2]  Norden E. Huang,et al.  Ensemble Empirical Mode Decomposition: a Noise-Assisted Data Analysis Method , 2009, Adv. Data Sci. Adapt. Anal..

[3]  T. Poggio,et al.  Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization , 2002 .

[4]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[5]  Ruslan Salakhutdinov,et al.  Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.

[6]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[7]  Amnon Shashua,et al.  Convolutional Rectifier Networks as Generalized Tensor Decompositions , 2016, ICML.

[8]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[9]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[10]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[11]  Tie-Yan Liu,et al.  Generalization Error Bounds for Optimization Algorithms via Stability , 2017, AAAI.

[12]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[13]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[14]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[15]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[16]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[17]  T. Poggio,et al.  Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.

[18]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[19]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[20]  A. Solow,et al.  Measuring biological diversity , 2006, Environmental and Ecological Statistics.

[21]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[22]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[23]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[24]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[25]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.