论文信息 - Exploring the Generalization Performance of Neural Networks via Diversity

Exploring the Generalization Performance of Neural Networks via Diversity

Neural networks (NNs) have achieved excellent performance in many industrial tasks, but their interpretability is still a major challenge and difficulty, in which the generalization ability of NNs is a subject to be completely studied. Inspired by ensemble learning, this paper proposes a new evaluation indicator called diversity to evaluate the generalization ability of NNs, that is, each hidden unit plays two roles in Neural Networks: the unit could be treated as an “ensemble” learner that integrates features extracted from the preceding layer, meanwhile, it is treated as a base learner for the learners in the next layer. We derive a diversitybased generalization bound for NNs and prove that network diversity is crucial for reducing generalization error. We experimentally verified the proposed evaluation indicator on two well-known datasets, i.e. CIFAR and MNIST, and the experimental results sufficiently verified the theory we proposed.

Junwei Zhang | Tianyuan Niu | Pengqing Zhang

[1] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[2] Norden E. Huang,et al. Ensemble Empirical Mode Decomposition: a Noise-Assisted Data Analysis Method , 2009, Adv. Data Sci. Adapt. Anal..

[3] T. Poggio,et al. Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization , 2002 .

[4] Tomaso A. Poggio,et al. Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[5] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.

[6] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[7] Amnon Shashua,et al. Convolutional Rectifier Networks as Generalized Tensor Decompositions , 2016, ICML.

[8] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[9] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[10] Yoshua Bengio,et al. Inference for the Generalization Error , 1999, Machine Learning.

[11] Tie-Yan Liu,et al. Generalization Error Bounds for Optimization Algorithms via Stability , 2017, AAAI.

[12] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[13] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[14] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..

[15] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[16] D. Opitz,et al. Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[17] T. Poggio,et al. Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.

[18] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[19] Ludmila I. Kuncheva,et al. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[20] A. Solow,et al. Measuring biological diversity , 2006, Environmental and Ecological Statistics.

[21] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.

[22] T. Poggio,et al. General conditions for predictivity in learning theory , 2004, Nature.

[23] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[24] Yoshua Bengio,et al. Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[25] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.