Eigenvalue decay: A new method for neural network regularization

This paper proposes two new training algorithms for multilayer perceptrons based on evolutionary computation, regularization, and transduction. Regularization is a commonly used technique for preventing the learning algorithm from overfitting the training data. In this context, this work introduces and analyzes a novel regularization scheme for neural networks (NNs) named eigenvalue decay, which aims at improving the classification margin. The introduction of eigenvalue decay led to the development of a new training method based on the same principles of SVM, and so named Support Vector NN (SVNN). Finally, by analogy with the transductive SVM (TSVM), it is proposed a transductive NN (TNN), by exploiting SVNN in order to address transductive learning. The effectiveness of the proposed algorithms is evaluated on seven benchmark datasets.

[1]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[2]  Mohamed Cheriet,et al.  Genetic algorithm–based training for semi-supervised SVM , 2010, Neural Computing and Applications.

[3]  Mark Beale,et al.  Neural Network Toolbox™ User's Guide , 2015 .

[4]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[5]  Shigeo Abe,et al.  Support Vector Machines for Pattern Classification (Advances in Pattern Recognition) , 2005 .

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Martin T. Hagan,et al.  Gauss-Newton approximation to Bayesian learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[8]  Bernardete Ribeiro,et al.  Improving the Generalization Capacity of Cascade Classifiers , 2013, IEEE Transactions on Cybernetics.

[9]  Madan Gopal,et al.  SVM-Based Tree-Type Neural Networks as a Critic in Adaptive Critic Designs for Control , 2007, IEEE Transactions on Neural Networks.

[10]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[12]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[13]  M. J. Usher Applications of Information Theory , 1984 .

[14]  Bernhard Sendhoff,et al.  Neural network regularization and ensembling using multi-objective evolutionary algorithms , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[15]  P. Corral,et al.  Optimization of ANN applied to non-linear system identification based in UWB , 2006, Joint IST Workshop on Mobile Future, 2006 and the Symposium on Trends in Communications. SympoTIC '06..

[16]  Tamás D. Gedeon,et al.  Exploring constructive cascade networks , 1999, IEEE Trans. Neural Networks.

[17]  Herman Augusto Lepikson,et al.  Applications of information theory, genetic algorithms, and neural models to predict oil flow , 2009 .

[18]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[19]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[20]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[21]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[22]  Shigeo Abe Support Vector Machines for Pattern Classification , 2010, Advances in Pattern Recognition.

[23]  L. BartlettP. The sample complexity of pattern classification with neural networks , 2006 .

[24]  M GavrilaDariu,et al.  Monocular Pedestrian Detection , 2009 .

[25]  Urbano Nunes,et al.  Novel Maximum-Margin Training Algorithms for Supervised Neural Networks , 2010, IEEE Transactions on Neural Networks.

[26]  Oswaldo Ludwig Study on non-parametric methods for fast pattern recognition with emphasis on neural networks and cascade classifiers , 2012 .