A novel weight pruning method for MLP classifiers based on the MAXCORE principle

We introduce a novel weight pruning methodology for MLP classifiers that can be used for model and/or feature selection purposes. The main concept underlying the proposed method is the MAXCORE principle, which is based on the observation that relevant synaptic weights tend to generate higher correlations between error signals associated with the neurons of a given layer and the error signals propagated back to the previous layer. Nonrelevant (i.e. prunable) weights tend to generate smaller correlations. Using the MAXCORE as a guiding principle, we perform a cross-correlation analysis of the error signals at successive layers. Weights for which the cross-correlations are smaller than a user-defined error tolerance are gradually discarded. Computer simulations using synthetic and real-world data sets show that the proposed method performs consistently better than standard pruning techniques, with much lower computational costs.

[1]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[2]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[3]  Chris Bishop,et al.  Exact Calculation of the Hessian Matrix for the Multilayer Perceptron , 1992, Neural Computation.

[4]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[5]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[6]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[7]  Helge J. Ritter,et al.  Learning and Generalization in Cascade Network Architectures , 1996, Neural Computation.

[8]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9]  Giovanna Castellano,et al.  An iterative pruning algorithm for feedforward neural networks , 1997, IEEE Trans. Neural Networks.

[10]  Malik Magdon-Ismail,et al.  No Free Lunch for Early Stopping , 1999, Neural Computation.

[11]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[12]  Xin Yao,et al.  Evolving artificial neural networks , 1999, Proc. IEEE.

[13]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[14]  Jose C. Principe,et al.  Neural and adaptive systems , 2000 .

[15]  Jihoon Yang,et al.  Constructive Neural-Network Learning Algorithms for Pattern Classification , 2000 .

[16]  Andries Petrus Engelbrecht,et al.  A new pruning heuristic based on variance analysis of sensitivity information , 2001, IEEE Trans. Neural Networks.

[17]  Alessio Micheli,et al.  Universal Approximation Capability of Cascade Correlation for Structures , 2005, Neural Computation.

[18]  Tong Heng Lee,et al.  Geometrical interpretation and architecture selection of MLP , 2005, IEEE Transactions on Neural Networks.

[19]  H. Labelle,et al.  Analysis of the Sagittal Balance of the Spine and Pelvis Using Shape and Orientation Parameters , 2005, Journal of spinal disorders & techniques.

[20]  Kevin Judd,et al.  A Comparative Study of Information Criteria for Model Selection , 2006, Int. J. Bifurc. Chaos.

[21]  Bruce Curry,et al.  Model selection in Neural Networks: Some difficulties , 2006, Eur. J. Oper. Res..

[22]  Thomas R. Shultz,et al.  A systematic comparison of flat and standard cascade-correlation using a student–teacher network approximation task , 2007, Connect. Sci..

[23]  Shun-ichi Amari,et al.  The AIC Criterion and Symmetrizing the Kullback–Leibler Divergence , 2007, IEEE Transactions on Neural Networks.

[24]  José Neves,et al.  Evolution of neural networks for classification and regression , 2007, Neurocomputing.

[25]  George-Christopher Vosniakos,et al.  Optimizing feedforward artificial neural network architecture , 2007, Eng. Appl. Artif. Intell..

[26]  Ashraf Saad,et al.  Evolving an artificial neural network classifier for condition monitoring of rotating mechanical systems , 2007, Appl. Soft Comput..

[27]  Lifeng Xi,et al.  Evolving artificial neural networks using an improved PSO and DPSO , 2008, Neurocomputing.

[28]  Augusto Montisci,et al.  Geometrical synthesis of MLP neural networks , 2008, Neurocomputing.

[29]  Stephan Trenn,et al.  Multilayer Perceptrons: Approximation Order and Necessary Number of Hidden Units , 2008, IEEE Transactions on Neural Networks.

[30]  I. Kanellopoulos,et al.  Global Optimization versus Deterministic Pruning for the Classification of Remotely Sensed Imagery , 2008 .

[31]  G.A. Barreto,et al.  On the Application of Ensembles of Classifiers to the Diagnosis of Pathologies of the Vertebral Column: A Comparative Analysis , 2009, IEEE Latin America Transactions.

[32]  Shingo Mabu,et al.  Enhancing the generalization ability of neural networks through controlling the hidden layers , 2009, Appl. Soft Comput..

[33]  Leonardo Franco,et al.  Neural Network Architecture Selection: Can Function Complexity Help? , 2009, Neural Processing Letters.

[34]  Ethem Alpaydin,et al.  An Incremental Framework Based on Cross-Validation for Estimating the Architecture of a Multilayer Perceptron , 2009, Int. J. Pattern Recognit. Artif. Intell..

[35]  Ioannis B. Theocharis,et al.  SVM-FuzCoC: A novel SVM-based feature selection method using a fuzzy complementary criterion , 2010, Pattern Recognit..

[36]  Pasi Luukka,et al.  Feature selection using fuzzy entropy measures with similarity classifier , 2011, Expert Syst. Appl..