Feature Selection for MLP Neural Network: The Use of Random Permutation of Probabilistic Outputs

This paper presents a new wrapper-based feature selection method for multilayer perceptron (MLP) neural networks. It uses a feature ranking criterion to measure the importance of a feature by computing the aggregate difference, over the feature space, of the probabilistic outputs of the MLP with and without the feature. Thus, a score of importance with respect to every feature can be provided using this criterion. Based on the numerical experiments on several artificial and real-world data sets, the proposed method performs, in general, better than several selected feature selection methods for MLP, particularly when the data set is sparse or has many redundant features. In addition, as a wrapper-based approach, the computational cost for the proposed method is modest.

[1]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[4]  Chong Jin Ong,et al.  Feature selection via sensitivity analysis of SVM probabilistic outputs , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[5]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[6]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Tommy W. S. Chow,et al.  Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information , 2005, IEEE Transactions on Neural Networks.

[8]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[9]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[10]  Chun-Nan Hsu,et al.  The ANNIGMA-wrapper approach to fast feature selection for neural nets , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[11]  Françoise Fogelman-Soulié,et al.  Neurocomputing : algorithms, architectures and applications , 1990 .

[12]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[13]  Deniz Erdogmus,et al.  Feature selection in MLPs and SVMs based on maximum output information , 2004, IEEE Transactions on Neural Networks.

[14]  E. S. Page Errata: A Note on Generating Random Permutations , 1968 .

[15]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[16]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[17]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[18]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[19]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[20]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[21]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[22]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .