Efficiently searching the important input variables using Bayesian discriminant

This paper focuses on enhancing feature selection (FS) performance on a classification data set. First, a novel FS criterion using the concept of Bayesian discriminant is introduced. The proposed criterion is able to measure the classification ability of a feature set (or, a combination of the weighted features) in a direct way. This guarantees excellent FS results. Second, FS is conducted by optimizing the newly derived criterion in a continuous space instead of by heuristically searching features in a discrete feature space. Using this optimizing strategy, FS efficiency can be significantly improved. In this study, the proposed supervised FS scheme is compared with other related methods on different classification problems in which the number of features ranges from 33 to over 12,000. The presented results are very promising and corroborate the contributions of this study.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[3]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[4]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[5]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[6]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[7]  Tommy W. S. Chow,et al.  Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information , 2005, IEEE Transactions on Neural Networks.

[8]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[9]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[10]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[11]  Sankar K. Pal,et al.  Unsupervised feature evaluation: a neuro-fuzzy approach , 2000, IEEE Trans. Neural Networks Learn. Syst..

[12]  Lior Wolf,et al.  Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[14]  Beatrice Lazzerini,et al.  Feature Selection based on Similarity , 2002 .

[15]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[16]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[17]  Ron Kohavi,et al.  The Wrapper Approach , 1998 .

[18]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[19]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[20]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[21]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[22]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[23]  Andreas S. Weigend,et al.  Nonparametric selection of input variables for connectionist learning , 1996 .

[24]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[26]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[27]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[28]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[29]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[30]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[31]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.