A new feature selection method based on clustering

Feature selection is an effective technique to put the high dimension of data down, which is prevailing in many application domains, such as text categorization and bio-informatics, and can bring many advantages, such as improving efficiency and avoiding over-fitting, to learning algorithms. Currently, many efforts have been attempted in this field and various feature selection methods have been developed and proved to be very competitive. Unlike other selection methods, in this paper we propose a new method to select important features using a manner of feature clustering. The main character of our method is that it works like data clustering in an agglomerative way. In this method, each feature is considered as a data point clustered with between-cluster and within-cluster distances. As a result, the selected feature subset has minimal redundancy among its members and maximal relevance with the class labels. Our performance evaluations on seven benchmark datasets show that the classification performance achieved by our proposed method is better than other feature selection methods.

[1]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Shimon Ullman,et al.  Learning to classify by ongoing feature selection , 2010, Image Vis. Comput..

[5]  Jing Hua,et al.  Localized feature selection for clustering , 2008, Pattern Recognit. Lett..

[6]  S. Billings,et al.  Feature Subset Selection and Ranking for Data Dimensionality Reduction , 2007 .

[7]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[8]  Antanas Verikas,et al.  A feature selection technique for generation of classification committees and its application to categorization of laryngeal images , 2009, Pattern Recognit..

[9]  Pavel Pudil,et al.  Conditional Mutual Information Based Feature Selection for Classification Task , 2007, CIARP.

[10]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..

[11]  Feiping Nie,et al.  A unified framework for semi-supervised dimensionality reduction , 2008, Pattern Recognit..

[12]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[14]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[15]  Lei Liu,et al.  Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[18]  Pavel Pudil,et al.  Notes on the evolution of feature selection methodology , 2007, Kybernetika.

[19]  Xiaoming Xu,et al.  A hybrid genetic algorithm for feature selection wrapper based on mutual information , 2007, Pattern Recognit. Lett..

[20]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[21]  Jinn-Yi Yeh,et al.  Applying Data Mining Techniques for Cancer Classification from Gene Expression Data , 2007 .

[22]  Lei Liu,et al.  Boosting feature selection using information metric for classification , 2009, Neurocomputing.