Boosting feature selection using information metric for classification

Feature selection plays an important role in pattern classification. Its purpose is to remove redundant features from data set as many as possible. The presence of useless features may not only deteriorate the performance of learning algorithms, but also obscure important information (e.g., intrinsic structure) behind data. Along with new and emerging techniques, data sets in many domains are becoming larger and larger and many irrelevant features are often prevailing in these data sets. This, however, poses great challenges to traditional learning algorithms, such as low efficiency and over-fitting. Thus, it becomes apparent that an efficient technique is needed to eliminate redundant or irrelevant features from the data sets. Currently, many endeavors to cope with this problem have been attempted and various outstanding feature selection methods have been proposed. Unlike other selection methods, in this paper we propose a general scheme of boosting feature selection method using information metric. The primary characteristic of our method is that it exploits weight of data to select salient features. Furthermore, the weight of data will be dynamically changed after each candidate feature has been selected. Thus, the information criteria used in feature selector can exactly represent the relevant degree between features and the class labels. As a result, the selected feature subset has maximal relevance to the class labels. Simulation studies carried out on UCI data sets show that the classification performance achieved by our proposed method is better than those of other selection methods in most cases.

[1]  Aidong Zhang,et al.  Boost Feature Subset Selection: A New Gene Selection Algorithm for Microarray Dataset , 2006, International Conference on Computational Science.

[2]  Jack Y. Yang,et al.  Feature Selection for Ensemble Learning and Its Application , 2008 .

[3]  S. Billings,et al.  Feature Subset Selection and Ranking for Data Dimensionality Reduction , 2007 .

[4]  Pavel Pudil,et al.  Conditional Mutual Information Based Feature Selection for Classification Task , 2007, CIARP.

[5]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Paul A. Viola,et al.  Boosting Image Retrieval , 2004, International Journal of Computer Vision.

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Richard Nock,et al.  A hybrid filter/wrapper approach of feature selection using information theory , 2002, Pattern Recognit..

[9]  Miin-Shen Yang,et al.  Bootstrapping approach to feature-weight selection in fuzzy c-means algorithms with an application in color image segmentation , 2008, Pattern Recognit. Lett..

[10]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[11]  Yuchou Chang,et al.  Consensus unsupervised feature ranking from multiple views , 2008, Pattern Recognit. Lett..

[12]  Luiz Eduardo Soares de Oliveira,et al.  Feature selection for ensembles applied to handwriting recognition , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[13]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[15]  Özge Uncu,et al.  A novel feature selection approach: Combining feature wrappers and filters , 2007, Inf. Sci..

[16]  David D. Denison,et al.  Nonlinear estimation and classification , 2003 .

[17]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[18]  K. Lebart,et al.  Boosting Feature Selection , 2005, ICAPR.

[19]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[20]  Pavel Pudil,et al.  Notes on the evolution of feature selection methodology , 2007, Kybernetika.

[21]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[22]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[23]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[24]  David G. Stork,et al.  Pattern Classification , 1973 .

[25]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[26]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[27]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[28]  Xiaowei Yang,et al.  An efficient gene selection algorithm based on mutual information , 2009, Neurocomputing.

[29]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[30]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[31]  Zhi Han,et al.  Feature combination using boosting , 2005, Pattern Recognit. Lett..

[32]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Tommy W. S. Chow,et al.  Effective feature selection scheme using mutual information , 2005, Neurocomputing.

[34]  Robert P. W. Duin,et al.  Combining Feature Subsets in Feature Selection , 2005, Multiple Classifier Systems.

[35]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[36]  Alexey Tsymbal,et al.  Ensemble feature selection with the simple Bayesian classification , 2003, Inf. Fusion.

[37]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[38]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[39]  Kagan Tumer,et al.  Input decimated ensembles , 2003, Pattern Analysis & Applications.