Combining multiple classifiers for wrapper feature selection

Wrapper feature selection methods are widely used to select relevant features. However, wrappers only use a single classifier. The downside to this approach is that each classifier will have its own biases and will therefore select very different features. In order to overcome the biases of individual classifiers, this study introduces a new data mining method called wrapper-based decision trees (WDT), which combines different classifiers and uses decision trees to classify selected features. The WDT method combines multiple classifiers so selecting classifiers for use in the combinations is an important issue. Thus, we investigate how the number and nature of classifiers influence the results of feature selection. Regarding the number of classifiers, results showed that few classifiers selected more relevant features whereas many selected few features. Regarding the nature of classifier, decision tree classifiers selected more features and the features that generated accuracies much higher than other classifiers.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  H. Benbrahim,et al.  A comparative study of pruned decision trees and fuzzy decision trees , 2000, PeachFuzz 2000. 19th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.00TH8500).

[3]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[4]  Daniel Nikovski,et al.  Induction of compact decision trees for personalized recommendation , 2006, SAC.

[5]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[6]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[7]  Rung Ching Chen,et al.  Web page classification based on a support vector machine using a weighted vote schema , 2006, Expert Syst. Appl..

[8]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[9]  Stan Matwin,et al.  Parallelizing Feature Selection , 2006, Algorithmica.

[10]  N. Sandgren,et al.  A model averaging approach for equalizing sparse communication channels , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[11]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  A. Ghosh On optimum choice of k in nearest neighbor classification , 2006 .

[13]  Haibin Liu,et al.  Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting users' future requests , 2007, Data Knowl. Eng..

[14]  Jesús S. Aguilar-Ruiz,et al.  Incremental wrapper-based gene selection from microarray data for cancer classification , 2006, Pattern Recognit..

[15]  Enrico Blanzieri,et al.  A multiple classifier system for early melanoma diagnosis , 2003, Artif. Intell. Medicine.

[16]  Shian-Chang Huang,et al.  Evaluation of ANN and SVM classifiers as predictors to the diagnosis of students with learning disabilities , 2008, Expert Syst. Appl..

[17]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[18]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[19]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[20]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[21]  Pedro M. Domingos,et al.  Learning Bayesian network classifiers by maximizing conditional likelihood , 2004, ICML.

[22]  S. S. Iyengar,et al.  An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering , 2005, IDA.

[23]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[24]  Enrique Frias-Martinez,et al.  User Modelling for Digital Libraries: A Data Mining Approach , 2006 .

[25]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[26]  Qi Luo,et al.  Personalized Web Information Recommendation Algorithm Based on Support Vector Machine , 2007, The 2007 International Conference on Intelligent Pervasive Computing (IPC 2007).