An Ensemble-Based Feature Selection Algorithm Using Combination of Support Vector Machine and Filter Methods for Solving Classification Problems

A new feature selection algorithm for solving classification problems is proposed. The algorithm exploits the ensemble-based methodology and iteratively combines classifiers in order to assign weights to features characterizing their importance in classification. The algorithm is based on the joint use of a filter method and the well known support vector machine. Moreover, the filter method uses only support vectors instead of the total training set to calculate the feature weights. Numerical experiments with publicly available data sets show that the proposed algorithm improves the classification accuracy.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[3]  Sarel Steel,et al.  Variable Selection for Support Vector Machines , 2009, Commun. Stat. Simul. Comput..

[4]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[5]  M. Fay,et al.  Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. , 2010, Statistics surveys.

[6]  Dmitrij Frishman,et al.  Pitfalls of supervised feature selection , 2009, Bioinform..

[7]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  Mário A. T. Figueiredo,et al.  Boosting Algorithms: A Review of Methods, Theory, and Applications , 2012 .

[10]  Wilker Altidor,et al.  Ensemble Feature Ranking Methods for Data Intensive Computing Applications , 2011 .

[11]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[12]  Jagath C. Rajapakse,et al.  Gene and sample selection for cancer classification with support vectors based t-statistic , 2010, Neurocomputing.

[13]  M. Lai,et al.  SVM-T-RFE: a novel gene selection algorithm for identifying metastasis-related genes in colorectal cancer using gene expression profiles. , 2012, Biochemical and biophysical research communications.

[14]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[15]  Wlodzislaw Duch,et al.  Feature Selection for High-Dimensional Data - A Pearson Redundancy Based Filter , 2008, Computer Recognition Systems 2.

[16]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[17]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[18]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[19]  Chengqi Zhang,et al.  Combining Support Vector Machines and the t-statistic for Gene Selection in DNA Microarray Data Analysis , 2010, PAKDD.

[20]  In-Hee Lee,et al.  A filter-based feature selection approach for identifying potential biomarkers for lung cancer , 2011, Journal of Clinical Bioinformatics.

[21]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[22]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.