Feature subset selection by SVM ensemble

Feature selection (FS) has proven to be useful to improve the generalization performance of classifiers. For applications with a small number of instances but a large number of input features, FS methods based on single classifier evaluation are subject to instability.We propose a new FS algorithm based on SVM ensemble learning. First, an ensemble of SVM classifiers are trained with re-sampled subsets of the training data. Then, with a predefined feature ranking criterion, a new stability criterion is defined on the ranking criterion values among the classifiers to measure the relevance of a certain feature. This measure favors the features which have stable ranking criterion values over the features whose ranking criterion values are subject to large variations. The unstable features usually do not have much relative information to the class label, and can be removed to improve the generalization performance of the classifier. To rank the features, the method only requires a small number of SVM classifiers to be trained. It is very fast to solve feature selection problems with a large number of input features. Combined with a backward elimination procedure, this method is robust to feature selection problems with very small sample sizes. In this paper, we evaluate its performance on nonlinear selection tasks.

[1]  Ponnuthurai N. Suganthan,et al.  Ensemble Classification and Regression-Recent Developments, Applications and Future Directions [Review Article] , 2016, IEEE Computational Intelligence Magazine.

[2]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[3]  Yoram Bresler,et al.  On the Optimality of the Backward Greedy Algorithm for the Subset Selection Problem , 2000, SIAM J. Matrix Anal. Appl..

[4]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[9]  Ponnuthurai N. Suganthan,et al.  Oblique Decision Tree Ensemble via Multisurface Proximal Support Vector Machine , 2015, IEEE Transactions on Cybernetics.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[12]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[13]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[14]  Ponnuthurai N. Suganthan,et al.  Random Forests with ensemble of feature spaces , 2014, Pattern Recognit..

[15]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[16]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[19]  Shigeo Abe Support Vector Machines for Pattern Classification , 2010, Advances in Pattern Recognition.

[20]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Vikas Sindhwani,et al.  Information Theoretic Feature Crediting in Multiclass Support Vector Machines , 2001, SDM.

[23]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[24]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[25]  Constantin F. Aliferis,et al.  Towards Principled Feature Selection: Relevancy, Filters and Wrappers , 2003 .