A Novel RFE-SVM-based Feature Selection Approach for Classification

The feature selection for classification is a very active research field in data mining and optimization. Its combinatorial nature requires the development of specific techniques (such as filters, wrappers, genetic algorithms, simulated annealing, and so on) or hybrid approaches combining several optimization methods. In this context, the support vector machine recursive feature elimination (SVM-RFE), is distinguished as one of the most effective methods. However, the RFE-SVM algorithm is a greedy method that only hopes to find the best possible combination for classification. To overcome this limitation, we propose an alternative approach with the aim to combine the RFE-SVM algorithm with local search operators based on operational research and artificial intelligence. To assess the contributions of our approach, we conducted a series of experiments on datasets from UCI Machine Learning Repository. The experimental results which we obtained are very promising and show the contribution of the local search on the classification process. The main conclusion is that the reuse of features previously removed during the RFE-SVM process improves the quality of the final classifier.

[1]  Piyushkumar A. Mundra,et al.  SVM-RFE with Relevancy and Redundancy Criteria for Gene Selection , 2007, PRIB.

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[6]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[8]  Yahya Slimani,et al.  Memetic Feature Selection: Benchmarking Hybridization Schemata , 2010, HAIS.

[9]  Stuart J. Russell,et al.  NP-Completeness of Searches for Smallest Possible Feature Sets , 1994 .

[10]  Yanqing Zhang,et al.  Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Jason Weston,et al.  Embedded Methods , 2006, Feature Extraction.

[12]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[13]  Jagath C. Rajapakse,et al.  SVM-RFE peak selection for cancer classification with mass spectrometry data , 2005, APBC.

[14]  Jin-Kao Hao,et al.  SVM-Based Local Search for Gene Selection and Classification of Microarray Data , 2008, BIRD.

[15]  Huan Liu,et al.  Feature Selection: An Ever Evolving Frontier in Data Mining , 2010, FSDM.