A novel embedded min-max approach for feature selection in nonlinear SVM classification

In recent years, feature selection has become a challenging problem in several machine learning fields, particularly in classification problems. Support Vector Machine (SVM) is a well-known technique applied in (nonlinear) classification. Various methodologies have been proposed in the literature to select the most relevant features in SVM. Unfortunately, all of them either deal with the feature selection problem in the linear classification setting or propose ad-hoc approaches that are difficult to implement in practice. In contrast, we propose an embedded feature selection method based on a min-max optimization problem, where a trade-off between model complexity and classification accuracy is sought. By leveraging duality theory, we equivalently reformulate the min-max problem and solve it without further ado using off-the-shelf software for nonlinear optimization. The efficiency and usefulness of our approach are tested on several benchmark data sets in terms of accuracy, number of selected features and interpretability.

[1]  Chia-Yen Lee,et al.  LASSO variable selection in data envelopment analysis with small datasets , 2020 .

[2]  José Luis Rojo-Álvarez,et al.  Informative variable identifier: Expanding interpretability in feature selection , 2020, Pattern Recognit..

[3]  Yang Zhang,et al.  Key quality characteristics selection for imbalanced production data using a two-phase bi-objective feature selection method , 2019, Eur. J. Oper. Res..

[4]  Emilio Carrizosa,et al.  Variable selection in classification for multivariate functional data , 2019, Inf. Sci..

[5]  Emilio Carrizosa,et al.  Functional-bandwidth kernel for Support Vector Machine with Functional Data: An alternating optimization algorithm , 2019, Eur. J. Oper. Res..

[6]  Jun Tan,et al.  An Embedded Method for Feature Selection Using Kernel Parameter Descent Support Vector Machine , 2018, PRCV.

[7]  Martine Labbé,et al.  Mixed Integer Linear Programming for Feature Selection in Support Vector Machine , 2018, Discret. Appl. Math..

[8]  Barbara Hammer,et al.  Interpretation of linear classifiers by means of feature relevance bounds , 2018, Neurocomputing.

[9]  Osman Y. Özaltın,et al.  Feature selection for classification models via bilevel optimization , 2018, Comput. Oper. Res..

[10]  Joe Naoum-Sawaya,et al.  High dimensional data classification and feature selection using support vector machines , 2018, Eur. J. Oper. Res..

[11]  Tao Li,et al.  Recent advances in feature selection and its applications , 2017, Knowledge and Information Systems.

[12]  Majdi M. Mafarja,et al.  Hybrid Whale Optimization Algorithm with simulated annealing for feature selection , 2017, Neurocomputing.

[13]  R. Sasikala,et al.  Genetic algorithm based feature selection and MOE Fuzzy classification algorithm on Pima Indians Diabetes dataset , 2017, 2017 International Conference on Computing Networking and Informatics (ICCNI).

[14]  Vanessa Gómez-Verdejo,et al.  Regularized Multivariate Analysis Framework for Interpretable High-Dimensional Variable Selection , 2016, IEEE Computational Intelligence Magazine.

[15]  Giovanni Felici,et al.  Integer programming models for feature selection: New extensions and a randomized solution algorithm , 2016, Eur. J. Oper. Res..

[16]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[17]  Chi-Hyuck Jun,et al.  Kernel-based calibration methods combined with multivariate feature selection to improve accuracy of near-infrared spectroscopic analysis , 2015 .

[18]  Verónica Bolón-Canedo,et al.  Recent advances and emerging challenges of feature selection in the context of big data , 2015, Knowl. Based Syst..

[19]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[20]  Genevera I. Allen Automatic Feature Selection via Weighted Kernels and Regularization , 2013 .

[21]  Chou-Yuan Lee,et al.  An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection , 2012, Appl. Soft Comput..

[22]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[23]  S Aruna,et al.  A Novel SVM based CSSFFS Feature Selection Algorithm for Detecting Breast Cancer , 2011 .

[24]  Rasmus Bro,et al.  Variable selection in regression—a tutorial , 2010 .

[25]  Fernando De la Torre,et al.  Optimal feature selection for support vector machines , 2010, Pattern Recognit..

[26]  Jung Hun Oh,et al.  A kernel-based approach for detecting outliers of high-dimensional biological data , 2009, BMC Bioinformatics.

[27]  Yufeng Liu,et al.  VARIABLE SELECTION IN QUANTILE REGRESSION , 2009 .

[28]  Yeung Sam Hung,et al.  Statistical Applications in Genetics and Molecular Biology , 2010 .

[29]  Khaled Rasheed,et al.  Simultaneously Removing Noise and Selecting Relevant Features for High Dimensional Noisy Data , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[30]  Jing Hu,et al.  Classification model selection via bilevel programming , 2008, Optim. Methods Softw..

[31]  T. Warren Liao,et al.  Medical data mining by fuzzy modeling with selected features , 2008, Artif. Intell. Medicine.

[32]  Gang Kou,et al.  Feature Selection for Nonlinear Kernel Support Vector Machines , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[33]  Sotiris B. Kotsiantis,et al.  Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.

[34]  Yuan Yao,et al.  Mercer's Theorem, Feature Maps, and Smoothing , 2006, COLT.

[35]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[36]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[37]  Yi Li,et al.  Bayesian automatic relevance determination algorithms for classifying gene expression data. , 2002, Bioinformatics.

[38]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[39]  Barbara Hammer,et al.  Feature Relevance Bounds for Linear Classification , 2017, ESANN.

[40]  Huan Liu,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[41]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[42]  Mohd Shahizan Othman,et al.  Review of feature selection for solving classification problems , 2013 .

[43]  Richard Weber,et al.  Simultaneous feature selection and classification using kernel-penalized support vector machines , 2011, Inf. Sci..

[44]  Sansanee Auephanwiriyakul,et al.  Colon Tumor Microarray Classification Using Neural Network with Feature Selection and Rule-Based Classification , 2010 .

[45]  Sreangsu Acharyya,et al.  Transductive De-Noising and Dimensionality Reduction using Total Bregman Regression , 2006, SDM.

[46]  K. Kadota,et al.  Detecting outlying samples in microarray data: A critical assessment of the effect of outliers on sample classification , 2003 .

[47]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[48]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[49]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .