Simultaneous Feature Selection and Classification for Data-Adaptive Kernel-Penalized SVM

Simultaneous feature selection and classification have been explored in the literature to extend the support vector machine (SVM) techniques by adding penalty terms to the loss function directly. However, it is the kernel function that controls the performance of the SVM, and an imbalance in the data will deteriorate the performance of an SVM. In this paper, we examine a new method of simultaneous feature selection and binary classification. Instead of incorporating the standard loss function of the SVM, a penalty is added to the data-adaptive kernel function directly to control the performance of the SVM, by firstly conformally transforming the kernel functions of the SVM, and then re-conducting an SVM classifier based on the sparse features selected. Both convex and non-convex penalties, such as least absolute shrinkage and selection (LASSO), moothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP) are explored, and the oracle property of the estimator is established accordingly. An iterative optimization procedure is applied as there is no analytic form of the estimated coefficients available. Numerical comparisons show that the proposed method outperforms the competitors considered when data are imbalanced, and it performs similarly to the competitors when data are balanced. The method can be easily applied in medical images from different platforms.

[1]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[2]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[3]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[4]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[5]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[6]  Fabio Roli,et al.  Support Vector Machines with Embedded Reject Option , 2002, SVM.

[7]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[8]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[9]  Si Wu,et al.  Conformal Transformation of Kernel Functions: A Data-Dependent Way to Improve Support Vector Machine Classifiers , 2002, Neural Processing Letters.

[10]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[11]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[12]  Peter Williams,et al.  Scaling the Kernel Function to Improve Performance of the Support Vector Machine , 2005, ISNN.

[13]  H. Zou,et al.  The doubly regularized support vector machine , 2006 .

[14]  Christophe Croux,et al.  An Information Criterion for Variable Selection in Support Vector Machines , 2007 .

[15]  Hui Zou An Improved 1-norm SVM for Simultaneous Classification and Variable Selection , 2007, AISTATS.

[16]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[17]  H. Zou,et al.  The F ∞ -norm support vector machine , 2008 .

[18]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification and gene selection , 2008, Bioinform..

[19]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[20]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[21]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[22]  Richard Weber,et al.  Simultaneous feature selection and classification using kernel-penalized support vector machines , 2011, Inf. Sci..

[23]  Changyi Park,et al.  Oracle properties of SCAD-penalized support vector machine , 2012 .

[24]  Alfredo Petrosino,et al.  Adjusted F-measure and kernel scaling for imbalanced data learning , 2014, Inf. Sci..

[25]  Lang Zhang,et al.  A credit risk assessment model based on SVM for small and medium enterprises in supply chain finance , 2015, Financial Innovation.

[26]  Mahmoud Pesaran,et al.  A comprehensive overview on signal processing and artificial intelligence techniques applications in classification of power quality disturbances , 2015 .

[27]  James A. Rodger,et al.  Discovery of medical Big Data analytics: Improving the prediction of traumatic brain injury survival rates by data mining Patient Informatics Processing Software Hybrid Hadoop Hive , 2015 .

[28]  Runze Li,et al.  Variable selection for support vector machines in moderately high dimensions , 2016, Journal of the Royal Statistical Society. Series B, Statistical methodology.