Variable selection for the multicategory SVM via adaptive sup-norm regularization

The Support Vector Machine (SVM) is a popular classification paradigm in machine learning and has achieved great success in real applications. However, the standard SVM can not select variables automatically and therefore its solution typically utilizes all the input variables without discrimination. This makes it difficult to identify important predictor variables, which is often one of the primary goals in data analysis. In this paper, we propose two novel types of regularization in the context of the multicategory SVM (MSVM) for simultaneous classification and variable selection. The MSVM generally requires estimation of multiple discriminating functions and applies the argmax rule for prediction. For each individual variable, we propose to characterize its importance by the supnorm of its coefficient vector associated with different functions, and then minimize the MSVM hinge loss function subject to a penalty on the sum of supnorms. To further improve the supnorm penalty, we propose the adaptive regularization, which allows different weights imposed on different variables according to their relative importance. Both types of regularization automate variable selection in the process of building classifiers, and lead to sparse multi-classifiers with enhanced interpretability and improved accuracy, especially for high dimensional low sample size data. One big advantage of the supnorm penalty is its easy implementation via standard linear programming. Several simulated examples and one real gene data analysis demonstrate the outstanding performance of the adaptive supnorm penalty in various data settings.

[1]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[2]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[3]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[4]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[8]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[9]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[10]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[11]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[12]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[13]  Charles A. Micchelli,et al.  Feature space perspectives for learning the kernel , 2006, Machine Learning.

[14]  Larry Wasserman,et al.  Challenges in Statistical Machine Learning , 2006 .

[15]  Yufeng Liu,et al.  Multicategory ψ-Learning , 2006 .

[16]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[17]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[18]  Lifeng Wang,et al.  On L_1-Norm Multi-class Support Vector Machines , 2006, 2006 5th International Conference on Machine Learning and Applications (ICMLA'06).

[19]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[20]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[21]  H. Zou,et al.  The doubly regularized support vector machine , 2006 .

[22]  Yufeng Liu,et al.  Variable Selection via A Combination of the L0 and L1 Penalties , 2007 .

[23]  Yufeng Liu,et al.  Robust Truncated Hinge Loss Support Vector Machines , 2007 .

[24]  Hao Helen Zhang,et al.  Support vector machines with adaptive Lq penalty , 2007, Comput. Stat. Data Anal..

[25]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[26]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[27]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[28]  Hansheng Wang,et al.  Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso , 2007 .

[29]  Junhui Wang,et al.  Large Margin Semi-supervised Learning , 2007, J. Mach. Learn. Res..

[30]  Xiaotong Shen,et al.  On L1-Norm Multiclass Support Vector Machines , 2007 .

[31]  H. Zou,et al.  The F ∞ -norm support vector machine , 2008 .

[32]  Yufeng Liu,et al.  VARIABLE SELECTION IN QUANTILE REGRESSION , 2009 .

[33]  Shigeo Abe,et al.  Multiclass Support Vector Machines , 2010 .