Support vector machines with adaptive Lq penalty

The standard support vector machine (SVM) minimizes the hinge loss function subject to the L"2 penalty or the roughness penalty. Recently, the L"1 SVM was suggested for variable selection by producing sparse solutions [Bradley, P., Mangasarian, O., 1998. Feature selection via concave minimization and support vector machines. In: Shavlik, J. (Ed.), ICML'98. Morgan Kaufmann, Los Altos, CA; Zhu, J., Hastie, T., Rosset, S., Tibshirani, R., 2003. 1-norm support vector machines. Neural Inform. Process. Systems 16]. These learning methods are non-adaptive since their penalty forms are pre-determined before looking at data, and they often perform well only in a certain type of situation. For instance, the L"2 SVM generally works well except when there are too many noise inputs, while the L"1 SVM is more preferred in the presence of many noise variables. In this article we propose and explore an adaptive learning procedure called the L"q SVM, where the best q>0 is automatically chosen by data. Both two- and multi-class classification problems are considered. We show that the new adaptive approach combines the benefit of a class of non-adaptive procedures and gives the best performance of this class across a variety of situations. Moreover, we observe that the proposed L"q penalty is more robust to noise variables than the L"1 and L"2 penalties. An iterative algorithm is suggested to solve the L"q SVM efficiently. Simulations and real data applications support the effectiveness of the proposed procedure.

[1]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[2]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[3]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[4]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[5]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[6]  Chong Gu,et al.  Soft Classification, a. k. a. Risk Estimation, via Penalized Log Likelihood and Smoothing Spline Ana , 1993 .

[7]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[8]  Kazushi Ikeda,et al.  Geometrical Properties of Nu Support Vector Machines with Different Norms , 2005, Neural Computation.

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[11]  Tong Tang,et al.  Proceedings of the European Symposium on Artificial Neural Networks , 2006 .

[12]  J. Friedman,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Response , 1993 .

[13]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[14]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[15]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[18]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[19]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[20]  Jianqing Fan,et al.  Regularization of Wavelet Approximations , 2001 .

[22]  Russell Greiner,et al.  Computational learning theory and natural learning systems , 1997 .

[23]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[24]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[25]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[26]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[27]  Xiaotong Shen,et al.  MULTI-CATEGORY SUPPORT VECTOR MACHINES, FEATURE SELECTION AND SOLUTION PATH , 2006 .