DC Algorithm for Extended Robust Support Vector Machine

Nonconvex variants of support vector machines (SVMs) have been developed for various purposes. For example, robust SVMs attain robustness to outliers by using a nonconvex loss function, while extended -SVM (E-SVM) extends the range of the hyperparameter by introducing a nonconvex constraint. Here, we consider an extended robust support vector machine (ER-SVM), a robust variant of E-SVM. ER-SVM combines two types of nonconvexity from robust SVMs and E-SVM. Because of the two nonconvexities, the existing algorithm we proposed needs to be divided into two parts depending on whether the hyperparameter value is in the extended range or not. The algorithm also heuristically solves the nonconvex problem in the extended range. In this letter, we propose a new, efficient algorithm for ER-SVM. The algorithm deals with two types of nonconvexity while never entailing more computations than either E-SVM or robust SVM, and it finds a critical point of ER-SVM. Furthermore, we show that ER-SVM includes the existing robust SVMs as special cases. Numerical experiments confirm the effectiveness of integrating the two nonconvexities.

[1]  Bernhard Schölkopf,et al.  Extension of the nu-SVM range for classification , 2003 .

[2]  Gert R. G. Lanckriet,et al.  A Proof of Convergence of the Concave-Convex Procedure Using Zangwill's Theory , 2012, Neural Computation.

[3]  R. Rockafellar,et al.  Conditional Value-at-Risk for General Loss Distributions , 2001 .

[4]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[7]  Tao Pham Dinh,et al.  Exact penalty in d.c. programming , 1999 .

[8]  Yufeng Liu,et al.  Robust Truncated Hinge Loss Support Vector Machines , 2007 .

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[11]  David Wozabal,et al.  Value-at-Risk optimization using the difference of convex algorithm , 2012, OR Spectr..

[12]  Takafumi Kanamori,et al.  Extended Robust Support Vector Machine Based on Financial Risk Minimization , 2014, Neural Computation.

[13]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[14]  Akiko Takeda,et al.  ν-support vector machine as conditional value-at-risk minimization , 2008, ICML '08.

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  Le Thi Hoai An,et al.  The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems , 2005, Ann. Oper. Res..

[17]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[18]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[19]  Jun-ya Gotoh,et al.  Support Vector Classification with Positive Homogeneous Risk Functionals , 2013 .

[20]  Shie Mannor,et al.  Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[21]  R. Rockafellar Convex Analysis: (pms-28) , 1970 .

[22]  Massimiliano Pontil,et al.  A Note on Support Vector Machine Degeneracy , 1999, ALT.

[23]  Chih-Jen Lin,et al.  Manuscript Number: 2187 Training ν-Support Vector Classifiers: Theory and Algorithms , 2022 .

[24]  R. Horst,et al.  Global Optimization: Deterministic Approaches , 1992 .

[25]  Pham Dinh Tao,et al.  Duality in D.C. (Difference of Convex functions) Optimization. Subgradient Methods , 1988 .

[26]  Koby Crammer,et al.  Robust Support Vector Machine Training via Convex Outlier Ablation , 2006, AAAI.

[27]  W. Wong,et al.  On ψ-Learning , 2003 .

[28]  Martha White,et al.  Relaxed Clipping: A Global Training Method for Robust Regression and Classification , 2010, NIPS.

[29]  Thomas Hofmann,et al.  Kernel Methods for Missing Variables , 2005, AISTATS.

[30]  Philippe Artzner,et al.  Coherent Measures of Risk , 1999 .

[31]  David J. Crisp,et al.  A Geometric Interpretation of ?-SVM Classifiers , 1999, NIPS 2000.