Ramp loss linear programming support vector machine

The ramp loss is a robust but non-convex loss for classification. Compared with other non-convex losses, a local minimum of the ramp loss can be effectively found. The effectiveness of local search comes from the piecewise linearity of the ramp loss. Motivated by the fact that the l1-penalty is piecewise linear as well, the l1-penalty is applied for the ramp loss, resulting in a ramp loss linear programming support vector machine (ramp-LPSVM). The proposed ramp-LPSVM is a piecewise linear minimization problem and the related optimization techniques are applicable. Moreover, the l1-penalty can enhance the sparsity. In this paper, the corresponding misclassification error and convergence behavior are discussed. Generally, the ramp loss is a truncated hinge loss. Therefore ramp-LPSVM possesses some similar properties as hinge loss SVMs. A local minimization algorithm and a global search strategy are discussed. The good optimization capability of the proposed algorithms makes ramp-LPSVM perform well in numerical experiments: the result of ramp-LPSVM is more robust than that of hinge SVMs and is sparser than that of ramp-SVM, which consists of the || ċ || k-penalty and the ramp loss.

[1]  Marcus Porembski,et al.  Cutting Planes for Low-Rank-Like Concave Minimization Problems , 2004, Oper. Res..

[2]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[3]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[4]  Yufeng Liu,et al.  Robust Truncated Hinge Loss Support Vector Machines , 2007 .

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Shuning Wang,et al.  Exact Penalty and Optimality Condition for Nonseparable Continuous Piecewise Linear Programming , 2012, J. Optim. Theory Appl..

[7]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[8]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[9]  R. Horst,et al.  DC Programming: Overview , 1999 .

[10]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[11]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics) , 2007 .

[12]  Vojislav Kecman,et al.  Support vectors selection by linear programming , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[13]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[14]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[15]  J. Paul Brooks,et al.  Support Vector Machines with the Ramp Loss and the Hard Margin Loss , 2011, Oper. Res..

[16]  Le Thi Hoai An,et al.  The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems , 2005, Ann. Oper. Res..

[17]  Shuning Wang,et al.  The hill detouring method for minimizing hinging hyperplanes functions , 2012, Comput. Oper. Res..

[18]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[19]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[20]  R. Horst,et al.  Global Optimization: Deterministic Approaches , 1992 .

[21]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[22]  Iftekhar A. Karimi,et al.  Efficient heuristics for inventory placement in acyclic networks , 2009, Comput. Oper. Res..

[23]  Olvi L. Mangasarian,et al.  Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..

[24]  Ping Zhong,et al.  Training Robust Support Vector Regression via D. C. Program , 2010 .

[25]  S. Smale,et al.  ESTIMATING THE APPROXIMATION ERROR IN LEARNING THEORY , 2003 .

[26]  Ding-Xuan Zhou,et al.  SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming , 2005, Neural Computation.

[27]  W. Wong,et al.  On ψ-Learning , 2003 .

[28]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint: On the bias–variance problem , 2007 .

[29]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[30]  Jianguo Sun,et al.  Robust support vector regression in the primal , 2008, Neural Networks.

[31]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[32]  Yi Lin A note on margin-based loss functions in classification , 2004 .

[33]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[34]  Olvi L. Mangasarian,et al.  Absolute value equation solution via concave minimization , 2006, Optim. Lett..

[35]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[36]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[37]  Johan A. K. Suykens,et al.  Robustness of Kernel Based Regression: A Comparison of Iterative Weighting Schemes , 2009, ICANN.

[38]  Le Thi Hoai An,et al.  Numerical solution for optimization over the efficient set by d.c. optimization algorithms , 1996, Oper. Res. Lett..

[39]  B. Schölkopf,et al.  Linear programs for automatic accuracy control in regression. , 1999 .

[40]  Ingo Steinwart How to Compare Different Loss Functions and Their Risks , 2007 .