Outlier Path: A Homotopy Algorithm for Robust SVM

In recent applications with massive but less reliable data (e.g., labels obtained by a semi-supervised learning method or crowdsourcing), non-robustness of the support vector machine (SVM) often causes considerable performance deterioration. Although improving the robustness of SVM has been investigated for long time, robust SVM (RSVM) learning still poses two major challenges: obtaining a good (local) solution from a non-convex optimization problem and optimally controlling the robustness-efficiency trade-off. In this paper, we address these two issues simultaneously in an integrated way by introducing a novel homotopy approach to RSVM learning. Based on theoretical investigation of the geometry of RSVM solutions, we show that a path of local RSVM solutions can be computed efficiently when the influence of outliers is gradually suppressed as simulated annealing. We experimentally demonstrate that our algorithm tends to produce better local solutions than the alternative approach based on the concave-convex procedure, with the ability of stable and efficient model selection for controlling the influence of outliers.

[1]  W. Wong,et al.  On ψ-Learning , 2003 .

[2]  Yoram Singer,et al.  Leveraging the margin more carefully , 2004, ICML.

[3]  K. Ritter On Parametric Linear and Quadratic Programming Problems. , 1981 .

[4]  Tomas Gal,et al.  Postoptimal Analyses, Parametric Programming, and Related Topics: Degeneracy, Multicriteria Decision Making, Redundancy , 1994 .

[5]  Martin Jaggi,et al.  Approximating Parameterized Convex Optimization Problems , 2010, ESA.

[6]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[7]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[8]  Yufeng Liu,et al.  Multicategory ψ-Learning , 2006 .

[9]  Yufeng Liu,et al.  Multicategory ψ-Learning and Support Vector Machine: Computational Tools , 2005 .

[10]  Masashi Sugiyama,et al.  Infinitesimal Annealing for Training Semi-Supervised Support Vector Machines , 2013, ICML.

[11]  Kiri Wagstaff,et al.  Alpha seeding for support vector machines , 2000, KDD '00.

[12]  David B. Dunson,et al.  Path Following and Empirical Bayes Model Selection for Sparse Regression , 2012, 1201.3528.

[13]  Martha White,et al.  Relaxed Clipping: A Global Training Method for Robust Regression and Classification , 2010, NIPS.

[14]  Yufeng Liu,et al.  Robust Truncated Hinge Loss Support Vector Machines , 2007 .

[15]  M. Best An Algorithm for the Solution of the Parametric Quadratic Programming Problem , 1996 .

[16]  Nuno Vasconcelos,et al.  On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost , 2008, NIPS.

[17]  Juraj Hromkovic,et al.  Algorithmics for hard problems - introduction to combinatorial optimization, randomization, approximation, and heuristics , 2001 .

[18]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[19]  Eugene L. Allgower,et al.  Continuation and path following , 1993, Acta Numerica.

[20]  Koby Crammer,et al.  Robust Support Vector Machine Training via Convex Outlier Ablation , 2006, AAAI.

[21]  Joseph G. Ecker,et al.  Postoptimal analyses, parametric programming, and related topics: McGraw-Hill, Düsseldorf, 1979, xvii + 380 pages, DM 104.- , 1981 .

[22]  Yoav Freund,et al.  A more robust boosting algorithm , 2009, 0905.2138.

[23]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[24]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[25]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[26]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[27]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[28]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[29]  Juraj Hromkovic,et al.  Algorithmics for Hard Problems , 2004, Texts in Theoretical Computer Science. An EATCS Series.