Ignorance is Bliss : Non-Convex Online Support Vector Machines

In this paper, we propose a non-convex online Support Vector Machine (SVM) algorithm (LASVM-NC) based on the Ramp Loss, which has strong ability of suppressing the influence of outliers. Then, again in the online learning setting, we propose an outlier filtering mechanism (LASVM-I) based on approximating non-convex behavior in convex optimization. These two algorithms are built upon another novel SVM algorithm (LASVM-G) that is capable of generating accurate intermediate models in its iterative steps by leveraging the primal/dual gap. We present experimental results that demonstrate the merit of our frameworks in achieving significant robustness to outliers in noisy data classification where mislabeled training instances are in abundance. Experimental results show that the proposed approaches yield more scalable online SVM algorithm with sparser models and less computational running time both in the training and recognition phases without sacrificing generalization performance. We also point out the relation between the non-convex behavior in SVMs and active learning.

[1]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[2]  Thomas Hofmann,et al.  Kernel Methods for Missing Variables , 2005, AISTATS.

[3]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[4]  Yufeng Liu,et al.  Multicategory ψ-Learning and Support Vector Machine: Computational Tools , 2005 .

[5]  Yoram Singer,et al.  Leveraging the margin more carefully , 2004, ICML.

[6]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[7]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[8]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[9]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[10]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[11]  Fernando Pérez-Cruz,et al.  Empirical risk minimization for support vector classifiers , 2003, IEEE Trans. Neural Networks.

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[14]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[15]  Koby Crammer,et al.  Robust Support Vector Machine Training via Convex Outlier Ablation , 2006, AAAI.

[16]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[17]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[18]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[19]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[20]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[21]  Jie Li,et al.  Training robust support vector machine with smooth Ramp loss in the primal space , 2008, Neurocomputing.