SVM-Optimization and Steepest-Descent Line Search

We consider (a subclass of) convex quadratic optimization problems and analyze decomposition algorithms that perform, at least approximately, steepest-descent exact line search. We show that these algorithms, when implemented properly, are within ǫ of optimality after O(log 1/ǫ) iterations for strictly convex cost functions, and after O(1/ǫ) iterations in the general case. Our analysis is general enough to cover the algorithms that are used in software packages like SVMTorch and (first or second order) LibSVM. To the best of our knowledge, this is the first paper coming up with a convergence rate for these algorithms without introducing unnecessarily restrictive assumptions.

[1]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[2]  Chih-Jen Lin,et al.  Manuscript Number: 2187 Training ν-Support Vector Classifiers: Theory and Algorithms , 2022 .

[3]  Nikolas List Generalized SMO-Style Decomposition Algorithms , 2007, COLT.

[4]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[5]  Nikolas List,et al.  Convergence of a Generalized Gradient Selection Approach for the Decomposition Method , 2004, ALT.

[6]  Don R. Hush,et al.  Gaps in Support Vector Optimization , 2007, COLT.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[9]  Don R. Hush,et al.  Polynomial-Time Decomposition Algorithms for Support Vector Machines , 2003, Machine Learning.

[10]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  J. Dunn Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals , 1979, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[13]  Hans Ulrich Simon,et al.  General Polynomial Time Decomposition Algorithms , 2005, J. Mach. Learn. Res..

[14]  Christian Igel,et al.  Maximum-Gain Working Set Selection for SVMs , 2006, J. Mach. Learn. Res..

[15]  Chih-Jen Lin,et al.  A Study on SMO-Type Decomposition Methods for Support Vector Machines , 2006, IEEE Transactions on Neural Networks.

[16]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..