On the convergence of the decomposition method for support vector machines

The decomposition method is currently one of the major methods for solving support vector machines (SVM). Its convergence properties have not been fully understood. The general asymptotic convergence was first proposed by Chang et al. However, their working set selection does not coincide with existing implementation. A later breakthrough by Keerthi and Gilbert (2000, 2002) proved the convergence finite termination for practical cases while the size of the working set is restricted to two. In this paper, we prove the asymptotic convergence of the algorithm used by the software SVM(light) and other later implementation. The size of the working set can be any even number. Extensions to other SVM formulations are also discussed.

[1]  G. Zoutendijk,et al.  Methods of Feasible Directions , 1962, The Mathematical Gazette.

[2]  P. Wolfe On the convergence of gradient methods under constraint , 1972 .

[3]  M. J. D. Powell,et al.  On search directions for minimization algorithms , 1973, Math. Program..

[4]  Editors , 1986, Brain Research Bulletin.

[5]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[6]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Nello Cristianini,et al.  The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines , 1998, ICML.

[9]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[10]  Alexander J. Smola,et al.  Support Vector Machine Reference Manual , 1998 .

[11]  Pavel Laskov,et al.  An Improved Decomposition Algorithm for Regression Support Vector Machines , 1999, NIPS.

[12]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[13]  David R. Musicant,et al.  Successive overrelaxation for support vector machines , 1999, IEEE Trans. Neural Networks.

[14]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[17]  Chih-Jen Lin,et al.  The analysis of decomposition methods for support vector machines , 2000, IEEE Trans. Neural Networks Learn. Syst..

[18]  S. Sathiya Keerthi,et al.  Improvements to the SMO algorithm for SVM regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[19]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[20]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[21]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.