QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines

We describe polynomial--time algorithms that produce approximate solutions with guaranteed accuracy for a class of QP problems that are used in the design of support vector machine classifiers. These algorithms employ a two--stage process where the first stage produces an approximate solution to a dual QP problem and the second stage maps this approximate dual solution to an approximate primal solution. For the second stage we describe an O(n log n) algorithm that maps an approximate dual solution with accuracy (2(2Km)1/2+8(λ)1/2)-2 λ ep2 to an approximate primal solution with accuracy ep where n is the number of data samples, Kn is the maximum kernel value over the data and λ > 0 is the SVM regularization parameter. For the first stage we present new results for decomposition algorithms and describe new decomposition algorithms with guaranteed accuracy and run time. In particular, for τ-rate certifying decomposition algorithms we establish the optimality of τ = 1/(n-1). In addition we extend the recent τ = 1/(n-1) algorithm of Simon (2004) to form two new composite algorithms that also achieve the τ = 1/(n-1) iteration bound of List and Simon (2005), but yield faster run times in practice. We also exploit the τ-rate certifying property of these algorithms to produce new stopping rules that are computationally efficient and that guarantee a specified accuracy for the approximate dual solution. Furthermore, for the dual QP problem corresponding to the standard classification problem we describe operational conditions for which the Simon and composite algorithms possess an upper bound of O(n) on the number of iterations. For this same problem we also describe general conditions for which a matching lower bound exists for any decomposition algorithm that uses working sets of size 2. For the Simon and composite algorithms we also establish an O(n2) bound on the overall run time for the first stage. Combining the first and second stages gives an overall run time of O(n2(ck + 1)) where ck is an upper bound on the computation to perform a kernel evaluation. Pseudocode is presented for a complete algorithm that inputs an accuracy ep and produces an approximate solution that satisfies this accuracy in low order polynomial time. Experiments are included to illustrate the new stopping rules and to compare the Simon and composite decomposition algorithms.

[1]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[6]  David R. Musicant,et al.  Successive overrelaxation for support vector machines , 1999, IEEE Trans. Neural Networks.

[7]  S. Sathiya Keerthi,et al.  A fast iterative nearest point algorithm for support vector machine classifier design , 2000, IEEE Trans. Neural Networks Learn. Syst..

[8]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[9]  Chih-Jen Lin,et al.  The analysis of decomposition methods for support vector machines , 2000, IEEE Trans. Neural Networks Learn. Syst..

[10]  Chih-Jen Lin,et al.  On the convergence of the decomposition method for support vector machines , 2001, IEEE Trans. Neural Networks.

[11]  Chih-Jen Lin Linear Convergence of a Decomposition Method for Support Vector Machines , 2001 .

[12]  David R. Musicant,et al.  Lagrangian Support Vector Machines , 2001, J. Mach. Learn. Res..

[13]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[14]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[15]  Hsuan-Tien Lin,et al.  A Note on the Decomposition Methods for Support Vector Regression , 2001, Neural Computation.

[16]  Chih-Jen Lin,et al.  A formal analysis of stopping criteria of decomposition methods for support vector machines , 2002, IEEE Trans. Neural Networks.

[17]  Chih-Jen Lin,et al.  Asymptotic convergence of an SMO algorithm without any assumptions , 2002, IEEE Trans. Neural Networks.

[18]  Don R. Hush,et al.  Polynomial-Time Decomposition Algorithms for Support Vector Machines , 2003, Machine Learning.

[19]  Hans Ulrich Simon,et al.  A General Convergence Theorem for the Decomposition Method , 2004, COLT.

[20]  Hans Ulrich Simon,et al.  On the complexity of working set selection , 2007, Theor. Comput. Sci..

[21]  Pavel Laskov,et al.  Feasible Direction Decomposition Algorithms for Training Support Vector Machines , 2002, Machine Learning.

[22]  Chih-Jen Lin,et al.  A Simple Decomposition Method for Support Vector Machines , 2002, Machine Learning.

[23]  S. Sathiya Keerthi,et al.  Convergence of a Generalized SMO Algorithm for SVM Classifier Design , 2002, Machine Learning.

[24]  Ingo Steinwart,et al.  Fast Rates for Support Vector Machines , 2005, COLT.

[25]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[26]  Chih-Jen Lin,et al.  Training Support Vector Machines via SMO-Type Decomposition Methods , 2005, ALT.

[27]  Chih-Jen Lin,et al.  Training Support Vector Machines via SMO-Type Decomposition Methods , 2005, Discovery Science.

[28]  Ingo Steinwart,et al.  LEARNING RATES FOR DENSITY LEVEL DETECTION , 2005 .

[29]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[30]  Chih-Jen Lin,et al.  A Study on SMO-Type Decomposition Methods for Support Vector Machines , 2006, IEEE Transactions on Neural Networks.

[31]  Yang Dai,et al.  Provably Fast Training Algorithms for Support Vector Machines , 2007, Theory of Computing Systems.

[32]  Hans Ulrich Simon,et al.  General Polynomial Time Decomposition Algorithms , 2005, J. Mach. Learn. Res..

[33]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[34]  Hong Qiao,et al.  A simple decomposition algorithm for support vector machines with polynomial-time convergence , 2007, Pattern Recognit..

[35]  Don R. Hush,et al.  Stability of Unstable Learning Algorithms , 2007, Machine Learning.

[36]  Ingo Steinwart,et al.  Approximate Duality , 2007 .