Convergence of decomposition methods for support vector machines

Abstract Decomposition methods play an important role in solving large-scale quadratic programming (QP) problems arising from support vector machines (SVMs). In this paper, we study convergence of general decomposition methods for SVMs. We prove that, under a mild condition on the working set selection, a decomposition algorithm stops within a finite number of iterations after reaching a solution of the QP problem satisfying a relaxed Karush–Kuhn–Tucker (KKT) condition which has been often used so far. Further, it is shown that the working set selection used in the implementation of SVMlight satisfies the condition given in this paper, so SVMlight has the finite termination property without the stronger assumption than the positive-semi-definiteness on the Hessian matrix of the objection function.

[1]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[2]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[3]  Fuad E. Alsaadi,et al.  A switching delayed PSO optimized extreme learning machine for short-term load forecasting , 2017, Neurocomputing.

[4]  William W. Hager,et al.  An Affine-Scaling Interior-Point Method for Continuous Knapsack Constraints with Application to Support Vector Machines , 2011, SIAM J. Optim..

[5]  Xiaobing Kong,et al.  Wind speed prediction using reduced support vector machines with feature selection , 2015, Neurocomputing.

[6]  Christian Igel,et al.  Maximum-Gain Working Set Selection for SVMs , 2006, J. Mach. Learn. Res..

[7]  Chih-Jen Lin,et al.  A Study on SMO-Type Decomposition Methods for Support Vector Machines , 2006, IEEE Transactions on Neural Networks.

[8]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[9]  José R. Dorronsoro,et al.  Simple Proof of Convergence of the SMO Algorithm for Different SVM Variants , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Laura Palagi,et al.  A Class of Parallel Decomposition Algorithms for SVMs Training , 2015 .

[11]  Chih-Jen Lin,et al.  A formal analysis of stopping criteria of decomposition methods for support vector machines , 2002, IEEE Trans. Neural Networks.

[12]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Don R. Hush,et al.  QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines , 2006, J. Mach. Learn. Res..

[15]  Yong Shi,et al.  Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines* , 2015 .

[16]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[17]  De-Shuang Huang,et al.  Multi-task ranking SVM for image cosegmentation , 2017, Neurocomputing.

[18]  Luca Zanni,et al.  On the working set selection in gradient projection-based decomposition techniques for support vector machines , 2005, Optim. Methods Softw..

[19]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[20]  S. Sathiya Keerthi,et al.  Parallel sequential minimal optimization for the training of support vector machines , 2006, IEEE Trans. Neural Networks.

[21]  Yide Wang,et al.  Time-Delay Estimation Using Ground-Penetrating Radar With a Support Vector Regression-Based Linear Prediction Method , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Juan Cota-Ruiz,et al.  An algorithm for training a large scale support vector machine for regression based on linear programming and decomposition methods , 2013, Pattern Recognit. Lett..

[23]  Jianxin Wu,et al.  Linear Regression-Based Efficient SVM Learning for Large-Scale Classification , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Norikazu Takahashi,et al.  Global Convergence of Decomposition Learning Methods for Support Vector Machines , 2006, IEEE Transactions on Neural Networks.

[25]  Marios M. Polycarpou,et al.  Embedded Hardware-Efficient Real-Time Classification With Cascade Support Vector Machines , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Chih-Jen Lin,et al.  Asymptotic convergence of an SMO algorithm without any assumptions , 2002, IEEE Trans. Neural Networks.

[27]  Christian Igel,et al.  Second-Order SMO Improves SVM Online and Active Learning , 2008, Neural Computation.

[28]  Chih-Jen Lin,et al.  A Simple Decomposition Method for Support Vector Machines , 2002, Machine Learning.

[29]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[30]  Luca Zanni,et al.  Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems , 2006, J. Mach. Learn. Res..

[31]  Norikazu Takahashi,et al.  Rigorous proof of termination of SMO algorithm for support vector Machines , 2005, IEEE Transactions on Neural Networks.

[32]  Luca Zanni,et al.  A parallel solver for large quadratic programs in training support vector machines , 2003, Parallel Comput..

[33]  Chih-Jen Lin,et al.  On the convergence of the decomposition method for support vector machines , 2001, IEEE Trans. Neural Networks.

[34]  Pavel Laskov,et al.  Feasible Direction Decomposition Algorithms for Training Support Vector Machines , 2002, Machine Learning.

[35]  Hans Ulrich Simon,et al.  General Polynomial Time Decomposition Algorithms , 2005, J. Mach. Learn. Res..

[36]  Chih-Jen Lin,et al.  The analysis of decomposition methods for support vector machines , 2000, IEEE Trans. Neural Networks Learn. Syst..

[37]  S. Sathiya Keerthi,et al.  Convergence of a Generalized SMO Algorithm for SVM Classifier Design , 2002, Machine Learning.

[38]  Xiaoqin Zhang,et al.  An Efficient Semi-Supervised Classifier Based on Block-Polynomial Mapping , 2015, IEEE Signal Processing Letters.

[39]  Don R. Hush,et al.  Gaps in Support Vector Optimization , 2007, COLT.

[40]  Paul Tseng,et al.  A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training , 2010, Comput. Optim. Appl..

[41]  Laura Palagi,et al.  A Convergent Hybrid Decomposition Algorithm Model for SVM Training , 2009, IEEE Transactions on Neural Networks.

[42]  Luca Zanni,et al.  Gradient projection methods for quadratic programs and applications in training support vector machines , 2005, Optim. Methods Softw..

[43]  Yong Shi,et al.  A First-Order Decomposition Algorithm for Training Bound-Constrained Support Vector Machines , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[44]  Laura Palagi,et al.  Parallel decomposition methods for linearly constrained problems subject to simple bound with application to the SVMs training , 2018, Computational Optimization and Applications.

[45]  Yitian Xu,et al.  Maximum Margin of Twin Spheres Support Vector Machine for Imbalanced Data Classification , 2017, IEEE Transactions on Cybernetics.

[46]  Xiaowei Yang,et al.  A robust least squares support vector machine for regression and classification with noise , 2014, Neurocomputing.

[47]  Hans Ulrich Simon,et al.  A General Convergence Theorem for the Decomposition Method , 2004, COLT.

[48]  Georgios Evangelidis,et al.  Exploring the effect of data reduction on Neural Network and Support Vector Machine classification , 2017, Neurocomputing.

[49]  Hong Zhang,et al.  Facial expression recognition via learning deep sparse autoencoders , 2018, Neurocomputing.

[50]  Deyu Meng,et al.  A kernel-based sparsity preserving method for semi-supervised classification , 2014, Neurocomputing.