Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems

Parallel software for solving the quadratic program arising in training support vector machines for classification problems is introduced. The software implements an iterative decomposition technique and exploits both the storage and the computing resources available on multiprocessor systems, by distributing the heaviest computational tasks of each decomposition iteration. Based on a wide range of recent theoretical advances, relevant decomposition issues, such as the quadratic subproblem solution, the gradient updating, the working set selection, are systematically described and their careful combination to get an effective parallel tool is discussed. A comparison with state-of-the-art packages on benchmark problems demonstrates the good accuracy and the remarkable time saving achieved by the proposed software. Furthermore, challenging experiments on real-world data sets with millions training samples highlight how the software makes large scale standard nonlinear support vector machines effectively tractable on common multiprocessor systems. This feature is not shown by any of the available codes.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[3]  Luca Zanni,et al.  On the working set selection in gradient projection-based decomposition techniques for support vector machines , 2005, Optim. Methods Softw..

[4]  Chih-Jen Lin,et al.  Asymptotic convergence of an SMO algorithm without any assumptions , 2002, IEEE Trans. Neural Networks.

[5]  Luca Zanni,et al.  An Improved Gradient Projection-based Decomposition Technique for Support Vector Machines , 2006, Comput. Manag. Sci..

[6]  L. Zanni,et al.  A Modified Projection Algorithm for Large Strictly-Convex Quadratic Programs , 2000 .

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  Roger Fletcher,et al.  Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming , 2005, Numerische Mathematik.

[9]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[10]  Chih-Jen Lin,et al.  A Simple Decomposition Method for Support Vector Machines , 2002, Machine Learning.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  I. Song,et al.  Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .

[13]  L. Zanni,et al.  Variable projection methods for large convex quadratic programs , 2000 .

[14]  Roger Fletcher,et al.  New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds , 2006, Math. Program..

[15]  Jian-xiong Dong,et al.  A Fast Parallel Optimization for Training Support Vector Machine , 2003, MLDM.

[16]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[17]  Luca Zanni,et al.  Gradient projection methods for quadratic programs and applications in training support vector machines , 2005, Optim. Methods Softw..

[18]  Chih-Jen Lin Linear Convergence of a Decomposition Method for Support Vector Machines , 2001 .

[19]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  S. Sathiya Keerthi,et al.  Convergence of a Generalized SMO Algorithm for SVM Classifier Design , 2002, Machine Learning.

[21]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[22]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[23]  Don R. Hush,et al.  Polynomial-Time Decomposition Algorithms for Support Vector Machines , 2003, Machine Learning.

[24]  Chih-Jen Lin,et al.  A Study on SMO-Type Decomposition Methods for Support Vector Machines , 2006, IEEE Transactions on Neural Networks.

[25]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[26]  Chih-Jen Lin,et al.  On the convergence of the decomposition method for support vector machines , 2001, IEEE Trans. Neural Networks.

[27]  José Mario Martínez,et al.  Nonmonotone Spectral Projected Gradient Methods on Convex Sets , 1999, SIAM J. Optim..

[28]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[29]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[30]  Luca Zanni,et al.  A parallel solver for large quadratic programs in training support vector machines , 2003, Parallel Comput..

[31]  Panos M. Pardalos,et al.  An algorithm for a singly constrained class of quadratic programs subject to upper and lower bounds , 1990, Math. Program..

[32]  Laura Palagi,et al.  On the convergence of a modified version of SVM light algorithm , 2005, Optim. Methods Softw..

[33]  J. Borwein,et al.  Two-Point Step Size Gradient Methods , 1988 .

[34]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .