P-packSVM: Parallel Primal grAdient desCent Kernel SVM

It is an extreme challenge to produce a nonlinear SVM classifier on very large scale data. In this paper we describe a novel P-packSVM algorithm that can solve the Support Vector Machine (SVM) optimization problem with an arbitrary kernel. This algorithm embraces the best known stochastic gradient descent method to optimize the primal objective, and has 1/¿ dependency in complexity to obtain a solution of optimization error ¿. The algorithm can be highly parallelized with a special packing strategy, and experiences sub-linear speed-up with hundreds of processors. We demonstrate that P-packSVM achieves accuracy sufficiently close to that of SVM-light, and overwhelms the state-of-the-art parallel SVM trainer PSVM in both accuracy and efficiency. As an illustration, our algorithm trains CCAT dataset with 800k samples in 13 minutes and 95% accuracy, while PSVM needs 5 hours but only has 92% accuracy. We at last demonstrate the capability of P-packSVM on 8 million training samples.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[3]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[4]  I. C. Mogotsi,et al.  Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to information retrieval , 2010, Information Retrieval.

[5]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[6]  Chen Xiu-hong Improved algorithm for support vector machines , 2009 .

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[10]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[11]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[12]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[13]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[14]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[15]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[16]  Luca Zanni,et al.  A parallel solver for large quadratic programs in training support vector machines , 2003, Parallel Computing.

[17]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[18]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[19]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[20]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[21]  Sanjay Mehrotra,et al.  On the Implementation of a Primal-Dual Interior Point Method , 1992, SIAM J. Optim..

[22]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[23]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[24]  S. Sathiya Keerthi,et al.  Parallel sequential minimal optimization for the training of support vector machines , 2006, IEEE Trans. Neural Networks.

[25]  S. Canu,et al.  Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[26]  Stephen J. Wright Primal-Dual Interior-Point Methods , 1997, Other Titles in Applied Mathematics.

[27]  Hao Wang,et al.  PSVM : Parallelizing Support Vector Machines on Distributed Computers , 2007 .

[28]  Don R. Hush,et al.  QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines , 2006, J. Mach. Learn. Res..

[29]  Sham M. Kakade,et al.  Mind the Duality Gap: Logarithmic regret algorithms for online optimization , 2008, NIPS.