Fast Parallel SVM using Data Augmentation

As one of the most popular classifiers, linear SVMs still have challenges in dealing with very large-scale problems, even though linear or sub-linear algorithms have been developed recently on single machines. Parallel computing methods have been developed for learning large-scale SVMs. However, existing methods rely on solving local sub-optimization problems. In this paper, we develop a novel parallel algorithm for learning large-scale linear SVM. Our approach is based on a data augmentation equivalent formulation, which casts the problem of learning SVM as a Bayesian inference problem, for which we can develop very efficient parallel sampling methods. We provide empirical results for this parallel sampling SVM, and provide extensions for SVR, non-linear kernels, and provide a parallel implementation of the Crammer and Singer model. This approach is very promising in its own right, and further is a very useful technique to parallelize a broader family of general maximum-margin models.

[1]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[2]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[3]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[4]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[5]  Hao Wang,et al.  PSVM : Parallelizing Support Vector Machines on Distributed Computers , 2007 .

[6]  Ning Chen,et al.  Gibbs Max-Margin Topic Models with Fast Sampling Algorithms , 2013, ICML.

[7]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[8]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Dan Roth,et al.  Selective block minimization for faster convergence of limited memory large-scale linear models , 2011, KDD.

[11]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[12]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[13]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[14]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[15]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[16]  Jun Zhu,et al.  Online Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction , 2012, 2014 IEEE International Conference on Data Mining.

[17]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[18]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[19]  Alexander J. Smola,et al.  Linear support vector machines via dual cached loops , 2012, KDD.