CA-SVM : Communication-Avoiding Support Vector Machines on Clusters

We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel isoefficiency of a state-of-the-art implementation scaled as W = Ω(P ), where W is the problem size and P the number of processors; this scaling is worse than even an one-dimensional block row dense matrix vector multiplication, which has W = Ω(P ). This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM (CASVM) method that improves the isoefficiency to nearly W = Ω(P ). We evaluate these methods on 96 to 1536 processors, and show average speedups of 3 − 16× (7× on average) over Dis-SMO, and a 95% weak-scaling efficiency on six realworld datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at [1]. Keywords-distributed memory algorithms; communicationavoidance; statistical machine learning

[1]  Hao Wang,et al.  PSVM : Parallelizing Support Vector Machines on Distributed Computers , 2007 .

[2]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[3]  James Demmel,et al.  Perfect Strong Scaling Using No Additional Energy , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[4]  Luca Zanni,et al.  Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems , 2006, J. Mach. Learn. Res..

[5]  Edward Y. Chang,et al.  Incremental approximate matrix factorization for speeding up support vector machines , 2006, KDD '06.

[6]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[7]  Zbigniew J. Czech,et al.  Introduction to Parallel Computing , 2017 .

[8]  Jacek M. Zurada,et al.  Generalized Core Vector Machines , 2006, IEEE Transactions on Neural Networks.

[9]  F. Tay,et al.  Application of support vector machines in financial time series forecasting , 2001 .

[10]  Calton Pu,et al.  Introducing the Webb Spam Corpus: Using Email Spam to Identify Web Spam Automatically , 2006, CEAS.

[11]  Shao-Yi Chien,et al.  Support Vector Machines on GPU with Sparse Matrix Format , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[12]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[13]  Shuaiwen Song,et al.  MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[14]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[15]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[16]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[17]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Inderjit S. Dhillon,et al.  A Divide-and-Conquer Solver for Kernel Support Vector Machines , 2013, ICML.

[20]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[21]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[22]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[23]  S. Sathiya Keerthi,et al.  Parallel sequential minimal optimization for the training of support vector machines , 2006, IEEE Trans. Neural Networks.