论文信息 - CA-SVM : Communication-Avoiding Support Vector Machines on Clusters

CA-SVM : Communication-Avoiding Support Vector Machines on Clusters

We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel isoefficiency of a state-of-the-art implementation scaled as W = Ω(P ), where W is the problem size and P the number of processors; this scaling is worse than even an one-dimensional block row dense matrix vector multiplication, which has W = Ω(P ). This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM (CASVM) method that improves the isoefficiency to nearly W = Ω(P ). We evaluate these methods on 96 to 1536 processors, and show average speedups of 3 − 16× (7× on average) over Dis-SMO, and a 95% weak-scaling efficiency on six realworld datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at [1]. Keywords-distributed memory algorithms; communicationavoidance; statistical machine learning

James Demmel | Yang You | Richard Vuduc | Kenneth Czechowski | Le Song

[1] Hao Wang,et al. PSVM : Parallelizing Support Vector Machines on Distributed Computers , 2007 .

[2] Kurt Keutzer,et al. Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[3] James Demmel,et al. Perfect Strong Scaling Using No Additional Energy , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[4] Luca Zanni,et al. Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems , 2006, J. Mach. Learn. Res..

[5] Edward Y. Chang,et al. Incremental approximate matrix factorization for speeding up support vector machines , 2006, KDD '06.

[6] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[7] Zbigniew J. Czech,et al. Introduction to Parallel Computing , 2017 .

[8] Jacek M. Zurada,et al. Generalized Core Vector Machines , 2006, IEEE Transactions on Neural Networks.

[9] F. Tay,et al. Application of support vector machines in financial time series forecasting , 2001 .

[10] Calton Pu,et al. Introducing the Webb Spam Corpus: Using Email Spam to Identify Web Spam Automatically , 2006, CEAS.

[11] Shao-Yi Chien,et al. Support Vector Machines on GPU with Sparse Matrix Format , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[12] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[13] Shuaiwen Song,et al. MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[14] Chih-Jen Lin,et al. Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[15] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[16] Eleazar Eskin,et al. The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[17] Jonathan J. Hull,et al. A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[18] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[19] Inderjit S. Dhillon,et al. A Divide-and-Conquer Solver for Kernel Support Vector Machines , 2013, ICML.

[20] Steve R. Gunn,et al. Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[21] Igor Durdanovic,et al. Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[22] E. Forgy,et al. Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[23] S. Sathiya Keerthi,et al. Parallel sequential minimal optimization for the training of support vector machines , 2006, IEEE Trans. Neural Networks.