论文信息 - Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines

Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines

We consider the problem of how to design and implement communication-efficient versions of parallel kernel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel isoefficiency of a state-of-the-art implementation scaled as <inline-formula><tex-math notation="LaTeX">$W=\Omega (P^3)$</tex-math><alternatives> <inline-graphic xlink:href="you-ieq1-2608823.gif"/></alternatives></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">$W$</tex-math><alternatives><inline-graphic xlink:href="you-ieq2-2608823.gif"/> </alternatives></inline-formula> is the problem size and <inline-formula><tex-math notation="LaTeX">$P$</tex-math> <alternatives><inline-graphic xlink:href="you-ieq3-2608823.gif"/></alternatives></inline-formula> the number of processors; this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has <inline-formula><tex-math notation="LaTeX">$W=\Omega (P^2)$</tex-math><alternatives> <inline-graphic xlink:href="you-ieq4-2608823.gif"/></alternatives></inline-formula>. This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM method that improves the isoefficiency to nearly <inline-formula><tex-math notation="LaTeX">$W=\Omega (P)$</tex-math><alternatives> <inline-graphic xlink:href="you-ieq5-2608823.gif"/></alternatives></inline-formula>. We evaluate these methods on 96 to 1,536 processors, and show average speedups of <inline-formula><tex-math notation="LaTeX">$3-16\times$</tex-math> <alternatives><inline-graphic xlink:href="you-ieq6-2608823.gif"/></alternatives></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$7\times$</tex-math><alternatives><inline-graphic xlink:href="you-ieq7-2608823.gif"/> </alternatives></inline-formula> on average) over Dis-SMO, and a 95 percent weak-scaling efficiency on six real-world datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at <xref ref-type="bibr" rid="ref1">[1]</xref> .

Le Song | James Demmel | Richard W. Vuduc | Yang You | Kenneth Czechowski

[1] References , 1971 .

[2] Chih-Jen Lin,et al. Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[3] Luca Zanni,et al. Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems , 2006, J. Mach. Learn. Res..

[4] Shao-Yi Chien,et al. Support Vector Machines on GPU with Sparse Matrix Format , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[5] Zbigniew J. Czech,et al. Introduction to Parallel Computing , 2017 .

[6] E. Forgy,et al. Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[7] Igor Durdanovic,et al. Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[8] Jonathan J. Hull,et al. A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[10] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[11] Kurt Keutzer,et al. Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[12] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[13] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..

[14] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15] James Demmel,et al. Perfect Strong Scaling Using No Additional Energy , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[16] Edward Y. Chang,et al. Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[17] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[18] Jacek M. Zurada,et al. Generalized Core Vector Machines , 2006, IEEE Transactions on Neural Networks.

[19] Inderjit S. Dhillon,et al. A Divide-and-Conquer Solver for Kernel Support Vector Machines , 2013, ICML.

[20] Steve R. Gunn,et al. Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[21] Hao Wang,et al. PSVM : Parallelizing Support Vector Machines on Distributed Computers , 2007 .

[22] Le Song,et al. CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[23] Laura Schweitzer,et al. Advances In Kernel Methods Support Vector Learning , 2016 .

[24] Edward Y. Chang,et al. Incremental approximate matrix factorization for speeding up support vector machines , 2006, KDD '06.

[25] Calton Pu,et al. Introducing the Webb Spam Corpus: Using Email Spam to Identify Web Spam Automatically , 2006, CEAS.

[26] Shuaiwen Song,et al. MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[27] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[28] S. Sathiya Keerthi,et al. Parallel sequential minimal optimization for the training of support vector machines , 2006, IEEE Trans. Neural Networks.

[29] Wu Meng,et al. Application of Support Vector Machines in Financial Time Series Forecasting , 2007 .

[30] Thierry Bertin-Mahieux,et al. The Million Song Dataset , 2011, ISMIR.

[31] Inderjit S. Dhillon,et al. Memory Efficient Kernel Approximation , 2014, ICML.

[32] Eleazar Eskin,et al. The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.