Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines

We consider the problem of how to design and implement communication-efficient versions of parallel kernel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel isoefficiency of a state-of-the-art implementation scaled as <inline-formula><tex-math notation="LaTeX">$W=\Omega (P^3)$</tex-math><alternatives> <inline-graphic xlink:href="you-ieq1-2608823.gif"/></alternatives></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">$W$</tex-math><alternatives><inline-graphic xlink:href="you-ieq2-2608823.gif"/> </alternatives></inline-formula> is the problem size and <inline-formula><tex-math notation="LaTeX">$P$</tex-math> <alternatives><inline-graphic xlink:href="you-ieq3-2608823.gif"/></alternatives></inline-formula> the number of processors; this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has <inline-formula><tex-math notation="LaTeX">$W=\Omega (P^2)$</tex-math><alternatives> <inline-graphic xlink:href="you-ieq4-2608823.gif"/></alternatives></inline-formula>. This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM method that improves the isoefficiency to nearly <inline-formula><tex-math notation="LaTeX">$W=\Omega (P)$</tex-math><alternatives> <inline-graphic xlink:href="you-ieq5-2608823.gif"/></alternatives></inline-formula>. We evaluate these methods on 96 to 1,536 processors, and show average speedups of <inline-formula><tex-math notation="LaTeX">$3-16\times$</tex-math> <alternatives><inline-graphic xlink:href="you-ieq6-2608823.gif"/></alternatives></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$7\times$</tex-math><alternatives><inline-graphic xlink:href="you-ieq7-2608823.gif"/> </alternatives></inline-formula> on average) over Dis-SMO, and a 95 percent weak-scaling efficiency on six real-world datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at <xref ref-type="bibr" rid="ref1">[1]</xref> .

[1]  References , 1971 .

[2]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[3]  Luca Zanni,et al.  Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems , 2006, J. Mach. Learn. Res..

[4]  Shao-Yi Chien,et al.  Support Vector Machines on GPU with Sparse Matrix Format , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[5]  Zbigniew J. Czech,et al.  Introduction to Parallel Computing , 2017 .

[6]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[7]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[8]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[11]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[12]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[13]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  James Demmel,et al.  Perfect Strong Scaling Using No Additional Energy , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[16]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Jacek M. Zurada,et al.  Generalized Core Vector Machines , 2006, IEEE Transactions on Neural Networks.

[19]  Inderjit S. Dhillon,et al.  A Divide-and-Conquer Solver for Kernel Support Vector Machines , 2013, ICML.

[20]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[21]  Hao Wang,et al.  PSVM : Parallelizing Support Vector Machines on Distributed Computers , 2007 .

[22]  Le Song,et al.  CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[23]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .

[24]  Edward Y. Chang,et al.  Incremental approximate matrix factorization for speeding up support vector machines , 2006, KDD '06.

[25]  Calton Pu,et al.  Introducing the Webb Spam Corpus: Using Email Spam to Identify Web Spam Automatically , 2006, CEAS.

[26]  Shuaiwen Song,et al.  MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[27]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[28]  S. Sathiya Keerthi,et al.  Parallel sequential minimal optimization for the training of support vector machines , 2006, IEEE Trans. Neural Networks.

[29]  Wu Meng,et al.  Application of Support Vector Machines in Financial Time Series Forecasting , 2007 .

[30]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[31]  Inderjit S. Dhillon,et al.  Memory Efficient Kernel Approximation , 2014, ICML.

[32]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.