Efficient Multi-Class Probabilistic SVMs on GPUs

Recently, many researchers have been working on improving other traditional machine learning algorithms (besides deep learning) using high-performance hardware such as Graphics Processing Units (GPUs). The recent success of machine learning is not only due to more effective algorithms, but also more efficient systems and implementations. In this paper, we propose a novel and efficient solution to multi-class SVMs with probabilistic output (MP-SVMs) accelerated by GPUs. MP-SVMs are an important technique for many pattern recognition applications. However, MP-SVMs are very time-consuming to use, because using an MP-SVM classifier requires training many binary SVMs and performing probability estimation by combining results of all the binary SVMs. GPUs have much higher computation capability than CPUs and are potentially excellent hardware to accelerate MP-SVMs. Still, two key challenges for efficient GPU accelerations for MP-SVM are: (i) many kernel values are repeatedly computed as a binary SVM classifier is trained iteratively, resulting in repeated accesses to the high latency GPU memory; (ii) performing training or estimating probability in a highly parallel way requires a much larger memory footprint than the GPU memory. To overcome the challenges, we propose a solution called GMP-SVM which exploits two-level (i.e., binary SVM level and MP-SVM level) optimization for training MP-SVMs and high parallelism for estimating probability. GMP-SVM reduces high latency memory accesses and memory consumption through batch processing, kernel value reusing and sharing, and support vector sharing. Experimental results show that GMP-SVM outperforms the GPU baseline by two to five times, and LibSVM with OpenMP by an order of magnitude. Also, GMP-SVM produces the same SVM classifier as LibSVM.

[1]  Mohamed Cheriet,et al.  Estimating accurate multi-class probabilities with support vector machines , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[4]  Kotagiri Ramamohanarao,et al.  MASCOT: Fast and Highly Scalable SVM Cross-Validation Using GPUs and SSDs , 2014, 2014 IEEE International Conference on Data Mining.

[5]  Ioannis Kompatsiaris,et al.  GPU acceleration for support vector machines , 2011, WIAMIS 2011.

[6]  Aziz Nasridinov,et al.  Decision tree construction on GPU: ubiquitous parallel computing approach , 2013, Computing.

[7]  Gokhan Memik,et al.  Machine Learning-Based Temperature Prediction for Runtime Thermal Management Across System Components , 2018, IEEE Transactions on Parallel and Distributed Systems.

[8]  Vivek Sarkar,et al.  HadoopCL2: Motivating the Design of a Distributed, Heterogeneous Programming System With Machine-Learning Applications , 2016, IEEE Transactions on Parallel and Distributed Systems.

[9]  Shai Shalev-Shwartz,et al.  Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.

[10]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, CVPR.

[11]  John Platt,et al.  Fast training of svms using sequential minimal optimization , 1998 .

[12]  Ferhat Özgür Çatak,et al.  CloudSVM: Training an SVM Classifier in Cloud Computing Systems , 2012, ICPCA/SWS.

[13]  John R. Williams,et al.  Parallel multiclass classification using SVMs on GPUs , 2010, GPGPU-3.

[14]  Kotagiri Ramamohanarao,et al.  Scalable and fast SVM regression using modern hardware , 2017, World Wide Web.

[15]  Jan Vaněk,et al.  A GPU-Architecture Optimized Hierarchical Decomposition Algorithm for Support Vector Machine Training , 2017, IEEE Transactions on Parallel and Distributed Systems.

[16]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[17]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Bipin C. Desai,et al.  Medical image retrieval with probabilistic multi-class support vector machine classifiers and adaptive similarity fusion , 2008, Comput. Medical Imaging Graph..

[20]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[21]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[22]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[23]  S. Sathiya Keerthi,et al.  Parallel sequential minimal optimization for the training of support vector machines , 2006, IEEE Trans. Neural Networks.

[24]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[25]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[26]  Stephen Tyree,et al.  Parallel Support Vector Machines in Practice , 2014, ArXiv.

[27]  Nathan Srebro,et al.  A GPU-tailored approach for training kernelized SVMs , 2011, KDD.

[28]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[29]  Shucheng Yu,et al.  Privacy Preserving Back-Propagation Neural Network Learning Made Practical with Cloud Computing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[30]  Terrance E. Boult,et al.  Multi-class Open Set Recognition Using Probability of Inclusion , 2014, ECCV.

[31]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[32]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[33]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[34]  Michael A. West,et al.  GPU-Accelerated Bayesian Learning and Forecasting in Simultaneous Graphical Dynamic Linear Models , 2016 .